README.md · protocolvoice/asr-models at main

File size: 6,833 Bytes

---

license: apache-2.0
language:
  - ru
  - en
tags:
  - automatic-speech-recognition
  - speaker-diarization
  - named-entity-recognition
  - text-summarization
  - onnx
  - russian
  - english
  - asr
  - gigaam
  - whisper
  - 3d-speaker
  - camplus
  - eres2net
  - slovnet
  - natasha
  - navec
  - mobile
  - offline
library_name: onnx
---


# ProtocolVoice models

Offline models for the [ProtocolVoice](https://github.com/conwerter1/protocolvoice) Android app — voice transcription, speaker diarization, and on-device interview summarization.

All models run **on the device**, no cloud calls.

## Contents

### Russian ASR

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `gigaam_v3_e2e_ctc_int8.onnx` | 305 MB | Russian ASR with built-in punctuation | [Sber/SaluteDevices GigaAM](https://github.com/salute-developers/GigaAM) (v3, e2e CTC, int8-quantized) | MIT |

### English ASR

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `en/whisper_base_en_encoder_int8.onnx` | 28 MB | Whisper base.en encoder | [openai/whisper](https://github.com/openai/whisper) via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) | MIT |
| `en/whisper_base_en_decoder_int8.onnx` | 125 MB | Whisper base.en decoder | OpenAI Whisper via sherpa-onnx | MIT |
| `en/whisper_base_en_tokens.txt` | 0.8 MB | Whisper tokens vocab | OpenAI Whisper | MIT |

### Speaker diarization (works for any language)

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `speaker_embedding_camplus.onnx` | 27 MB | Speaker embedding (CAM++) — recommended default | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding.onnx` | 111 MB | Speaker embedding (ERes2Net V1) — best quality | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding_v2.onnx` | 68 MB | Speaker embedding (ERes2NetV2) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |

### Russian summarization (Default tier — NER-based, no LLM)

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `summary/navec_news.tar` | 25 MB | Navec quantized word embeddings (250K Russian words, 300-dim, PQ-100) | [natasha/navec](https://github.com/natasha/navec) | MIT |
| `summary/slovnet_ner.tar` | 2.3 MB | Slovnet NER weights (WordCNN + CRF, PER/LOC/ORG) | [natasha/slovnet](https://github.com/natasha/slovnet) | MIT |

These two files together (28 MB total) enable offline Russian named entity recognition + LexRank-based extractive summarization. ProtocolVoice uses them to extract names, organizations, locations, and key quotes from interview transcripts. No LLM required — fully deterministic, factual extraction.

### Manifest

| File | Size | Purpose |
|---|---|---|
| `manifest.json` | < 2 KB | SHA-256 hashes and metadata for all models |

## Important — attribution

These are NOT new models — this repository **redistributes existing models** in formats convenient for mobile delivery. The original authors retain all credit and copyright. We did not train, fine-tune, or modify the model weights.

**Please cite the original projects, not this redistribution:**

- **GigaAM-v3** (Russian ASR): Sber AI, SaluteDevices — https://github.com/salute-developers/GigaAM
- **Whisper** (English ASR): OpenAI — https://github.com/openai/whisper
- **3D-Speaker** (CAM++, ERes2Net, ERes2NetV2): ModelScope, Alibaba — https://github.com/modelscope/3D-Speaker
- **Slovnet NER + Navec**: Natasha project, Alexander Kukushkin — https://github.com/natasha/slovnet, https://github.com/natasha/navec
- **sherpa-onnx** (ONNX runtime): Next-gen Kaldi (k2-fsa) — https://github.com/k2-fsa/sherpa-onnx

## Why this redistribution

The ProtocolVoice mobile app needs to download these models on first run from a mirror that:
- supports files larger than 100 MB without git-lfs limits,
- has fast CDN reachable from Russia,
- is the conventional hosting platform for ML models.

All redistributed files retain their original licenses. This README serves as the required attribution under those licenses.

## How the app uses these models

ASR + diarization (loaded via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)):
1. App downloads `.onnx` files from `https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}`
2. Verifies SHA-256 against `manifest.json`
3. Loads via sherpa-onnx for offline inference

Summarization (Default tier, custom Kotlin port):
1. App downloads `summary/navec_news.tar` and `summary/slovnet_ner.tar`
2. Extracts both `.tar` archives into the app's private files directory
3. Loads weights into a pure-Kotlin reimplementation of Slovnet NER (no PyTorch, no Python — just FloatArray math): WordEmbedding → ShapeEmbedding → 3-layer Conv1D → Linear → CRF Viterbi
4. Combines NER output with TF-IDF + LexRank to extract top quotes, named entities, risks, and numerical data

Inference performance on Xiaomi 12T: ~6 seconds for a 17,900-word transcript (default tier, NER + LexRank, no LLM).

You can also use these files directly with the upstream libraries (sherpa-onnx, slovnet, navec) in any project that respects the original licenses.

## Verifying integrity

```python

import hashlib



with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:

    print(hashlib.sha256(f.read()).hexdigest())

# expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0

```

Hashes for all files are in `manifest.json`.

## Optional: Pro tier (QVikhr 1.5B)

ProtocolVoice has an optional **PRO tier** that produces a literary, narrative summary using [QVikhr-2.5-1.5B-Instruct-r](https://huggingface.co/Vikhrmodels/QVikhr-2.5-1.5B-Instruct-r) (1.0 GB GGUF, runs via llama.cpp on-device). The PRO tier is layered on top of the Default tier — Default extracts facts, PRO turns them into a coherent narrative.

The QVikhr GGUF is **not hosted in this repo** — users download it directly from the Vikhrmodels HF org or from a separate mirror, on demand. The QVikhr authors retain copyright; please cite them, not us.

## License

This repository's metadata, README, and packaging scripts are released under **Apache-2.0**. Each model file remains under its original license (see the tables above). By using a model, you accept its original license — not just this repository's.

## Removal request

If you are an author of one of the upstream projects and have any concerns about this redistribution (attribution, hosting, anything else), please open a discussion on this Hugging Face repo or email the maintainers — the files will be amended or removed as requested.