File size: 6,833 Bytes
ae9287e
 
2341cf2
 
5c38f63
2341cf2
 
 
5c38f63
 
2341cf2
 
5c38f63
2341cf2
 
5c38f63
2341cf2
 
 
5c38f63
 
 
2341cf2
 
 
ae9287e
2341cf2
5c38f63
2341cf2
5c38f63
 
 
2341cf2
 
 
5c38f63
 
 
2341cf2
 
5c38f63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2341cf2
 
5c38f63
2341cf2
5c38f63
 
 
 
2341cf2
5c38f63
2341cf2
5c38f63
2341cf2
5c38f63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2341cf2
 
 
5c38f63
2341cf2
 
 
 
5c38f63
2341cf2
5c38f63
2341cf2
5c38f63
 
 
 
2341cf2
5c38f63
 
 
 
 
2341cf2
5c38f63
 
 
2341cf2
 
 
 
 
 
 
 
 
 
 
 
 
5c38f63
 
 
 
 
 
2341cf2
 
5c38f63
2341cf2
 
 
5c38f63
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---

license: apache-2.0
language:
  - ru
  - en
tags:
  - automatic-speech-recognition
  - speaker-diarization
  - named-entity-recognition
  - text-summarization
  - onnx
  - russian
  - english
  - asr
  - gigaam
  - whisper
  - 3d-speaker
  - camplus
  - eres2net
  - slovnet
  - natasha
  - navec
  - mobile
  - offline
library_name: onnx
---


# ProtocolVoice models

Offline models for the [ProtocolVoice](https://github.com/conwerter1/protocolvoice) Android app β€” voice transcription, speaker diarization, and on-device interview summarization.

All models run **on the device**, no cloud calls.

## Contents

### Russian ASR

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `gigaam_v3_e2e_ctc_int8.onnx` | 305 MB | Russian ASR with built-in punctuation | [Sber/SaluteDevices GigaAM](https://github.com/salute-developers/GigaAM) (v3, e2e CTC, int8-quantized) | MIT |

### English ASR

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `en/whisper_base_en_encoder_int8.onnx` | 28 MB | Whisper base.en encoder | [openai/whisper](https://github.com/openai/whisper) via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) | MIT |
| `en/whisper_base_en_decoder_int8.onnx` | 125 MB | Whisper base.en decoder | OpenAI Whisper via sherpa-onnx | MIT |
| `en/whisper_base_en_tokens.txt` | 0.8 MB | Whisper tokens vocab | OpenAI Whisper | MIT |

### Speaker diarization (works for any language)

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `speaker_embedding_camplus.onnx` | 27 MB | Speaker embedding (CAM++) β€” recommended default | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding.onnx` | 111 MB | Speaker embedding (ERes2Net V1) β€” best quality | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding_v2.onnx` | 68 MB | Speaker embedding (ERes2NetV2) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |

### Russian summarization (Default tier β€” NER-based, no LLM)

| File | Size | Purpose | Original source | License |
|---|---|---|---|---|
| `summary/navec_news.tar` | 25 MB | Navec quantized word embeddings (250K Russian words, 300-dim, PQ-100) | [natasha/navec](https://github.com/natasha/navec) | MIT |
| `summary/slovnet_ner.tar` | 2.3 MB | Slovnet NER weights (WordCNN + CRF, PER/LOC/ORG) | [natasha/slovnet](https://github.com/natasha/slovnet) | MIT |

These two files together (28 MB total) enable offline Russian named entity recognition + LexRank-based extractive summarization. ProtocolVoice uses them to extract names, organizations, locations, and key quotes from interview transcripts. No LLM required β€” fully deterministic, factual extraction.

### Manifest

| File | Size | Purpose |
|---|---|---|
| `manifest.json` | < 2 KB | SHA-256 hashes and metadata for all models |

## Important β€” attribution

These are NOT new models β€” this repository **redistributes existing models** in formats convenient for mobile delivery. The original authors retain all credit and copyright. We did not train, fine-tune, or modify the model weights.

**Please cite the original projects, not this redistribution:**

- **GigaAM-v3** (Russian ASR): Sber AI, SaluteDevices β€” https://github.com/salute-developers/GigaAM
- **Whisper** (English ASR): OpenAI β€” https://github.com/openai/whisper
- **3D-Speaker** (CAM++, ERes2Net, ERes2NetV2): ModelScope, Alibaba β€” https://github.com/modelscope/3D-Speaker
- **Slovnet NER + Navec**: Natasha project, Alexander Kukushkin β€” https://github.com/natasha/slovnet, https://github.com/natasha/navec
- **sherpa-onnx** (ONNX runtime): Next-gen Kaldi (k2-fsa) β€” https://github.com/k2-fsa/sherpa-onnx

## Why this redistribution

The ProtocolVoice mobile app needs to download these models on first run from a mirror that:
- supports files larger than 100 MB without git-lfs limits,
- has fast CDN reachable from Russia,
- is the conventional hosting platform for ML models.

All redistributed files retain their original licenses. This README serves as the required attribution under those licenses.

## How the app uses these models

ASR + diarization (loaded via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)):
1. App downloads `.onnx` files from `https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}`
2. Verifies SHA-256 against `manifest.json`
3. Loads via sherpa-onnx for offline inference

Summarization (Default tier, custom Kotlin port):
1. App downloads `summary/navec_news.tar` and `summary/slovnet_ner.tar`
2. Extracts both `.tar` archives into the app's private files directory
3. Loads weights into a pure-Kotlin reimplementation of Slovnet NER (no PyTorch, no Python β€” just FloatArray math): WordEmbedding β†’ ShapeEmbedding β†’ 3-layer Conv1D β†’ Linear β†’ CRF Viterbi
4. Combines NER output with TF-IDF + LexRank to extract top quotes, named entities, risks, and numerical data

Inference performance on Xiaomi 12T: ~6 seconds for a 17,900-word transcript (default tier, NER + LexRank, no LLM).

You can also use these files directly with the upstream libraries (sherpa-onnx, slovnet, navec) in any project that respects the original licenses.

## Verifying integrity

```python

import hashlib



with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:

    print(hashlib.sha256(f.read()).hexdigest())

# expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0

```

Hashes for all files are in `manifest.json`.

## Optional: Pro tier (QVikhr 1.5B)

ProtocolVoice has an optional **PRO tier** that produces a literary, narrative summary using [QVikhr-2.5-1.5B-Instruct-r](https://huggingface.co/Vikhrmodels/QVikhr-2.5-1.5B-Instruct-r) (1.0 GB GGUF, runs via llama.cpp on-device). The PRO tier is layered on top of the Default tier β€” Default extracts facts, PRO turns them into a coherent narrative.

The QVikhr GGUF is **not hosted in this repo** β€” users download it directly from the Vikhrmodels HF org or from a separate mirror, on demand. The QVikhr authors retain copyright; please cite them, not us.

## License

This repository's metadata, README, and packaging scripts are released under **Apache-2.0**. Each model file remains under its original license (see the tables above). By using a model, you accept its original license β€” not just this repository's.

## Removal request

If you are an author of one of the upstream projects and have any concerns about this redistribution (attribution, hosting, anything else), please open a discussion on this Hugging Face repo or email the maintainers β€” the files will be amended or removed as requested.