Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ pipeline_tag: audio-to-audio
|
|
| 18 |
# Human-1: A Full-Duplex Conversational Model for Hindi
|
| 19 |
**ποΈ [Try the live demo β](https://ai.joshtalks.com/research/josh1)**
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
<p align="center">
|
| 24 |
<img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
|
|
@@ -84,9 +84,9 @@ Measured using Sarvam-1 (2B) on Whisper-v3 transcriptions of generated speech.
|
|
| 84 |
| Temperature | PPL β |
|
| 85 |
|---|---|
|
| 86 |
| Ground-truth | 237.1 |
|
| 87 |
-
|
|
| 88 |
-
|
|
| 89 |
-
|
|
| 90 |
|
| 91 |
### Human Evaluation
|
| 92 |
|
|
@@ -126,7 +126,7 @@ Temperature Ο=0.9 produces turn-taking dynamics closest to ground-truth.
|
|
| 126 |
|
| 127 |
## Conversation Style
|
| 128 |
|
| 129 |
-
|
| 130 |
|
| 131 |
After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
|
| 132 |
|
|
@@ -135,7 +135,7 @@ This makes the model particularly well-suited for **domain-specific conversation
|
|
| 135 |
## Files
|
| 136 |
|
| 137 |
```
|
| 138 |
-
βββ model.safetensors #
|
| 139 |
βββ tokenizer-e351c8d8-checkpoint125.safetensors # Mimi audio codec (frozen, from Moshi)
|
| 140 |
βββ tokenizer_hindi.model # Hindi SentencePiece tokenizer
|
| 141 |
βββ tokenizer_hindi.vocab # Vocabulary reference
|
|
@@ -155,7 +155,7 @@ source $HOME/.local/bin/env
|
|
| 155 |
### 2. Create project and install dependencies
|
| 156 |
|
| 157 |
```bash
|
| 158 |
-
uv init
|
| 159 |
uv python install 3.12
|
| 160 |
uv python pin 3.12
|
| 161 |
uv add moshi huggingface_hub
|
|
@@ -164,7 +164,7 @@ uv add moshi huggingface_hub
|
|
| 164 |
### 3. Download the model
|
| 165 |
|
| 166 |
```bash
|
| 167 |
-
uv run huggingface-cli download JoshTalksAI/
|
| 168 |
```
|
| 169 |
|
| 170 |
### 4. Run the server
|
|
|
|
| 18 |
# Human-1: A Full-Duplex Conversational Model for Hindi
|
| 19 |
**ποΈ [Try the live demo β](https://ai.joshtalks.com/research/josh1)**
|
| 20 |
|
| 21 |
+
Human-1 by Josh Talks is the first full-duplex spoken dialogue model for Hindi, built by adapting [Kyutai's Moshi](https://github.com/kyutai-labs/moshi) architecture. It enables real-time, natural Hindi conversation with support for interruptions, overlaps, backchannels, and natural turn-taking β trained on 26,000 hours of real spontaneous Hindi conversations from 14,695 speakers.
|
| 22 |
|
| 23 |
<p align="center">
|
| 24 |
<img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
|
|
|
|
| 84 |
| Temperature | PPL β |
|
| 85 |
|---|---|
|
| 86 |
| Ground-truth | 237.1 |
|
| 87 |
+
| Human-1 (Ο=0.8) | 356.9 |
|
| 88 |
+
| Human-1 (Ο=0.9) | 467.1 |
|
| 89 |
+
| Human-1 (Ο=1.0) | 640.6 |
|
| 90 |
|
| 91 |
### Human Evaluation
|
| 92 |
|
|
|
|
| 126 |
|
| 127 |
## Conversation Style
|
| 128 |
|
| 129 |
+
Human-1 is trained on **topic-driven conversations** - real dialogues where two speakers discuss a subject naturally, with backchannels, interruptions, and organic turn-taking.
|
| 130 |
|
| 131 |
After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
|
| 132 |
|
|
|
|
| 135 |
## Files
|
| 136 |
|
| 137 |
```
|
| 138 |
+
βββ model.safetensors # Human-1 LM weights
|
| 139 |
βββ tokenizer-e351c8d8-checkpoint125.safetensors # Mimi audio codec (frozen, from Moshi)
|
| 140 |
βββ tokenizer_hindi.model # Hindi SentencePiece tokenizer
|
| 141 |
βββ tokenizer_hindi.vocab # Vocabulary reference
|
|
|
|
| 155 |
### 2. Create project and install dependencies
|
| 156 |
|
| 157 |
```bash
|
| 158 |
+
uv init human-1 && cd human-1
|
| 159 |
uv python install 3.12
|
| 160 |
uv python pin 3.12
|
| 161 |
uv add moshi huggingface_hub
|
|
|
|
| 164 |
### 3. Download the model
|
| 165 |
|
| 166 |
```bash
|
| 167 |
+
uv run huggingface-cli download JoshTalksAI/Human-1 --local-dir ./weights
|
| 168 |
```
|
| 169 |
|
| 170 |
### 4. Run the server
|