bhaskarJT commited on
Commit
f4ebea7
Β·
verified Β·
1 Parent(s): 77dbf75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: audio-to-audio
18
  # Human-1: A Full-Duplex Conversational Model for Hindi
19
  **πŸŽ™οΈ [Try the live demo β†’](https://ai.joshtalks.com/research/josh1)**
20
 
21
- Hindi-Moshi is the first full-duplex spoken dialogue model for Hindi, built by adapting [Kyutai's Moshi](https://github.com/kyutai-labs/moshi) architecture. It enables real-time, natural Hindi conversation with support for interruptions, overlaps, backchannels, and natural turn-taking β€” trained on 26,000 hours of real spontaneous Hindi conversations from 14,695 speakers.
22
 
23
  <p align="center">
24
  <img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
@@ -84,9 +84,9 @@ Measured using Sarvam-1 (2B) on Whisper-v3 transcriptions of generated speech.
84
  | Temperature | PPL ↓ |
85
  |---|---|
86
  | Ground-truth | 237.1 |
87
- | Hindi-Moshi (Ο„=0.8) | 356.9 |
88
- | Hindi-Moshi (Ο„=0.9) | 467.1 |
89
- | Hindi-Moshi (Ο„=1.0) | 640.6 |
90
 
91
  ### Human Evaluation
92
 
@@ -126,7 +126,7 @@ Temperature Ο„=0.9 produces turn-taking dynamics closest to ground-truth.
126
 
127
  ## Conversation Style
128
 
129
- Hindi-Moshi is trained on **topic-driven conversations** - real dialogues where two speakers discuss a subject naturally, with backchannels, interruptions, and organic turn-taking.
130
 
131
  After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
132
 
@@ -135,7 +135,7 @@ This makes the model particularly well-suited for **domain-specific conversation
135
  ## Files
136
 
137
  ```
138
- β”œβ”€β”€ model.safetensors # Hindi-Moshi LM weights
139
  β”œβ”€β”€ tokenizer-e351c8d8-checkpoint125.safetensors # Mimi audio codec (frozen, from Moshi)
140
  β”œβ”€β”€ tokenizer_hindi.model # Hindi SentencePiece tokenizer
141
  β”œβ”€β”€ tokenizer_hindi.vocab # Vocabulary reference
@@ -155,7 +155,7 @@ source $HOME/.local/bin/env
155
  ### 2. Create project and install dependencies
156
 
157
  ```bash
158
- uv init hindi-moshi && cd hindi-moshi
159
  uv python install 3.12
160
  uv python pin 3.12
161
  uv add moshi huggingface_hub
@@ -164,7 +164,7 @@ uv add moshi huggingface_hub
164
  ### 3. Download the model
165
 
166
  ```bash
167
- uv run huggingface-cli download JoshTalksAI/josh1 --local-dir ./weights
168
  ```
169
 
170
  ### 4. Run the server
 
18
  # Human-1: A Full-Duplex Conversational Model for Hindi
19
  **πŸŽ™οΈ [Try the live demo β†’](https://ai.joshtalks.com/research/josh1)**
20
 
21
+ Human-1 by Josh Talks is the first full-duplex spoken dialogue model for Hindi, built by adapting [Kyutai's Moshi](https://github.com/kyutai-labs/moshi) architecture. It enables real-time, natural Hindi conversation with support for interruptions, overlaps, backchannels, and natural turn-taking β€” trained on 26,000 hours of real spontaneous Hindi conversations from 14,695 speakers.
22
 
23
  <p align="center">
24
  <img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
 
84
  | Temperature | PPL ↓ |
85
  |---|---|
86
  | Ground-truth | 237.1 |
87
+ | Human-1 (Ο„=0.8) | 356.9 |
88
+ | Human-1 (Ο„=0.9) | 467.1 |
89
+ | Human-1 (Ο„=1.0) | 640.6 |
90
 
91
  ### Human Evaluation
92
 
 
126
 
127
  ## Conversation Style
128
 
129
+ Human-1 is trained on **topic-driven conversations** - real dialogues where two speakers discuss a subject naturally, with backchannels, interruptions, and organic turn-taking.
130
 
131
  After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
132
 
 
135
  ## Files
136
 
137
  ```
138
+ β”œβ”€β”€ model.safetensors # Human-1 LM weights
139
  β”œβ”€β”€ tokenizer-e351c8d8-checkpoint125.safetensors # Mimi audio codec (frozen, from Moshi)
140
  β”œβ”€β”€ tokenizer_hindi.model # Hindi SentencePiece tokenizer
141
  β”œβ”€β”€ tokenizer_hindi.vocab # Vocabulary reference
 
155
  ### 2. Create project and install dependencies
156
 
157
  ```bash
158
+ uv init human-1 && cd human-1
159
  uv python install 3.12
160
  uv python pin 3.12
161
  uv add moshi huggingface_hub
 
164
  ### 3. Download the model
165
 
166
  ```bash
167
+ uv run huggingface-cli download JoshTalksAI/Human-1 --local-dir ./weights
168
  ```
169
 
170
  ### 4. Run the server