JoshTalksAI
/

Human-1

@@ -18,7 +18,7 @@ pipeline_tag: audio-to-audio
 # Human-1: A Full-Duplex Conversational Model for Hindi
 **🎙️ [Try the live demo →](https://ai.joshtalks.com/research/josh1)**
-Hindi-Moshi is the first full-duplex spoken dialogue model for Hindi, built by adapting [Kyutai's Moshi](https://github.com/kyutai-labs/moshi) architecture. It enables real-time, natural Hindi conversation with support for interruptions, overlaps, backchannels, and natural turn-taking — trained on 26,000 hours of real spontaneous Hindi conversations from 14,695 speakers.
 <p align="center">
   <img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
@@ -84,9 +84,9 @@ Measured using Sarvam-1 (2B) on Whisper-v3 transcriptions of generated speech.
 | Temperature | PPL ↓ |
 |---|---|
 | Ground-truth | 237.1 |
-| Hindi-Moshi (τ=0.8) | 356.9 |
-| Hindi-Moshi (τ=0.9) | 467.1 |
-| Hindi-Moshi (τ=1.0) | 640.6 |
 ### Human Evaluation
@@ -126,7 +126,7 @@ Temperature τ=0.9 produces turn-taking dynamics closest to ground-truth.
 ## Conversation Style
-Hindi-Moshi is trained on **topic-driven conversations** - real dialogues where two speakers discuss a subject naturally, with backchannels, interruptions, and organic turn-taking.
 After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
@@ -135,7 +135,7 @@ This makes the model particularly well-suited for **domain-specific conversation
 ## Files
 ```
-├── model.safetensors                              # Hindi-Moshi LM weights
 ├── tokenizer-e351c8d8-checkpoint125.safetensors   # Mimi audio codec (frozen, from Moshi)
 ├── tokenizer_hindi.model                          # Hindi SentencePiece tokenizer
 ├── tokenizer_hindi.vocab                          # Vocabulary reference
@@ -155,7 +155,7 @@ source $HOME/.local/bin/env
 ### 2. Create project and install dependencies
 ```bash
-uv init hindi-moshi && cd hindi-moshi
 uv python install 3.12
 uv python pin 3.12
 uv add moshi huggingface_hub
@@ -164,7 +164,7 @@ uv add moshi huggingface_hub
 ### 3. Download the model
 ```bash
-uv run huggingface-cli download JoshTalksAI/josh1 --local-dir ./weights
 ```
 ### 4. Run the server

 # Human-1: A Full-Duplex Conversational Model for Hindi
 **🎙️ [Try the live demo →](https://ai.joshtalks.com/research/josh1)**
+Human-1 by Josh Talks is the first full-duplex spoken dialogue model for Hindi, built by adapting [Kyutai's Moshi](https://github.com/kyutai-labs/moshi) architecture. It enables real-time, natural Hindi conversation with support for interruptions, overlaps, backchannels, and natural turn-taking — trained on 26,000 hours of real spontaneous Hindi conversations from 14,695 speakers.
 <p align="center">
   <img src="hindi_moshi_architecture.svg" alt="Hindi-Moshi Architecture" width="480"/>
 | Temperature | PPL ↓ |
 |---|---|
 | Ground-truth | 237.1 |
+| Human-1 (τ=0.8) | 356.9 |
+| Human-1 (τ=0.9) | 467.1 |
+| Human-1 (τ=1.0) | 640.6 |
 ### Human Evaluation
 ## Conversation Style
+Human-1 is trained on **topic-driven conversations** - real dialogues where two speakers discuss a subject naturally, with backchannels, interruptions, and organic turn-taking.
 After an initial introduction, the model will typically **propose a topic and steer the conversation toward it**, preferring structured discussion over open-ended chitchat. Users can also **introduce their own topic** - the model will pick it up and engage in a focused discussion around it. This is an intentional design choice - the training data consists of real conversations where speakers engage in focused, in-depth discussions on assigned topics.
 ## Files
 ```
+├── model.safetensors                              # Human-1 LM weights
 ├── tokenizer-e351c8d8-checkpoint125.safetensors   # Mimi audio codec (frozen, from Moshi)
 ├── tokenizer_hindi.model                          # Hindi SentencePiece tokenizer
 ├── tokenizer_hindi.vocab                          # Vocabulary reference
 ### 2. Create project and install dependencies
 ```bash
+uv init human-1 && cd human-1
 uv python install 3.12
 uv python pin 3.12
 uv add moshi huggingface_hub
 ### 3. Download the model
 ```bash
+uv run huggingface-cli download JoshTalksAI/Human-1 --local-dir ./weights
 ```
 ### 4. Run the server