---
language:
  - ja
license: apache-2.0
base_model: Qwen/Qwen3-ASR-1.7B
library_name: mlx
tags:
  - automatic-speech-recognition
  - speech-to-text
  - japanese
  - programming
  - mlx
  - asr
  - stt
  - qwen3_asr
pipeline_tag: automatic-speech-recognition
---

# lilfugu

A Japanese ASR model fine-tuned for software development.

Based on [Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B). Designed to produce clean, usable transcriptions for developers — not just programming term recognition, but also proper Arabic numerals (e.g. `3000`, not `三千`), consistent punctuation, and overall higher-quality Japanese output.

## What's improved over the base model

- **Programming terms in English**: `useEffect`, `Docker`, `Vercel`, `Prisma`, `Tailwind CSS`, etc. — not katakana
- **Arabic numerals**: `3000番ポート`, `200ms`, `8GB` — not kanji numerals
- **Punctuation and formatting**: cleaner, more consistent output
- **General Japanese quality**: improvements not fully captured by existing benchmarks (JSUT, etc.) due to their normalization

## Benchmarks

### [ADLIB](https://github.com/holotherapper/adlib) (DevTerm, 247 test cases)

| Model | CER | Term Accuracy (Exact) | Composite |
|---|---|---|---|
| **lilfugu** | **26.3%** | **51.6%** | **0.6272** |
| Qwen3-ASR-1.7B (base) | 41.1% | 24.6% | 0.4203 |
| Whisper large-v3-turbo | 41.9% | 20.2% | 0.3935 |
| kotoba-whisper-v2.0 | 61.1% | 7.0% | 0.2256 |
| SenseVoice Small | 56.8% | 0.0% | 0.2090 |

Composite = 0.4 × (1 - CER) + 0.6 × Term Accuracy (includes both exact and flexible matches)

Benchmark: [ADLIB](https://github.com/holotherapper/adlib) — Language-aware ASR benchmark for Japanese

### [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) basic5000 (General Japanese, 300 samples)

| Model | CER |
|---|---|
| Qwen3-ASR-1.7B (base) | 10.7% |
| **lilfugu** | **10.8%** |
| Whisper large-v3-turbo | 12.0% |
| kotoba-whisper-v2.0 | 15.7% |
| SenseVoice Small | 16.2% |

Dataset: [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut)

Note: Existing Japanese ASR benchmarks are not designed to properly evaluate Japanese language quality — they normalize numbers, punctuation, and whitespace before scoring. These scores should be taken as a rough reference only.

## Variants

| Repository | Size | Format |
|---|---|---|
| [lilfugu](https://huggingface.co/holotherapper/lilfugu) (this) | 4.1 GB | MLX bfloat16 |
| [lilfugu-8bit](https://huggingface.co/holotherapper/lilfugu-8bit) | 2.8 GB | MLX 8bit quantized |
| [lilfugu-transformers](https://huggingface.co/holotherapper/lilfugu-transformers) | 4.1 GB | safetensors fp16 (CUDA/Linux) |
| [lilfugu-transformers-8bit](https://huggingface.co/holotherapper/lilfugu-transformers-8bit) | 2.2 GB | bitsandbytes int8 (CUDA/Linux) |
| [lilfugu-lora](https://huggingface.co/holotherapper/lilfugu-lora) | ~49 MB | LoRA adapter |

See also: [lilfugu-experimental](https://huggingface.co/holotherapper/lilfugu-experimental) — higher term accuracy, but may over-convert in some cases.

## Usage

### MLX (Apple Silicon)

```bash
pip install -U mlx-audio
```

```python
from mlx_audio.stt import load

model = load("holotherapper/lilfugu")
result = model.generate("audio.wav", language="Japanese")
print(result.text)
```

For the 8bit version:
```python
model = load("holotherapper/lilfugu-8bit")
```

### CUDA / Linux

```python
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained("holotherapper/lilfugu-transformers")
result = model.transcribe("audio.wav")
```

### LoRA adapter (custom scale tuning)

```python
from mlx_tune.stt import FastSTTModel
from mlx_lm.tuner.lora import LoRALinear

model, _ = FastSTTModel.from_pretrained("mlx-community/Qwen3-ASR-1.7B-bf16")
model.load_adapter("holotherapper/lilfugu-lora")

# Adjust scale (0.0-1.0). Higher = stronger term conversion.
for _, module in model.model.named_modules():
    if isinstance(module, LoRALinear):
        module.scale = 1.0

text = model.transcribe("audio.wav", language="ja")
```

## License

Apache 2.0 (following [Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B))