lilfugu / README.md
holotherapper's picture
Upload README.md with huggingface_hub
4bdec9f verified
---
language:
- ja
license: apache-2.0
base_model: Qwen/Qwen3-ASR-1.7B
library_name: mlx
tags:
- automatic-speech-recognition
- speech-to-text
- japanese
- programming
- mlx
- asr
- stt
- qwen3_asr
pipeline_tag: automatic-speech-recognition
---
# lilfugu
A Japanese ASR model fine-tuned for software development.
Based on [Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B). Designed to produce clean, usable transcriptions for developers β€” not just programming term recognition, but also proper Arabic numerals (e.g. `3000`, not `三千`), consistent punctuation, and overall higher-quality Japanese output.
## What's improved over the base model
- **Programming terms in English**: `useEffect`, `Docker`, `Vercel`, `Prisma`, `Tailwind CSS`, etc. β€” not katakana
- **Arabic numerals**: `3000η•ͺγƒγƒΌγƒˆ`, `200ms`, `8GB` β€” not kanji numerals
- **Punctuation and formatting**: cleaner, more consistent output
- **General Japanese quality**: improvements not fully captured by existing benchmarks (JSUT, etc.) due to their normalization
## Benchmarks
### [ADLIB](https://github.com/holotherapper/adlib) (DevTerm, 247 test cases)
| Model | CER | Term Accuracy (Exact) | Composite |
|---|---|---|---|
| **lilfugu** | **26.3%** | **51.6%** | **0.6272** |
| Qwen3-ASR-1.7B (base) | 41.1% | 24.6% | 0.4203 |
| Whisper large-v3-turbo | 41.9% | 20.2% | 0.3935 |
| kotoba-whisper-v2.0 | 61.1% | 7.0% | 0.2256 |
| SenseVoice Small | 56.8% | 0.0% | 0.2090 |
Composite = 0.4 Γ— (1 - CER) + 0.6 Γ— Term Accuracy (includes both exact and flexible matches)
Benchmark: [ADLIB](https://github.com/holotherapper/adlib) β€” Language-aware ASR benchmark for Japanese
### [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) basic5000 (General Japanese, 300 samples)
| Model | CER |
|---|---|
| Qwen3-ASR-1.7B (base) | 10.7% |
| **lilfugu** | **10.8%** |
| Whisper large-v3-turbo | 12.0% |
| kotoba-whisper-v2.0 | 15.7% |
| SenseVoice Small | 16.2% |
Dataset: [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut)
Note: Existing Japanese ASR benchmarks are not designed to properly evaluate Japanese language quality β€” they normalize numbers, punctuation, and whitespace before scoring. These scores should be taken as a rough reference only.
## Variants
| Repository | Size | Format |
|---|---|---|
| [lilfugu](https://huggingface.co/holotherapper/lilfugu) (this) | 4.1 GB | MLX bfloat16 |
| [lilfugu-8bit](https://huggingface.co/holotherapper/lilfugu-8bit) | 2.8 GB | MLX 8bit quantized |
| [lilfugu-transformers](https://huggingface.co/holotherapper/lilfugu-transformers) | 4.1 GB | safetensors fp16 (CUDA/Linux) |
| [lilfugu-transformers-8bit](https://huggingface.co/holotherapper/lilfugu-transformers-8bit) | 2.2 GB | bitsandbytes int8 (CUDA/Linux) |
| [lilfugu-lora](https://huggingface.co/holotherapper/lilfugu-lora) | ~49 MB | LoRA adapter |
See also: [lilfugu-experimental](https://huggingface.co/holotherapper/lilfugu-experimental) β€” higher term accuracy, but may over-convert in some cases.
## Usage
### MLX (Apple Silicon)
```bash
pip install -U mlx-audio
```
```python
from mlx_audio.stt import load
model = load("holotherapper/lilfugu")
result = model.generate("audio.wav", language="Japanese")
print(result.text)
```
For the 8bit version:
```python
model = load("holotherapper/lilfugu-8bit")
```
### CUDA / Linux
```python
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained("holotherapper/lilfugu-transformers")
result = model.transcribe("audio.wav")
```
### LoRA adapter (custom scale tuning)
```python
from mlx_tune.stt import FastSTTModel
from mlx_lm.tuner.lora import LoRALinear
model, _ = FastSTTModel.from_pretrained("mlx-community/Qwen3-ASR-1.7B-bf16")
model.load_adapter("holotherapper/lilfugu-lora")
# Adjust scale (0.0-1.0). Higher = stronger term conversion.
for _, module in model.model.named_modules():
if isinstance(module, LoRALinear):
module.scale = 1.0
text = model.transcribe("audio.wav", language="ja")
```
## License
Apache 2.0 (following [Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B))