third-eye / MODEL_VERIFICATION.md
mitvho09's picture
Upload folder using huggingface_hub
031e3f9 verified
|
Raw
History Blame Contribute Delete
1.74 kB
# Model API Verification
Verified against the live Hugging Face model cards on June 14, 2026.
| Role | Verified repository | Load API | Inference API | Important constraint |
|---|---|---|---|---|
| Vision/OCR | `openbmb/MiniCPM-V-2` | `AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16)` and `AutoTokenizer.from_pretrained(...)` | `answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7)` | The prompt's `openbmb/MiniCPM-V-2_0` ID does not exist. The model card documents English and Chinese support. |
| TTS | `openbmb/VoxCPM2` | `VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)` | `model.generate(text=..., cfg_value=2.0, inference_timesteps=10)` | Supports 30 languages, including English and Chinese. |
| STT | `CohereLabs/cohere-transcribe-03-2026` | `AutoProcessor` plus `CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda")` | Processor at 16 kHz with an explicit `language`, then `model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False)` and `processor.decode(...)` | Gated model. Accept terms and provide `HF_TOKEN`. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: `device_map="auto"` + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + `.to("cuda")` + greedy decoding. |
The implementation does not silently replace any sponsor model. It corrects the invalid
MiniCPM repository spelling to the verified 2.8B release.
English and Chinese are the only languages advertised end to end because they are shared by
all three verified model capabilities. No model training is required for the MVP.