Spaces:
Sleeping
Sleeping
| # Model API Verification | |
| Verified against the live Hugging Face model cards on June 14, 2026. | |
| | Role | Verified repository | Load API | Inference API | Important constraint | | |
| |---|---|---|---|---| | |
| | Vision/OCR | `openbmb/MiniCPM-V-2` | `AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16)` and `AutoTokenizer.from_pretrained(...)` | `answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7)` | The prompt's `openbmb/MiniCPM-V-2_0` ID does not exist. The model card documents English and Chinese support. | | |
| | TTS | `openbmb/VoxCPM2` | `VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)` | `model.generate(text=..., cfg_value=2.0, inference_timesteps=10)` | Supports 30 languages, including English and Chinese. | | |
| | STT | `CohereLabs/cohere-transcribe-03-2026` | `AutoProcessor` plus `CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda")` | Processor at 16 kHz with an explicit `language`, then `model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False)` and `processor.decode(...)` | Gated model. Accept terms and provide `HF_TOKEN`. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: `device_map="auto"` + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + `.to("cuda")` + greedy decoding. | | |
| The implementation does not silently replace any sponsor model. It corrects the invalid | |
| MiniCPM repository spelling to the verified 2.8B release. | |
| English and Chinese are the only languages advertised end to end because they are shared by | |
| all three verified model capabilities. No model training is required for the MVP. | |