Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
Model API Verification
Verified against the live Hugging Face model cards on June 14, 2026.
| Role | Verified repository | Load API | Inference API | Important constraint |
|---|---|---|---|---|
| Vision/OCR | openbmb/MiniCPM-V-2 |
AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16) and AutoTokenizer.from_pretrained(...) |
answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7) |
The prompt's openbmb/MiniCPM-V-2_0 ID does not exist. The model card documents English and Chinese support. |
| TTS | openbmb/VoxCPM2 |
VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False) |
model.generate(text=..., cfg_value=2.0, inference_timesteps=10) |
Supports 30 languages, including English and Chinese. |
| STT | CohereLabs/cohere-transcribe-03-2026 |
AutoProcessor plus CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda") |
Processor at 16 kHz with an explicit language, then model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False) and processor.decode(...) |
Gated model. Accept terms and provide HF_TOKEN. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: device_map="auto" + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + .to("cuda") + greedy decoding. |
The implementation does not silently replace any sponsor model. It corrects the invalid MiniCPM repository spelling to the verified 2.8B release.
English and Chinese are the only languages advertised end to end because they are shared by all three verified model capabilities. No model training is required for the MVP.