third-eye / MODEL_VERIFICATION.md
mitvho09's picture
Upload folder using huggingface_hub
031e3f9 verified
|
Raw
History Blame Contribute Delete
1.74 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Model API Verification

Verified against the live Hugging Face model cards on June 14, 2026.

Role Verified repository Load API Inference API Important constraint
Vision/OCR openbmb/MiniCPM-V-2 AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16) and AutoTokenizer.from_pretrained(...) answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7) The prompt's openbmb/MiniCPM-V-2_0 ID does not exist. The model card documents English and Chinese support.
TTS openbmb/VoxCPM2 VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False) model.generate(text=..., cfg_value=2.0, inference_timesteps=10) Supports 30 languages, including English and Chinese.
STT CohereLabs/cohere-transcribe-03-2026 AutoProcessor plus CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda") Processor at 16 kHz with an explicit language, then model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False) and processor.decode(...) Gated model. Accept terms and provide HF_TOKEN. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: device_map="auto" + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + .to("cuda") + greedy decoding.

The implementation does not silently replace any sponsor model. It corrects the invalid MiniCPM repository spelling to the verified 2.8B release.

English and Chinese are the only languages advertised end to end because they are shared by all three verified model capabilities. No model training is required for the MVP.