Spaces:

build-small-hackathon
/

third-eye

Sleeping

App Files Files Community

third-eye / MODEL_VERIFICATION.md

mitvho09

Upload folder using huggingface_hub

031e3f9 verified 18 days ago

preview code

Raw

History Blame Contribute Delete

1.74 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Model API Verification

Verified against the live Hugging Face model cards on June 14, 2026.

Role	Verified repository	Load API	Inference API	Important constraint
Vision/OCR	`openbmb/MiniCPM-V-2`	`AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16)` and `AutoTokenizer.from_pretrained(...)`	`answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7)`	The prompt's `openbmb/MiniCPM-V-2_0` ID does not exist. The model card documents English and Chinese support.
TTS	`openbmb/VoxCPM2`	`VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)`	`model.generate(text=..., cfg_value=2.0, inference_timesteps=10)`	Supports 30 languages, including English and Chinese.
STT	`CohereLabs/cohere-transcribe-03-2026`	`AutoProcessor` plus `CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda")`	Processor at 16 kHz with an explicit `language`, then `model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False)` and `processor.decode(...)`	Gated model. Accept terms and provide `HF_TOKEN`. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: `device_map="auto"` + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + `.to("cuda")` + greedy decoding.

The implementation does not silently replace any sponsor model. It corrects the invalid MiniCPM repository spelling to the verified 2.8B release.

English and Chinese are the only languages advertised end to end because they are shared by all three verified model capabilities. No model training is required for the MVP.