Spaces:

build-small-hackathon
/

third-eye

Sleeping

App Files Files Community

third-eye / MODEL_VERIFICATION.md

mitvho09's picture

Upload folder using huggingface_hub

031e3f9 verified 18 days ago

|

History Blame Contribute Delete

1.74 kB

	# Model API Verification

	Verified against the live Hugging Face model cards on June 14, 2026.

	\| Role \| Verified repository \| Load API \| Inference API \| Important constraint \|
	\|---\|---\|---\|---\|---\|
	\| Vision/OCR \| `openbmb/MiniCPM-V-2` \| `AutoModel.from_pretrained(..., trust_remote_code=True, torch_dtype=torch.float16)` and `AutoTokenizer.from_pretrained(...)` \| `answer, context, _ = model.chat(image=image, msgs=[...], context=None, tokenizer=tokenizer, sampling=True, temperature=0.7)` \| The prompt's `openbmb/MiniCPM-V-2_0` ID does not exist. The model card documents English and Chinese support. \|
	\| TTS \| `openbmb/VoxCPM2` \| `VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)` \| `model.generate(text=..., cfg_value=2.0, inference_timesteps=10)` \| Supports 30 languages, including English and Chinese. \|
	\| STT \| `CohereLabs/cohere-transcribe-03-2026` \| `AutoProcessor` plus `CohereAsrForConditionalGeneration.from_pretrained(..., torch_dtype=bfloat16).to("cuda")` \| Processor at 16 kHz with an explicit `language`, then `model.generate(..., max_new_tokens=128, num_beams=1, do_sample=False)` and `processor.decode(...)` \| Gated model. Accept terms and provide `HF_TOKEN`. Requires Transformers 5.4+, so it uses a separate Modal image. NOTE: `device_map="auto"` + default float32 made transcription run on CPU/offloaded (~minutes). Use explicit bf16 + `.to("cuda")` + greedy decoding. \|

	The implementation does not silently replace any sponsor model. It corrects the invalid
	MiniCPM repository spelling to the verified 2.8B release.

	English and Chinese are the only languages advertised end to end because they are shared by
	all three verified model capabilities. No model training is required for the MVP.