--- license: bigscience-openrail-m datasets: - openai/gdpval language: - af metrics: - character base_model: - sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 new_version: deepseek-ai/DeepSeek-OCR pipeline_tag: voice-activity-detection library_name: diffusers tags: - art - agent --- ---# install pip install transformers datasets accelerate # training (very simplified) python run_speech_to_text_flax.py \ --model_name_or_path openai/whisper-small \ --train_data_dir ./chichewa_speech/train \ --validation_data_dir ./chichewa_speech/val \ --output_dir ./whisper-chichewa-finetuned license: bigscience-openrail-m datasets: - openai/gdpval language: - af metrics: - character base_model: - deepseek-ai/DeepSeek-OCR new_version: zai-org/GLM-4.6 pipeline_tag: translation library_name: adapter-transformers tags: - art --- GPT2from transformers import MarianTokenizer, MarianMTModel model = MarianMTModel.from_pretrained("your-org/marian-chichewa-en") tok = MarianTokenizer.from_pretrained("your-org/marian-chichewa-en") batch = tok(["Muli bwanji?"], return_tensors="pt", padding=True) out = model.generate(**batch) print(tok.batch_decode(out, skip_special_tokens=True))