Instructions to use maya-research/Veena with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use maya-research/Veena with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="maya-research/Veena")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("maya-research/Veena") model = AutoModelForCausalLM.from_pretrained("maya-research/Veena") - Notebooks
- Google Colab
- Kaggle
Question about Concurrent / Parallel Request Performance and Benchmarks
Hi, it’s great to see this work pushing Indic language modeling forward - thanks for sharing it with the community.
I’ve been testing the model and found that single-request performance is excellent, and the benchmark numbers look very strong. However, I’ve been having some difficulty achieving good throughput under concurrent or parallel request load.
Do you happen to have any benchmarks, guidance, or best practices for running this system with multiple parallel requests (for example, concurrency limits, batching strategies, or recommended serving setups)? I’m especially curious whether the reported performance remains similar under concurrent usage.
Thanks again for the great work, and appreciate any pointers you can share.