Question about Concurrent / Parallel Request Performance and Benchmarks

#22
by Anonymous2025-26 - opened

Hi, it’s great to see this work pushing Indic language modeling forward - thanks for sharing it with the community.

I’ve been testing the model and found that single-request performance is excellent, and the benchmark numbers look very strong. However, I’ve been having some difficulty achieving good throughput under concurrent or parallel request load.

Do you happen to have any benchmarks, guidance, or best practices for running this system with multiple parallel requests (for example, concurrency limits, batching strategies, or recommended serving setups)? I’m especially curious whether the reported performance remains similar under concurrent usage.

Thanks again for the great work, and appreciate any pointers you can share.

Sign up or log in to comment