Question about Concurrent / Parallel Request Performance and Benchmarks

#22

by Anonymous2025-26 - opened Jan 8

Anonymous2025-26

Jan 8

Hi, it’s great to see this work pushing Indic language modeling forward - thanks for sharing it with the community.

I’ve been testing the model and found that single-request performance is excellent, and the benchmark numbers look very strong. However, I’ve been having some difficulty achieving good throughput under concurrent or parallel request load.

Do you happen to have any benchmarks, guidance, or best practices for running this system with multiple parallel requests (for example, concurrency limits, batching strategies, or recommended serving setups)? I’m especially curious whether the reported performance remains similar under concurrent usage.

Thanks again for the great work, and appreciate any pointers you can share.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment