AI & ML interests

MoE architectures, Chimera models, Assembly of Experts

Articles

BM-TNG 
published an article 8 months ago
view article
Article

How Long Prompts Block Other Requests - Optimizing LLM Performance

•
9
SR-TNG 
published an article 9 months ago
view article
Article

Finetuning olmOCR to be a faithful OCR-Engine

•
19
BM-TNG 
published an article 10 months ago
view article
Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

•
61
BM-TNG 
published an article 10 months ago
view article
Article

Efficient Request Queueing – Optimizing LLM Performance

•
21