š„ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. š¤ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.
Managing 16 different machine learning pipelines (from Expected Goals to Space Creation) across Databricks Serverless and HF Jobs is a logistical challenge. To solve this, we built a dynamic operations center (the 13th page in our app).
It features:
Ā Ā ā¢ šš» š¶š»šš²šæš®š°šš¶šš² š±š²š½š²š»š±š²š»š°š ššš: Powered by Cytoscape.js, it visually maps exactly how our models and data grids feed into each other.
Ā Ā ā¢ š„š²š®š¹-šš¶šŗš² šŗš¼š»š¶šš¼šæš¶š»š“: Tracks run volumes and data freshness SLAs across the entire platform.
Ā Ā ā¢ š šÆ-šš¶š²šæ šµššÆšæš¶š± š°š¼šš š²š»š“š¶š»š²: Merges "cold" Databricks billing data with "warm/hot" live HF Jobs estimates to give a unified view of pipeline expenses.