| | --- |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | arxiv: 2512.24873 |
| | tags: |
| | - agent |
| | - moe |
| | --- |
| | |
| | # ROME-30B-A3B (Coming Soon) |
| |
|
| | <a href="https://arxiv.org/pdf/2512.24873" target="_blank"> |
| | 🔗 <strong>Technical Report</strong><br/> |
| | <img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv%3A2512.24873-red"/> |
| | </a> |
| |
|
| |
|
| | --- |
| |
|
| | ## 📢 Note: Coming Soon! |
| |
|
| | **ROME (ROME is Obviously an Agentic ModEl)** will be officially released soon. |
| | The project is currently under final review and preparation. Model weights will be made publicly available shortly. Stay tuned! |
| |
|
| | <img src="https://rlhf.oss-cn-hangzhou.aliyuncs.com/iFLOW-ROME/performance.png" width="600"/> |
| |
|
| |
|
| | --- |
| |
|
| |
|
| |
|
| | ## Highlights |
| |
|
| | **ROME** is an open-source **agentic model** incubated within the **ALE (Agentic Learning Ecosystem)**. |
| |
|
| | Rather than scaling performance purely by increasing parameter count, ROME achieves parameter-scale–crossing agentic performance through full-stack infrastructure and RL algorithmic optimization. |
| |
|
| | <img src="https://rlhf.oss-cn-hangzhou.aliyuncs.com/iFLOW-ROME/ALE.PNG" width="600"/> |
| |
|
| |
|
| | ### 🔧 ALE Full-Stack Infrastructure |
| | - [**ROLL**](https://github.com/alibaba/ROLL) – Large-scale reinforcement learning optimization engine |
| |
|
| | - [**ROCK**](https://github.com/alibaba/ROCK) – Secure sandbox and environment orchestration for agent execution |
| |
|
| | - **iFlow CLI** – Unified agent framework and developer interface |
| |
|
| | |
| |
|
| | ### 🧠 IPA Policy Optimization Algorithm |
| | - Introduces **Interaction-Perceptive Agentic Policy Optimization (IPA)** |
| | - Performs credit assignment at the level of **Semantic Interaction Chunks** |
| | - Significantly improves **training stability** and **success rates** on **long-horizon tasks** |
| |
|
| |
|
| |
|
| | ### 🚀 Strong Agentic Performance |
| | - Despite being a **mid-sized model** (30B MoE with 3B active parameters), ROME outperforms same-scale models on standard agent benchmarks: |
| | - **Terminal-Bench 2.0**: 24.72% |
| | - **SWE-bench Verified**: 57.40% |
| | |
| | - Performance is competitive with, and in some cases comparable to, models exceeding **100B parameters** |
| |
|
| | |
| |
|
| | ### 🔒 Production-Grade Safety |
| | - Designed for autonomous agent execution in real environments |
| | - Rigorously aligned and red-teamed against risks such as: |
| | - Unauthorized access |
| | - Illegal or unsafe tool invocation |
| | - Built with **deployment-grade safety guarantees** in mind |
| |
|
| | --- |
| |
|
| |
|
| |
|
| | ## Performance (Preview) |
| |
|
| | ### Terminal-Based Benchmarks |
| |
|
| | | **Model** | **Terminal-Bench 2.0** | **SWE-bench Verified** | |
| | | ---------------------------- | ---------------------- | ---------------------- | |
| | | Qwen3-Coder-30B-A3B-Instruct | 13.48% | 46.33% | |
| | | **ROME-30B-A3B** | **24.72%** | **57.40%** | |
| | | GPT-OSS-120B | 21.12% | 43.93% | |
| | | GLM-4.5 Air (106B) | 17.30% | 56.20% | |
| |
|
| | > See the technical report for full experimental details. |
| |
|
| | --- |
| |
|
| | ## Best Practices |
| |
|
| | *(Code examples and usage guidelines will be added after the model release.)* |
| |
|
| | --- |
| |
|
| |
|
| |
|
| | ## Citation |
| |
|
| | If you find our work useful, please consider citing: |
| |
|
| | ```bibtex |
| | @article{rome2025ale, |
| | title={Let It Flow: Agentic Crafting on Rock and Roll - Building the ROME Model within an Open Agentic Learning Ecosystem}, |
| | author={Wang, Weixun and Xu, XiaoXiao and An, Wanhe and Dai, Fangwen and others}, |
| | journal={arXiv preprint arXiv:2512.24873}, |
| | year={2025} |
| | } |
| | ``` |