Axe-Turbo-31B
Precision at Scale
Axe-Turbo-31B represents our approach to agentic coding at scale. Distilled from Claude Opus 4.5 agentic coding sessions involving multiple tool calls, complex repositories, and real-world software engineering tasks, this 31 billion parameter model punches significantly above its weight class on SWE-Bench and similar evaluations. Engineered for terminal-based coding workflows within Bodega OS, it excels at understanding large codebases, orchestrating tool calls, and making intelligent decisions about what to read and what to ignore.
Cascade RL for Coding and Tool Use
Most coding models are trained once on static datasets and shipped. Axe-Turbo-31B uses Cascade Reinforcement Learning, a fundamentally different approach to getting coding and tool use right. We start with offline RL—learning from a carefully curated playbook of excellent coding sessions, tool call sequences, and problem-solving strategies. This warm-up phase establishes strong fundamentals in code generation, file navigation, and multi-step reasoning.
Then comes online RL, where the model learns from its own tool use patterns and code generations in real scenarios. It continuously adapts, refining its understanding of what effective tool orchestration looks like, which code patterns solve problems reliably, and how to recover from errors. This two-stage cascade gives us the best of both worlds: the efficiency of offline learning and the adaptability of online refinement.
The mathematical formulation follows a coarse-to-fine strategy. During offline RL, we optimize:
π* = argmax_π E_{(s,a)~D}[r(s,a) - β·KL(π(a|s) || π_ref(a|s))]
where D is our dataset of expert coding sessions, r(s,a) captures code correctness and tool efficiency, and the KL term prevents deviation from the reference policy. This establishes stable convergence.
Online RL then refines alignment through:
π** = argmax_π E_{s~ρ_π, a~π}[r(s,a) + α·V(s')]
where the model learns from its own generated rollouts, with V(s') estimating future value to encourage long-horizon planning in complex coding tasks.
Traditional fine-tuning hits a wall—the model learns the average of its training data, not the ceiling of what is possible. Online RL alone is expensive and unstable without a strong foundation. Cascade RL solves both problems, producing production-ready stability with frontier-level coding capabilities at a fraction of the compute cost. This is why Axe-Turbo-31B performs competitively on SWE-Bench despite being smaller than many competing models.
Intelligent Context Compression
Working with modern complex codebases requires handling far more context than fits in any reasonable context window. Axe-Turbo-31B addresses this through intelligent context compression in collaboration with smaller Raptor models. The system can effectively handle what feels like infinite context—or at least 5 million tokens—by strategically deciding what to keep in active memory.
The approach uses Raptor-series models to compress and summarize less-critical context while maintaining full detail for actively relevant code. When examining a large repository, the system identifies which files and functions are central to the current task and keeps those in full detail, while maintaining compressed representations of peripheral code. This hierarchical compression means the model can work on massive codebases without getting lost.
Terminal-Based Coding Excellence
Axe-Turbo-31B excels in terminal-based coding environments. The model understands terminal operations, file manipulation, git workflows, test execution, and debugging through command-line tools. For CLI-based development within Bodega OS, the model provides sophisticated assistance—analyzing compiler errors, suggesting fixes, and helping navigate complex build systems.
The model understands Makefiles, shell scripts, package managers, and the ecosystem of command-line development tools. Integration with Bodega's retrieval engines enables searching documentation, error messages, and code history while maintaining the terminal workflow. Developers get code suggestions and explanations without leaving the terminal environment.
Agentic Tool Use and SWE-Bench Performance
The model orchestrates multiple tool calls efficiently to accomplish complex tasks. It plans multi-step operations, executes them through appropriate tools, verifies results, and adapts when things do not go as expected. This agentic behavior comes from distillation of Opus's tool use patterns and refinement through Cascade RL.
On SWE-Bench, Axe-Turbo-31B achieves competitive performance despite being smaller than many competing models. It successfully completes complex GitHub issues involving bug fixes, feature implementations, and refactoring tasks requiring multi-file edits and understanding of project architecture. The strength comes from learning effective tool use patterns during distillation and optimizing them through Cascade RL training.
Tool use is efficient—the model does not make redundant calls or retrieve unnecessary information. It understands dependencies between operations and sequences them appropriately. Error recovery is sophisticated; when operations fail, the model diagnoses what went wrong and tries alternative approaches.
Integration with Bodega OS
Within Bodega OS, Axe-Turbo-31B serves as the primary coding agent. It orchestrates tool calls to the retrieval system, manages context through collaboration with Raptor models, and generates code based on comprehensive understanding of your codebase. All operations happen locally—your proprietary code, architectural decisions, and implementation strategies remain private.
The model integrates with Bodega's retrieval engines to search code, documentation, and commit history. It analyzes retrieved code, identifies patterns across your codebase, and suggests changes that maintain consistency with existing conventions. For indexing and ingestion, the model generates rich metadata about functions, classes, and modules that improve search quality.
Architecture and Performance
Thirty-one billion parameters optimized for code generation and reasoning. MLX-based implementation delivers low-latency responses suitable for interactive coding workflows on Apple Silicon. Memory layout is optimized for sustained performance during extended coding sessions. The model runs on M-series chips with 32GB+ memory recommended for optimal performance.
Disclaimer
SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.
Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone
*Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engine
- Downloads last month
- 32
8-bit
