Axe-Turbo-1B

Speed Meets Efficiency

Axe-Turbo-1B is a specialized model for ultra-fast code completion within Bodega OS. With just 1 billion parameters and a memory footprint of 500MB-1GB, this model delivers 80-150 tokens per second with sub-50ms latency. It is designed specifically for tab completion workflows where speed and context-aware suggestions matter more than general reasoning capabilities.

Built for Code Completion

The model was trained with reinforcement learning to understand how to complete code based on the indexed parts of your codebase. When you hit tab, Axe-Turbo-1B queries Bodega's retrieval system to find relevant code patterns, examines the surrounding context, and generates completions that match your project's style and conventions.

This is not just autocomplete based on the current file. The model learns a policy for suggesting completions that are informed by your entire indexed codebase. If you have helper functions defined elsewhere, common patterns used across files, or project-specific conventions, the model incorporates that knowledge into its suggestions.

The RL training focused on teaching the model when to suggest simple completions versus more complex multi-line generations, how to use retrieved code examples effectively, and when to prefer local context over broader codebase patterns. This policy-based approach makes completions smarter and more contextually appropriate.

Performance Characteristics

Five hundred megabytes to 1GB memory footprint. Eighty to 150 tokens per second throughput on Apple Silicon. Sub-50ms first token latency ensures that completions appear instantly as you type. The model runs continuously in the background without noticeable resource consumption, making it practical for always-on code assistance.

The ultra-low latency is critical for code completion. Anything above 100ms feels sluggish. Axe-Turbo-1B completes in under 50ms, which means suggestions appear as fast as your IDE can render them. This responsiveness changes how you interact with AI-assisted coding—it becomes seamless rather than interruptive.

Integration with Bodega Retrieval

Axe-Turbo-1B works in tight integration with Bodega's code indexing and retrieval systems. As you type, the model identifies what completion context would be useful, queries the retrieval system for relevant code snippets, and uses that information to generate suggestions that fit your project.

The retrieval queries happen in parallel with inference, keeping latency low. The model learned through RL training which retrieval queries are most valuable for different completion scenarios, when to prioritize local context versus broader codebase search, and how to synthesize information from multiple retrieved examples.

This retrieval-augmented approach means completions improve as your codebase grows. The model gets better at suggesting project-specific patterns because it has more indexed examples to learn from. Your coding style, naming conventions, and architectural decisions become part of the model's context through retrieval.

Edge Deployment

The model's compact size makes it practical for edge deployment on developer laptops. It runs entirely locally as part of Bodega OS, ensuring that your code never leaves your machine. This is essential for proprietary codebases where sending code to external completion services is not acceptable.

Battery efficiency matters for mobile development. Axe-Turbo-1B consumes minimal power even during continuous operation, making it viable for extended coding sessions on laptops without external power. The model's efficiency means you can keep it running all day without thermal throttling or battery concerns.

Technical Details

The model runs efficiently on M1, M2, M3, and newer Apple Silicon chips. Its small size means it also works on older hardware and mobile devices where larger code completion models would not be practical.

The RL training used a policy gradient approach with rewards based on completion acceptance rates, contextual appropriateness, and retrieval effectiveness. This produces a model that does not just generate plausible code—it generates code that developers actually want to use.

Disclaimer

SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.

Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone

Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engines.

Downloads last month: 9

Safetensors

Model size

1B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including srswti/axe-turbo-1b

Bodega's Own

Collection

Optimized for Apple Silicon and BodegaOS • 18 items • Updated Apr 30 • 3