Bodega-Raptor-0.9B
Reasoning at the Edge
Bodega-Raptor-0.9B brings analytical thinking to the smallest devices. At very less memory footprint, this model delivers structured reasoning where resources are most constrained. It is part of our Raptor series of generalist models, optimized for edge deployment where every megabyte and every millisecond counts.
Ultra-Compact Reasoning
Nine hundred million parameters. That is small enough to run on mobile devices, Raspberry Pi boards, and embedded systems where you would not normally consider deploying AI. The memory footprint ranges from 400-800MB depending on quantization settings, making it practical for devices with limited RAM.
Despite its size, the model maintains structured reasoning capabilities. It can perform logical analysis, work through step-by-step problem solving, handle basic causal reasoning, and apply structured thinking to simple problems. This is not a model that tries to do everything—it focuses on doing reasoning tasks well within its capacity constraints.
Extreme Edge Performance
On Apple Silicon and similar ARM processors, Raptor-0.9B delivers over 100 tokens per second with sub-30ms first token latency. That is fast enough for interactive applications, real-time analysis, and scenarios where you need immediate responses. The model is battery-friendly, consuming minimal power even during sustained inference, which makes it practical for mobile and embedded deployments.
The model is offline-first by design. Once loaded, it requires no network connection, no API calls, no external dependencies. This is essential for edge devices that may have unreliable connectivity, for privacy-sensitive applications where data cannot leave the device, or for embedded systems that operate autonomously.
Startup time is minimal. The model loads quickly, making it suitable for applications that need to spawn instances on demand or for devices that conserve battery by unloading the model when idle.
What Raptor-0.9B Does
The model handles lightweight reasoning tasks within Bodega OS's retrieval and inference pipeline. It excels at quick document classification, determining what should be indexed and how. When ingesting documents, the model can rapidly assess content type, extract key metadata, and make routing decisions about which retrieval pipeline to use.
For retrieval workflows, Raptor-0.9B performs fast query understanding and reformulation. It can analyze user queries, identify intent, and generate alternative phrasings to improve retrieval results. This happens in milliseconds, making it practical for real-time search interfaces where query processing cannot introduce noticeable latency.
The model supports indexing operations by generating tags, categories, and structured metadata from unstructured content. It can process documents as they are ingested, extract relevant information, and populate index fields without requiring larger models. This lightweight preprocessing keeps indexing pipelines fast and efficient.
For inference workflows, Raptor-0.9B acts as a fast router and filter. It can evaluate which queries need full model inference versus simple retrieval, classify user intents to route requests to appropriate specialized models, and perform preliminary analysis to structure inputs for downstream processing.
Edge Deployment in Bodega
Raptor-0.9B runs on mobile devices and laptops as part of Bodega's edge deployment. The model's small footprint makes it practical for always-on retrieval assistance where larger models would drain battery or consume too much memory.
The model's efficiency means it works on standard consumer hardware without specialized accelerators. If you have a few hundred megabytes of available RAM and a reasonably modern processor, you can run this model as part of your local Bodega instance.
Part of the Raptor Series
Raptor-0.9B is the second-smallest member of our Raptor series of generalist models. It sits between the 90M model (optimized for extreme edge deployment) and the larger Raptor models designed for more demanding reasoning tasks. The series philosophy remains consistent: focus on doing real tasks well rather than chasing benchmark numbers.
For applications that need more capability, you can pair Raptor-0.9B with larger models in hybrid workflows. Use the small model for initial filtering, quick analysis, and routine decision making. Escalate to larger models only when you need more sophisticated reasoning. This approach keeps most computation on the edge while maintaining access to deeper analysis when necessary.
Integration with Bodega OS
As part of Bodega OS, Raptor-0.9B can work alongside our retrieval engines and other models in the ecosystem. On edge devices, it serves as a lightweight reasoning layer that can make decisions about what data to retrieve, how to process it, and when to involve more powerful models.
The model maintains privacy by keeping all processing local. Sensor data, user inputs, analysis results—nothing leaves the device unless you explicitly choose to send it. This is fundamental to our approach: AI should work for you on your hardware, not require sending your data to external services.
Technical Details
Nine hundred million parameters. Memory footprint of 400-800MB depending on quantization. Inference speed over 100 tokens per second on ARM processors. Sub-30ms first token latency for typical queries. Minimal power consumption suitable for battery-operated devices.
The model runs efficiently on Apple Silicon (M1, M2, M3 series), ARM Cortex processors, and similar architectures. It does not require discrete GPUs or specialized AI accelerators. Standard CPU inference is fast enough for interactive applications.
Context window is optimized for the kinds of reasoning tasks this model handles well: short-form problem solving, logical analysis, and structured decision making. The model does not try to maintain extensive context because that is not what you use a sub-1B parameter model for. It focuses on doing small reasoning tasks quickly and reliably.
Disclaimer
SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.
Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone
Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engines.
- Downloads last month
- 44
6-bit
