Bodega-Raptor-15B-6bit

Premium Reasoning with Efficiency

Bodega-Raptor-15B-6bit represents the middle ground in our Raptor series—more capable than our lighter models, more efficient than our largest ones. With 15 billion parameters and 6-bit quantization, this model brings sophisticated analytical thinking to mainstream hardware as part of Bodega OS. The 6-bit quantization preserves more information than aggressive 4-bit approaches while maintaining practical memory requirements.

The 6-bit Advantage

Six-bit quantization strikes a balance between quality and efficiency. While 4-bit quantization can introduce noticeable degradation in reasoning quality, 6-bit preserves the subtle patterns and relationships that matter for complex analytical tasks. The trade-off is worth it—you get near-full-precision reasoning quality with memory requirements of just 8-12GB, making the model deployable on standard laptops and workstations.

The quantization approach maintains the model's ability to handle multi-step reasoning chains, logical deduction, and complex problem decomposition. These capabilities depend on preserving fine-grained relationships in the model's weights, which is why the extra precision of 6-bit quantization matters for reasoning-focused models.

Architecture and Performance

Fifteen billion parameters quantized to 6-bit precision. Memory footprint of 8-12GB depending on configuration. On Apple Silicon, you are looking at 40-60 tokens per second sustained throughput. This is fast enough for interactive development and analysis workflows while maintaining the reasoning quality needed for complex tasks.

The MLX-based inference leverages unified memory architecture, making efficient use of available resources without requiring discrete GPUs. The model runs comfortably on M1 Pro, M2, and M3 series chips with 16GB or more of unified memory.

What Raptor-15B Does

This model excels at complex problem analysis within Bodega's retrieval and inference workflows. When analyzing retrieved documents or code, the model can identify patterns, extract relevant information, and synthesize insights across multiple sources. The reasoning capabilities make it valuable for understanding relationships between different pieces of information in your local knowledge base.

For code intelligence, the model handles architectural analysis of complex systems. It can examine codebases retrieved through Bodega's search, identify design patterns, suggest optimization strategies, and perform root cause analysis when debugging. The model understands not just individual code snippets but the broader architectural context that determines how systems behave.

Query understanding and expansion benefit from the model's analytical thinking. When processing user queries in Bodega's retrieval system, the model can infer intent, generate semantically related search terms, and reformulate queries to improve retrieval results. This happens fast enough to enhance search without introducing noticeable latency.

The model supports sophisticated document analysis and synthesis. When Bodega retrieves multiple documents, Raptor-15B can analyze them collectively, identify common themes, resolve contradictions, and generate coherent summaries that preserve important details. This multi-document reasoning is where the model's capabilities shine.

Reasoning for Retrieval and Inference

Within Bodega OS, Raptor-15B serves as a reasoning layer between retrieval and final output. After the retrieval system finds relevant documents or code, the model analyzes them to extract actionable insights. It can determine which retrieved items are most relevant, how they relate to each other, and what conclusions can be drawn from the combined information.

For indexing and ingestion, the model generates rich metadata and semantic tags that improve retrieval quality. It can analyze documents as they are ingested, extract key concepts, identify relationships to existing content, and structure information for efficient retrieval. This preprocessing happens at 40-60 tokens per second, making it practical for continuous ingestion workflows.

The model handles hypothesis generation during exploratory analysis. When users are investigating problems or researching topics through Bodega's retrieval system, the model can suggest alternative explanations, identify gaps in retrieved information, and recommend additional searches to fill those gaps.

Running On-Premises

Raptor-15B runs entirely on your hardware as part of Bodega OS. Complex analyses, proprietary code examination, sensitive document processing—all of it stays local. The model integrates with Bodega's retrieval engines to access your local knowledge base without sending queries or content to external services.

The 6-bit quantization makes the model practical for deployment on standard development machines. You do not need workstation-class hardware or specialized accelerators. A modern laptop with 16GB of unified memory can run this model comfortably alongside other Bodega components.

Technical Details

Fifteen billion parameters quantized to 6-bit precision. Memory footprint of 8-12GB depending on configuration and loaded context. Sustained throughput of 40-60 tokens per second on Apple Silicon. MLX-based inference optimized for unified memory architecture.

The model runs efficiently on M1 Pro, M2, M3, and newer Apple Silicon chips. Memory bandwidth is the primary performance factor at this scale, which is why Apple's unified memory architecture provides good performance despite having fewer compute units than discrete GPUs.

Context window supports extended reasoning over retrieved documents and code. The model can maintain coherence across multi-document analysis, understand relationships between distant pieces of information, and build reasoning chains that span multiple retrieval results.

Part of the Raptor Series

Raptor-15B occupies the middle tier of our Raptor series. It provides more sophisticated reasoning than our 8B model while remaining more efficient than our 32B variant. For workflows that need serious analytical capability without the resource requirements of our largest models, this is the model to use.

The model pairs well with other Bodega components in hybrid workflows. Use lighter Raptor models for initial filtering and quick analysis. Route complex reasoning tasks to Raptor-15B when you need deeper insights. Escalate to larger models only when necessary. This tiered approach optimizes resource usage while maintaining reasoning quality.

Disclaimer

SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.

Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone

Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engines.

Downloads last month: 18

Safetensors

Model size

15B params

Tensor type

F32

U32

MLX

Hardware compatibility

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including srswti/bodega-raptor-15b-6bit

Bodega's Own

Collection

Optimized for Apple Silicon and BodegaOS • 18 items • Updated Apr 30 • 3