File size: 4,942 Bytes
c864fd4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | ---
language:
- en
tags:
- code
---
# Model Card for Bleenk
## Model Summary
**Bleenk 123B** is an agentic large language model developed by **[Robi Labs](https://www.robiai.com/)** for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.
Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.
## Model Details
### Model Description
* **Developed by:** [Robi Labs](https://www.robiai.com/)
* **Created for:** [Bleenk](https://www.bleenk.app/)
* **Funded by:** [Robi Labs](https://www.robiai.com/)
* **Shared by:** [Robi Labs](https://www.robiai.com/)
* **Model type:** Agentic Large Language Model (LLM)
* **Language(s) (NLP):** Primarily English; supports multilingual code and technical text
* **License:** To be released by Robi Labs
* **Finetuned from model:** Proprietary pretraining and fine-tuning pipeline
### Model Sources
* **Demo:** [https://bleenk.app](https://bleenk.app)
## Uses
### Direct Use
* Software engineering agents
* AI-powered code assistants
* Codebase navigation and analysis
* Multi-file refactoring and maintenance
* Tool-augmented development workflows
### Downstream Use
* Fine-tuning for organization-specific codebases
* Integration into internal developer platforms
* Agent frameworks for autonomous engineering
### Out-of-Scope Use
* General-purpose chat or conversational agents
* High-risk decision-making without human oversight
* Tasks requiring domain-specific legal, medical, or financial guarantees
## Bias, Risks, and Limitations
* The model may produce incorrect or incomplete code without verification
* Tool misuse may result in unintended system changes
* Performance depends on tool availability and prompt quality
* Trained primarily on publicly available and licensed data, which may encode historical biases
### Recommendations
Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.
## How to Get Started with the Model
```bash
ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest
```
## Training Details
### Training Data
The model was trained on a mixture of:
* Publicly available code repositories
* Licensed datasets
* Synthetic data generated for software engineering tasks
### Training Procedure
#### Preprocessing
Data was filtered for quality, deduplicated, and normalized for code and technical text.
#### Training Hyperparameters
* **Training regime:** Mixed-precision training (bf16)
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
* SWE-bench Verified
* SWE-bench Multilingual
* Terminal Bench
#### Metrics
* Task success rate
* Patch correctness
* Tool execution accuracy
### Results
| Model | Size (B Tokens) | SWE Bench Verified | SWE Bench Multilingual | Terminal Bench |
| ------------------ | --------------- | ------------------ | ---------------------- | -------------- |
| **Bleenk** | **123** | **73.2%** | **71.3%** | **45.5%** |
| Devstral 2 | 123 | 72.2% | 61.3% | 40.5% |
| Devstral Small 2 | 24 | 65.8% | 51.6% | 32.0% |
| DeepSeek v3.2 | 671 | 73.1% | 70.2% | 46.4% |
| Kimi K2 Thinking | 1000 | 71.3% | 61.1% | 35.7% |
| MiniMax M2 | 230 | 69.4% | 56.5% | 30.0% |
| GLM 4.6 | 455 | 68.0% | – | 40.5% |
| Qwen 3 Coder Plus | 480 | 69.6% | 54.7% | 37.5% |
| Gemini 3 Pro | – | 76.2% | – | 54.2% |
| Claude Sonnet 4.5 | – | 77.2% | 68.0% | 42.8% |
| GPT 5.1 Codex Max | – | 77.9% | – | 58.1% |
| GPT 5.1 Codex High | – | 73.7% | – | 52.8% |
## Environmental Impact
Environmental impact details will be released as measurements are finalized.
## Technical Specifications
### Model Architecture and Objective
Transformer-based large language model optimized for agentic reasoning and tool usage.
### Compute Infrastructure
#### Hardware
Large-scale GPU/accelerator clusters
#### Software
Custom training and inference stack developed by Robi Labs
## Model Card Authors
Robi Labs Research Team
## Model Card Contact
[hello@robiai.com](mailto:hello@robiai.com) |