|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- code |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
# Model Card for Bleenk |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
**Bleenk 123B** is an agentic large language model developed by **[Robi Labs](https://www.robiai.com/)** for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents. |
|
|
|
|
|
Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
* **Developed by:** [Robi Labs](https://www.robiai.com/) |
|
|
* **Created for:** [Bleenk](https://www.bleenk.app/) |
|
|
* **Funded by:** [Robi Labs](https://www.robiai.com/) |
|
|
* **Shared by:** [Robi Labs](https://www.robiai.com/) |
|
|
* **Model type:** Agentic Large Language Model (LLM) |
|
|
* **Language(s) (NLP):** Primarily English; supports multilingual code and technical text |
|
|
* **License:** To be released by Robi Labs |
|
|
* **Finetuned from model:** Proprietary pretraining and fine-tuning pipeline |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
* **Demo:** [https://bleenk.app](https://bleenk.app) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
* Software engineering agents |
|
|
* AI-powered code assistants |
|
|
* Codebase navigation and analysis |
|
|
* Multi-file refactoring and maintenance |
|
|
* Tool-augmented development workflows |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
* Fine-tuning for organization-specific codebases |
|
|
* Integration into internal developer platforms |
|
|
* Agent frameworks for autonomous engineering |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
* General-purpose chat or conversational agents |
|
|
* High-risk decision-making without human oversight |
|
|
* Tasks requiring domain-specific legal, medical, or financial guarantees |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* The model may produce incorrect or incomplete code without verification |
|
|
* Tool misuse may result in unintended system changes |
|
|
* Performance depends on tool availability and prompt quality |
|
|
* Trained primarily on publicly available and licensed data, which may encode historical biases |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```bash |
|
|
ollama pull RobiLabs/bleenk:latest |
|
|
ollama run RobiLabs/bleenk:latest |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on a mixture of: |
|
|
|
|
|
* Publicly available code repositories |
|
|
* Licensed datasets |
|
|
* Synthetic data generated for software engineering tasks |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### Preprocessing |
|
|
|
|
|
Data was filtered for quality, deduplicated, and normalized for code and technical text. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
* **Training regime:** Mixed-precision training (bf16) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
* SWE-bench Verified |
|
|
* SWE-bench Multilingual |
|
|
* Terminal Bench |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
* Task success rate |
|
|
* Patch correctness |
|
|
* Tool execution accuracy |
|
|
|
|
|
### Results |
|
|
|
|
|
| Model | Size (B Tokens) | SWE Bench Verified | SWE Bench Multilingual | Terminal Bench | |
|
|
| ------------------ | --------------- | ------------------ | ---------------------- | -------------- | |
|
|
| **Bleenk** | **123** | **73.2%** | **71.3%** | **45.5%** | |
|
|
| Devstral 2 | 123 | 72.2% | 61.3% | 40.5% | |
|
|
| Devstral Small 2 | 24 | 65.8% | 51.6% | 32.0% | |
|
|
| DeepSeek v3.2 | 671 | 73.1% | 70.2% | 46.4% | |
|
|
| Kimi K2 Thinking | 1000 | 71.3% | 61.1% | 35.7% | |
|
|
| MiniMax M2 | 230 | 69.4% | 56.5% | 30.0% | |
|
|
| GLM 4.6 | 455 | 68.0% | – | 40.5% | |
|
|
| Qwen 3 Coder Plus | 480 | 69.6% | 54.7% | 37.5% | |
|
|
| Gemini 3 Pro | – | 76.2% | – | 54.2% | |
|
|
| Claude Sonnet 4.5 | – | 77.2% | 68.0% | 42.8% | |
|
|
| GPT 5.1 Codex Max | – | 77.9% | – | 58.1% | |
|
|
| GPT 5.1 Codex High | – | 73.7% | – | 52.8% | |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
Environmental impact details will be released as measurements are finalized. |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
Transformer-based large language model optimized for agentic reasoning and tool usage. |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
#### Hardware |
|
|
|
|
|
Large-scale GPU/accelerator clusters |
|
|
|
|
|
#### Software |
|
|
|
|
|
Custom training and inference stack developed by Robi Labs |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Robi Labs Research Team |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
[hello@robiai.com](mailto:hello@robiai.com) |