File size: 4,942 Bytes

c864fd4

---
language:
- en
tags:
- code
---



# Model Card for Bleenk

## Model Summary

**Bleenk 123B** is an agentic large language model developed by **[Robi Labs](https://www.robiai.com/)** for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

## Model Details

### Model Description

* **Developed by:** [Robi Labs](https://www.robiai.com/)
* **Created for:** [Bleenk](https://www.bleenk.app/)
* **Funded by:** [Robi Labs](https://www.robiai.com/)
* **Shared by:** [Robi Labs](https://www.robiai.com/)
* **Model type:** Agentic Large Language Model (LLM)
* **Language(s) (NLP):** Primarily English; supports multilingual code and technical text
* **License:** To be released by Robi Labs
* **Finetuned from model:** Proprietary pretraining and fine-tuning pipeline

### Model Sources

* **Demo:** [https://bleenk.app](https://bleenk.app)

## Uses

### Direct Use

* Software engineering agents
* AI-powered code assistants
* Codebase navigation and analysis
* Multi-file refactoring and maintenance
* Tool-augmented development workflows

### Downstream Use

* Fine-tuning for organization-specific codebases
* Integration into internal developer platforms
* Agent frameworks for autonomous engineering

### Out-of-Scope Use

* General-purpose chat or conversational agents
* High-risk decision-making without human oversight
* Tasks requiring domain-specific legal, medical, or financial guarantees

## Bias, Risks, and Limitations

* The model may produce incorrect or incomplete code without verification
* Tool misuse may result in unintended system changes
* Performance depends on tool availability and prompt quality
* Trained primarily on publicly available and licensed data, which may encode historical biases

### Recommendations

Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

## How to Get Started with the Model

```bash
ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest
```

## Training Details

### Training Data

The model was trained on a mixture of:

* Publicly available code repositories
* Licensed datasets
* Synthetic data generated for software engineering tasks

### Training Procedure

#### Preprocessing

Data was filtered for quality, deduplicated, and normalized for code and technical text.

#### Training Hyperparameters

* **Training regime:** Mixed-precision training (bf16)

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

* SWE-bench Verified
* SWE-bench Multilingual
* Terminal Bench

#### Metrics

* Task success rate
* Patch correctness
* Tool execution accuracy

### Results

| Model              | Size (B Tokens) | SWE Bench Verified | SWE Bench Multilingual | Terminal Bench |
| ------------------ | --------------- | ------------------ | ---------------------- | -------------- |
| **Bleenk**         | **123**         | **73.2%**          | **71.3%**              | **45.5%**      |
| Devstral 2         | 123             | 72.2%              | 61.3%                  | 40.5%          |
| Devstral Small 2   | 24              | 65.8%              | 51.6%                  | 32.0%          |
| DeepSeek v3.2      | 671             | 73.1%              | 70.2%                  | 46.4%          |
| Kimi K2 Thinking   | 1000            | 71.3%              | 61.1%                  | 35.7%          |
| MiniMax M2         | 230             | 69.4%              | 56.5%                  | 30.0%          |
| GLM 4.6            | 455             | 68.0%              | –                      | 40.5%          |
| Qwen 3 Coder Plus  | 480             | 69.6%              | 54.7%                  | 37.5%          |
| Gemini 3 Pro       | –               | 76.2%              | –                      | 54.2%          |
| Claude Sonnet 4.5  | –               | 77.2%              | 68.0%                  | 42.8%          |
| GPT 5.1 Codex Max  | –               | 77.9%              | –                      | 58.1%          |
| GPT 5.1 Codex High | –               | 73.7%              | –                      | 52.8%          |

## Environmental Impact

Environmental impact details will be released as measurements are finalized.

## Technical Specifications

### Model Architecture and Objective

Transformer-based large language model optimized for agentic reasoning and tool usage.

### Compute Infrastructure

#### Hardware

Large-scale GPU/accelerator clusters

#### Software

Custom training and inference stack developed by Robi Labs

## Model Card Authors

Robi Labs Research Team

## Model Card Contact

[hello@robiai.com](mailto:hello@robiai.com)