File size: 4,942 Bytes
c864fd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language:
- en
tags:
- code
---



# Model Card for Bleenk

## Model Summary

**Bleenk 123B** is an agentic large language model developed by **[Robi Labs](https://www.robiai.com/)** for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

## Model Details

### Model Description

* **Developed by:** [Robi Labs](https://www.robiai.com/)
* **Created for:** [Bleenk](https://www.bleenk.app/)
* **Funded by:** [Robi Labs](https://www.robiai.com/)
* **Shared by:** [Robi Labs](https://www.robiai.com/)
* **Model type:** Agentic Large Language Model (LLM)
* **Language(s) (NLP):** Primarily English; supports multilingual code and technical text
* **License:** To be released by Robi Labs
* **Finetuned from model:** Proprietary pretraining and fine-tuning pipeline

### Model Sources

* **Demo:** [https://bleenk.app](https://bleenk.app)

## Uses

### Direct Use

* Software engineering agents
* AI-powered code assistants
* Codebase navigation and analysis
* Multi-file refactoring and maintenance
* Tool-augmented development workflows

### Downstream Use

* Fine-tuning for organization-specific codebases
* Integration into internal developer platforms
* Agent frameworks for autonomous engineering

### Out-of-Scope Use

* General-purpose chat or conversational agents
* High-risk decision-making without human oversight
* Tasks requiring domain-specific legal, medical, or financial guarantees

## Bias, Risks, and Limitations

* The model may produce incorrect or incomplete code without verification
* Tool misuse may result in unintended system changes
* Performance depends on tool availability and prompt quality
* Trained primarily on publicly available and licensed data, which may encode historical biases

### Recommendations

Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

## How to Get Started with the Model

```bash
ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest
```

## Training Details

### Training Data

The model was trained on a mixture of:

* Publicly available code repositories
* Licensed datasets
* Synthetic data generated for software engineering tasks

### Training Procedure

#### Preprocessing

Data was filtered for quality, deduplicated, and normalized for code and technical text.

#### Training Hyperparameters

* **Training regime:** Mixed-precision training (bf16)

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

* SWE-bench Verified
* SWE-bench Multilingual
* Terminal Bench

#### Metrics

* Task success rate
* Patch correctness
* Tool execution accuracy

### Results

| Model              | Size (B Tokens) | SWE Bench Verified | SWE Bench Multilingual | Terminal Bench |
| ------------------ | --------------- | ------------------ | ---------------------- | -------------- |
| **Bleenk**         | **123**         | **73.2%**          | **71.3%**              | **45.5%**      |
| Devstral 2         | 123             | 72.2%              | 61.3%                  | 40.5%          |
| Devstral Small 2   | 24              | 65.8%              | 51.6%                  | 32.0%          |
| DeepSeek v3.2      | 671             | 73.1%              | 70.2%                  | 46.4%          |
| Kimi K2 Thinking   | 1000            | 71.3%              | 61.1%                  | 35.7%          |
| MiniMax M2         | 230             | 69.4%              | 56.5%                  | 30.0%          |
| GLM 4.6            | 455             | 68.0%              | –                      | 40.5%          |
| Qwen 3 Coder Plus  | 480             | 69.6%              | 54.7%                  | 37.5%          |
| Gemini 3 Pro       | –               | 76.2%              | –                      | 54.2%          |
| Claude Sonnet 4.5  | –               | 77.2%              | 68.0%                  | 42.8%          |
| GPT 5.1 Codex Max  | –               | 77.9%              | –                      | 58.1%          |
| GPT 5.1 Codex High | –               | 73.7%              | –                      | 52.8%          |

## Environmental Impact

Environmental impact details will be released as measurements are finalized.

## Technical Specifications

### Model Architecture and Objective

Transformer-based large language model optimized for agentic reasoning and tool usage.

### Compute Infrastructure

#### Hardware

Large-scale GPU/accelerator clusters

#### Software

Custom training and inference stack developed by Robi Labs

## Model Card Authors

Robi Labs Research Team

## Model Card Contact

[hello@robiai.com](mailto:hello@robiai.com)