Bleenk / README.md

Update README.md

6f1ce9d verified about 2 months ago

4.94 kB

	---
	language:
	- en
	tags:
	- code
	---



	# Model Card for Bleenk

	## Model Summary

	Bleenk 123B is an agentic large language model developed by [Robi Labs](https://www.robiai.com/) for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

	Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

	## Model Details

	### Model Description

	* Developed by: [Robi Labs](https://www.robiai.com/)
	* Created for: [Bleenk](https://www.bleenk.app/)
	* Funded by: [Robi Labs](https://www.robiai.com/)
	* Shared by: [Robi Labs](https://www.robiai.com/)
	* Model type: Agentic Large Language Model (LLM)
	* Language(s) (NLP): Primarily English; supports multilingual code and technical text
	* License: To be released by Robi Labs
	* Finetuned from model: Proprietary pretraining and fine-tuning pipeline

	### Model Sources

	* Demo: [https://bleenk.app](https://bleenk.app)

	## Uses

	### Direct Use

	* Software engineering agents
	* AI-powered code assistants
	* Codebase navigation and analysis
	* Multi-file refactoring and maintenance
	* Tool-augmented development workflows

	### Downstream Use

	* Fine-tuning for organization-specific codebases
	* Integration into internal developer platforms
	* Agent frameworks for autonomous engineering

	### Out-of-Scope Use

	* General-purpose chat or conversational agents
	* High-risk decision-making without human oversight
	* Tasks requiring domain-specific legal, medical, or financial guarantees

	## Bias, Risks, and Limitations

	* The model may produce incorrect or incomplete code without verification
	* Tool misuse may result in unintended system changes
	* Performance depends on tool availability and prompt quality
	* Trained primarily on publicly available and licensed data, which may encode historical biases

	### Recommendations

	Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

	## How to Get Started with the Model

	```bash
	ollama pull RobiLabs/bleenk:latest
	ollama run RobiLabs/bleenk:latest
	```

	## Training Details

	### Training Data

	The model was trained on a mixture of:

	* Publicly available code repositories
	* Licensed datasets
	* Synthetic data generated for software engineering tasks

	### Training Procedure

	#### Preprocessing

	Data was filtered for quality, deduplicated, and normalized for code and technical text.

	#### Training Hyperparameters

	* Training regime: Mixed-precision training (bf16)

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	* SWE-bench Verified
	* SWE-bench Multilingual
	* Terminal Bench

	#### Metrics

	* Task success rate
	* Patch correctness
	* Tool execution accuracy

	### Results

	\| Model \| Size (B Tokens) \| SWE Bench Verified \| SWE Bench Multilingual \| Terminal Bench \|
	\| ------------------ \| --------------- \| ------------------ \| ---------------------- \| -------------- \|
	\| Bleenk \| 123 \| 73.2% \| 71.3% \| 45.5% \|
	\| Devstral 2 \| 123 \| 72.2% \| 61.3% \| 40.5% \|
	\| Devstral Small 2 \| 24 \| 65.8% \| 51.6% \| 32.0% \|
	\| DeepSeek v3.2 \| 671 \| 73.1% \| 70.2% \| 46.4% \|
	\| Kimi K2 Thinking \| 1000 \| 71.3% \| 61.1% \| 35.7% \|
	\| MiniMax M2 \| 230 \| 69.4% \| 56.5% \| 30.0% \|
	\| GLM 4.6 \| 455 \| 68.0% \| – \| 40.5% \|
	\| Qwen 3 Coder Plus \| 480 \| 69.6% \| 54.7% \| 37.5% \|
	\| Gemini 3 Pro \| – \| 76.2% \| – \| 54.2% \|
	\| Claude Sonnet 4.5 \| – \| 77.2% \| 68.0% \| 42.8% \|
	\| GPT 5.1 Codex Max \| – \| 77.9% \| – \| 58.1% \|
	\| GPT 5.1 Codex High \| – \| 73.7% \| – \| 52.8% \|

	## Environmental Impact

	Environmental impact details will be released as measurements are finalized.

	## Technical Specifications

	### Model Architecture and Objective

	Transformer-based large language model optimized for agentic reasoning and tool usage.

	### Compute Infrastructure

	#### Hardware

	Large-scale GPU/accelerator clusters

	#### Software

	Custom training and inference stack developed by Robi Labs

	## Model Card Authors

	Robi Labs Research Team

	## Model Card Contact

	[hello@robiai.com](mailto:hello@robiai.com)