Instructions to use RobiLabs/Bleenk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RobiLabs/Bleenk with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RobiLabs/Bleenk",
	filename="bleenk-123b-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use RobiLabs/Bleenk with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RobiLabs/Bleenk:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RobiLabs/Bleenk:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RobiLabs/Bleenk:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RobiLabs/Bleenk:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RobiLabs/Bleenk:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf RobiLabs/Bleenk:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RobiLabs/Bleenk:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RobiLabs/Bleenk:Q4_K_M

Use Docker

docker model run hf.co/RobiLabs/Bleenk:Q4_K_M

LM Studio
Jan
Ollama
How to use RobiLabs/Bleenk with Ollama:
```
ollama run hf.co/RobiLabs/Bleenk:Q4_K_M
```

Unsloth Studio new

How to use RobiLabs/Bleenk with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RobiLabs/Bleenk to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RobiLabs/Bleenk to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RobiLabs/Bleenk to start chatting

Docker Model Runner
How to use RobiLabs/Bleenk with Docker Model Runner:
```
docker model run hf.co/RobiLabs/Bleenk:Q4_K_M
```

Lemonade

How to use RobiLabs/Bleenk with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RobiLabs/Bleenk:Q4_K_M

Run and chat with the model

lemonade run user.Bleenk-Q4_K_M

List all available models

lemonade list

Model Card for Bleenk

Model Summary

Bleenk 123B is an agentic large language model developed by Robi Labs for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

Model Details

Model Description

Developed by: Robi Labs
Created for: Bleenk
Funded by: Robi Labs
Shared by: Robi Labs
Model type: Agentic Large Language Model (LLM)
Language(s) (NLP): Primarily English; supports multilingual code and technical text
License: To be released by Robi Labs
Finetuned from model: Proprietary pretraining and fine-tuning pipeline

Model Sources

Demo: https://bleenk.app

Uses

Direct Use

Software engineering agents
AI-powered code assistants
Codebase navigation and analysis
Multi-file refactoring and maintenance
Tool-augmented development workflows

Downstream Use

Fine-tuning for organization-specific codebases
Integration into internal developer platforms
Agent frameworks for autonomous engineering

Out-of-Scope Use

General-purpose chat or conversational agents
High-risk decision-making without human oversight
Tasks requiring domain-specific legal, medical, or financial guarantees

Bias, Risks, and Limitations

The model may produce incorrect or incomplete code without verification
Tool misuse may result in unintended system changes
Performance depends on tool availability and prompt quality
Trained primarily on publicly available and licensed data, which may encode historical biases

Recommendations

Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

How to Get Started with the Model

ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest

Training Details

Training Data

The model was trained on a mixture of:

Publicly available code repositories
Licensed datasets
Synthetic data generated for software engineering tasks

Training Procedure

Preprocessing

Data was filtered for quality, deduplicated, and normalized for code and technical text.

Training Hyperparameters

Training regime: Mixed-precision training (bf16)

Evaluation

Testing Data, Factors & Metrics

Testing Data

SWE-bench Verified
SWE-bench Multilingual
Terminal Bench

Metrics

Task success rate
Patch correctness
Tool execution accuracy

Results

Model	Size (B Tokens)	SWE Bench Verified	SWE Bench Multilingual	Terminal Bench
Bleenk	123	73.2%	71.3%	45.5%
Devstral 2	123	72.2%	61.3%	40.5%
Devstral Small 2	24	65.8%	51.6%	32.0%
DeepSeek v3.2	671	73.1%	70.2%	46.4%
Kimi K2 Thinking	1000	71.3%	61.1%	35.7%
MiniMax M2	230	69.4%	56.5%	30.0%
GLM 4.6	455	68.0%	–	40.5%
Qwen 3 Coder Plus	480	69.6%	54.7%	37.5%
Gemini 3 Pro	–	76.2%	–	54.2%
Claude Sonnet 4.5	–	77.2%	68.0%	42.8%
GPT 5.1 Codex Max	–	77.9%	–	58.1%
GPT 5.1 Codex High	–	73.7%	–	52.8%

Environmental Impact

Environmental impact details will be released as measurements are finalized.

Technical Specifications

Model Architecture and Objective

Transformer-based large language model optimized for agentic reasoning and tool usage.

Compute Infrastructure

Hardware

Large-scale GPU/accelerator clusters

Software

Custom training and inference stack developed by Robi Labs

Model Card Authors

Robi Labs Research Team

Model Card Contact

hello@robiai.com

Downloads last month: 3

GGUF

Model size

125B params

Architecture

mistral3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support