Instructions to use Maincode/Maincoder-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Maincode/Maincoder-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Maincode/Maincoder-1B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Maincode/Maincoder-1B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Maincode/Maincoder-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Maincode/Maincoder-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Maincode/Maincoder-1B

SGLang

How to use Maincode/Maincoder-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Maincode/Maincoder-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Maincode/Maincoder-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Maincode/Maincoder-1B with Docker Model Runner:
```
docker model run hf.co/Maincode/Maincoder-1B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Maincoder-1B is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment.

Key Features

Code Generation: Optimized for Python code completion and generation tasks.
Compact Size: 1 billion parameters, lightweight enough to run on consumer hardware.
Deep Architecture: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio.
Advanced Data Mixing: Pre-trained and mid-trained on custom data mixes developed for high-performance coding.
MCPO Algorithm: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence.
SOTA Performance: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.

Benchmark Results

Benchmark Performance Across Baseline LLMs

Model	HumanEval	HumanEval+	MBPP+	MMLU	GSM8K
Maincode/Maincoder-1B	0.7622	0.7256	0.7090	0.3054	0.2976
deepseek-ai/deepseek-coder-1.3b-instruct	0.5610	0.5305	0.6217	0.2705	0.0413
HuggingFaceTB/SmolLM3-3B	0.5366	0.5000	0.6799	0.5928	0.5505
Qwen/Qwen2.5-Coder-1.5B-Instruct	0.4634	0.4451	0.6561	0.4984	0.4944
Qwen/Qwen3-1.7B	0.4024	0.3780	0.5582	0.5571	0.6865

Model Overview

Maincoder uses a modern transformer decoder architecture with:

Rotary Position Embeddings: With theta of 1,000,000.
RMSNorm: Pre-normalization for stable training.
Grouped Query Attention: 4:1 ratio of query to key-value heads.
QK Normalization: RMSNorm applied to attention queries and keys.
SwiGLU MLP: Gated linear units with SiLU activation.

Attribute	Value
Parameters	1B
Hidden Size	1536
Layers	32
Attention Heads	16 (4 KV heads)
Head Dimension	96
Vocabulary Size	151,936
Context Length	2,048
Precision	bfloat16

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Maincode/Maincoder-1B",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "Maincode/Maincoder-1B",
    trust_remote_code=True,
)

# Code completion example
prompt = '''def fibonacci(n: int) -> int:
    """Return the n-th Fibonacci number."""
'''

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Completion

# Function completion
prompt = '''def quicksort(arr: list) -> list:
    """Sort a list using the quicksort algorithm."""
'''

# Class completion
prompt = '''class BinarySearchTree:
    """A binary search tree implementation."""
    
    def __init__(self):
'''

# Algorithm implementation
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
    """Find the shortest path using Dijkstra's algorithm.
    
    Args:
        graph: Adjacency list representation of the graph
        start: Starting node
        end: Target node
    
    Returns:
        Tuple of (distance, path)
    """
'''

Additional Notes

Reproducibility

Model evaluations were run on 8 AMD MI355X GPUs via the EleutherAI framework.

docker run --rm -it \
  --device=/dev/kfd --device=/dev/dri --group-add=video \
  --ipc=host --security-opt seccomp=unconfined \
  -v $(pwd):/workspace -w /workspace \
  -e HF_TOKEN \
  -e PYTHONHASHSEED=0 \
  -e TORCH_DETERMINISTIC=1 \
  -e ROCBLAS_ATOMICS_MODE="0" \
  -e MIOPEN_FIND_MODE="1" \
  -e CUBLAS_WORKSPACE_CONFIG=":4096:8" \
  -e HF_ALLOW_CODE_EVAL="1" \
  rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \
  bash -c 'pip install "lm_eval[hf]" && \
  accelerate launch -m lm_eval \
  --model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \
  --tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \
  --device cuda:0 --batch_size 32 --seed 42 \
  --confirm_run_unsafe_code'

Limitations

Context length limited to 2,048 tokens
Primarily optimized for Python, performance may vary on other languages
May generate code with bugs or security issues - always review generated code

Disclaimer: This model has not undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.

License

This model is released under the Apache 2.0 License.

Citation

@misc{maincoder2025,
  title        = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
  author       = {Maincode Team},
  year         = {2025},
  organization = {Maincode},
  howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
}

Contact

For questions, issues, or collaboration inquiries, please visit Maincode.

Downloads last month: 395

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for Maincode/Maincoder-1B

Adapters

1 model

Quantizations

6 models