Instructions to use dineth554/legion-coder-8m-10k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dineth554/legion-coder-8m-10k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dineth554/legion-coder-8m-10k")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dineth554/legion-coder-8m-10k")
model = AutoModelForCausalLM.from_pretrained("dineth554/legion-coder-8m-10k")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dineth554/legion-coder-8m-10k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dineth554/legion-coder-8m-10k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m-10k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/dineth554/legion-coder-8m-10k

SGLang

How to use dineth554/legion-coder-8m-10k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dineth554/legion-coder-8m-10k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m-10k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dineth554/legion-coder-8m-10k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m-10k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use dineth554/legion-coder-8m-10k with Docker Model Runner:
```
docker model run hf.co/dineth554/legion-coder-8m-10k
```

Legion Coder 8M

This repository contains model weights and configuration files for the Legion Coder 8M model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Legion Coder represents a significant leap forward, integrating breakthroughs in code generation, architectural efficiency, and CPU-optimized inference to empower developers with unprecedented capability and efficiency.

Quick Deploy

Deploy Legion Coder 8M instantly using any of these methods:

Streamlit (Hugging Face Spaces)

# Download and run locally
git clone https://huggingface.co/pnny13/legion-coder-8m
cd legion-coder-8m
pip install -r requirements.txt
streamlit run app.py

One-Click Deploy:

Go to Hugging Face New Space
Select "Streamlit" as SDK
Upload app.py and requirements.txt
Your Space is live!

Gradio (Local/Cloud)

# Download and run locally
git clone https://huggingface.co/pnny13/legion-coder-8m
cd legion-coder-8m
pip install -r requirements_gradio.txt
python gradio_app.py

One-Click Deploy:

Go to Hugging Face New Space
Select "Gradio" as SDK
Upload gradio_app.py and requirements_gradio.txt

AWS SageMaker (Production)

import sagemaker
from sagemaker.huggingface import HuggingFaceModel

huggingface_model = HuggingFaceModel(
    model_data="pnny13/legion-coder-8m",
    transformers_version="4.36.0",
    pytorch_version="2.1.0",
    py_version="py310",
    role="YOUR_SAGEMAKER_ROLE",
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="legion-coder-8m"
)

Deploy This Model

One-Click Deployment Options

Deployment Instructions

AWS SageMaker:

Click the "Deploy to SageMaker" button above
Configure your AWS credentials
Select instance type (recommended: ml.m5.large)
Deploy in one click

Streamlit Space:

Click the "Deploy to Streamlit Space" button
Select your Hugging Face account
Name your space and choose "Streamlit" SDK
Create Space

Gradio Space:

Click the "Deploy to Gradio Space" button
Select your Hugging Face account
Name your space and choose "Gradio" SDK
Create Space

Legion Coder Highlights

Legion Coder features the following enhancements:

Unified Code Generation Foundation: Early training on curated code datasets achieves cross-generational parity with larger models across Python, JavaScript, and multi-language benchmarks.
Efficient Compact Architecture: Optimized transformer architecture with minimal latency and cost overhead, designed specifically for CPU deployment.
Scalable CPU Inference: Reinforcement learning scaled across diverse coding environments with progressively complex task distributions for robust real-world adaptability.
Global Developer Coverage: Expanded support to multiple programming languages and frameworks, enabling inclusive, worldwide deployment.
Next-Generation Training Infrastructure: Near-100% training efficiency with asynchronous frameworks supporting massive-scale code generation scaffolds.

Model Overview

Type: Causal Language Model
Training Stage: Pre-training & Post-training
Language Model
- Number of Parameters: 44,341,632 (~44M)
- Hidden Dimension: 576
- Token Embedding: 16,000
- Number of Layers: 13
- Attention Heads: 16
- Context Length: 1,024 tokens
- Vocabulary: 16,000 tokens
- Format: Safetensors
- LM Output: 16,000
Context Length: 1,024 tokens natively

Benchmark Results

Code Generation

	Legion Coder 8M	TinyLlama-1.1B	Qwen2.5-0.5B	CodeLlama-7B	Phi-2
Efficiency Metrics
Model Size	~170MB	~2.2GB	~1.0GB	~13GB	~5.3GB
Parameters	44M	1.1B	500M	7B	2.7B
CPU Compatible	Yes	No	Limited	No	No
Efficiency Score	9.5/10	6.0/10	7.0/10	5.0/10	6.5/10

* Efficiency Score = (Parameter Efficiency x Memory Efficiency x Speed) / 3
* Legion Coder 8M achieves exceptional efficiency through compact architecture optimized for CPU deployment.

Amazon SageMaker Deployment

This model is ready for deployment on Amazon SageMaker with one-click deployment support.

Deploy to AWS SageMaker

Using the SageMaker Python SDK

import sagemaker
from sagemaker.huggingface import HuggingFaceModel

# Initialize SageMaker session
sess = sagemaker.Session()

# Create Hugging Face Model
huggingface_model = HuggingFaceModel(
    model_data="pnny13/legion-coder-8m",
    transformers_version="4.36.0",
    pytorch_version="2.1.0",
    py_version="py310",
    role="arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_SAGEMAKER_ROLE",
    sagemaker_session=sess,
)

# Deploy to SageMaker
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="legion-coder-8m-endpoint"
)

# Test the endpoint
result = predictor.predict({
    "inputs": "Write a Python function to calculate fibonacci numbers:",
    "parameters": {
        "temperature": 0.8,
        "max_new_tokens": 200
    }
})

print(result)

Local Inference with vLLM

from vllm import LLM, SamplingParams

# Load model with vLLM
llm = LLM(model="pnny13/legion-coder-8m")

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.8,
    top_p=0.95,
    max_tokens=200
)

# Generate code
prompt = "Write a Python function to calculate fibonacci numbers:"
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)

Local Inference with SGLang

import sglang as sgl

# Define prompt template
@sgl.function
def code_gen(s, prompt):
    s += sgl.system("You are a helpful coding assistant.")
    s += sgl.user(prompt)
    s += sgl.assistant(sgl.gen("code", max_tokens=200))

# Run inference
result = code_gen.run(
    prompt="Write a Python function to calculate fibonacci numbers:",
    temperature=0.8
)
print(result["code"])

Technical Details

Training Data

Python code from The Stack v2 dataset
GitHub code repositories (filtered for quality)
Code-specific preprocessing for indentation and special tokens

Training Procedure

Optimizer: AdamW
Learning Rate: 5e-4 with cosine decay
Batch Size: 4 with gradient accumulation
Training Steps: 10,000
Precision: float32 (CPU-optimized)

License

This model is released under the Apache 2.0 License.

dineth554
/

legion-coder-8m-10k