Instructions to use dineth554/legion-coder-8m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dineth554/legion-coder-8m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dineth554/legion-coder-8m")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dineth554/legion-coder-8m")
model = AutoModelForCausalLM.from_pretrained("dineth554/legion-coder-8m")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dineth554/legion-coder-8m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dineth554/legion-coder-8m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/dineth554/legion-coder-8m

SGLang

How to use dineth554/legion-coder-8m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dineth554/legion-coder-8m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dineth554/legion-coder-8m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dineth554/legion-coder-8m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use dineth554/legion-coder-8m with Docker Model Runner:
```
docker model run hf.co/dineth554/legion-coder-8m
```

dineth554 commited on Mar 8

Commit

7c5aef2

verified ·

1 Parent(s): 1964634

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +157 -13

README.md CHANGED Viewed

@@ -1,31 +1,175 @@
-# Legion-Coder-8M
-A compact 8M parameter transformer model optimized for coding tasks.
 ## Model Details
-- **Architecture**: GPT-style transformer
-- **Parameters**: 7,510,528 (~8M)
-- **Vocabulary Size**: 16,000
-- **Hidden Size**: 256
-- **Layers**: 6
-- **Attention Heads**: 8
 - **Context Length**: 1,024 tokens
 - **Format**: Safetensors
 ## Usage
 ```python
 from transformers import AutoModel, AutoTokenizer
-model = AutoModel.from_pretrained("your-username/Legion-Coder-8M")
-tokenizer = AutoTokenizer.from_pretrained("your-username/Legion-Coder-8M")
 ```
-## Training
-Trained on Python code datasets with memory-efficient techniques for CPU environments.
 ## License
-MIT License

+---
+language:
+- en
+tags:
+- code
+- coding
+- python
+- programming
+- text-generation
+- causal-lm
+- transformer
+- gpt
+- legion-coder
+- code-generation
+- code-completion
+license: mit
+datasets:
+- the-stack-v2
+- codeparrot/github-code
+- bigcode/the-stack
+model-index:
+- name: Legion Coder 8M
+  results: []
+---
+# Legion Coder 8M
+A compact yet powerful 44M parameter transformer model optimized for coding tasks. Legion Coder is designed to generate clean, efficient, and well-documented code while maintaining a small footprint suitable for local deployment.
 ## Model Details
+- **Architecture**: GPT-style transformer with pre-normalization
+- **Parameters**: 44,341,632 (~44M)
+- **Vocabulary Size**: 16,000 (BPE tokenizer optimized for code)
+- **Hidden Size (d_model)**: 576
+- **Layers**: 13
+- **Attention Heads**: 16
+- **Feed-forward Dimension**: 1,152
 - **Context Length**: 1,024 tokens
 - **Format**: Safetensors
+- **Precision**: float32
+## Model Specifications
+| Attribute | Value |
+|-----------|-------|
+| Model Type | Causal Language Model |
+| Architecture | Transformer Decoder |
+| Parameters | 44,341,632 |
+| Hidden Size | 576 |
+| Num Layers | 13 |
+| Num Attention Heads | 16 |
+| Intermediate Size | 1,152 |
+| Max Position Embeddings | 1,024 |
+| Vocab Size | 16,000 |
+## Intended Use
+This model is designed for:
+- **Code Generation**: Generate Python and other programming language code
+- **Code Completion**: Complete partial code snippets
+- **Code Explanation**: Provide explanations for code functionality
+- **Debugging Assistance**: Help identify and fix code issues
+- **Educational Purposes**: Learn programming concepts through examples
 ## Usage
+### Loading the Model
 ```python
 from transformers import AutoModel, AutoTokenizer
+import torch
+# Load model and tokenizer
+model = AutoModel.from_pretrained("pnny13/legion-coder-8m", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("pnny13/legion-coder-8m", trust_remote_code=True)
+# Set to eval mode
+model.eval()
 ```
+### Generating Code
+```python
+# Prepare prompt
+prompt = "# Write a function to calculate factorial\ndef factorial(n):"
+inputs = tokenizer(prompt, return_tensors="pt")
+# Generate
+with torch.no_grad():
+    outputs = model.generate(
+        inputs.input_ids,
+        max_length=200,
+        temperature=0.8,
+        top_p=0.95,
+        top_k=50
+    )
+# Decode
+generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_code)
+```
+## System Prompt
+For optimal results, use the following system prompt:
+```
+You are Legion Coder, an expert coding assistant. Your purpose is to help users write clean, efficient, and well-documented code.
+Guidelines:
+- Write code that follows best practices and PEP 8 style guidelines
+- Include helpful comments explaining complex logic
+- Provide complete, runnable code examples
+- Explain your approach before showing code when helpful
+- If asked to debug, identify the issue and provide the corrected code
+Always wrap code blocks in triple backticks with the appropriate language identifier.
+```
+## Training Details
+### Training Data
+- Python code from The Stack v2 dataset
+- GitHub code repositories (filtered for quality)
+- Code-specific preprocessing to handle indentation and special tokens
+### Training Procedure
+- Optimizer: AdamW
+- Learning Rate: 5e-4 with cosine decay
+- Batch Size: 4 with gradient accumulation
+- Training Steps: 10,000
+- Mixed Precision: No (CPU-optimized)
+## Limitations
+- **Context Length**: Limited to 1,024 tokens
+- **Language Support**: Primarily optimized for Python
+- **Model Size**: 44M parameters may not capture all programming patterns
+- **Training Data**: May reflect biases present in training code
+- **No Internet Access**: Cannot access external APIs or documentation
+## Ethical Considerations
+- Generated code should be reviewed before production use
+- The model may reproduce patterns from training data; verify licensing
+- Do not use for generating malicious code
+- Consider environmental impact of model inference
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{legioncoder2024,
+  title={Legion Coder 8M: A Compact Transformer for Code Generation},
+  author={Legion Coder Team},
+  year={2024},
+  howpublished={\url{https://huggingface.co/pnny13/legion-coder-8m}}
+}
+```
 ## License
+This model is released under the MIT License.
+## Contact
+For questions or issues, please open an issue on the Hugging Face model repository.
+---
+**Model Version**: 1.0.0
+**Last Updated**: 2024-03-08
+**Hugging Face Hub**: https://huggingface.co/pnny13/legion-coder-8m