Instructions to use ShahriarFerdoush/llama-3.2-1b-code-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ShahriarFerdoush/llama-3.2-1b-code-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ShahriarFerdoush/llama-3.2-1b-code-instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ShahriarFerdoush/llama-3.2-1b-code-instruct")
model = AutoModelForCausalLM.from_pretrained("ShahriarFerdoush/llama-3.2-1b-code-instruct")

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ShahriarFerdoush/llama-3.2-1b-code-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ShahriarFerdoush/llama-3.2-1b-code-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ShahriarFerdoush/llama-3.2-1b-code-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ShahriarFerdoush/llama-3.2-1b-code-instruct

SGLang

How to use ShahriarFerdoush/llama-3.2-1b-code-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ShahriarFerdoush/llama-3.2-1b-code-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ShahriarFerdoush/llama-3.2-1b-code-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ShahriarFerdoush/llama-3.2-1b-code-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ShahriarFerdoush/llama-3.2-1b-code-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ShahriarFerdoush/llama-3.2-1b-code-instruct with Docker Model Runner:
```
docker model run hf.co/ShahriarFerdoush/llama-3.2-1b-code-instruct
```

llama-3.2-1b-code-instruct / README.md

ShahriarFerdoush

Update README.md

fdd74ef verified 5 months ago

preview code

raw

history blame contribute delete

4.35 kB

metadata

library_name: transformers
license: apache-2.0
datasets:
  - sahil2801/CodeAlpaca-20k
base_model:
  - meta-llama/Llama-3.2-1B

🧠 Llama-3.2-1B Code Solver (QLoRA Fine-Tuned)

A lightweight yet powerful code-focused language model fine-tuned from Meta Llama-3.2-1B using QLoRA (4-bit) on the CodeAlpaca-20K dataset.
Designed for efficient code generation, reasoning, and problem-solving on limited GPU resources.

🚀 Trained on a single Tesla P100 GPU
⚡ Optimized for Kaggle, Colab, and low-VRAM environments
🧩 Ideal for research, education, and rapid prototyping

🔍 Model Overview

Attribute	Value
Base Model	`meta-llama/Llama-3.2-1B`
Model Type	Decoder-only causal language model
Fine-Tuning Method	QLoRA (4-bit quantization + LoRA)
LoRA Rank	16
Task Domain	Code generation & code reasoning
Training Samples	10,000
Training Time	~5 hours
Hardware	NVIDIA Tesla P100
Precision	4-bit (NF4)
Frameworks	Hugging Face Transformers, PEFT, BitsAndBytes

🎯 What This Model Is Good At

🧑‍💻 Code generation (Python-focused, but generalizable)
🧠 Step-by-step coding reasoning
🧪 Algorithmic problem solving
📘 Educational coding assistance
⚙️ Running efficiently on low-VRAM GPUs

📚 Training Dataset

CodeAlpaca-20K

A high-quality instruction-tuning dataset derived from the Alpaca format and specialized for coding tasks.

Total dataset size: 20,000 samples
Used for training: 10,000 samples (50%)

Data format:

{
  "instruction": "Describe the coding task",
  "input": "Optional context or input code",
  "output": "Expected code solution"
}

Task Types:
- Algorithm implementation
- Code completion
- Debugging
- Function writing
- Problem solving

🏗️ Training Methodology

This model was fine-tuned using QLoRA, enabling efficient adaptation of large language models on limited hardware.

Key Techniques Used

4-bit Quantization (NF4) via BitsAndBytes
LoRA adapters applied to attention layers
Frozen base model weights
Low-rank updates only

Why QLoRA?

🔻 Drastically reduces GPU memory usage
⚡ Enables training on consumer-grade GPUs
📈 Maintains strong downstream performance

⚙️ Training Configuration

Parameter	Value
Max Sequence Length	1024
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.05
Optimizer	AdamW
Learning Rate	2e-4
Batch Size	Small (GPU-constrained)
Gradient Accumulation	Enabled
Quantization	4-bit

🚀 Usage

Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "YOUR_USERNAME/llama-3.2-1b-code-solver"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    load_in_4bit=True
)

Example Inference

prompt = "Write a Python function to check if a number is prime."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧪 Evaluation Notes

This model is instruction-tuned, not benchmark-optimized
No formal benchmarks (HumanEval / MBPP) were run
Best evaluated through qualitative code generation

⚠️ Limitations

1B parameters → limited long-context reasoning
Not optimized for natural language chat
May hallucinate on complex or ambiguous prompts
English-centric training data

🧭 Intended Use

✅ Allowed

Research and experimentation
Coding assistants
Educational tools
Prototyping LLM systems

🙏 Acknowledgements

Meta AI for Llama 3.2
CodeAlpaca dataset creators
Hugging Face ecosystem
QLoRA & PEFT authors