Instructions to use suresh2001/llama-3.2-1b-instruct-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use suresh2001/llama-3.2-1b-instruct-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="suresh2001/llama-3.2-1b-instruct-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned")
model = AutoModelForCausalLM.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use suresh2001/llama-3.2-1b-instruct-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "suresh2001/llama-3.2-1b-instruct-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "suresh2001/llama-3.2-1b-instruct-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/suresh2001/llama-3.2-1b-instruct-finetuned

SGLang

How to use suresh2001/llama-3.2-1b-instruct-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "suresh2001/llama-3.2-1b-instruct-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "suresh2001/llama-3.2-1b-instruct-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "suresh2001/llama-3.2-1b-instruct-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "suresh2001/llama-3.2-1b-instruct-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use suresh2001/llama-3.2-1b-instruct-finetuned with Docker Model Runner:
```
docker model run hf.co/suresh2001/llama-3.2-1b-instruct-finetuned
```

Fine-tuned Llama 3.2 1B Instruct

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct.

Model Details

Base Model: meta-llama/Llama-3.2-1B-Instruct
Model Type: Causal Language Model
Architecture: Llama 3.2
Parameters: ~1.2B
Fine-tuning: Custom fine-tuned model

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned")
model = AutoModelForCausalLM.from_pretrained(
    "suresh2001/llama-3.2-1b-instruct-finetuned",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Architecture

This model follows the Llama 3.2 architecture with:

16 transformer layers
32 attention heads
2048 hidden size
8192 intermediate size
131072 max position embeddings
RoPE (Rotary Position Embedding) with Llama 3 scaling

Training Details

This model was fine-tuned from the base Llama 3.2 1B Instruct model. The specific training details and dataset information would depend on your fine-tuning process.

Intended Use

This model is designed for instruction-following tasks and conversational AI applications. It can be used for:

Text generation
Question answering
Creative writing
Code generation
General conversation

Limitations

This model inherits the limitations of the base Llama 3.2 1B model
Performance may vary depending on the specific fine-tuning data and objectives
As with all language models, outputs should be carefully reviewed for accuracy and appropriateness

Ethical Considerations

Please use this model responsibly and in accordance with Meta's Llama 3.2 license and usage policies.

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for suresh2001/llama-3.2-1b-instruct-finetuned

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(1749)

this model