Instructions to use suresh2001/llama-3.2-1b-instruct-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use suresh2001/llama-3.2-1b-instruct-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="suresh2001/llama-3.2-1b-instruct-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned") model = AutoModelForCausalLM.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use suresh2001/llama-3.2-1b-instruct-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "suresh2001/llama-3.2-1b-instruct-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "suresh2001/llama-3.2-1b-instruct-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/suresh2001/llama-3.2-1b-instruct-finetuned
- SGLang
How to use suresh2001/llama-3.2-1b-instruct-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "suresh2001/llama-3.2-1b-instruct-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "suresh2001/llama-3.2-1b-instruct-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "suresh2001/llama-3.2-1b-instruct-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "suresh2001/llama-3.2-1b-instruct-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use suresh2001/llama-3.2-1b-instruct-finetuned with Docker Model Runner:
docker model run hf.co/suresh2001/llama-3.2-1b-instruct-finetuned
Fine-tuned Llama 3.2 1B Instruct
This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct.
Model Details
- Base Model: meta-llama/Llama-3.2-1B-Instruct
- Model Type: Causal Language Model
- Architecture: Llama 3.2
- Parameters: ~1.2B
- Fine-tuning: Custom fine-tuned model
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("suresh2001/llama-3.2-1b-instruct-finetuned")
model = AutoModelForCausalLM.from_pretrained(
"suresh2001/llama-3.2-1b-instruct-finetuned",
torch_dtype=torch.float16,
device_map="auto"
)
# Generate text
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=100,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Model Architecture
This model follows the Llama 3.2 architecture with:
- 16 transformer layers
- 32 attention heads
- 2048 hidden size
- 8192 intermediate size
- 131072 max position embeddings
- RoPE (Rotary Position Embedding) with Llama 3 scaling
Training Details
This model was fine-tuned from the base Llama 3.2 1B Instruct model. The specific training details and dataset information would depend on your fine-tuning process.
Intended Use
This model is designed for instruction-following tasks and conversational AI applications. It can be used for:
- Text generation
- Question answering
- Creative writing
- Code generation
- General conversation
Limitations
- This model inherits the limitations of the base Llama 3.2 1B model
- Performance may vary depending on the specific fine-tuning data and objectives
- As with all language models, outputs should be carefully reviewed for accuracy and appropriateness
Ethical Considerations
Please use this model responsibly and in accordance with Meta's Llama 3.2 license and usage policies.
- Downloads last month
- -
Model tree for suresh2001/llama-3.2-1b-instruct-finetuned
Base model
meta-llama/Llama-3.2-1B-Instruct