Instructions to use marcoonorato91/LLAMUsic2-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use marcoonorato91/LLAMUsic2-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="marcoonorato91/LLAMUsic2-1b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("marcoonorato91/LLAMUsic2-1b") model = AutoModelForCausalLM.from_pretrained("marcoonorato91/LLAMUsic2-1b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use marcoonorato91/LLAMUsic2-1b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "marcoonorato91/LLAMUsic2-1b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marcoonorato91/LLAMUsic2-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/marcoonorato91/LLAMUsic2-1b
- SGLang
How to use marcoonorato91/LLAMUsic2-1b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "marcoonorato91/LLAMUsic2-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marcoonorato91/LLAMUsic2-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "marcoonorato91/LLAMUsic2-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marcoonorato91/LLAMUsic2-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use marcoonorato91/LLAMUsic2-1b with Docker Model Runner:
docker model run hf.co/marcoonorato91/LLAMUsic2-1b
Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)
Model Information
WELCOME TO LLAMUSIC 2! The LLAMUsic is a finetuned version of Llama 3.2 instruction-tuned generative models in 1B size (text in/text out).
Model Developers: Marco Onorato, Riccardo Preite, Niccolò Monaco
Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Supported Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported.
Llama 3.2 Model Family: Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.
Model Release Date: 2025-03-11
Status: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.
License: MIT License, please use this with conscience.
Feedback: You can contact info.llamusic@gmail.com
Intended Use
Intended Use Cases: Llama 3.2 is intended for personal and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.
Out of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.
How to use
Use with transformers
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Make sure to update your transformers installation via pip install --upgrade transformers.
import torch
from transformers import pipeline
model_id = "marcoonorato91/LLAMUsic2-1b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are LLAMUsic, an artificial intelligence expert of music."},
{"role": "user", "content": "Write a guitar tab in the style of Metallica and include lyrics."},
]
outputs = pipe(
messages,
max_new_tokens=4096,
)
print(outputs[0]["generated_text"][-1])
Use with ollama
Please, follow the instructions here to install ollama
Then you can pull from the public llamusic ollama hub
Two models are available: the standard version and the Q4_K_M quantized version
- Downloads last month
- 16

docker model run hf.co/marcoonorato91/LLAMUsic2-1b