Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
unsloth
conversational
Instructions to use parkky21/orpheus-3b-hi-ft-1e with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use parkky21/orpheus-3b-hi-ft-1e with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="parkky21/orpheus-3b-hi-ft-1e") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("parkky21/orpheus-3b-hi-ft-1e") model = AutoModelForCausalLM.from_pretrained("parkky21/orpheus-3b-hi-ft-1e") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use parkky21/orpheus-3b-hi-ft-1e with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "parkky21/orpheus-3b-hi-ft-1e" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "parkky21/orpheus-3b-hi-ft-1e", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/parkky21/orpheus-3b-hi-ft-1e
- SGLang
How to use parkky21/orpheus-3b-hi-ft-1e with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "parkky21/orpheus-3b-hi-ft-1e" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "parkky21/orpheus-3b-hi-ft-1e", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "parkky21/orpheus-3b-hi-ft-1e" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "parkky21/orpheus-3b-hi-ft-1e", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use parkky21/orpheus-3b-hi-ft-1e with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for parkky21/orpheus-3b-hi-ft-1e to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for parkky21/orpheus-3b-hi-ft-1e to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for parkky21/orpheus-3b-hi-ft-1e to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="parkky21/orpheus-3b-hi-ft-1e", max_seq_length=2048, ) - Docker Model Runner
How to use parkky21/orpheus-3b-hi-ft-1e with Docker Model Runner:
docker model run hf.co/parkky21/orpheus-3b-hi-ft-1e
parkky21/orpheus-3b-hi-duo-voices (अनुष्का • करन)
🔎 Model Summary
- Base model: canopylabs/3b-hi-pretrain-research_release
- Finetuned by: parkky21
- Language: Hindi (hi), with Hinglish tolerance
- Voices: अनुष्का (warm, curious), करन (friendly, direct)
- Architecture: LLaMA-family, decoder-only
- Intended use: Multi-turn dialogue in Hindi with lightweight “voice” control via speaker prefixes
▶️ Try It (Colab) Use the Colab notebook for inference and examples—no local setup needed: Colab: https://colab.research.google.com/drive/1-greyn4D7-0SVUx86fGPzj5rjB2DjGUn?usp=sharing
✨ What’s Special
- Two natural voices out of the box—switch tone by prefixing lines with the speaker name.
- Simple prompting (no special chat template required).
- Fast + lightweight—great for laptops and mid-tier GPUs thanks to Unsloth and 3B size.
🗣️ Voices & Prompting
Use speaker-name prefixes followed by a colon. Example conversation style:
अनुष्का: हे करन, क्या आज बारिश ज़्यादा नहीं हो रही?
करन: हाँ, बहुत ज़्यादा! सुबह से रुकने का नाम ही नहीं ले रही।
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- -
Model tree for parkky21/orpheus-3b-hi-ft-1e
Base model
meta-llama/Llama-3.2-3B-Instruct Finetuned
canopylabs/orpheus-3b-0.1-pretrained