Instructions to use pelosi70/jch1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use pelosi70/jch1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pelosi70/jch1", filename="unsloth.Q8_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use pelosi70/jch1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pelosi70/jch1:Q8_0 # Run inference directly in the terminal: llama-cli -hf pelosi70/jch1:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pelosi70/jch1:Q8_0 # Run inference directly in the terminal: llama-cli -hf pelosi70/jch1:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pelosi70/jch1:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf pelosi70/jch1:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pelosi70/jch1:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf pelosi70/jch1:Q8_0
Use Docker
docker model run hf.co/pelosi70/jch1:Q8_0
- LM Studio
- Jan
- vLLM
How to use pelosi70/jch1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pelosi70/jch1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pelosi70/jch1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/pelosi70/jch1:Q8_0
- Ollama
How to use pelosi70/jch1 with Ollama:
ollama run hf.co/pelosi70/jch1:Q8_0
- Unsloth Studio new
How to use pelosi70/jch1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pelosi70/jch1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pelosi70/jch1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pelosi70/jch1 to start chatting
- Docker Model Runner
How to use pelosi70/jch1 with Docker Model Runner:
docker model run hf.co/pelosi70/jch1:Q8_0
- Lemonade
How to use pelosi70/jch1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pelosi70/jch1:Q8_0
Run and chat with the model
lemonade run user.jch1-Q8_0
List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)Model Summary
This model is a Korean instruction-following Small Language Model (SLM) fine-tuned from the Llama-3.2-3B base model using Supervised Fine-Tuning (SFT). The objective of this model is to validate a resource-efficient fine-tuning and deployment pipeline suitable for on-premise and constrained GPU/CPU environments, rather than to maximize benchmark scores.
Training Approach
- Base Model: Meta Llama-3.2-3B (base, non-instruct)
- Fine-Tuning Method: Supervised Fine-Tuning (SFT)
- Parameter-Efficient Training: LoRA (PEFT)
- Quantization During Training: 4-bit (QLoRA)
- Training Framework: Unsloth + Hugging Face TRL
- Training Environment: Single GPU (Google Colab, Tesla T4)
The model was trained using an instruction–response prompt template (Alpaca-style), enabling stable instruction-following behavior in Korean. The fine-tuning process focused on maintaining the base model’s general language capability while adapting response style, tone, and instruction compliance.
Dataset
- Primary Dataset:
korean_safe_conversation - Language: Korean
- Data Type: Instruction–response conversational data
- Data Scale: ~27K samples
The dataset was preprocessed to ensure:
- Clear separation between instruction and response
- Explicit end-of-sequence (EOS) control to prevent uncontrolled generation
- Consistent prompt formatting for stable training behavior
Intended Use
This model is intended for:
Korean instruction-following assistants
Domain-adapted SLM experimentation
On-premise inference scenarios where:
- Data privacy is critical
- GPU resources are limited
- Low-latency local inference is preferred
Typical application examples include:
- Internal enterprise assistants
- Document-based Q&A systems (pre/post-RAG)
- Operational report generation from structured or semi-structured text
Deployment
- Format: GGUF
- Quantization: Q8
- Deployment Target: CPU or low-VRAM environments
- Distribution: Hugging Face Hub
The GGUF format allows the model to be deployed without external API dependencies, making it suitable for secure, offline, or air-gapped environments.
Limitations
- This model is not an official Meta Instruct model
- Preference optimization methods such as DPO or RLHF were not applied
- The model was trained for behavior adaptation and stability, not for benchmark optimization
- Performance may vary outside the instruction-following and conversational domains
Technical Motivation
This project demonstrates that domain-adapted instruction-following models can be efficiently built and deployed using small-scale resources, providing a practical alternative to large, cost-intensive LLM deployments in real-world systems.
- Downloads last month
- 7
8-bit
Model tree for pelosi70/jch1
Base model
meta-llama/Llama-3.2-3B-Instruct
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pelosi70/jch1", filename="unsloth.Q8_0.gguf", )