Instructions to use pelosi70/jch1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pelosi70/jch1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="pelosi70/jch1",
	filename="unsloth.Q8_0.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use pelosi70/jch1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pelosi70/jch1:Q8_0
# Run inference directly in the terminal:
llama-cli -hf pelosi70/jch1:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pelosi70/jch1:Q8_0
# Run inference directly in the terminal:
llama-cli -hf pelosi70/jch1:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf pelosi70/jch1:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf pelosi70/jch1:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf pelosi70/jch1:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf pelosi70/jch1:Q8_0

Use Docker

docker model run hf.co/pelosi70/jch1:Q8_0

LM Studio
Jan

vLLM

How to use pelosi70/jch1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pelosi70/jch1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pelosi70/jch1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/pelosi70/jch1:Q8_0

Ollama
How to use pelosi70/jch1 with Ollama:
```
ollama run hf.co/pelosi70/jch1:Q8_0
```

Unsloth Studio new

How to use pelosi70/jch1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pelosi70/jch1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pelosi70/jch1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for pelosi70/jch1 to start chatting

Docker Model Runner
How to use pelosi70/jch1 with Docker Model Runner:
```
docker model run hf.co/pelosi70/jch1:Q8_0
```

Lemonade

How to use pelosi70/jch1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull pelosi70/jch1:Q8_0

Run and chat with the model

lemonade run user.jch1-Q8_0

List all available models

lemonade list

Model Summary

This model is a Korean instruction-following Small Language Model (SLM) fine-tuned from the Llama-3.2-3B base model using Supervised Fine-Tuning (SFT). The objective of this model is to validate a resource-efficient fine-tuning and deployment pipeline suitable for on-premise and constrained GPU/CPU environments, rather than to maximize benchmark scores.

Training Approach

Base Model: Meta Llama-3.2-3B (base, non-instruct)
Fine-Tuning Method: Supervised Fine-Tuning (SFT)
Parameter-Efficient Training: LoRA (PEFT)
Quantization During Training: 4-bit (QLoRA)
Training Framework: Unsloth + Hugging Face TRL
Training Environment: Single GPU (Google Colab, Tesla T4)

The model was trained using an instruction–response prompt template (Alpaca-style), enabling stable instruction-following behavior in Korean. The fine-tuning process focused on maintaining the base model’s general language capability while adapting response style, tone, and instruction compliance.

Dataset

Primary Dataset: korean_safe_conversation
Language: Korean
Data Type: Instruction–response conversational data
Data Scale: ~27K samples

The dataset was preprocessed to ensure:

Clear separation between instruction and response
Explicit end-of-sequence (EOS) control to prevent uncontrolled generation
Consistent prompt formatting for stable training behavior

Intended Use

This model is intended for:

Korean instruction-following assistants
Domain-adapted SLM experimentation
On-premise inference scenarios where:
- Data privacy is critical
- GPU resources are limited
- Low-latency local inference is preferred

Typical application examples include:

Internal enterprise assistants
Document-based Q&A systems (pre/post-RAG)
Operational report generation from structured or semi-structured text

Deployment

Format: GGUF
Quantization: Q8
Deployment Target: CPU or low-VRAM environments
Distribution: Hugging Face Hub

The GGUF format allows the model to be deployed without external API dependencies, making it suitable for secure, offline, or air-gapped environments.

Limitations

This model is not an official Meta Instruct model
Preference optimization methods such as DPO or RLHF were not applied
The model was trained for behavior adaptation and stability, not for benchmark optimization
Performance may vary outside the instruction-following and conversational domains

Technical Motivation

This project demonstrates that domain-adapted instruction-following models can be efficiently built and deployed using small-scale resources, providing a practical alternative to large, cost-intensive LLM deployments in real-world systems.

Downloads last month: 7

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

8-bit

Model tree for pelosi70/jch1

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

unsloth/Llama-3.2-3B-Instruct-bnb-4bit

Quantized

(113)

this model

pelosi70
/

jch1