Model Summary
This model is a Korean instruction-following Small Language Model (SLM) fine-tuned from the Llama-3.2-3B base model using Supervised Fine-Tuning (SFT). The objective of this model is to validate a resource-efficient fine-tuning and deployment pipeline suitable for on-premise and constrained GPU/CPU environments, rather than to maximize benchmark scores.
Training Approach
- Base Model: Meta Llama-3.2-3B (base, non-instruct)
- Fine-Tuning Method: Supervised Fine-Tuning (SFT)
- Parameter-Efficient Training: LoRA (PEFT)
- Quantization During Training: 4-bit (QLoRA)
- Training Framework: Unsloth + Hugging Face TRL
- Training Environment: Single GPU (Google Colab, Tesla T4)
The model was trained using an instruction–response prompt template (Alpaca-style), enabling stable instruction-following behavior in Korean. The fine-tuning process focused on maintaining the base model’s general language capability while adapting response style, tone, and instruction compliance.
Dataset
- Primary Dataset:
korean_safe_conversation - Language: Korean
- Data Type: Instruction–response conversational data
- Data Scale: ~27K samples
The dataset was preprocessed to ensure:
- Clear separation between instruction and response
- Explicit end-of-sequence (EOS) control to prevent uncontrolled generation
- Consistent prompt formatting for stable training behavior
Intended Use
This model is intended for:
Korean instruction-following assistants
Domain-adapted SLM experimentation
On-premise inference scenarios where:
- Data privacy is critical
- GPU resources are limited
- Low-latency local inference is preferred
Typical application examples include:
- Internal enterprise assistants
- Document-based Q&A systems (pre/post-RAG)
- Operational report generation from structured or semi-structured text
Deployment
- Format: GGUF
- Quantization: Q8
- Deployment Target: CPU or low-VRAM environments
- Distribution: Hugging Face Hub
The GGUF format allows the model to be deployed without external API dependencies, making it suitable for secure, offline, or air-gapped environments.
Limitations
- This model is not an official Meta Instruct model
- Preference optimization methods such as DPO or RLHF were not applied
- The model was trained for behavior adaptation and stability, not for benchmark optimization
- Performance may vary outside the instruction-following and conversational domains
Technical Motivation
This project demonstrates that domain-adapted instruction-following models can be efficiently built and deployed using small-scale resources, providing a practical alternative to large, cost-intensive LLM deployments in real-world systems.
- Downloads last month
- 3
8-bit
Model tree for pelosi70/jch1
Base model
meta-llama/Llama-3.2-3B-Instruct