YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Multi-Model Fine-Tuning with LoRA

This project demonstrates fine-tuning multiple language models using LoRA (Low-Rank Adaptation) on the NVIDIA Aegis AI Content Safety Dataset for content moderation and safety classification tasks.

Overview

This repository contains a comprehensive Jupyter notebook that fine-tunes three different language models for content safety applications:

Qwen3-0.6B-Base - Alibaba Cloud's compact language model
Llama-3.2-1B - Meta's efficient language model
ERNIE-4.5-0.3B-Base-PT - Baidu's knowledge-enhanced language model

All models are fine-tuned using LoRA (Low-Rank Adaptation), an efficient parameter-efficient fine-tuning technique that reduces computational requirements while maintaining performance.

Features

Multi-model comparison: Train and compare three different LLMs side-by-side
Efficient training: Uses LoRA for parameter-efficient fine-tuning
Quantization support: 4-bit quantization with bitsandbytes for reduced memory usage
Comprehensive evaluation: Perplexity calculation and sample generation
Automated deployment: Upload fine-tuned models to HuggingFace Hub
Detailed documentation: Generated model cards with full training details

Dataset

NVIDIA Aegis AI Content Safety Dataset 2.0

A comprehensive dataset for content safety and moderation tasks
Contains diverse examples of safe and unsafe content
Multiple categories of potentially harmful content
Training size: 2,000 samples (subset used for efficiency)

Requirements

Hardware

GPU with at least 8GB VRAM (recommended: 12GB+ for optimal performance)
CUDA-compatible GPU for accelerated training

Software

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers>=4.38.0
pip install datasets>=2.16.0
pip install accelerate>=0.26.0
pip install peft>=0.8.0
pip install trl>=0.7.10
pip install bitsandbytes>=0.43.0
pip install sentencepiece>=0.1.99
pip install protobuf>=3.20.0
pip install huggingface_hub
pip install scikit-learn pandas numpy matplotlib seaborn

Project Structure

hg/
├── multi_model_finetuning.ipynb   # Main training notebook
├── README.md                       # This file
├── LICENSE                         # Project license
├── output/                         # Fine-tuned models (generated)
│   ├── qwen3-0.6b/
│   ├── llama-3.2-1b/
│   └── ernie-0.3b/
├── model_comparison.csv            # Performance comparison (generated)
└── results_summary.json            # Full results JSON (generated)

Usage

1. Setup Environment

Open the Jupyter notebook and run the environment setup cells to install all dependencies:

# The notebook handles installation automatically
# Includes special handling for bitsandbytes and sentencepiece

2. Login to HuggingFace

from huggingface_hub import login
login()  # Enter your HF token when prompted

3. Configure Training

The notebook includes pre-configured settings for:

LoRA parameters: rank=16, alpha=32, dropout=0.05
Training hyperparameters: 3 epochs, learning rate=2e-4, batch size=4
Model targets: All three models with HuggingFace Hub upload paths

4. Run Training

Execute the notebook cells sequentially to:

Load and preprocess the dataset
Fine-tune each model with LoRA
Evaluate model performance
Generate comparison metrics
Upload models to HuggingFace Hub

5. Test Fine-tuned Models

Use the interactive testing cell to evaluate models:

test_model_interactive("qwen3-0.6b",
    "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n")

Training Configuration

LoRA Parameters

{
    "r": 16,                    # Rank of LoRA matrices
    "lora_alpha": 32,          # LoRA scaling parameter
    "lora_dropout": 0.05,      # Dropout for LoRA layers
    "task_type": "CAUSAL_LM",
    "target_modules": [         # Modules to apply LoRA
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
}

Training Hyperparameters

{
    "num_train_epochs": 3,
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 4,
    "learning_rate": 2e-4,
    "lr_scheduler_type": "cosine",
    "warmup_ratio": 0.1,
    "fp16": True,
    "optim": "paged_adamw_8bit",
    "max_seq_length": 512
}

Model Performance

After training, the notebook generates:

Perplexity scores for each model
Training time comparison
Sample predictions from test set
Comparative analysis across all models

Results are saved to:

model_comparison.csv - Performance metrics table
results_summary.json - Complete training results with metadata

Deployment

HuggingFace Hub

All fine-tuned models are automatically uploaded to:

ahczhg/qwen3-0.6b-aegis-safety-lora
ahczhg/llama-3.2-1b-aegis-safety-lora
ahczhg/ernie-4.5-0.3b-aegis-safety-lora

Each model includes:

Fine-tuned weights (merged with base model)
Tokenizer configuration
Comprehensive README with usage examples
Training metrics and evaluation results

Loading Fine-tuned Models

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Use for content safety classification
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Use Cases

Content Moderation

Automated detection of harmful content
Real-time chat safety monitoring
User-generated content screening

Safety Systems

Educational platform content filtering
Social media safety monitoring
Community guidelines enforcement

Research Applications

Comparative analysis of LLM safety alignment
Parameter-efficient fine-tuning research
Content safety model development

Key Features

Efficient Training

4-bit quantization reduces memory usage by 75%
LoRA trains only 0.1-1% of model parameters
Gradient checkpointing enables larger batch sizes
Mixed precision (FP16) accelerates training

Robustness

Automatic fallback to FP16 if quantization unavailable
Error handling for tokenizer and dependency issues
Memory management and GPU cache clearing
Comprehensive logging and progress tracking

Automation

End-to-end pipeline from data loading to deployment
Automated model card generation
Batch evaluation and metrics calculation
One-click HuggingFace Hub upload

Limitations

Models are trained on English content primarily
Performance may vary on domain-specific content
Requires GPU for efficient training and inference
Should be used as part of comprehensive moderation system, not sole arbiter

Ethical Considerations

This project is designed for:

✅ Content safety and user protection
✅ Harmful content detection
✅ Educational and research purposes
✅ Building safer online communities

Not intended for:

❌ Censoring legitimate speech
❌ Suppressing diverse viewpoints
❌ Replacing human moderation entirely
❌ Automated decision-making without oversight

Contributing

Contributions are welcome! Areas for improvement:

Additional model architectures
Multi-language support
Enhanced evaluation metrics
Custom dataset integration
Deployment optimizations

Citation

If you use this project in your research or applications, please cite:

@misc{multi_model_lora_content_safety,
  author = {ahczhg},
  title = {Multi-Model Fine-Tuning with LoRA for Content Safety},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/ahczhg/hg}},
  note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}

Acknowledgments

NVIDIA for the Aegis AI Content Safety Dataset 2.0
HuggingFace for transformers, datasets, PEFT, and TRL libraries
Meta AI for Llama-3.2 models
Alibaba Cloud for Qwen models
Baidu for ERNIE models
Microsoft for bitsandbytes quantization library

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

HuggingFace: ahczhg
GitHub: This repository

Support

For issues, questions, or feedback:

Open an issue on GitHub
Check the notebook documentation
Review model cards on HuggingFace Hub
Consult the training logs and error messages

Generated with Claude Code | Last updated: 2025-11-13

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support