metadata
language:
- vi
tags:
- history
- vietnamese
- ppo
- rlhf
license: apache-2.0
HistoryGPT
Vietnamese History AI Assistant fine-tuned with RLHF (PPO).
Training Details
- Base Model: khanhrill/HistoryGPT
- Fine-tuning: PPO with human feedback from OpenWebUI
- Last Updated: 2025-12-12
- Version: 20251212_0806
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("khanhrill/HistoryGPT")
tokenizer = AutoTokenizer.from_pretrained("khanhrill/HistoryGPT")
prompt = "Hãy kể về lịch sử Việt Nam"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
Training Pipeline
This model was trained using an automated RLHF pipeline:
- Collect user feedback from OpenWebUI
- Train reward model from preference pairs
- Fine-tune with PPO using the reward model
- Deploy to HuggingFace Hub