--- language: - vi tags: - history - vietnamese - ppo - rlhf license: apache-2.0 --- # HistoryGPT Vietnamese History AI Assistant fine-tuned with RLHF (PPO). ## Training Details - **Base Model**: khanhrill/HistoryGPT - **Fine-tuning**: PPO with human feedback from OpenWebUI - **Last Updated**: 2025-12-12 - **Version**: 20251212_0806 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("khanhrill/HistoryGPT") tokenizer = AutoTokenizer.from_pretrained("khanhrill/HistoryGPT") prompt = "Hãy kể về lịch sử Việt Nam" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0])) ``` ## Training Pipeline This model was trained using an automated RLHF pipeline: 1. Collect user feedback from OpenWebUI 2. Train reward model from preference pairs 3. Fine-tune with PPO using the reward model 4. Deploy to HuggingFace Hub