| language: | |
| - vi | |
| tags: | |
| - history | |
| - vietnamese | |
| - ppo | |
| - rlhf | |
| license: apache-2.0 | |
| # HistoryGPT | |
| Vietnamese History AI Assistant fine-tuned with RLHF (PPO). | |
| ## Training Details | |
| - **Base Model**: khanhrill/HistoryGPT | |
| - **Fine-tuning**: PPO with human feedback from OpenWebUI | |
| - **Last Updated**: 2025-12-12 | |
| - **Version**: 20251212_0806 | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("khanhrill/HistoryGPT") | |
| tokenizer = AutoTokenizer.from_pretrained("khanhrill/HistoryGPT") | |
| prompt = "Hãy kể về lịch sử Việt Nam" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=200) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ## Training Pipeline | |
| This model was trained using an automated RLHF pipeline: | |
| 1. Collect user feedback from OpenWebUI | |
| 2. Train reward model from preference pairs | |
| 3. Fine-tune with PPO using the reward model | |
| 4. Deploy to HuggingFace Hub | |