File size: 988 Bytes
5dc03e3
f28ac02
 
66b3de2
f28ac02
 
 
 
 
5dc03e3
 
f28ac02
5dc03e3
f28ac02
5dc03e3
 
 
f28ac02
 
 
6cc46d9
5dc03e3
f28ac02
5dc03e3
f28ac02
 
5dc03e3
f28ac02
 
5dc03e3
f28ac02
 
 
 
 
5dc03e3
f28ac02
5dc03e3
f28ac02
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
language:
- vi
tags:
- history
- vietnamese
- ppo
- rlhf
license: apache-2.0
---

# HistoryGPT

Vietnamese History AI Assistant fine-tuned with RLHF (PPO).

## Training Details

- **Base Model**: khanhrill/HistoryGPT
- **Fine-tuning**: PPO with human feedback from OpenWebUI
- **Last Updated**: 2025-12-12
- **Version**: 20251212_0806

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("khanhrill/HistoryGPT")
tokenizer = AutoTokenizer.from_pretrained("khanhrill/HistoryGPT")

prompt = "Hãy kể về lịch sử Việt Nam"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
```

## Training Pipeline

This model was trained using an automated RLHF pipeline:
1. Collect user feedback from OpenWebUI
2. Train reward model from preference pairs
3. Fine-tune with PPO using the reward model
4. Deploy to HuggingFace Hub