StevenHuo's picture
Update README.md
a0d01d9 verified
---
license: mit
language:
- en
library_name: transformers
tags:
- gpt2
- question-answering
- reinforcement-learning
- ppo
- squad
- fine-tuned
datasets:
- rajpurkar/squad
base_model: openai-community/gpt2
pipeline_tag: text-generation
---
# GPT2 Fine-tuned with Reinforcement Learning for Question Answering
This model is a GPT2 (`openai-community/gpt2`) fine-tuned using **Reinforcement Learning (PPO)** on the **SQuAD dataset** for question-answering tasks.
## Model Description
- **Base Model:** [openai-community/gpt2](https://huggingface.co/openai-community/gpt2)
- **Training Method:** Proximal Policy Optimization (PPO)
- **Dataset:** [SQuAD (Stanford Question Answering Dataset)](https://huggingface.co/datasets/rajpurkar/squad)
- **Task:** Question Answering with formatted responses
- **Language:** English
## Training Details
### Reinforcement Learning Approach
This model was trained using PPO (Proximal Policy Optimization) with shaped rewards to encourage a specific response format:
**Response Format:**
- Starts with: `"That is a great question! "`
- Ends with: `" Let me know if you have any other questions."`
### Reward Shaping
| Reward | Condition |
|--------|-----------|
| +5 | Response starts with correct prefix |
| +5 | Response ends with correct suffix |
| +3 | Contains meaningful content |
| +5 | Contains reference answer |
| -3 | Missing prefix or suffix |
### Training Configuration
- **Epochs:** 3
- **Batch Size:** 1
- **Learning Rate:** 1e-5
- **Max Sequence Length:** 128
- **Training Samples:** 300 (from SQuAD)
## Usage
### Using Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")
model = AutoModelForCausalLM.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")
# Prepare input
question = "What is the capital of France?"
context = "France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower."
prompt = f"Question: {question}\nContext: {context}\nAnswer: That is a great question! "
# Generate
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
do_sample=True,
top_k=50,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
# Decode
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)### Example Output
**Input:**
Question: What is the capital of France?
Context: France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower.
**Output:**
That is a great question! The capital of France is Paris. Let me know if you have any othertuning of language models
- Question-answering tasks with formatted responses
- Learning about PPO (Proximal Policy Optimization) for NLP
## Limitations
- Based on GPT2 (124M parameters), which has limited reasoning capabilities
- Response format may not always be perfectly adhered to
- Training was done on a subset of SQuAD (300 samples)
- Best suited for simple factual questions