You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

GPT2 Fine-tuned with Reinforcement Learning for Question Answering

This model is a GPT2 (openai-community/gpt2) fine-tuned using Reinforcement Learning (PPO) on the SQuAD dataset for question-answering tasks.

Model Description

Training Details

Reinforcement Learning Approach

This model was trained using PPO (Proximal Policy Optimization) with shaped rewards to encourage a specific response format:

Response Format:

  • Starts with: "That is a great question! "
  • Ends with: " Let me know if you have any other questions."

Reward Shaping

Reward Condition
+5 Response starts with correct prefix
+5 Response ends with correct suffix
+3 Contains meaningful content
+5 Contains reference answer
-3 Missing prefix or suffix

Training Configuration

  • Epochs: 3
  • Batch Size: 1
  • Learning Rate: 1e-5
  • Max Sequence Length: 128
  • Training Samples: 300 (from SQuAD)

Usage

Using Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

Load model and tokenizer

tokenizer = AutoTokenizer.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl") model = AutoModelForCausalLM.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")

Prepare input

question = "What is the capital of France?" context = "France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower." prompt = f"Question: {question}\nContext: {context}\nAnswer: That is a great question! "

Generate

inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=100, temperature=0.8, do_sample=True, top_k=50, top_p=0.95, pad_token_id=tokenizer.eos_token_id )

Decode

response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response)### Example Output

Input:

Question: What is the capital of France? Context: France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower.

Output: That is a great question! The capital of France is Paris. Let me know if you have any othertuning of language models

  • Question-answering tasks with formatted responses
  • Learning about PPO (Proximal Policy Optimization) for NLP

Limitations

  • Based on GPT2 (124M parameters), which has limited reasoning capabilities
  • Response format may not always be perfectly adhered to
  • Training was done on a subset of SQuAD (300 samples)
  • Best suited for simple factual questions
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for StevenHuo/StevenHuo-gpt2-squad-rl

Finetuned
(2043)
this model

Dataset used to train StevenHuo/StevenHuo-gpt2-squad-rl