Instructions to use StevenHuo/StevenHuo-gpt2-squad-rl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StevenHuo/StevenHuo-gpt2-squad-rl with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StevenHuo/StevenHuo-gpt2-squad-rl")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")
model = AutoModelForCausalLM.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use StevenHuo/StevenHuo-gpt2-squad-rl with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StevenHuo/StevenHuo-gpt2-squad-rl"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StevenHuo/StevenHuo-gpt2-squad-rl",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/StevenHuo/StevenHuo-gpt2-squad-rl

SGLang

How to use StevenHuo/StevenHuo-gpt2-squad-rl with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StevenHuo/StevenHuo-gpt2-squad-rl" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StevenHuo/StevenHuo-gpt2-squad-rl",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StevenHuo/StevenHuo-gpt2-squad-rl" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StevenHuo/StevenHuo-gpt2-squad-rl",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use StevenHuo/StevenHuo-gpt2-squad-rl with Docker Model Runner:
```
docker model run hf.co/StevenHuo/StevenHuo-gpt2-squad-rl
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

GPT2 Fine-tuned with Reinforcement Learning for Question Answering

This model is a GPT2 (openai-community/gpt2) fine-tuned using Reinforcement Learning (PPO) on the SQuAD dataset for question-answering tasks.

Model Description

Base Model: openai-community/gpt2
Training Method: Proximal Policy Optimization (PPO)
Dataset: SQuAD (Stanford Question Answering Dataset)
Task: Question Answering with formatted responses
Language: English

Training Details

Reinforcement Learning Approach

This model was trained using PPO (Proximal Policy Optimization) with shaped rewards to encourage a specific response format:

Response Format:

Starts with: "That is a great question! "
Ends with: " Let me know if you have any other questions."

Reward Shaping

Reward	Condition
+5	Response starts with correct prefix
+5	Response ends with correct suffix
+3	Contains meaningful content
+5	Contains reference answer
-3	Missing prefix or suffix

Training Configuration

Epochs: 3
Batch Size: 1
Learning Rate: 1e-5
Max Sequence Length: 128
Training Samples: 300 (from SQuAD)

Usage

Using Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

Load model and tokenizer

tokenizer = AutoTokenizer.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl") model = AutoModelForCausalLM.from_pretrained("StevenHuo/StevenHuo-gpt2-squad-rl")

Prepare input

question = "What is the capital of France?" context = "France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower." prompt = f"Question: {question}\nContext: {context}\nAnswer: That is a great question! "

Generate

inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=100, temperature=0.8, do_sample=True, top_k=50, top_p=0.95, pad_token_id=tokenizer.eos_token_id )

Decode

response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response)### Example Output

Input:

Question: What is the capital of France? Context: France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower.

Output: That is a great question! The capital of France is Paris. Let me know if you have any othertuning of language models

Question-answering tasks with formatted responses
Learning about PPO (Proximal Policy Optimization) for NLP

Limitations

Based on GPT2 (124M parameters), which has limited reasoning capabilities
Response format may not always be perfectly adhered to
Training was done on a subset of SQuAD (300 samples)
Best suited for simple factual questions

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for StevenHuo/StevenHuo-gpt2-squad-rl

Base model

openai-community/gpt2

Finetuned

(2153)

this model

StevenHuo
/

StevenHuo-gpt2-squad-rl