Summarization
Transformers
Safetensors
gpt_bigcode
text-classification
reward-model
reward-trainer
trl
rlhf
preference-learning
text-generation-inference
Instructions to use caffeic/tinystarcoder-reward-tldr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use caffeic/tinystarcoder-reward-tldr with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="caffeic/tinystarcoder-reward-tldr")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("caffeic/tinystarcoder-reward-tldr") model = AutoModelForSequenceClassification.from_pretrained("caffeic/tinystarcoder-reward-tldr") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("caffeic/tinystarcoder-reward-tldr")
model = AutoModelForSequenceClassification.from_pretrained("caffeic/tinystarcoder-reward-tldr")Quick Links
TinyStarCoder Reward Model (TL;DR Preference Model)
This model is a reward model fine-tuned from bigcode/tiny_starcoder_py using TRL's RewardTrainer.
The model predicts a single scalar reward score for an input sequence and is intended for preference ranking, not text generation.
Higher reward → model prefers that response.
Model Details
Base Model
bigcode/tiny_starcoder_py
Task
- Reward Modeling
- Preference Learning
- RLHF-style reward estimation
Framework
- Transformers
- TRL RewardTrainer
Dataset
Dataset used:
CarperAI/openai_summarize_comparisons
Training examples contain:
prompt
chosen
rejected
Training objective:
reward(chosen) > reward(rejected)
Training Configuration
| Parameter | Value |
|---|---|
| Samples | 2000 |
| Epochs | 2 |
| Max Length | 256 |
| Learning Rate | 1e-5 |
| Train Batch Size | 2 |
| Eval Batch Size | 1 |
| Trainer | RewardTrainer |
Evaluation
Final evaluation metrics:
| Metric | Value |
|---|---|
| Eval Accuracy | ~0.62 |
| Eval Loss | ~0.98 |
| Eval Margin | ~0.75 |
Interpretation:
- Accuracy > 0.50 indicates the reward model learned preference signal.
- Positive margin means preferred responses generally receive higher reward.
Usage
Load model
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification
)
repo = "caffeic/tinystarcoder-reward-tldr"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(
repo
)
Score a response
import torch
text = """
Summarize:
Transformers are deep learning architectures...
Summary:
Transformers use self-attention.
"""
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=256
)
with torch.no_grad():
reward = model(**inputs).logits.item()
print("Reward:", reward)
Compare two responses
chosen_score = score(chosen)
rejected_score = score(rejected)
if chosen_score > rejected_score:
print("Chosen preferred")
else:
print("Rejected preferred")
Limitations
- This is a reward model and does not generate text.
- Reward values are relative and not absolute quality scores.
- Trained on a limited subset (~2000 samples).
- Not intended for production RLHF pipelines.
Training Notes
This project was created to learn:
- Reward modeling
- Preference datasets
- TRL RewardTrainer
- RLHF workflows
- Hugging Face model publishing
Citation
@software{vonwerra2020trl,
title={{TRL: Transformers Reinforcement Learning}},
author={von Werra et al.},
year={2020},
url={https://github.com/huggingface/trl}
}
- Downloads last month
- 27
Model tree for caffeic/tinystarcoder-reward-tldr
Base model
bigcode/tiny_starcoder_py
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="caffeic/tinystarcoder-reward-tldr")