|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- regression |
|
|
- similarity |
|
|
- sql |
|
|
- natural-language |
|
|
- reward-model |
|
|
license: mit |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- mse |
|
|
- mae |
|
|
- rmse |
|
|
model-index: |
|
|
- name: BERT Reward Model for CoT Filtering |
|
|
results: |
|
|
- task: |
|
|
type: regression |
|
|
name: Similarity Score Prediction |
|
|
dataset: |
|
|
name: Custom CoT Dataset |
|
|
type: custom |
|
|
metrics: |
|
|
- type: mse |
|
|
value: 0.0238 |
|
|
- type: mae |
|
|
value: 0.1229 |
|
|
- type: rmse |
|
|
value: 0.1543 |
|
|
--- |
|
|
|
|
|
# BERT Reward Model for CoT Filtering |
|
|
|
|
|
A BERT-based regression model fine-tuned to predict similarity scores between SQL queries, reasoning chains (Chain-of-Thought), and natural language descriptions. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is based on `bert-base-uncased` and has been fine-tuned for regression to predict similarity scores in the range [0, 1]. The model takes as input a concatenation of: |
|
|
- SQL query |
|
|
- Reasoning/Chain-of-Thought explanation |
|
|
- Predicted natural language description |
|
|
|
|
|
And outputs a similarity score indicating how well the predicted NL matches the ground truth. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("DarianNLP/bert_sequel_beagles") |
|
|
model = AutoModelForSequenceClassification.from_pretrained( |
|
|
"DarianNLP/bert_sequel_beagles", |
|
|
num_labels=1, |
|
|
problem_type="regression" |
|
|
) |
|
|
model.eval() |
|
|
|
|
|
# Prepare input |
|
|
sql = "SELECT movie_title FROM movies WHERE movie_release_year = 1945" |
|
|
reasoning = "think: The SQL selects the movie title..." |
|
|
predicted_nl = "What was the most popular movie released in 1945?" |
|
|
|
|
|
input_text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {predicted_nl}" |
|
|
|
|
|
# Tokenize and predict |
|
|
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
# Apply sigmoid to get probability |
|
|
similarity_score = torch.sigmoid(outputs.logits).item() |
|
|
|
|
|
print(f"Predicted similarity: {similarity_score:.3f}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model**: bert-base-uncased |
|
|
- **Training Dataset**: Custom CoT dataset with corruptions (7,342 examples) |
|
|
- **Train/Val/Test Split**: 75% / 12.5% / 12.5% |
|
|
- **Training Loss**: MSE (Mean Squared Error) |
|
|
- **Evaluation Metrics**: |
|
|
- MSE: 0.0238 |
|
|
- MAE: 0.1229 |
|
|
- RMSE: 0.1543 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Maximum input length: 512 tokens (BERT's limit) |
|
|
- Trained on a specific domain (SQL to NL translation with CoT) |
|
|
- Performance may vary on out-of-domain data |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{bert_cot_reward_model, |
|
|
title={BERT Reward Model for Chain-of-Thought Filtering}, |
|
|
author={Darian Lee}, |
|
|
year={2025}, |
|
|
} |
|
|
``` |
|
|
|
|
|
|