YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
ModernBERT Reward Model (CoT SQL/NL Alignment)
Finetuned answerdotai/ModernBERT-base to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in [0, 1]) to predict similarity_with_penalty scores derived from human preference data plus corruption heuristics.
Usage
import torch
from transformers import AutoTokenizer
from modeling_reward import BERTRewardModel
model_name = "DarianNLP/modernbert-nl-sql"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = BERTRewardModel(model_name=model_name)
state_dict = torch.load("model.safetensors") # or use safetensors.torch.load_file
model.load_state_dict(state_dict)
model.eval()
sql = "SELECT COUNT(*) FROM orders WHERE status = 'complete';"
reasoning = "think: Count rows in orders filtered by status 'complete'."
nl = "How many completed orders exist?"
text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {nl}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
score = model(**inputs)["scores"].item()
print(f"Reward: {score:.3f}")
For convenience, modeling_reward.py exposes load_finetuned_model(model_dir) which handles loading model.safetensors or pytorch_model.bin and moves the module to GPU if available (falling back to CPU on OOM).
Notes
- The reward target is bounded
[0, 1]and already penalizes copied NL or incorrect reasoning. - The model uses mean pooling instead of CLS to better leverage long ModernBERT contexts.
- Tokenizer files are saved from the finetuned run; no extra special tokens were introduced.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support