DarianNLP
/

modernbert-nl-sql

Model card Files Files and versions

modernbert-nl-sql / README.md

DarianNLP's picture

Update README.md

8bf3b8b verified about 1 month ago

|

history blame contribute delete

1.7 kB

	# ModernBERT Reward Model (CoT SQL/NL Alignment)

	Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.


	## Usage

	```python
	import torch
	from transformers import AutoTokenizer
	from modeling_reward import BERTRewardModel

	model_name = "DarianNLP/modernbert-nl-sql"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = BERTRewardModel(model_name=model_name)
	state_dict = torch.load("model.safetensors") # or use safetensors.torch.load_file
	model.load_state_dict(state_dict)
	model.eval()

	sql = "SELECT COUNT(*) FROM orders WHERE status = 'complete';"
	reasoning = "think: Count rows in orders filtered by status 'complete'."
	nl = "How many completed orders exist?"
	text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {nl}"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
	score = model(**inputs)["scores"].item()
	print(f"Reward: {score:.3f}")
	```

	For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).


	## Notes

	- The reward target is bounded `[0, 1]` and already penalizes copied NL or incorrect reasoning.
	- The model uses mean pooling instead of CLS to better leverage long ModernBERT contexts.
	- Tokenizer files are saved from the finetuned run; no extra special tokens were introduced.