DarianNLP
/

modernbert-nl-sql

PyTorch

Safetensors

modernbert

Model card Files Files and versions

xet

Community

DarianNLP commited on Nov 20, 2025

Commit

b23977c

verified ·

1 Parent(s): 6b77899

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -43

README.md CHANGED Viewed

@@ -2,45 +2,6 @@
 Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
-## Repo Layout (ready for Hugging Face)
-```
-modernbert_reward_model_hf/
-├── config.json
-├── model.safetensors
-├── tokenizer.json / tokenizer_config.json / special_tokens_map.json
-├── training_args.bin
-├── training_results.json
-├── test_metrics.json
-├── train_set.csv / eval_set.csv / test_set.csv
-├── modernbert_reward_model_predictions_scatter.png
-├── modeling_reward.py
-└── README.md  ← (this file)
-```
-Upload the folder to a new Hugging Face repo (e.g. `daeilee/modernbert-cot-reward`) and run `huggingface-cli upload` or push via git. No extra conversion steps are required.
-## Training Summary
-- **Base model:** `answerdotai/ModernBERT-base`
-- **Context length:** truncated to 2,048 tokens (model supports 8,192)
-- **Dataset:** 7,633 SQL + chain-of-thought + NL examples from `data/cot_dataset_with_corruptions.csv`
-  - Invalid rows dropped (missing fields, score outside `[0, 1]`)
-  - Token-length filter at 2,048
-  - Split: 75% train / 12.5% eval / 12.5% test (random state 12)
-- **Training setup:** batch size 2, gradient accumulation 4, AdamW (lr `2e-5`), early stopping patience 5, ModernBERT encoder + mean-pooling + linear + sigmoid, MSE loss scaled by 100
-See `training_args.bin` + `training_results.json` for the exact `transformers.TrainingArguments` and eval metrics (`eval_mse = 0.0197`, `eval_mae = 0.1128`, `eval_rmse = 0.1402`).
-## Test Metrics (955 examples)
-- **MSE:** 0.0203
-- **MAE:** 0.1129
-- **RMSE:** 0.1425
-- **Ground-truth mean/std:** 0.673 / 0.215
-- **Prediction mean/std:** 0.691 / 0.170
-Full details live in `test_metrics.json`, and the scatter plot is stored as `modernbert_reward_model_predictions_scatter.png`.
 ## Usage
@@ -67,10 +28,6 @@ print(f"Reward: {score:.3f}")
 For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
-## Reproducing Evaluation
-1. `pip install -r requirements.txt` (same stack as training: `transformers`, `torch`, `safetensors`, `pandas`, `scikit-learn`, `matplotlib`).
-2. Run `python test_mosaic_bert_on_test_set.py` from `cot_filtering_reward_model/bert_regression/` to regenerate metrics + scatter plot (uses the saved `test_set.csv` by default).
 ## Notes

 Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
 ## Usage
 For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
 ## Notes