Update README.md
Browse files
README.md
CHANGED
|
@@ -2,45 +2,6 @@
|
|
| 2 |
|
| 3 |
Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
|
| 4 |
|
| 5 |
-
## Repo Layout (ready for Hugging Face)
|
| 6 |
-
|
| 7 |
-
```
|
| 8 |
-
modernbert_reward_model_hf/
|
| 9 |
-
βββ config.json
|
| 10 |
-
βββ model.safetensors
|
| 11 |
-
βββ tokenizer.json / tokenizer_config.json / special_tokens_map.json
|
| 12 |
-
βββ training_args.bin
|
| 13 |
-
βββ training_results.json
|
| 14 |
-
βββ test_metrics.json
|
| 15 |
-
βββ train_set.csv / eval_set.csv / test_set.csv
|
| 16 |
-
βββ modernbert_reward_model_predictions_scatter.png
|
| 17 |
-
βββ modeling_reward.py
|
| 18 |
-
βββ README.md β (this file)
|
| 19 |
-
```
|
| 20 |
-
|
| 21 |
-
Upload the folder to a new Hugging Face repo (e.g. `daeilee/modernbert-cot-reward`) and run `huggingface-cli upload` or push via git. No extra conversion steps are required.
|
| 22 |
-
|
| 23 |
-
## Training Summary
|
| 24 |
-
|
| 25 |
-
- **Base model:** `answerdotai/ModernBERT-base`
|
| 26 |
-
- **Context length:** truncated to 2,048 tokens (model supports 8,192)
|
| 27 |
-
- **Dataset:** 7,633 SQL + chain-of-thought + NL examples from `data/cot_dataset_with_corruptions.csv`
|
| 28 |
-
- Invalid rows dropped (missing fields, score outside `[0, 1]`)
|
| 29 |
-
- Token-length filter at 2,048
|
| 30 |
-
- Split: 75% train / 12.5% eval / 12.5% test (random state 12)
|
| 31 |
-
- **Training setup:** batch size 2, gradient accumulation 4, AdamW (lr `2e-5`), early stopping patience 5, ModernBERT encoder + mean-pooling + linear + sigmoid, MSE loss scaled by 100
|
| 32 |
-
|
| 33 |
-
See `training_args.bin` + `training_results.json` for the exact `transformers.TrainingArguments` and eval metrics (`eval_mse = 0.0197`, `eval_mae = 0.1128`, `eval_rmse = 0.1402`).
|
| 34 |
-
|
| 35 |
-
## Test Metrics (955 examples)
|
| 36 |
-
|
| 37 |
-
- **MSE:** 0.0203
|
| 38 |
-
- **MAE:** 0.1129
|
| 39 |
-
- **RMSE:** 0.1425
|
| 40 |
-
- **Ground-truth mean/std:** 0.673 / 0.215
|
| 41 |
-
- **Prediction mean/std:** 0.691 / 0.170
|
| 42 |
-
|
| 43 |
-
Full details live in `test_metrics.json`, and the scatter plot is stored as `modernbert_reward_model_predictions_scatter.png`.
|
| 44 |
|
| 45 |
## Usage
|
| 46 |
|
|
@@ -67,10 +28,6 @@ print(f"Reward: {score:.3f}")
|
|
| 67 |
|
| 68 |
For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
|
| 69 |
|
| 70 |
-
## Reproducing Evaluation
|
| 71 |
-
|
| 72 |
-
1. `pip install -r requirements.txt` (same stack as training: `transformers`, `torch`, `safetensors`, `pandas`, `scikit-learn`, `matplotlib`).
|
| 73 |
-
2. Run `python test_mosaic_bert_on_test_set.py` from `cot_filtering_reward_model/bert_regression/` to regenerate metrics + scatter plot (uses the saved `test_set.csv` by default).
|
| 74 |
|
| 75 |
## Notes
|
| 76 |
|
|
|
|
| 2 |
|
| 3 |
Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## Usage
|
| 7 |
|
|
|
|
| 28 |
|
| 29 |
For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Notes
|
| 33 |
|