DarianNLP commited on
Commit
b23977c
Β·
verified Β·
1 Parent(s): 6b77899

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -43
README.md CHANGED
@@ -2,45 +2,6 @@
2
 
3
  Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
4
 
5
- ## Repo Layout (ready for Hugging Face)
6
-
7
- ```
8
- modernbert_reward_model_hf/
9
- β”œβ”€β”€ config.json
10
- β”œβ”€β”€ model.safetensors
11
- β”œβ”€β”€ tokenizer.json / tokenizer_config.json / special_tokens_map.json
12
- β”œβ”€β”€ training_args.bin
13
- β”œβ”€β”€ training_results.json
14
- β”œβ”€β”€ test_metrics.json
15
- β”œβ”€β”€ train_set.csv / eval_set.csv / test_set.csv
16
- β”œβ”€β”€ modernbert_reward_model_predictions_scatter.png
17
- β”œβ”€β”€ modeling_reward.py
18
- └── README.md ← (this file)
19
- ```
20
-
21
- Upload the folder to a new Hugging Face repo (e.g. `daeilee/modernbert-cot-reward`) and run `huggingface-cli upload` or push via git. No extra conversion steps are required.
22
-
23
- ## Training Summary
24
-
25
- - **Base model:** `answerdotai/ModernBERT-base`
26
- - **Context length:** truncated to 2,048 tokens (model supports 8,192)
27
- - **Dataset:** 7,633 SQL + chain-of-thought + NL examples from `data/cot_dataset_with_corruptions.csv`
28
- - Invalid rows dropped (missing fields, score outside `[0, 1]`)
29
- - Token-length filter at 2,048
30
- - Split: 75% train / 12.5% eval / 12.5% test (random state 12)
31
- - **Training setup:** batch size 2, gradient accumulation 4, AdamW (lr `2e-5`), early stopping patience 5, ModernBERT encoder + mean-pooling + linear + sigmoid, MSE loss scaled by 100
32
-
33
- See `training_args.bin` + `training_results.json` for the exact `transformers.TrainingArguments` and eval metrics (`eval_mse = 0.0197`, `eval_mae = 0.1128`, `eval_rmse = 0.1402`).
34
-
35
- ## Test Metrics (955 examples)
36
-
37
- - **MSE:** 0.0203
38
- - **MAE:** 0.1129
39
- - **RMSE:** 0.1425
40
- - **Ground-truth mean/std:** 0.673 / 0.215
41
- - **Prediction mean/std:** 0.691 / 0.170
42
-
43
- Full details live in `test_metrics.json`, and the scatter plot is stored as `modernbert_reward_model_predictions_scatter.png`.
44
 
45
  ## Usage
46
 
@@ -67,10 +28,6 @@ print(f"Reward: {score:.3f}")
67
 
68
  For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
69
 
70
- ## Reproducing Evaluation
71
-
72
- 1. `pip install -r requirements.txt` (same stack as training: `transformers`, `torch`, `safetensors`, `pandas`, `scikit-learn`, `matplotlib`).
73
- 2. Run `python test_mosaic_bert_on_test_set.py` from `cot_filtering_reward_model/bert_regression/` to regenerate metrics + scatter plot (uses the saved `test_set.csv` by default).
74
 
75
  ## Notes
76
 
 
2
 
3
  Finetuned `answerdotai/ModernBERT-base` to score how well a generated natural-language description (NL) and chain-of-thought reasoning align with a SQL query. The model is trained as a regression head (sigmoid output in `[0, 1]`) to predict `similarity_with_penalty` scores derived from human preference data plus corruption heuristics.
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
  ## Usage
7
 
 
28
 
29
  For convenience, `modeling_reward.py` exposes `load_finetuned_model(model_dir)` which handles loading `model.safetensors` or `pytorch_model.bin` and moves the module to GPU if available (falling back to CPU on OOM).
30
 
 
 
 
 
31
 
32
  ## Notes
33