Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,20 +10,34 @@ train_grpo_final.py # GRPO reinforcement learning (3 reward functions)
|
|
| 10 |
inference_grpo_final.py # vLLM inference with majority/plurality voting
|
| 11 |
run_final.sh # End-to-end pipeline orchestration
|
| 12 |
requirements.txt # Python dependencies
|
| 13 |
-
|
| 14 |
submission/ # Pre-generated submission
|
| 15 |
Final_submission_plurality_pp.csv # Phase 2 submission (score: 0.9582)
|
| 16 |
```
|
| 17 |
|
| 18 |
-
**Quick start:** `bash run_final.sh --all` (requires `HF_TOKEN` environment variable and GPU).
|
| 19 |
-
|
| 20 |
**Inference on a new test CSV:**
|
| 21 |
```bash
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
pip install -r requirements.txt
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
```
|
| 26 |
-
The test CSV requires two columns: `ID` (unique question identifier) and `question` (full question text including data tables). Output submission
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
**Pre-generated submission:** `submission/Final_submission_plurality_pp.csv` scores **0.9582** on the Phase 2 test dataset.
|
| 29 |
|
|
@@ -138,4 +152,4 @@ The training pipeline is fully reproducible: `run_final.sh --all` regenerates tr
|
|
| 138 |
- **GRPO**: 1x NVIDIA H200 NVL 141GB, vLLM 0.12.0 for generation
|
| 139 |
- **Inference**: 1x NVIDIA H200 NVL 141GB, vLLM 0.12.0, bfloat16, CUDA graphs enabled, batch size 32
|
| 140 |
|
| 141 |
-
Full pipeline: `bash run_final.sh --all` (requires HF_TOKEN environment variable).
|
|
|
|
| 10 |
inference_grpo_final.py # vLLM inference with majority/plurality voting
|
| 11 |
run_final.sh # End-to-end pipeline orchestration
|
| 12 |
requirements.txt # Python dependencies
|
| 13 |
+
report.pdf # Detailed solution report
|
| 14 |
submission/ # Pre-generated submission
|
| 15 |
Final_submission_plurality_pp.csv # Phase 2 submission (score: 0.9582)
|
| 16 |
```
|
| 17 |
|
|
|
|
|
|
|
| 18 |
**Inference on a new test CSV:**
|
| 19 |
```bash
|
| 20 |
+
# Clone the repo (includes model weights + inference scripts)
|
| 21 |
+
git lfs install
|
| 22 |
+
git clone https://huggingface.co/Phaedrus33/GRPO_final_submission
|
| 23 |
+
cd GRPO_final_submission
|
| 24 |
+
|
| 25 |
+
# Install dependencies
|
| 26 |
pip install -r requirements.txt
|
| 27 |
+
|
| 28 |
+
# Run inference (model is local, no download needed)
|
| 29 |
+
python inference_grpo_final.py \
|
| 30 |
+
--model . \
|
| 31 |
+
--test-csv /path/to/test.csv \
|
| 32 |
+
--output-dir ./outputs
|
| 33 |
```
|
| 34 |
+
The test CSV requires two columns: `ID` (unique question identifier) and `question` (full question text including data tables). Output submission is written to `./outputs/submission_plurality.csv`.
|
| 35 |
+
|
| 36 |
+
**Requirements:** GPU with 80GB+ VRAM (A100-80GB, H100) or 2x 40GB GPUs with `--num-gpus 2`. Python 3.10+, CUDA 12.x.
|
| 37 |
+
|
| 38 |
+
**Options:**
|
| 39 |
+
- Reduce compute: `--num-generations 1` for single prediction per ID (no voting)
|
| 40 |
+
- Out-of-memory: `--batch-size 8` (or lower)
|
| 41 |
|
| 42 |
**Pre-generated submission:** `submission/Final_submission_plurality_pp.csv` scores **0.9582** on the Phase 2 test dataset.
|
| 43 |
|
|
|
|
| 152 |
- **GRPO**: 1x NVIDIA H200 NVL 141GB, vLLM 0.12.0 for generation
|
| 153 |
- **Inference**: 1x NVIDIA H200 NVL 141GB, vLLM 0.12.0, bfloat16, CUDA graphs enabled, batch size 32
|
| 154 |
|
| 155 |
+
Full training pipeline (trace generation + SFT + GRPO): `bash run_final.sh --all` (requires HF_TOKEN environment variable and 80GB+ GPU).
|