File size: 2,803 Bytes
90dacf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2754a79
90dacf5
41a7495
90dacf5
a86edac
90dacf5
a86edac
 
 
 
 
 
 
 
 
90dacf5
 
 
 
 
 
 
a86edac
90dacf5
 
a86edac
 
90dacf5
 
a86edac
90dacf5
 
a86edac
 
 
 
4461ccc
a86edac
4461ccc
a86edac
 
4461ccc
 
a86edac
4461ccc
 
 
 
 
 
a86edac
 
4461ccc
 
a86edac
90dacf5
a86edac
 
 
90dacf5
a86edac
90dacf5
a86edac
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - mathematics
  - conjecture-reasoning
  - deepseek-math
  - lora
base_model:
  - deepseek-ai/deepseek-math-7b-instruct
  - deepseek-ai/deepseek-math-v2
datasets:
  - NorthernTribe-Research/math-conjecture-training-corpus
---

# NorthernTribe-Research/math-conjecture-model

An autonomous DeepSeek-Math training and evaluation stack that powers multi-stage Space GPU fine-tuning, quality-gated adapter promotion, and reproducible publishing to your Hugging Face model repository.

This folder contains the autonomous training/evaluation stack used by the Space and local runs.

## Included

- `configs/deepseek_math.yaml`: DeepSeek-Math baseline preset
- `configs/deepseek_math_v2.yaml`: DeepSeek-Math-V2 baseline preset
- `configs/deepseek_math_sota.yaml`: 4-stage SOTA curriculum + post-eval + quality gate
- `scripts/train_sft.py`: single-stage LoRA/QLoRA SFT
- `scripts/train_sota.py`: staged weighted curriculum with autonomous post-eval and gated push
- `scripts/eval_sota.py`: pass@k + exact/boxed + family/difficulty metrics
- `scripts/merge_and_push.py`: optional adapter merge into full model weights

## Setup

```bash
.venv/bin/python -m pip install -r model_development/requirements.txt
```

## Run SOTA curriculum

```bash
.venv/bin/python model_development/scripts/train_sota.py \
  --config model_development/configs/deepseek_math_sota.yaml
```

Optional controls:

```bash
# Validate stages only
.venv/bin/python model_development/scripts/train_sota.py \
  --config model_development/configs/deepseek_math_sota.yaml \
  --dry-run

# Force skip quality gate for one run
.venv/bin/python model_development/scripts/train_sota.py \
  --config model_development/configs/deepseek_math_sota.yaml \
  --skip-quality-gate
```

## Evaluate adapters

```bash
.venv/bin/python model_development/scripts/eval_sota.py \
  --config model_development/configs/deepseek_math_sota.yaml \
  --adapter-path model_development/runs/math-conjecture-sota/final_adapter \
  --eval-file data/releases/v1/test.parquet \
  --k 6 \
  --max-samples 240
```

## Outputs

- final adapter: `model_development/runs/math-conjecture-sota/final_adapter`
- training summary: `model_development/runs/math-conjecture-sota/training_summary.json`
- post-eval report: `model_development/runs/math-conjecture-sota/post_eval_report.json`

## Quality gate behavior

When enabled in config/runtime:

- validates minimum evaluation coverage
- enforces `pass@1` / `pass@k` thresholds
- enforces required family-level `pass@k` thresholds
- can enforce max final stage `eval_loss`
- blocks hub push if gate fails

## Auth

Hub auth resolves from environment first (`HF_TOKEN` / `HUGGINGFACE_HUB_TOKEN`) and can fall back to `huggingface-api-key.json`.