NorthernTribe-Research commited on
Commit
303bcf7
·
verified ·
1 Parent(s): 011e6cc

Autonomous Space trainer update

Browse files
README.md CHANGED
@@ -7,177 +7,70 @@ datasets:
7
  - NorthernTribe-Research/UMSR-v1
8
  tags:
9
  - reasoning
10
- - instruction-following
11
  - structured-output
 
12
  - math
13
- - science
14
  - logic
15
- - strategy
16
  ---
17
 
18
  # UMSR-Reasoner-7B
19
 
20
- ## Overview
21
-
22
- UMSR-Reasoner-7B is a standalone 7B reasoning model for structured multi-step problem solving.
23
-
24
- It is optimized for tasks that require:
25
 
26
- - explicit reasoning traces
27
- - deterministic final-answer formatting
28
- - consistent performance across math, science, logic, and strategy domains
29
-
30
- ## Dataset
31
-
32
- - Primary dataset: https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1
33
-
34
- ## Training Strategy
35
-
36
- - student architecture: `NorthernTribe-Research/UMSR-Reasoner-7B`
37
- - teacher architecture: `NorthernTribe-Research/UMSR-Reasoner-7B` self-distillation (default)
38
- - objective: blended CE + KL distillation with temperature and weight scheduling
39
- - continuity: checkpointed autonomous training cycles with resume support
40
-
41
- ## Output Contract
42
 
43
- For best reliability, prompt the model to end with:
 
44
 
45
- ```text
46
- <final_answer>...</final_answer>
47
- ```
48
 
49
- Optional reasoning can be requested in:
50
-
51
- ```text
52
- <reasoning>...</reasoning>
53
- ```
54
 
55
- ## Model Tree
 
 
 
56
 
57
- | Variant | Repository | Purpose |
58
- |---|---|---|
59
- | Base FP model | `NorthernTribe-Research/UMSR-Reasoner-7B` | Primary inference and fine-tuning target |
60
- | INT8 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-INT8` | Lower-memory deployment |
61
- | NF4 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-NF4` | Max compression for constrained GPUs |
62
- | Smoke INT8 profile | `NorthernTribe-Research/UMSR-Reasoner-7B-Smoke-INT8` | Fast CI/smoke validation profile linked to the main model tree |
63
 
64
- ## Quickstart
 
 
 
65
 
66
- ```python
67
- import torch
68
- from transformers import AutoModelForCausalLM, AutoTokenizer
69
 
70
- model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
71
 
72
- tokenizer = AutoTokenizer.from_pretrained(model_id)
73
- model = AutoModelForCausalLM.from_pretrained(
74
- model_id,
75
- torch_dtype=torch.bfloat16,
76
- device_map="auto",
77
- )
78
 
79
- messages = [
80
- {"role": "system", "content": "Solve step by step and finish with <final_answer>...</final_answer>."},
81
- {"role": "user", "content": "If 3x + 5 = 20, what is x?"},
82
- ]
83
 
84
- inputs = tokenizer.apply_chat_template(
85
- messages,
86
- add_generation_prompt=True,
87
- return_tensors="pt",
88
- ).to(model.device)
89
-
90
- outputs = model.generate(
91
- inputs,
92
- max_new_tokens=256,
93
- temperature=0.2,
94
- top_p=0.9,
95
- )
96
-
97
- print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
98
- ```
99
-
100
- ## Quantized Runtime
101
-
102
- ### INT8
103
-
104
- ```python
105
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
106
-
107
- model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
108
- bnb_config = BitsAndBytesConfig(load_in_8bit=True)
109
-
110
- tokenizer = AutoTokenizer.from_pretrained(model_id)
111
- model = AutoModelForCausalLM.from_pretrained(
112
- model_id,
113
- device_map="auto",
114
- quantization_config=bnb_config,
115
- )
116
- ```
117
-
118
- ### NF4
119
-
120
- ```python
121
- import torch
122
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
123
-
124
- model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
125
- bnb_config = BitsAndBytesConfig(
126
- load_in_4bit=True,
127
- bnb_4bit_quant_type="nf4",
128
- bnb_4bit_use_double_quant=True,
129
- bnb_4bit_compute_dtype=torch.bfloat16,
130
- )
131
 
132
- tokenizer = AutoTokenizer.from_pretrained(model_id)
133
- model = AutoModelForCausalLM.from_pretrained(
134
- model_id,
135
- device_map="auto",
136
- quantization_config=bnb_config,
137
- )
138
- ```
139
 
140
- ## Llamafile Packaging
141
 
142
- For single-binary deployment, use:
143
 
144
- ```bash
145
- python scripts/create_llamafile.py \
146
- --gguf /path/to/UMSR-Reasoner-7B.Q4_K_M.gguf \
147
- --runtime-bin tools/llamafile \
148
- --output dist/UMSR-Reasoner-7B.Q4_K_M.llamafile \
149
- --force
150
- ```
151
-
152
- ## Code-Aware Robust Evaluation
153
 
154
- `scripts/eval_reasoner.py` supports code-focused robustness checks:
155
-
156
- - code-task detection
157
- - Python code-block syntax validation
158
- - optional row-level unit-test execution
159
- - optional TensorFlow-backed multi-candidate scorer
160
 
161
- ## Trainer Integration
 
 
162
 
163
- An autonomous trainer Space is available for continuous training cycles against UMSR-v1. It supports:
164
 
165
- - teacher-student distillation mode with configurable in-house teacher model
166
- - live run telemetry (`live_progress.json`, `live_events.jsonl`) for real-time monitoring
167
- - scheduled or continuous operation
168
- - checkpoint auto-resume (`UMSR_RESUME_FROM_CHECKPOINT=auto`)
169
- - warmup-step and warmup-ratio control
170
- - push-to-hub automation
171
- - run monitoring through live dashboard and logs
172
-
173
- ## Best Practices
174
-
175
- - keep prompts explicit about output tags
176
- - validate final answers for high-stakes workflows
177
- - prefer domain-specific evaluation before production deployment
178
 
179
  ## Limitations
180
 
181
- - reasoning text may contain errors even when final format is correct
182
- - quality depends on prompt clarity and task scope
183
- - not suitable as a sole decision-maker for legal, medical, or safety-critical use
 
7
  - NorthernTribe-Research/UMSR-v1
8
  tags:
9
  - reasoning
 
10
  - structured-output
11
+ - instruction-following
12
  - math
 
13
  - logic
14
+ - science
15
  ---
16
 
17
  # UMSR-Reasoner-7B
18
 
19
+ ## Purpose
 
 
 
 
20
 
21
+ UMSR-Reasoner-7B is a general reasoning model designed for structured problem solving and consistent answer formatting in production and research workflows.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ Model repository: `https://huggingface.co/NorthernTribe-Research/UMSR-Reasoner-7B`
24
+ Primary dataset: `https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1`
25
 
26
+ ## Intended Use
 
 
27
 
28
+ Use this model for tasks that require:
 
 
 
 
29
 
30
+ - multi-step quantitative reasoning
31
+ - logic and strategy-style question answering
32
+ - science and technical problem decomposition
33
+ - deterministic final-answer formatting for downstream parsers
34
 
35
+ ## Core Capabilities
 
 
 
 
 
36
 
37
+ - Produces step-aware reasoning outputs for complex prompts
38
+ - Handles open-form and exam-style tasks across math, logic, and science domains
39
+ - Supports structured response contracts for automation pipelines
40
+ - Works well in teacher-student continuous improvement loops
41
 
42
+ ## Recommended Prompting
 
 
43
 
44
+ For highest reliability, use explicit instructions about reasoning depth and enforce a final-answer tag in every response.
45
 
46
+ Suggested system instruction:
 
 
 
 
 
47
 
48
+ `Solve step by step and end with <final_answer>...</final_answer>.`
 
 
 
49
 
50
+ ## Output Contract
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ Required final output tag:
 
 
 
 
 
 
53
 
54
+ `<final_answer>...</final_answer>`
55
 
56
+ Optional reasoning tag:
57
 
58
+ `<reasoning>...</reasoning>`
 
 
 
 
 
 
 
 
59
 
60
+ ## Training Profile
 
 
 
 
 
61
 
62
+ - Student model: `NorthernTribe-Research/UMSR-Reasoner-7B`
63
+ - Training mode: `teacher-student distillation`
64
+ - Teacher model(s): `NorthernTribe-Research/UMSR-Reasoner-7B`
65
 
66
+ ## Operational Guidance
67
 
68
+ - Prefer lower sampling temperature for deterministic workflows
69
+ - Validate final answers for high-stakes usage
70
+ - Run domain-specific evaluation before production rollout
 
 
 
 
 
 
 
 
 
 
71
 
72
  ## Limitations
73
 
74
+ - May produce plausible but incorrect reasoning traces
75
+ - Performance varies with prompt quality and task domain
76
+ - Not a substitute for expert review in legal, medical, financial, or safety-critical decisions
adapter_config.json CHANGED
@@ -29,9 +29,9 @@
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
32
- "c_proj",
33
  "c_fc",
34
- "c_attn"
 
35
  ],
36
  "target_parameters": null,
37
  "task_type": "CAUSAL_LM",
 
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
 
32
  "c_fc",
33
+ "c_attn",
34
+ "c_proj"
35
  ],
36
  "target_parameters": null,
37
  "task_type": "CAUSAL_LM",
effective_run_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "ce_weight_end": 0.5,
3
  "ce_weight_start": 0.35,
4
- "created_at": "2026-02-24T08:19:15.059846+00:00",
5
  "dataset_id": "NorthernTribe-Research/UMSR-v1",
6
  "distill_enabled": true,
7
  "enforce_inhouse_models": true,
@@ -23,7 +23,7 @@
23
  ],
24
  "min_quality": 0.72,
25
  "model_dtype": "bfloat16",
26
- "output_dir": "/app/runs/20260224_081901",
27
  "resume_from_checkpoint": "",
28
  "runtime_hardware": {
29
  "cuda_available": false,
 
1
  {
2
  "ce_weight_end": 0.5,
3
  "ce_weight_start": 0.35,
4
+ "created_at": "2026-02-24T08:35:33.950049+00:00",
5
  "dataset_id": "NorthernTribe-Research/UMSR-v1",
6
  "distill_enabled": true,
7
  "enforce_inhouse_models": true,
 
23
  ],
24
  "min_quality": 0.72,
25
  "model_dtype": "bfloat16",
26
+ "output_dir": "/app/runs/20260224_083520",
27
  "resume_from_checkpoint": "",
28
  "runtime_hardware": {
29
  "cuda_available": false,
live_events.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
live_progress.json CHANGED
@@ -27,5 +27,5 @@
27
  "torch_version": "2.10.0+cu128"
28
  },
29
  "status": "completed",
30
- "updated_at": "2026-02-24T08:23:53.900701+00:00"
31
  }
 
27
  "torch_version": "2.10.0+cu128"
28
  },
29
  "status": "completed",
30
+ "updated_at": "2026-02-24T08:39:31.326347+00:00"
31
  }
metrics/eval_metrics.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "eval_loss": 5.438441753387451,
3
- "eval_runtime": 17.0533,
4
  "eval_samples": 64,
5
- "eval_samples_per_second": 3.753,
6
- "eval_steps_per_second": 3.753
7
  }
 
1
  {
2
  "eval_loss": 5.438441753387451,
3
+ "eval_runtime": 12.587,
4
  "eval_samples": 64,
5
+ "eval_samples_per_second": 5.085,
6
+ "eval_steps_per_second": 5.085
7
  }
metrics/train_metrics.json CHANGED
@@ -9,8 +9,8 @@
9
  "temperature_start": 2.5,
10
  "total_flos": 42322071132.0,
11
  "train_loss": 4.595640664920211,
12
- "train_runtime": 261.398,
13
  "train_samples": 256,
14
- "train_samples_per_second": 0.979,
15
- "train_steps_per_second": 0.979
16
  }
 
9
  "temperature_start": 2.5,
10
  "total_flos": 42322071132.0,
11
  "train_loss": 4.595640664920211,
12
+ "train_runtime": 224.4389,
13
  "train_samples": 256,
14
+ "train_samples_per_second": 1.141,
15
+ "train_steps_per_second": 1.141
16
  }
run_summary.json CHANGED
@@ -10,13 +10,13 @@
10
  "distill_enabled": true,
11
  "enforce_inhouse_models": true,
12
  "eval_rows": 64,
13
- "finished_at": "2026-02-24T08:23:53.900247+00:00",
14
  "fp16": false,
15
  "gradient_checkpointing": true,
16
  "kd_weight_end": 0.5,
17
  "kd_weight_start": 0.65,
18
- "live_events_path": "/app/runs/20260224_081901/live_events.jsonl",
19
- "live_progress_path": "/app/runs/20260224_081901/live_progress.json",
20
  "lora_alpha": 64,
21
  "lora_dropout": 0.05,
22
  "lora_enabled": true,
@@ -32,7 +32,7 @@
32
  ],
33
  "model_dtype": "bfloat16",
34
  "mps_available": false,
35
- "output_dir": "/app/runs/20260224_081901",
36
  "requested_warmup_steps": 0,
37
  "resume_from_checkpoint": "",
38
  "runtime_hardware": {
 
10
  "distill_enabled": true,
11
  "enforce_inhouse_models": true,
12
  "eval_rows": 64,
13
+ "finished_at": "2026-02-24T08:39:31.325377+00:00",
14
  "fp16": false,
15
  "gradient_checkpointing": true,
16
  "kd_weight_end": 0.5,
17
  "kd_weight_start": 0.65,
18
+ "live_events_path": "/app/runs/20260224_083520/live_events.jsonl",
19
+ "live_progress_path": "/app/runs/20260224_083520/live_progress.json",
20
  "lora_alpha": 64,
21
  "lora_dropout": 0.05,
22
  "lora_enabled": true,
 
32
  ],
33
  "model_dtype": "bfloat16",
34
  "mps_available": false,
35
+ "output_dir": "/app/runs/20260224_083520",
36
  "requested_warmup_steps": 0,
37
  "resume_from_checkpoint": "",
38
  "runtime_hardware": {
trainer_state.json CHANGED
@@ -317,9 +317,9 @@
317
  "distill_temperature": 2.373046875,
318
  "epoch": 0.09765625,
319
  "eval_loss": 3.927885055541992,
320
- "eval_runtime": 13.3298,
321
- "eval_samples_per_second": 4.801,
322
- "eval_steps_per_second": 4.801,
323
  "step": 25
324
  },
325
  {
@@ -630,9 +630,9 @@
630
  "distill_temperature": 2.24609375,
631
  "epoch": 0.1953125,
632
  "eval_loss": 4.088868141174316,
633
- "eval_runtime": 13.1424,
634
- "eval_samples_per_second": 4.87,
635
- "eval_steps_per_second": 4.87,
636
  "step": 50
637
  },
638
  {
@@ -943,9 +943,9 @@
943
  "distill_temperature": 2.119140625,
944
  "epoch": 0.29296875,
945
  "eval_loss": 4.2497992515563965,
946
- "eval_runtime": 14.3197,
947
- "eval_samples_per_second": 4.469,
948
- "eval_steps_per_second": 4.469,
949
  "step": 75
950
  },
951
  {
@@ -1256,9 +1256,9 @@
1256
  "distill_temperature": 1.9921875,
1257
  "epoch": 0.390625,
1258
  "eval_loss": 4.411334991455078,
1259
- "eval_runtime": 14.5811,
1260
- "eval_samples_per_second": 4.389,
1261
- "eval_steps_per_second": 4.389,
1262
  "step": 100
1263
  },
1264
  {
@@ -1569,9 +1569,9 @@
1569
  "distill_temperature": 1.865234375,
1570
  "epoch": 0.48828125,
1571
  "eval_loss": 4.573975086212158,
1572
- "eval_runtime": 14.6477,
1573
- "eval_samples_per_second": 4.369,
1574
- "eval_steps_per_second": 4.369,
1575
  "step": 125
1576
  },
1577
  {
@@ -1882,9 +1882,9 @@
1882
  "distill_temperature": 1.73828125,
1883
  "epoch": 0.5859375,
1884
  "eval_loss": 4.739337921142578,
1885
- "eval_runtime": 15.7116,
1886
- "eval_samples_per_second": 4.073,
1887
- "eval_steps_per_second": 4.073,
1888
  "step": 150
1889
  },
1890
  {
@@ -2195,9 +2195,9 @@
2195
  "distill_temperature": 1.611328125,
2196
  "epoch": 0.68359375,
2197
  "eval_loss": 4.90593957901001,
2198
- "eval_runtime": 14.8353,
2199
- "eval_samples_per_second": 4.314,
2200
- "eval_steps_per_second": 4.314,
2201
  "step": 175
2202
  },
2203
  {
@@ -2508,9 +2508,9 @@
2508
  "distill_temperature": 1.484375,
2509
  "epoch": 0.78125,
2510
  "eval_loss": 5.072885513305664,
2511
- "eval_runtime": 15.3273,
2512
- "eval_samples_per_second": 4.176,
2513
- "eval_steps_per_second": 4.176,
2514
  "step": 200
2515
  },
2516
  {
@@ -2821,9 +2821,9 @@
2821
  "distill_temperature": 1.357421875,
2822
  "epoch": 0.87890625,
2823
  "eval_loss": 5.237745761871338,
2824
- "eval_runtime": 15.8537,
2825
- "eval_samples_per_second": 4.037,
2826
- "eval_steps_per_second": 4.037,
2827
  "step": 225
2828
  },
2829
  {
@@ -3134,9 +3134,9 @@
3134
  "distill_temperature": 1.23046875,
3135
  "epoch": 0.9765625,
3136
  "eval_loss": 5.399942398071289,
3137
- "eval_runtime": 16.0564,
3138
- "eval_samples_per_second": 3.986,
3139
- "eval_steps_per_second": 3.986,
3140
  "step": 250
3141
  },
3142
  {
@@ -3221,9 +3221,9 @@
3221
  "step": 256,
3222
  "total_flos": 42322071132.0,
3223
  "train_loss": 4.595640664920211,
3224
- "train_runtime": 261.398,
3225
- "train_samples_per_second": 0.979,
3226
- "train_steps_per_second": 0.979
3227
  }
3228
  ],
3229
  "logging_steps": 1,
 
317
  "distill_temperature": 2.373046875,
318
  "epoch": 0.09765625,
319
  "eval_loss": 3.927885055541992,
320
+ "eval_runtime": 12.6517,
321
+ "eval_samples_per_second": 5.059,
322
+ "eval_steps_per_second": 5.059,
323
  "step": 25
324
  },
325
  {
 
630
  "distill_temperature": 2.24609375,
631
  "epoch": 0.1953125,
632
  "eval_loss": 4.088868141174316,
633
+ "eval_runtime": 11.8494,
634
+ "eval_samples_per_second": 5.401,
635
+ "eval_steps_per_second": 5.401,
636
  "step": 50
637
  },
638
  {
 
943
  "distill_temperature": 2.119140625,
944
  "epoch": 0.29296875,
945
  "eval_loss": 4.2497992515563965,
946
+ "eval_runtime": 11.9749,
947
+ "eval_samples_per_second": 5.345,
948
+ "eval_steps_per_second": 5.345,
949
  "step": 75
950
  },
951
  {
 
1256
  "distill_temperature": 1.9921875,
1257
  "epoch": 0.390625,
1258
  "eval_loss": 4.411334991455078,
1259
+ "eval_runtime": 12.9318,
1260
+ "eval_samples_per_second": 4.949,
1261
+ "eval_steps_per_second": 4.949,
1262
  "step": 100
1263
  },
1264
  {
 
1569
  "distill_temperature": 1.865234375,
1570
  "epoch": 0.48828125,
1571
  "eval_loss": 4.573975086212158,
1572
+ "eval_runtime": 12.4169,
1573
+ "eval_samples_per_second": 5.154,
1574
+ "eval_steps_per_second": 5.154,
1575
  "step": 125
1576
  },
1577
  {
 
1882
  "distill_temperature": 1.73828125,
1883
  "epoch": 0.5859375,
1884
  "eval_loss": 4.739337921142578,
1885
+ "eval_runtime": 12.8004,
1886
+ "eval_samples_per_second": 5.0,
1887
+ "eval_steps_per_second": 5.0,
1888
  "step": 150
1889
  },
1890
  {
 
2195
  "distill_temperature": 1.611328125,
2196
  "epoch": 0.68359375,
2197
  "eval_loss": 4.90593957901001,
2198
+ "eval_runtime": 12.4225,
2199
+ "eval_samples_per_second": 5.152,
2200
+ "eval_steps_per_second": 5.152,
2201
  "step": 175
2202
  },
2203
  {
 
2508
  "distill_temperature": 1.484375,
2509
  "epoch": 0.78125,
2510
  "eval_loss": 5.072885513305664,
2511
+ "eval_runtime": 13.848,
2512
+ "eval_samples_per_second": 4.622,
2513
+ "eval_steps_per_second": 4.622,
2514
  "step": 200
2515
  },
2516
  {
 
2821
  "distill_temperature": 1.357421875,
2822
  "epoch": 0.87890625,
2823
  "eval_loss": 5.237745761871338,
2824
+ "eval_runtime": 12.8627,
2825
+ "eval_samples_per_second": 4.976,
2826
+ "eval_steps_per_second": 4.976,
2827
  "step": 225
2828
  },
2829
  {
 
3134
  "distill_temperature": 1.23046875,
3135
  "epoch": 0.9765625,
3136
  "eval_loss": 5.399942398071289,
3137
+ "eval_runtime": 12.3118,
3138
+ "eval_samples_per_second": 5.198,
3139
+ "eval_steps_per_second": 5.198,
3140
  "step": 250
3141
  },
3142
  {
 
3221
  "step": 256,
3222
  "total_flos": 42322071132.0,
3223
  "train_loss": 4.595640664920211,
3224
+ "train_runtime": 224.4389,
3225
+ "train_samples_per_second": 1.141,
3226
+ "train_steps_per_second": 1.141
3227
  }
3228
  ],
3229
  "logging_steps": 1,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2d23671895b8a0d20a8fc1fc999d056c5abf3a9e171b9f55654865cd05ff443
3
  size 5201
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c27e3991bbbdd0cd23177ff38568dc878e3110c5ef9e98bf748cac6301cfc50
3
  size 5201