Autonomous Space trainer update

Browse files

Files changed (10) hide show

README.md +37 -144
adapter_config.json +2 -2
effective_run_config.json +2 -2
live_events.jsonl +0 -0
live_progress.json +1 -1
metrics/eval_metrics.json +3 -3
metrics/train_metrics.json +3 -3
run_summary.json +4 -4
trainer_state.json +33 -33
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -7,177 +7,70 @@ datasets:
 - NorthernTribe-Research/UMSR-v1
 tags:
 - reasoning
-- instruction-following
 - structured-output
 - math
-- science
 - logic
-- strategy
 ---
 # UMSR-Reasoner-7B
-## Overview
-UMSR-Reasoner-7B is a standalone 7B reasoning model for structured multi-step problem solving.
-It is optimized for tasks that require:
-- explicit reasoning traces
-- deterministic final-answer formatting
-- consistent performance across math, science, logic, and strategy domains
-## Dataset
-- Primary dataset: https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1
-## Training Strategy
-- student architecture: `NorthernTribe-Research/UMSR-Reasoner-7B`
-- teacher architecture: `NorthernTribe-Research/UMSR-Reasoner-7B` self-distillation (default)
-- objective: blended CE + KL distillation with temperature and weight scheduling
-- continuity: checkpointed autonomous training cycles with resume support
-## Output Contract
-For best reliability, prompt the model to end with:
-```text
-<final_answer>...</final_answer>
-```
-Optional reasoning can be requested in:
-```text
-<reasoning>...</reasoning>
-```
-## Model Tree
-| Variant | Repository | Purpose |
-|---|---|---|
-| Base FP model | `NorthernTribe-Research/UMSR-Reasoner-7B` | Primary inference and fine-tuning target |
-| INT8 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-INT8` | Lower-memory deployment |
-| NF4 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-NF4` | Max compression for constrained GPUs |
-| Smoke INT8 profile | `NorthernTribe-Research/UMSR-Reasoner-7B-Smoke-INT8` | Fast CI/smoke validation profile linked to the main model tree |
-## Quickstart
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-)
-messages = [
-    {"role": "system", "content": "Solve step by step and finish with <final_answer>...</final_answer>."},
-    {"role": "user", "content": "If 3x + 5 = 20, what is x?"},
-]
-inputs = tokenizer.apply_chat_template(
-    messages,
-    add_generation_prompt=True,
-    return_tensors="pt",
-).to(model.device)
-outputs = model.generate(
-    inputs,
-    max_new_tokens=256,
-    temperature=0.2,
-    top_p=0.9,
-)
-print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
-```
-## Quantized Runtime
-### INT8
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
-bnb_config = BitsAndBytesConfig(load_in_8bit=True)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    device_map="auto",
-    quantization_config=bnb_config,
-)
-```
-### NF4
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
-bnb_config = BitsAndBytesConfig(
-    load_in_4bit=True,
-    bnb_4bit_quant_type="nf4",
-    bnb_4bit_use_double_quant=True,
-    bnb_4bit_compute_dtype=torch.bfloat16,
-)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    device_map="auto",
-    quantization_config=bnb_config,
-)
-```
-## Llamafile Packaging
-For single-binary deployment, use:
-```bash
-python scripts/create_llamafile.py \
-  --gguf /path/to/UMSR-Reasoner-7B.Q4_K_M.gguf \
-  --runtime-bin tools/llamafile \
-  --output dist/UMSR-Reasoner-7B.Q4_K_M.llamafile \
-  --force
-```
-## Code-Aware Robust Evaluation
-`scripts/eval_reasoner.py` supports code-focused robustness checks:
-- code-task detection
-- Python code-block syntax validation
-- optional row-level unit-test execution
-- optional TensorFlow-backed multi-candidate scorer
-## Trainer Integration
-An autonomous trainer Space is available for continuous training cycles against UMSR-v1. It supports:
-- teacher-student distillation mode with configurable in-house teacher model
-- live run telemetry (`live_progress.json`, `live_events.jsonl`) for real-time monitoring
-- scheduled or continuous operation
-- checkpoint auto-resume (`UMSR_RESUME_FROM_CHECKPOINT=auto`)
-- warmup-step and warmup-ratio control
-- push-to-hub automation
-- run monitoring through live dashboard and logs
-## Best Practices
-- keep prompts explicit about output tags
-- validate final answers for high-stakes workflows
-- prefer domain-specific evaluation before production deployment
 ## Limitations
-- reasoning text may contain errors even when final format is correct
-- quality depends on prompt clarity and task scope
-- not suitable as a sole decision-maker for legal, medical, or safety-critical use

 - NorthernTribe-Research/UMSR-v1
 tags:
 - reasoning
 - structured-output
+- instruction-following
 - math
 - logic
+- science
 ---
 # UMSR-Reasoner-7B
+## Purpose
+UMSR-Reasoner-7B is a general reasoning model designed for structured problem solving and consistent answer formatting in production and research workflows.
+Model repository: `https://huggingface.co/NorthernTribe-Research/UMSR-Reasoner-7B`
+Primary dataset: `https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1`
+## Intended Use
+Use this model for tasks that require:
+- multi-step quantitative reasoning
+- logic and strategy-style question answering
+- science and technical problem decomposition
+- deterministic final-answer formatting for downstream parsers
+## Core Capabilities
+- Produces step-aware reasoning outputs for complex prompts
+- Handles open-form and exam-style tasks across math, logic, and science domains
+- Supports structured response contracts for automation pipelines
+- Works well in teacher-student continuous improvement loops
+## Recommended Prompting
+For highest reliability, use explicit instructions about reasoning depth and enforce a final-answer tag in every response.
+Suggested system instruction:
+`Solve step by step and end with <final_answer>...</final_answer>.`
+## Output Contract
+Required final output tag:
+`<final_answer>...</final_answer>`
+Optional reasoning tag:
+`<reasoning>...</reasoning>`
+## Training Profile
+- Student model: `NorthernTribe-Research/UMSR-Reasoner-7B`
+- Training mode: `teacher-student distillation`
+- Teacher model(s): `NorthernTribe-Research/UMSR-Reasoner-7B`
+## Operational Guidance
+- Prefer lower sampling temperature for deterministic workflows
+- Validate final answers for high-stakes usage
+- Run domain-specific evaluation before production rollout
 ## Limitations
+- May produce plausible but incorrect reasoning traces
+- Performance varies with prompt quality and task domain
+- Not a substitute for expert review in legal, medical, financial, or safety-critical decisions

adapter_config.json CHANGED Viewed

@@ -29,9 +29,9 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "c_proj",
     "c_fc",
-    "c_attn"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "c_fc",
+    "c_attn",
+    "c_proj"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

effective_run_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "ce_weight_end": 0.5,
   "ce_weight_start": 0.35,
-  "created_at": "2026-02-24T08:19:15.059846+00:00",
   "dataset_id": "NorthernTribe-Research/UMSR-v1",
   "distill_enabled": true,
   "enforce_inhouse_models": true,
@@ -23,7 +23,7 @@
   ],
   "min_quality": 0.72,
   "model_dtype": "bfloat16",
-  "output_dir": "/app/runs/20260224_081901",
   "resume_from_checkpoint": "",
   "runtime_hardware": {
     "cuda_available": false,

 {
   "ce_weight_end": 0.5,
   "ce_weight_start": 0.35,
+  "created_at": "2026-02-24T08:35:33.950049+00:00",
   "dataset_id": "NorthernTribe-Research/UMSR-v1",
   "distill_enabled": true,
   "enforce_inhouse_models": true,
   ],
   "min_quality": 0.72,
   "model_dtype": "bfloat16",
+  "output_dir": "/app/runs/20260224_083520",
   "resume_from_checkpoint": "",
   "runtime_hardware": {
     "cuda_available": false,

live_events.jsonl CHANGED Viewed

The diff for this file is too large to render. See raw diff

live_progress.json CHANGED Viewed

@@ -27,5 +27,5 @@
     "torch_version": "2.10.0+cu128"
   },
   "status": "completed",
-  "updated_at": "2026-02-24T08:23:53.900701+00:00"
 }

     "torch_version": "2.10.0+cu128"
   },
   "status": "completed",
+  "updated_at": "2026-02-24T08:39:31.326347+00:00"
 }

metrics/eval_metrics.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "eval_loss": 5.438441753387451,
-  "eval_runtime": 17.0533,
   "eval_samples": 64,
-  "eval_samples_per_second": 3.753,
-  "eval_steps_per_second": 3.753
 }

 {
   "eval_loss": 5.438441753387451,
+  "eval_runtime": 12.587,
   "eval_samples": 64,
+  "eval_samples_per_second": 5.085,
+  "eval_steps_per_second": 5.085
 }

metrics/train_metrics.json CHANGED Viewed

@@ -9,8 +9,8 @@
   "temperature_start": 2.5,
   "total_flos": 42322071132.0,
   "train_loss": 4.595640664920211,
-  "train_runtime": 261.398,
   "train_samples": 256,
-  "train_samples_per_second": 0.979,
-  "train_steps_per_second": 0.979
 }

   "temperature_start": 2.5,
   "total_flos": 42322071132.0,
   "train_loss": 4.595640664920211,
+  "train_runtime": 224.4389,
   "train_samples": 256,
+  "train_samples_per_second": 1.141,
+  "train_steps_per_second": 1.141
 }

run_summary.json CHANGED Viewed

@@ -10,13 +10,13 @@
   "distill_enabled": true,
   "enforce_inhouse_models": true,
   "eval_rows": 64,
-  "finished_at": "2026-02-24T08:23:53.900247+00:00",
   "fp16": false,
   "gradient_checkpointing": true,
   "kd_weight_end": 0.5,
   "kd_weight_start": 0.65,
-  "live_events_path": "/app/runs/20260224_081901/live_events.jsonl",
-  "live_progress_path": "/app/runs/20260224_081901/live_progress.json",
   "lora_alpha": 64,
   "lora_dropout": 0.05,
   "lora_enabled": true,
@@ -32,7 +32,7 @@
   ],
   "model_dtype": "bfloat16",
   "mps_available": false,
-  "output_dir": "/app/runs/20260224_081901",
   "requested_warmup_steps": 0,
   "resume_from_checkpoint": "",
   "runtime_hardware": {

   "distill_enabled": true,
   "enforce_inhouse_models": true,
   "eval_rows": 64,
+  "finished_at": "2026-02-24T08:39:31.325377+00:00",
   "fp16": false,
   "gradient_checkpointing": true,
   "kd_weight_end": 0.5,
   "kd_weight_start": 0.65,
+  "live_events_path": "/app/runs/20260224_083520/live_events.jsonl",
+  "live_progress_path": "/app/runs/20260224_083520/live_progress.json",
   "lora_alpha": 64,
   "lora_dropout": 0.05,
   "lora_enabled": true,
   ],
   "model_dtype": "bfloat16",
   "mps_available": false,
+  "output_dir": "/app/runs/20260224_083520",
   "requested_warmup_steps": 0,
   "resume_from_checkpoint": "",
   "runtime_hardware": {

trainer_state.json CHANGED Viewed

@@ -317,9 +317,9 @@
       "distill_temperature": 2.373046875,
       "epoch": 0.09765625,
       "eval_loss": 3.927885055541992,
-      "eval_runtime": 13.3298,
-      "eval_samples_per_second": 4.801,
-      "eval_steps_per_second": 4.801,
       "step": 25
     },
     {
@@ -630,9 +630,9 @@
       "distill_temperature": 2.24609375,
       "epoch": 0.1953125,
       "eval_loss": 4.088868141174316,
-      "eval_runtime": 13.1424,
-      "eval_samples_per_second": 4.87,
-      "eval_steps_per_second": 4.87,
       "step": 50
     },
     {
@@ -943,9 +943,9 @@
       "distill_temperature": 2.119140625,
       "epoch": 0.29296875,
       "eval_loss": 4.2497992515563965,
-      "eval_runtime": 14.3197,
-      "eval_samples_per_second": 4.469,
-      "eval_steps_per_second": 4.469,
       "step": 75
     },
     {
@@ -1256,9 +1256,9 @@
       "distill_temperature": 1.9921875,
       "epoch": 0.390625,
       "eval_loss": 4.411334991455078,
-      "eval_runtime": 14.5811,
-      "eval_samples_per_second": 4.389,
-      "eval_steps_per_second": 4.389,
       "step": 100
     },
     {
@@ -1569,9 +1569,9 @@
       "distill_temperature": 1.865234375,
       "epoch": 0.48828125,
       "eval_loss": 4.573975086212158,
-      "eval_runtime": 14.6477,
-      "eval_samples_per_second": 4.369,
-      "eval_steps_per_second": 4.369,
       "step": 125
     },
     {
@@ -1882,9 +1882,9 @@
       "distill_temperature": 1.73828125,
       "epoch": 0.5859375,
       "eval_loss": 4.739337921142578,
-      "eval_runtime": 15.7116,
-      "eval_samples_per_second": 4.073,
-      "eval_steps_per_second": 4.073,
       "step": 150
     },
     {
@@ -2195,9 +2195,9 @@
       "distill_temperature": 1.611328125,
       "epoch": 0.68359375,
       "eval_loss": 4.90593957901001,
-      "eval_runtime": 14.8353,
-      "eval_samples_per_second": 4.314,
-      "eval_steps_per_second": 4.314,
       "step": 175
     },
     {
@@ -2508,9 +2508,9 @@
       "distill_temperature": 1.484375,
       "epoch": 0.78125,
       "eval_loss": 5.072885513305664,
-      "eval_runtime": 15.3273,
-      "eval_samples_per_second": 4.176,
-      "eval_steps_per_second": 4.176,
       "step": 200
     },
     {
@@ -2821,9 +2821,9 @@
       "distill_temperature": 1.357421875,
       "epoch": 0.87890625,
       "eval_loss": 5.237745761871338,
-      "eval_runtime": 15.8537,
-      "eval_samples_per_second": 4.037,
-      "eval_steps_per_second": 4.037,
       "step": 225
     },
     {
@@ -3134,9 +3134,9 @@
       "distill_temperature": 1.23046875,
       "epoch": 0.9765625,
       "eval_loss": 5.399942398071289,
-      "eval_runtime": 16.0564,
-      "eval_samples_per_second": 3.986,
-      "eval_steps_per_second": 3.986,
       "step": 250
     },
     {
@@ -3221,9 +3221,9 @@
       "step": 256,
       "total_flos": 42322071132.0,
       "train_loss": 4.595640664920211,
-      "train_runtime": 261.398,
-      "train_samples_per_second": 0.979,
-      "train_steps_per_second": 0.979
     }
   ],
   "logging_steps": 1,

       "distill_temperature": 2.373046875,
       "epoch": 0.09765625,
       "eval_loss": 3.927885055541992,
+      "eval_runtime": 12.6517,
+      "eval_samples_per_second": 5.059,
+      "eval_steps_per_second": 5.059,
       "step": 25
     },
     {
       "distill_temperature": 2.24609375,
       "epoch": 0.1953125,
       "eval_loss": 4.088868141174316,
+      "eval_runtime": 11.8494,
+      "eval_samples_per_second": 5.401,
+      "eval_steps_per_second": 5.401,
       "step": 50
     },
     {
       "distill_temperature": 2.119140625,
       "epoch": 0.29296875,
       "eval_loss": 4.2497992515563965,
+      "eval_runtime": 11.9749,
+      "eval_samples_per_second": 5.345,
+      "eval_steps_per_second": 5.345,
       "step": 75
     },
     {
       "distill_temperature": 1.9921875,
       "epoch": 0.390625,
       "eval_loss": 4.411334991455078,
+      "eval_runtime": 12.9318,
+      "eval_samples_per_second": 4.949,
+      "eval_steps_per_second": 4.949,
       "step": 100
     },
     {
       "distill_temperature": 1.865234375,
       "epoch": 0.48828125,
       "eval_loss": 4.573975086212158,
+      "eval_runtime": 12.4169,
+      "eval_samples_per_second": 5.154,
+      "eval_steps_per_second": 5.154,
       "step": 125
     },
     {
       "distill_temperature": 1.73828125,
       "epoch": 0.5859375,
       "eval_loss": 4.739337921142578,
+      "eval_runtime": 12.8004,
+      "eval_samples_per_second": 5.0,
+      "eval_steps_per_second": 5.0,
       "step": 150
     },
     {
       "distill_temperature": 1.611328125,
       "epoch": 0.68359375,
       "eval_loss": 4.90593957901001,
+      "eval_runtime": 12.4225,
+      "eval_samples_per_second": 5.152,
+      "eval_steps_per_second": 5.152,
       "step": 175
     },
     {
       "distill_temperature": 1.484375,
       "epoch": 0.78125,
       "eval_loss": 5.072885513305664,
+      "eval_runtime": 13.848,
+      "eval_samples_per_second": 4.622,
+      "eval_steps_per_second": 4.622,
       "step": 200
     },
     {
       "distill_temperature": 1.357421875,
       "epoch": 0.87890625,
       "eval_loss": 5.237745761871338,
+      "eval_runtime": 12.8627,
+      "eval_samples_per_second": 4.976,
+      "eval_steps_per_second": 4.976,
       "step": 225
     },
     {
       "distill_temperature": 1.23046875,
       "epoch": 0.9765625,
       "eval_loss": 5.399942398071289,
+      "eval_runtime": 12.3118,
+      "eval_samples_per_second": 5.198,
+      "eval_steps_per_second": 5.198,
       "step": 250
     },
     {
       "step": 256,
       "total_flos": 42322071132.0,
       "train_loss": 4.595640664920211,
+      "train_runtime": 224.4389,
+      "train_samples_per_second": 1.141,
+      "train_steps_per_second": 1.141
     }
   ],
   "logging_steps": 1,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2d23671895b8a0d20a8fc1fc999d056c5abf3a9e171b9f55654865cd05ff443
 size 5201

 version https://git-lfs.github.com/spec/v1
+oid sha256:2c27e3991bbbdd0cd23177ff38568dc878e3110c5ef9e98bf748cac6301cfc50
 size 5201