Mid-Training - Phase 005: Kurtis SFT Mix + OpenHermes2.5
#3
by mrs83 - opened
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|----------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_easy | 1|none | 0|acc |β |0.4625|Β± |0.0102|
| | |none | 0|acc_norm|β |0.4158|Β± |0.0101|
|hellaswag | 1|none | 0|acc |β |0.2920|Β± |0.0045|
| | |none | 0|acc_norm|β |0.3186|Β± |0.0046|
|openbookqa| 1|none | 0|acc |β |0.1620|Β± |0.0165|
| | |none | 0|acc_norm|β |0.2960|Β± |0.0204|
|piqa | 1|none | 0|acc |β |0.6328|Β± |0.0112|
| | |none | 0|acc_norm|β |0.6153|Β± |0.0114|
|sciq | 1|none | 0|acc |β |0.7540|Β± |0.0136|
| | |none | 0|acc_norm|β |0.6750|Β± |0.0148|
|winogrande| 1|none | 0|acc |β |0.5043|Β± |0.0141|
interesting failure.... it retrieves the correct high-density medical terms (DKA, insulin resistance, etc) but flips the logic (calling high sugar "low").
user: What medical conditions are often associated with diabetes?
assistant: 1. Diabetic ketoacidosis (DKA): This is a condition where the body's blood sugar levels rise too high, leading to an abnormally low level of glucose in the urine and/or stomach contents.
2. Hyperglycemia: A common complication of type-I diabetes mellitus occurs when there isnβt enough insulin or other hormones produced by the pancreas for proper functioning of the cells that produce it. Symptoms include nausea, vomiting, weight loss, fatigue, vision changes, skin rashes, and difficulty breathing due to reduced oxygen supply from the lungs. Treatment options may vary depending on severity but typically involve medication such as prednisone, metformin, or glyburide tablets.
3. Insulin resistance syndrome (IGHD) β In this disorder, patients develop elevated triglycerides which can lead to increased risk factors like obstructive sleep apnea (OSA), hypertension, heart disease, stroke, kidney failure, and even blindness if left untreated. It also leads to decreased quality of life and impairments related to memory consolidation and learning ability.
4. Type II diabetes β The prevalence rate among adults aged over 65 years has been increasing steadily since the mid-90s, rising significantly during adolescence and early twenties. However, recent studies suggest that up to one third of all cases occur at least once per year.
Diagnosis & Management
The primary goal of treatment should be to manage symptoms effectively so that they do not worsen over time. Medications used to treat these diseases might help reduce inflammation, improve circulatory control, decrease cholesterol levels, lower blood pressure, prevent cardiovascular events, increase energy expenditure, promote healing after surgery, enhance immune system response, etc., while minimizing side effects. Some potential therapies include exercise training, meditation, yoga, deep breathing exercises, progressive muscle relaxation techniques, hypnotherapy, biofeedback, cognitive behavioral therapy, and pharmacy management.
Echo-DSRN$ uv run lm_eval --model hf --model_args pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.3,trust_remote_code=True,device_map="auto" --tasks truthfulqa_mc1,truthfulqa_mc2 --output_path ./results_sft_smoltalk_phase5.3 --batch_size 1 --apply_chat_template
2026-02-23:00:22:29 INFO [__main__:465] Selected Tasks: ['truthfulqa_mc1', 'truthfulqa_mc2']
2026-02-23:00:22:29 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2026-02-23:00:22:29 INFO [evaluator:240] Initializing hf model, with arguments: {'pretrained': 'models/Echo-DSRN-Small-Kurtis-EON1-v0.3', 'trust_remote_code': True, 'device_map':
'auto'}
2026-02-23:00:22:30 INFO [models.huggingface:158] Using device 'cuda'
2026-02-23:00:22:30 INFO [models.huggingface:545] Model type cannot be determined. Using default model type 'causal'
2026-02-23:00:22:30 INFO [models.huggingface:426] Model parallel was set to False.
2026-02-23:00:22:35 INFO [tasks:695] Selected tasks:
2026-02-23:00:22:35 INFO [tasks:686] Task: truthfulqa_mc2 (truthfulqa/truthfulqa_mc2.yaml)
2026-02-23:00:22:35 INFO [tasks:686] Task: truthfulqa_mc1 (truthfulqa/truthfulqa_mc1.yaml)
2026-02-23:00:22:35 WARNING [evaluator:480] Chat template formatting change affects loglikelihood and multiple-choice tasks. See docs/chat-template-readme.md for details.
2026-02-23:00:22:35 INFO [api.task:434] Building contexts for truthfulqa_mc2 on rank 0...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 817/817 [00:00<00:00, 1676.73it/s]
2026-02-23:00:22:35 INFO [api.task:434] Building contexts for truthfulqa_mc1 on rank 0...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 817/817 [00:00<00:00, 2426.12it/s]
2026-02-23:00:22:36 INFO [evaluator:574] Running loglikelihood requests
Running loglikelihood requests: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 9996/9996 [11:12<00:00, 14.87it/s]
2026-02-23:00:33:50 INFO [loggers.evaluation_tracker:209] Saving results aggregated
hf (pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.3,trust_remote_code=True,device_map=auto), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc1| 2|none | 0|acc |β |0.2754|Β± |0.0156|
|truthfulqa_mc2| 3|none | 0|acc |β |0.4292|Β± |0.0153|





