Phase15-DeepSeek-FFT (WIP step 484)

Status: Work-in-progress training snapshot. Not a release.

On-device reasoning model: TinyLlama-1.1B with HyperNetwork-driven soft prompt + small raw-token window. Target: iPhone deployment (~3GB RAM limit), distilled from DeepSeek-R1 traces (dolphin-r1).

Architecture

For each answer token position k:

sp_k = HyperNet(sp_{k-1}.detach(), embed(a[k - raw_window - 1]))   # recurrent soft-prompt update
LLM input = [sp_k (128 soft tokens), a[k-raw_window:k] (last N raw tokens)]
            query KV cache held separately

S = 128 soft tokens × 2048 dim
raw_window curriculum: 1 → 2 → 4 → 8 → 16 → 32 (auto-bumped on plateau)
Multi-position auxiliary loss at sp_last + each raw[i] position

Files

File	Purpose
`ckpt_step484.pt`	Latest WIP checkpoint (7.5GB). Contains `hypernet`, `llm`, `optimizer`, `scheduler`, `raw_window`, `samples_processed`, `global_step`.
`train_pure_simple.py`	Training script. Supports `--auto_raw`, `--filter_cjk`, `--filter_toolcall`, samples-skip on resume.
`auto_train.sh`	Wrapper: auto-restart on OOM (batch -4 fallback), reads `raw_window` from ckpt, sets batch from a static table.
`test_inference.py`	Sampling inference with rep_penalty + no-repeat ngram (loop suppression).
`test_sp_collapse.py`	Verifies the SP is not collapsed into a raw-equivalent (cosine + null/noise sp variants).

Reproduce

# Resume training (raw_window read from ckpt)
nohup ./auto_train.sh >> auto_train.log 2>&1 &

# Inference
python test_inference.py --ckpt ckpt_step484.pt --max_new_tokens 400

# SP collapse verification
python test_sp_collapse.py --ckpt ckpt_step484.pt

Training data

local:/workspace/.hf_home/normalized/dolphin_r1.jsonl (300k → 270k after CJK + tool_call filter)
Source: cognitivecomputations/dolphin-r1, normalized so output = f"<think>\n{reasoning}\n</think>\n\n{answer}"
Curriculum: sort by answer character length, short-first

Progress at step 484

lm avg ≈ 1.51 (down from scratch 5.4)
lm min ≈ 1.24 (super-long sample)
raw=8 plateau reached, next: raw=16
Inference: coherent prose for advice questions, math/code still broken (TinyLlama base limitation), </think> closure unreliable

Limitations

TinyLlama base struggles with arithmetic; math/code rely on additional data
Closure problem: <think> opens but model fails to commit to </think> reliably
Trained on 1 GPU (RTX 3090, $0.15/hr Vast.ai), batch=24-32

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baya1116/Phase15-DeepSeek-FFT

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(569)

this model