Phase15-DeepSeek-FFT (WIP step 484)

Status: Work-in-progress training snapshot. Not a release.

On-device reasoning model: TinyLlama-1.1B with HyperNetwork-driven soft prompt + small raw-token window. Target: iPhone deployment (~3GB RAM limit), distilled from DeepSeek-R1 traces (dolphin-r1).

Architecture

For each answer token position k:

sp_k = HyperNet(sp_{k-1}.detach(), embed(a[k - raw_window - 1]))   # recurrent soft-prompt update
LLM input = [sp_k (128 soft tokens), a[k-raw_window:k] (last N raw tokens)]
            query KV cache held separately
  • S = 128 soft tokens Γ— 2048 dim
  • raw_window curriculum: 1 β†’ 2 β†’ 4 β†’ 8 β†’ 16 β†’ 32 (auto-bumped on plateau)
  • Multi-position auxiliary loss at sp_last + each raw[i] position

Files

File Purpose
ckpt_step484.pt Latest WIP checkpoint (7.5GB). Contains hypernet, llm, optimizer, scheduler, raw_window, samples_processed, global_step.
train_pure_simple.py Training script. Supports --auto_raw, --filter_cjk, --filter_toolcall, samples-skip on resume.
auto_train.sh Wrapper: auto-restart on OOM (batch -4 fallback), reads raw_window from ckpt, sets batch from a static table.
test_inference.py Sampling inference with rep_penalty + no-repeat ngram (loop suppression).
test_sp_collapse.py Verifies the SP is not collapsed into a raw-equivalent (cosine + null/noise sp variants).

Reproduce

# Resume training (raw_window read from ckpt)
nohup ./auto_train.sh >> auto_train.log 2>&1 &

# Inference
python test_inference.py --ckpt ckpt_step484.pt --max_new_tokens 400

# SP collapse verification
python test_sp_collapse.py --ckpt ckpt_step484.pt

Training data

  • local:/workspace/.hf_home/normalized/dolphin_r1.jsonl (300k β†’ 270k after CJK + tool_call filter)
  • Source: cognitivecomputations/dolphin-r1, normalized so output = f"<think>\n{reasoning}\n</think>\n\n{answer}"
  • Curriculum: sort by answer character length, short-first

Progress at step 484

  • lm avg β‰ˆ 1.51 (down from scratch 5.4)
  • lm min β‰ˆ 1.24 (super-long sample)
  • raw=8 plateau reached, next: raw=16
  • Inference: coherent prose for advice questions, math/code still broken (TinyLlama base limitation), </think> closure unreliable

Limitations

  • TinyLlama base struggles with arithmetic; math/code rely on additional data
  • Closure problem: <think> opens but model fails to commit to </think> reliably
  • Trained on 1 GPU (RTX 3090, $0.15/hr Vast.ai), batch=24-32
Downloads last month
332
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for baya1116/Phase15-DeepSeek-FFT

Finetuned
(557)
this model