Phase15-DeepSeek-FFT (WIP step 484)
Status: Work-in-progress training snapshot. Not a release.
On-device reasoning model: TinyLlama-1.1B with HyperNetwork-driven soft prompt + small raw-token window. Target: iPhone deployment (~3GB RAM limit), distilled from DeepSeek-R1 traces (dolphin-r1).
Architecture
For each answer token position k:
sp_k = HyperNet(sp_{k-1}.detach(), embed(a[k - raw_window - 1])) # recurrent soft-prompt update
LLM input = [sp_k (128 soft tokens), a[k-raw_window:k] (last N raw tokens)]
query KV cache held separately
S = 128soft tokens Γ 2048 dimraw_windowcurriculum: 1 β 2 β 4 β 8 β 16 β 32 (auto-bumped on plateau)- Multi-position auxiliary loss at
sp_last+ eachraw[i]position
Files
| File | Purpose |
|---|---|
ckpt_step484.pt |
Latest WIP checkpoint (7.5GB). Contains hypernet, llm, optimizer, scheduler, raw_window, samples_processed, global_step. |
train_pure_simple.py |
Training script. Supports --auto_raw, --filter_cjk, --filter_toolcall, samples-skip on resume. |
auto_train.sh |
Wrapper: auto-restart on OOM (batch -4 fallback), reads raw_window from ckpt, sets batch from a static table. |
test_inference.py |
Sampling inference with rep_penalty + no-repeat ngram (loop suppression). |
test_sp_collapse.py |
Verifies the SP is not collapsed into a raw-equivalent (cosine + null/noise sp variants). |
Reproduce
# Resume training (raw_window read from ckpt)
nohup ./auto_train.sh >> auto_train.log 2>&1 &
# Inference
python test_inference.py --ckpt ckpt_step484.pt --max_new_tokens 400
# SP collapse verification
python test_sp_collapse.py --ckpt ckpt_step484.pt
Training data
local:/workspace/.hf_home/normalized/dolphin_r1.jsonl(300k β 270k after CJK + tool_call filter)- Source: cognitivecomputations/dolphin-r1, normalized so
output = f"<think>\n{reasoning}\n</think>\n\n{answer}" - Curriculum: sort by answer character length, short-first
Progress at step 484
- lm avg β 1.51 (down from scratch 5.4)
- lm min β 1.24 (super-long sample)
- raw=8 plateau reached, next: raw=16
- Inference: coherent prose for advice questions, math/code still broken (TinyLlama base limitation),
</think>closure unreliable
Limitations
- TinyLlama base struggles with arithmetic; math/code rely on additional data
- Closure problem:
<think>opens but model fails to commit to</think>reliably - Trained on 1 GPU (RTX 3090, $0.15/hr Vast.ai), batch=24-32
- Downloads last month
- 332
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for baya1116/Phase15-DeepSeek-FFT
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0