llm_scs_3.5_35B

Domain-specialized fine-tune of unsloth/Qwen3.5-35B-A3B (a Qwen3.5 VL MoE: 35B total / ~3B active, 256 experts, hybrid linear+full attention, with a vision tower) on a corpus of spinal-cord-stimulation (SCS) and motor-recovery research papers.

This is the merged checkpoint (base + adapter folded in), text + vision, in BF16 — loadable directly with transformers / vLLM / Unsloth.

Training

  • Method: LoRA continued-pretraining, text decoder only (vision tower and MTP head left frozen/untouched).
  • Adapted modules: self-attention q/k/v/o_proj, gated-DeltaNet linear-attn projections, and the always-on shared-expert MLP (310 modules, ~21M params, 0.06% of the model). The 256 routed experts are fused tensors and were not adapted.
  • Data: 11 markdown papers on epidural/cervical SCS, motor recovery, and stroke rehabilitation, packed into 1024-token sequences.
  • Schedule: 2 epochs, LoRA r=16 / α=32, lr 2e-4 cosine, bf16 + gradient checkpointing, effective batch 8. Trained across 2× RTX A6000.
  • Result: train loss 1.48, eval loss 1.53 → 1.49.

Notes

  • The base model's multi-token-prediction (MTP) head is not included (mtp_num_hidden_layers: 0); it is used only for speculative decoding and does not affect generation.
  • License and intended use follow the base model unsloth/Qwen3.5-35B-A3B.
  • Research artifact; not a medical device and not for clinical decision-making.

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "achuthc1298/llm_scs_3.5_35B", dtype="bfloat16", device_map="auto",
)
proc = AutoProcessor.from_pretrained("achuthc1298/llm_scs_3.5_35B")
Downloads last month
24
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for achuthc1298/llm_scs_3.5_35B

Adapter
(11)
this model