SmallThinker-4BA0.6B-Instruct REAP 0.25

This repository contains a REAP-pruned checkpoint derived from Tiiny/SmallThinker-4BA0.6B-Instruct.

The files in this repository are the model files directly at repository root, including the safetensors shards, tokenizer files, config, and custom SmallThinker modeling code.

Creation Notes

This pruned checkpoint was prepared in Codex with GPT5.5 assistance at the repository owner's direction. Codex was used to adapt the REAP workflow for SmallThinker, run pruning and smoke evaluation, and prepare the upload artifacts.

Pruning Summary

  • Base model: Tiiny/SmallThinker-4BA0.6B-Instruct
  • Pruning method: REAP layerwise expert pruning
  • Calibration dataset: theblackcat102/evol-codealpaca-v1
  • Requested compression ratio: 0.25
  • Effective experts pruned per layer: 8 / 32
  • Primary experts retained per layer: 24
  • Active experts per token: 4
  • Router weight renormalization: enabled
  • Calibration settings:
    • model_max_length=2048
    • batches_per_category=128
    • batch_size=1
    • batch_group_size=8
    • truncate=false

Local Smoke Evaluation

Greedy generation was checked on Japanese, English, and Chinese prompts.

Language Language check Notes
Japanese OK Understands the language, but output can become repetitive or partially degraded.
English OK Most stable among the three tested languages.
Chinese OK Produces Chinese answers, though sentence-count instructions may not be followed exactly.

Average generation time in the local smoke run was about 11.512 seconds across the three prompts on the test machine with CPU offload.

Usage

This model uses custom code, so load it with trust_remote_code=True.

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "sasa2000/SmallThinker-4BA0.6B-Instruct-REAP-0.25"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

Colab / Transformers Compatibility

This checkpoint uses Hugging Face custom modeling code. If loading in Google Colab or another notebook environment fails while importing modeling_smallthinker.py with an error such as cannot import name 'HybridCache' from 'transformers.cache_utils', the installed Transformers package is too old for the SmallThinker custom code. Upgrade Transformers and restart the runtime before loading the model:

!pip -q install -U "transformers>=4.55.0" "accelerate>=1.7.0" "safetensors"

After restarting, this import should succeed:

from transformers.cache_utils import HybridCache

If your installed Transformers version raises a later import error involving LossKwargs, upgrade Transformers or apply an equivalent compatibility shim. The local pruning run was tested with transformers==4.55.0 plus a REAP-side compatibility shim. GGUF runtimes may work even when this Python loading path fails, because GGUF does not execute Hugging Face modeling_smallthinker.py.

Caveats

This is an experimental pruned checkpoint. It was validated with load and short generation smoke tests, not a full benchmark suite. Quality can vary by language and task, especially in Japanese after pruning.

Downloads last month
41
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sasa2000/SmallThinker-4BA0.6B-Instruct-REAP-0.25

Finetuned
(3)
this model