SmolLM-135M-SFT-exp01

Supervised fine-tuning of SmolLM-135M-CPT-LoRA-r32 on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers β€” not a general chatbot.

This is exp01 in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.


Full Pipeline

arXiv ML papers (188)
        β”‚
        β–Ό
   text-albumentations
   (chunking + constrained synthetic generation)
        β”‚
        β–Ό
paperbd/paper_instructions_300K-v1
   (300K instruction-response pairs)
        β”‚
        β–Ό
   SFT training (LoRA r=32, ChatML, train_on_responses_only)
        β”‚
        β–Ό
   SmolLM-135M-SFT-exp01
        β”‚
        β–Ό
   PaperResearcher API (10 structured tasks)

Model Description

  • Base model: paperbd/smollm_135M_arxiv_cpt β€” SmolLM-135M after continued pre-training on arXiv ML papers
  • Method: Supervised Fine-Tuning (SFT) with LoRA + train_on_responses_only
  • Domain: ML/arXiv paper research tasks
  • Design: Restricted API β€” 10 fixed task types, not a general chatbot

Data Generation Pipeline

The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline:

1. Chunking

Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation.

2. Augmentation with text-albumentations

Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity:

Task Description Output type
bullet_augmentation Extract key points as markdown bullets list[str]
qa_pair_augmentation Generate question-answer pairs list[QAPair]
rephrase_augmentation Elaborate and restate the passage str
continuation_augmentation Continue from a passage prefix str
triplet_augmentation Extract knowledge graph triplets list[Triplet]
retrieval_augmentation Cross-chunk: which passage answers a question RetrievalResult
comparison_augmentation Cross-chunk: compare two passages str

3. Constrained Decoding via Outlines

All generation during data prep uses Outlines for structured output β€” a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures:

  • QA pairs always have valid question / answer fields
  • Triplets always follow (subject, relation, object) format
  • Retrieval results always return a valid passage index

Default runtime: mlx-community/Qwen3.5-4B-OptiQ-4bit via MLX (Apple Silicon). Async and batch variants available for large-scale generation.

4. Dataset

The final dataset paperbd/paper_instructions_300K-v1 contains 300K instruction-response pairs across all task types, uploaded to HuggingFace for reuse.


Training Details

Parameter Value
LoRA rank 32
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params ~9.7M / 144M (6.77%)
Quantization 4-bit (QLoRA via Unsloth)
Batch size 32
Gradient accumulation 4 (effective batch: 128)
Learning rate 2e-4 (linear decay)
Warmup ratio 0.03
Epochs 3
Total steps 11,355
Sequence length 2048 (packed)
Chat template ChatML
Response-only training Yes β€” loss on assistant turns only
Data variations 2 (conversation extension) β†’ ~600K effective examples
Hardware NVIDIA RTX 4090
Training time ~10 hours

Evaluation

Method

1000 samples drawn from the paper_instructions_300K-v1 test split. The fine-tuned model generates responses, which are then scored by grok-3-mini as an LLM judge.

Judge Prompt (4 dimensions, 1–5 scale)

  • Faithfulness β€” Does the response contain only factually correct claims? Penalise hallucinations.
  • Answer Correctness β€” How closely does the response match the ground truth semantically?
  • Relevance β€” Does the response directly address what was asked, without padding or going off-topic?
  • Completeness β€” Does the response cover the key points from the ground truth without omitting important details?

Results

Metric Score (1–5)
Faithfulness 2.70
Answer Correctness 1.98
Relevance 3.04
Completeness 1.85
Overall 2.39

Interpretation: Relevance is the strongest dimension β€” the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely.


PaperResearcher API

The model is designed to be used as a structured API, not a free-form chatbot. The PaperResearcher class exposes 10 typed methods, each using the exact instruction strings the model was trained on:

from paper_researcher import PaperResearcher

researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
passage = "Attention mechanisms compute weighted sums of values..."

# Extract key points
bullets: list[str] = researcher.extract_bullets(passage)

# Generate Q&A pairs
pairs: list[QAPair] = researcher.generate_qa_pairs(passage)
# β†’ [QAPair(question="What does attention compute?", answer="Weighted sums of values")]

# Extract knowledge graph triplets
triplets: list[Triplet] = researcher.extract_triplets(passage)
# β†’ [Triplet(subject="attention", relation="computes", object="weighted sums")]

# Answer a question given a passage
answer: str = researcher.answer("What does attention compute?", passage)

# Rephrase and elaborate
rephrased: str = researcher.rephrase(passage)

# Continue a passage from its beginning
continuation: str = researcher.continue_from(passage[:200])

# Extract a single key fact
fact: str = researcher.extract_fact(passage)

# Generate a question from a passage
question: str = researcher.generate_question(passage)

# Compare two passages
comparison: str = researcher.compare(passage_a, passage_b)

# Retrieval: which passage answers the question?
result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b])
# β†’ RetrievalResult(index=0, reasoning="Passage 1 directly defines...")

Return Types

Method Return Type Description
extract_bullets list[str] Parsed bullet points
generate_qa_pairs list[QAPair] .question and .answer fields
extract_triplets list[Triplet] .subject, .relation, .object fields
find_relevant RetrievalResult .index (0-based), .reasoning
All others str Raw text response

Raw Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01"
base_model_id = "paperbd/smollm_135M_arxiv_cpt"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)

messages = [
    {"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."},
    {"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

  • 135M parameter model β€” limited factual recall and reasoning capacity
  • Trained on synthetic data β€” instruction format matters; use the exact prompts from tasks.py
  • Relevance strongest (3.04/5); correctness and completeness weak (< 2/5)
  • Best suited for structured extraction (bullets, triplets, QA) over open-ended generation
  • No comparison against uninstructed base model yet β€” exp02 planned

Related Models

Model Description
JaydeepR/SmolLM-135M-CPT-LoRA-r32 CPT base (this model's starting point)
HuggingFaceTB/SmolLM-135M Original base model

Citation

@misc{smollm135m-sft-exp01,
  author = {Jaydeep Raijada},
  title  = {SmolLM-135M SFT exp01 β€” Instruction Tuning on ML Paper Research Tasks},
  year   = {2026},
  url    = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for JaydeepR/SmolLM-135M-SFT-exp01

Adapter
(1)
this model

Dataset used to train JaydeepR/SmolLM-135M-SFT-exp01

Space using JaydeepR/SmolLM-135M-SFT-exp01 1