SmolLM-135M-SFT-exp01

Supervised fine-tuning of SmolLM-135M-CPT-LoRA-r32 on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers — not a general chatbot.

This is exp01 in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.

Full Pipeline

arXiv ML papers (188)
        │
        ▼
   text-albumentations
   (chunking + constrained synthetic generation)
        │
        ▼
paperbd/paper_instructions_300K-v1
   (300K instruction-response pairs)
        │
        ▼
   SFT training (LoRA r=32, ChatML, train_on_responses_only)
        │
        ▼
   SmolLM-135M-SFT-exp01
        │
        ▼
   PaperResearcher API (10 structured tasks)

Model Description

Base model: paperbd/smollm_135M_arxiv_cpt — SmolLM-135M after continued pre-training on arXiv ML papers
Method: Supervised Fine-Tuning (SFT) with LoRA + train_on_responses_only
Domain: ML/arXiv paper research tasks
Design: Restricted API — 10 fixed task types, not a general chatbot

Data Generation Pipeline

The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline:

1. Chunking

Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation.

2. Augmentation with `text-albumentations`

Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity:

Task	Description	Output type
`bullet_augmentation`	Extract key points as markdown bullets	`list[str]`
`qa_pair_augmentation`	Generate question-answer pairs	`list[QAPair]`
`rephrase_augmentation`	Elaborate and restate the passage	`str`
`continuation_augmentation`	Continue from a passage prefix	`str`
`triplet_augmentation`	Extract knowledge graph triplets	`list[Triplet]`
`retrieval_augmentation`	Cross-chunk: which passage answers a question	`RetrievalResult`
`comparison_augmentation`	Cross-chunk: compare two passages	`str`

3. Constrained Decoding via Outlines

All generation during data prep uses Outlines for structured output — a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures:

QA pairs always have valid question / answer fields
Triplets always follow (subject, relation, object) format
Retrieval results always return a valid passage index

Default runtime: mlx-community/Qwen3.5-4B-OptiQ-4bit via MLX (Apple Silicon). Async and batch variants available for large-scale generation.

4. Dataset

The final dataset paperbd/paper_instructions_300K-v1 contains 300K instruction-response pairs across all task types, uploaded to HuggingFace for reuse.

Training Details

Parameter	Value
LoRA rank	32
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params	~9.7M / 144M (6.77%)
Quantization	4-bit (QLoRA via Unsloth)
Batch size	32
Gradient accumulation	4 (effective batch: 128)
Learning rate	2e-4 (linear decay)
Warmup ratio	0.03
Epochs	3
Total steps	11,355
Sequence length	2048 (packed)
Chat template	ChatML
Response-only training	Yes — loss on assistant turns only
Data variations	2 (conversation extension) → ~600K effective examples
Hardware	NVIDIA RTX 4090
Training time	~10 hours

Evaluation

Method

1000 samples drawn from the paper_instructions_300K-v1 test split. The fine-tuned model generates responses, which are then scored by grok-3-mini as an LLM judge.

Judge Prompt (4 dimensions, 1–5 scale)

Faithfulness — Does the response contain only factually correct claims? Penalise hallucinations.
Answer Correctness — How closely does the response match the ground truth semantically?
Relevance — Does the response directly address what was asked, without padding or going off-topic?
Completeness — Does the response cover the key points from the ground truth without omitting important details?

Results

Metric	Score (1–5)
Faithfulness	2.70
Answer Correctness	1.98
Relevance	3.04
Completeness	1.85
Overall	2.39

Interpretation: Relevance is the strongest dimension — the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely.

PaperResearcher API

The model is designed to be used as a structured API, not a free-form chatbot. The PaperResearcher class exposes 10 typed methods, each using the exact instruction strings the model was trained on:

from paper_researcher import PaperResearcher

researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
passage = "Attention mechanisms compute weighted sums of values..."

# Extract key points
bullets: list[str] = researcher.extract_bullets(passage)

# Generate Q&A pairs
pairs: list[QAPair] = researcher.generate_qa_pairs(passage)
# → [QAPair(question="What does attention compute?", answer="Weighted sums of values")]

# Extract knowledge graph triplets
triplets: list[Triplet] = researcher.extract_triplets(passage)
# → [Triplet(subject="attention", relation="computes", object="weighted sums")]

# Answer a question given a passage
answer: str = researcher.answer("What does attention compute?", passage)

# Rephrase and elaborate
rephrased: str = researcher.rephrase(passage)

# Continue a passage from its beginning
continuation: str = researcher.continue_from(passage[:200])

# Extract a single key fact
fact: str = researcher.extract_fact(passage)

# Generate a question from a passage
question: str = researcher.generate_question(passage)

# Compare two passages
comparison: str = researcher.compare(passage_a, passage_b)

# Retrieval: which passage answers the question?
result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b])
# → RetrievalResult(index=0, reasoning="Passage 1 directly defines...")

Return Types

Method	Return Type	Description
`extract_bullets`	`list[str]`	Parsed bullet points
`generate_qa_pairs`	`list[QAPair]`	`.question` and `.answer` fields
`extract_triplets`	`list[Triplet]`	`.subject`, `.relation`, `.object` fields
`find_relevant`	`RetrievalResult`	`.index` (0-based), `.reasoning`
All others	`str`	Raw text response

Raw Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01"
base_model_id = "paperbd/smollm_135M_arxiv_cpt"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)

messages = [
    {"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."},
    {"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

135M parameter model — limited factual recall and reasoning capacity
Trained on synthetic data — instruction format matters; use the exact prompts from tasks.py
Relevance strongest (3.04/5); correctness and completeness weak (< 2/5)
Best suited for structured extraction (bullets, triplets, QA) over open-ended generation
No comparison against uninstructed base model yet — exp02 planned

Related Models

Model	Description
JaydeepR/SmolLM-135M-CPT-LoRA-r32	CPT base (this model's starting point)
HuggingFaceTB/SmolLM-135M	Original base model

Citation

@misc{smollm135m-sft-exp01,
  author = {Jaydeep Raijada},
  title  = {SmolLM-135M SFT exp01 — Instruction Tuning on ML Paper Research Tasks},
  year   = {2026},
  url    = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JaydeepR/SmolLM-135M-SFT-exp01

Base model

paperbd/smollm_135M_arxiv_cpt

Adapter

(1)

this model

JaydeepR
/

SmolLM-135M-SFT-exp01

SmolLM-135M-SFT-exp01

Full Pipeline

Model Description

Data Generation Pipeline

1. Chunking

2. Augmentation with `text-albumentations`

3. Constrained Decoding via Outlines

4. Dataset

Training Details

Evaluation

Method

Judge Prompt (4 dimensions, 1–5 scale)

Results

PaperResearcher API

Return Types

Raw Inference

Limitations

Related Models

Citation

Model tree for JaydeepR/SmolLM-135M-SFT-exp01

Dataset used to train JaydeepR/SmolLM-135M-SFT-exp01

Space using JaydeepR/SmolLM-135M-SFT-exp01 1

SmolLM-135M-SFT-exp01

Full Pipeline

Model Description

Data Generation Pipeline

1. Chunking

2. Augmentation with text-albumentations

3. Constrained Decoding via Outlines

4. Dataset

Training Details

Evaluation

Method

Judge Prompt (4 dimensions, 1–5 scale)

Results

PaperResearcher API

Return Types

Raw Inference

Limitations

Related Models

Citation

Model tree for JaydeepR/SmolLM-135M-SFT-exp01

Dataset used to train JaydeepR/SmolLM-135M-SFT-exp01

Space using JaydeepR/SmolLM-135M-SFT-exp01 1

2. Augmentation with `text-albumentations`