Instructions to use JaydeepR/SmolLM-135M-SFT-exp01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use JaydeepR/SmolLM-135M-SFT-exp01 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="JaydeepR/SmolLM-135M-SFT-exp01", max_seq_length=2048, )
SmolLM-135M-SFT-exp01
Supervised fine-tuning of SmolLM-135M-CPT-LoRA-r32 on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers β not a general chatbot.
This is exp01 in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.
Full Pipeline
arXiv ML papers (188)
β
βΌ
text-albumentations
(chunking + constrained synthetic generation)
β
βΌ
paperbd/paper_instructions_300K-v1
(300K instruction-response pairs)
β
βΌ
SFT training (LoRA r=32, ChatML, train_on_responses_only)
β
βΌ
SmolLM-135M-SFT-exp01
β
βΌ
PaperResearcher API (10 structured tasks)
Model Description
- Base model:
paperbd/smollm_135M_arxiv_cptβ SmolLM-135M after continued pre-training on arXiv ML papers - Method: Supervised Fine-Tuning (SFT) with LoRA +
train_on_responses_only - Domain: ML/arXiv paper research tasks
- Design: Restricted API β 10 fixed task types, not a general chatbot
Data Generation Pipeline
The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline:
1. Chunking
Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation.
2. Augmentation with text-albumentations
Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity:
| Task | Description | Output type |
|---|---|---|
bullet_augmentation |
Extract key points as markdown bullets | list[str] |
qa_pair_augmentation |
Generate question-answer pairs | list[QAPair] |
rephrase_augmentation |
Elaborate and restate the passage | str |
continuation_augmentation |
Continue from a passage prefix | str |
triplet_augmentation |
Extract knowledge graph triplets | list[Triplet] |
retrieval_augmentation |
Cross-chunk: which passage answers a question | RetrievalResult |
comparison_augmentation |
Cross-chunk: compare two passages | str |
3. Constrained Decoding via Outlines
All generation during data prep uses Outlines for structured output β a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures:
- QA pairs always have valid
question/answerfields - Triplets always follow
(subject, relation, object)format - Retrieval results always return a valid passage index
Default runtime: mlx-community/Qwen3.5-4B-OptiQ-4bit via MLX (Apple Silicon). Async and batch variants available for large-scale generation.
4. Dataset
The final dataset paperbd/paper_instructions_300K-v1 contains 300K instruction-response pairs across all task types, uploaded to HuggingFace for reuse.
Training Details
| Parameter | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | ~9.7M / 144M (6.77%) |
| Quantization | 4-bit (QLoRA via Unsloth) |
| Batch size | 32 |
| Gradient accumulation | 4 (effective batch: 128) |
| Learning rate | 2e-4 (linear decay) |
| Warmup ratio | 0.03 |
| Epochs | 3 |
| Total steps | 11,355 |
| Sequence length | 2048 (packed) |
| Chat template | ChatML |
| Response-only training | Yes β loss on assistant turns only |
| Data variations | 2 (conversation extension) β ~600K effective examples |
| Hardware | NVIDIA RTX 4090 |
| Training time | ~10 hours |
Evaluation
Method
1000 samples drawn from the paper_instructions_300K-v1 test split. The fine-tuned model generates responses, which are then scored by grok-3-mini as an LLM judge.
Judge Prompt (4 dimensions, 1β5 scale)
- Faithfulness β Does the response contain only factually correct claims? Penalise hallucinations.
- Answer Correctness β How closely does the response match the ground truth semantically?
- Relevance β Does the response directly address what was asked, without padding or going off-topic?
- Completeness β Does the response cover the key points from the ground truth without omitting important details?
Results
| Metric | Score (1β5) |
|---|---|
| Faithfulness | 2.70 |
| Answer Correctness | 1.98 |
| Relevance | 3.04 |
| Completeness | 1.85 |
| Overall | 2.39 |
Interpretation: Relevance is the strongest dimension β the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely.
PaperResearcher API
The model is designed to be used as a structured API, not a free-form chatbot. The PaperResearcher class exposes 10 typed methods, each using the exact instruction strings the model was trained on:
from paper_researcher import PaperResearcher
researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
passage = "Attention mechanisms compute weighted sums of values..."
# Extract key points
bullets: list[str] = researcher.extract_bullets(passage)
# Generate Q&A pairs
pairs: list[QAPair] = researcher.generate_qa_pairs(passage)
# β [QAPair(question="What does attention compute?", answer="Weighted sums of values")]
# Extract knowledge graph triplets
triplets: list[Triplet] = researcher.extract_triplets(passage)
# β [Triplet(subject="attention", relation="computes", object="weighted sums")]
# Answer a question given a passage
answer: str = researcher.answer("What does attention compute?", passage)
# Rephrase and elaborate
rephrased: str = researcher.rephrase(passage)
# Continue a passage from its beginning
continuation: str = researcher.continue_from(passage[:200])
# Extract a single key fact
fact: str = researcher.extract_fact(passage)
# Generate a question from a passage
question: str = researcher.generate_question(passage)
# Compare two passages
comparison: str = researcher.compare(passage_a, passage_b)
# Retrieval: which passage answers the question?
result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b])
# β RetrievalResult(index=0, reasoning="Passage 1 directly defines...")
Return Types
| Method | Return Type | Description |
|---|---|---|
extract_bullets |
list[str] |
Parsed bullet points |
generate_qa_pairs |
list[QAPair] |
.question and .answer fields |
extract_triplets |
list[Triplet] |
.subject, .relation, .object fields |
find_relevant |
RetrievalResult |
.index (0-based), .reasoning |
| All others | str |
Raw text response |
Raw Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01"
base_model_id = "paperbd/smollm_135M_arxiv_cpt"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)
messages = [
{"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."},
{"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- 135M parameter model β limited factual recall and reasoning capacity
- Trained on synthetic data β instruction format matters; use the exact prompts from
tasks.py - Relevance strongest (3.04/5); correctness and completeness weak (< 2/5)
- Best suited for structured extraction (bullets, triplets, QA) over open-ended generation
- No comparison against uninstructed base model yet β exp02 planned
Related Models
| Model | Description |
|---|---|
| JaydeepR/SmolLM-135M-CPT-LoRA-r32 | CPT base (this model's starting point) |
| HuggingFaceTB/SmolLM-135M | Original base model |
Citation
@misc{smollm135m-sft-exp01,
author = {Jaydeep Raijada},
title = {SmolLM-135M SFT exp01 β Instruction Tuning on ML Paper Research Tasks},
year = {2026},
url = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01}
}
Model tree for JaydeepR/SmolLM-135M-SFT-exp01
Base model
paperbd/smollm_135M_arxiv_cpt