Smollm3_720prms
A fine-tuned version of SmolLM2-360M-Instruct specialized for machine learning education and research paper explanations.
🎯 Model Description
This model assists with:
- ML Project Advisory: Architecture decisions, best practices, data handling strategies
- Research Paper Assistance: Explaining papers, concepts, and methodologies
- Technical Guidance: Deployment, optimization, troubleshooting
Perfect for students, researchers, and ML practitioners seeking quick, reliable guidance.
📊 Model Details
- Base Model: HuggingFaceTB/SmolLM2-360M-Instruct
- Parameters: 360 Million
- Training Method: Full fine-tuning with Unsloth (2x faster training)
- Training Platform: Google Colab (Tesla T4 GPU)
- Context Length: 2048 tokens
- Precision: BFloat16
- Model Size: ~720 MB
🚀 Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Xen0pp/Smollm3_720prms",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Xen0pp/Smollm3_720prms")
# Prepare your question
messages = [
{"role": "system", "content": "You are an ML project advisor."},
{"role": "user", "content": "What's the best approach for image classification with limited data?"}
]
# Generate response
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using Pipeline API
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="Xen0pp/Smollm3_720prms",
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Explain gradient descent"}]
output = pipe(messages, max_new_tokens=200)
print(output[0]['generated_text'][-1]['content'])
💡 Example Queries
ML Project Guidance
messages = [
{"role": "system", "content": "You are an ML project advisor."},
{"role": "user", "content": "How should I handle imbalanced datasets in binary classification?"}
]
Research Paper Explanations
messages = [
{"role": "system", "content": "You are a research paper assistant."},
{"role": "user", "content": "Explain the key innovation in the ResNet paper"}
]
Best Practices
messages = [
{"role": "user", "content": "What are best practices for training deep neural networks?"}
]
⚙️ Training Details
Hardware
- GPU: NVIDIA Tesla T4 (16GB VRAM)
- Platform: Google Colab
- Training Time: ~1 minute
- Peak GPU Memory: 2.73 GB (18.7% utilization)
Hyperparameters
- Optimizer: AdamW 8-bit
- Learning Rate: 2e-4
- Batch Size: 2 per device
- Gradient Accumulation: 4 steps (effective batch size: 8)
- Epochs: 3
- Max Sequence Length: 2048 tokens
- Weight Decay: 0.01
- Warmup Steps: 5
Training Framework
- Library: Unsloth
- Benefits: 2x faster training, 60% less memory usage
- LoRA Config: r=16, alpha=16, targeting all attention layers
📚 Training Data
Fine-tuned on curated examples covering:
- Computer vision architectures (CNNs, ViTs, ResNets)
- NLP concepts (Transformers, attention mechanisms)
- Common ML challenges (imbalanced data, overfitting, deployment)
- Research paper summaries (ResNet, Attention, optimization)
- Best practices and practical implementation guidance
🎓 Intended Use
Primary Use Cases
- Educational: Helping students learn ML concepts
- Research: Quick explanations of papers and techniques
- Prototyping: Rapid guidance during ML project development
- Reference: Quick lookup for best practices
Out of Scope
- Production decision-making without human verification
- Medical or legal advice
- Real-time critical systems
- Replacing domain experts
⚠️ Limitations
- Training Data Size: Trained on limited examples; may hallucinate on unfamiliar topics
- Knowledge Cutoff: Based on SmolLM2 training data (up to ~2024)
- Accuracy: Should verify important information with authoritative sources
- Scope: Focused on ML/AI topics; limited knowledge in other domains
- Bias: May inherit biases from base model and training data
📈 Evaluation
The model was evaluated through:
- Manual testing on diverse ML questions
- Comparison with base model responses
- Human review of accuracy and helpfulness
Note: Formal benchmarks coming soon. Contributions welcome!
🔧 Deployment Examples
FastAPI Server
from fastapi import FastAPI
from transformers import pipeline
app = FastAPI()
pipe = pipeline("text-generation", model="Xen0pp/Smollm3_720prms")
@app.post("/generate")
async def generate(prompt: str):
messages = [{"role": "user", "content": prompt}]
result = pipe(messages, max_new_tokens=200)[0]['generated_text'][-1]['content']
return {"response": result}
Gradio Interface
import gradio as gr
from transformers import pipeline
pipe = pipeline("text-generation", model="Xen0pp/Smollm3_720prms")
def chat(message, history):
messages = [{"role": "user", "content": message}]
response = pipe(messages, max_new_tokens=300)[0]['generated_text'][-1]['content']
return response
gr.ChatInterface(chat, title="ML Assistant").launch()
📜 License
Apache 2.0 (same as base model)
🙏 Acknowledgments
- Base Model: HuggingFace SmolLM Team
- Training Framework: Unsloth AI
- Platform: Google Colab
📧 Citation
If you use this model, please cite:
@misc{Smollm3_720prms,
author = {Xen0pp},
title = {Smollm3_720prms: Fine-tuned SmolLM for ML Assistance},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/Xen0pp/Smollm3_720prms}
}
🔗 Links
- Model Repository: https://huggingface.co/Xen0pp/Smollm3_720prms
- Base Model: https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct
- Unsloth: https://github.com/unslothai/unsloth
Disclaimer: This is a fine-tuned small language model for educational purposes. Always verify critical information with authoritative sources and domain experts.
- Downloads last month
- 75