🌟 Twinkel LLM - 72M Parameters (v0.1-alpha)
Twinkel LLM is an experimental 72M parameter language model created by Kunal Pandey as a learning project.
⚠️ Status: Early experimental release (v0.1-alpha)
🚀 Quick Start (CPU Inference)
⚠️ Important: This model currently works best on CPU. GPU inference has known issues that are being resolved in future versions.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained(
"Kunal7370944861/Twinkel-LLM-72M",
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
"Kunal7370944861/Twinkel-LLM-72M",
trust_remote_code=True,
torch_dtype=torch.float32,
device_map="cpu" # Force CPU for stability
)
# Generate response
def chat(message):
messages = [{"role": "user", "content": message}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(
prompt,
return_tensors="pt",
return_token_type_ids=False # Important!
)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Test
response = chat("What is Python?")
print(response)
📋 Model Details
- Parameters: 72M (72 million)
- Architecture: Custom decoder-only transformer
- Hidden size: 448
- Layers: 6
- Attention: Grouped Query Attention (GQA)
- FFN: SwiGLU activation
- Position encoding: RoPE
- Context length: 512 tokens
- Tokenizer: SmolLM3 tokenizer (128K vocab)
- Training: Pre-trained on C4 + instruction fine-tuning
- Creator: Kunal Pandey
- License: Apache 2.0
⚠️ Known Limitations
GPU Inference Issues
- Model currently has compatibility issues with GPU inference
- CUDA assert errors occur during GPU loading
- Workaround: Use CPU inference (as shown above)
- Fix is planned for v0.2
Model Size
- Only 72M parameters (much smaller than production models)
- Limited knowledge and reasoning capabilities
- May produce inconsistent or incorrect responses
Context Window
- Limited to 512 tokens
- Cannot handle long conversations or documents
Response Quality
- Experimental model, responses may be:
- Off-topic or irrelevant
- Repetitive
- Factually incorrect
- Not suitable for production use
- Experimental model, responses may be:
Language
- Primarily English
- Limited multilingual support
🎯 Intended Use
This is an experimental educational project suitable for:
✅ Learning about LLM architecture
✅ Understanding model training and fine-tuning
✅ Experimenting with small language models
✅ CPU-based inference testing
❌ NOT suitable for:
- Production applications
- Critical or safety-sensitive tasks
- High-quality text generation
- GPU-accelerated inference (until v0.2)
🛠️ Training Details
Pre-training
- Dataset: C4 (English)
- Steps: 20,000
- Batch size: 32 (effective)
- Hardware: Kaggle P100 GPU
- Optimization: AdamW with mixed precision
Fine-tuning
- Dataset: Custom instruction dataset (~70K samples)
- Epochs: 2-3
- Learning rate: 1e-4
- Hardware: Kaggle P100 GPU
🐛 Troubleshooting
GPU CUDA Error
AcceleratorError: CUDA error: device-side assert triggered
Solution: Force CPU inference:
model = AutoModelForCausalLM.from_pretrained(
"Kunal7370944861/Twinkel-LLM-72M",
trust_remote_code=True,
device_map="cpu" # Add this
)
token_type_ids Error
ValueError: The following `model_kwargs` are not used: ['token_type_ids']
Solution: Disable token_type_ids:
inputs = tokenizer(
prompt,
return_tensors="pt",
return_token_type_ids=False # Add this
)
📊 Performance
This is an experimental model with limited capabilities:
- Size: 72M parameters (vs billions in production models)
- Quality: Basic responses, may be off-topic
- Speed (CPU): ~5-10 tokens/second on standard CPU
- Reliability: Experimental, expect issues
🔮 Future Plans
Version 0.2 (Planned):
- ✅ Fix GPU compatibility issues
- ✅ Improve response quality
- ✅ Add proper identity training
- ✅ Increase context length
- ✅ Better instruction following
🙏 Acknowledgments
- Creator: Kunal Pandey
- Tokenizer: Based on SmolLM3 (Hugging Face)
- Training data: C4 dataset (AllenAI)
- Inspiration: SmolLM project
📜 License
Apache 2.0 - Free for commercial and research use.
⚠️ Disclaimer
This is an experimental educational project. The model:
- May produce incorrect, biased, or inappropriate content
- Has not been safety-tested or aligned
- Should not be used in production environments
- Is provided "as-is" without warranties
Use at your own risk for experimental and educational purposes only.
📧 Contact
For questions, issues, or feedback, please open an issue on the model repository.
Model Status: 🚧 Experimental Alpha
Created by: Kunal Pandey
Version: 0.1-alpha
Last updated: January 2026
- Downloads last month
- 5