🌟 Twinkel LLM - 72M Parameters (v0.1-alpha)

Twinkel LLM is an experimental 72M parameter language model created by Kunal Pandey as a learning project.

⚠️ Status: Early experimental release (v0.1-alpha)

🚀 Quick Start (CPU Inference)

⚠️ Important: This model currently works best on CPU. GPU inference has known issues that are being resolved in future versions.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained(
    "Kunal7370944861/Twinkel-LLM-72M",
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    "Kunal7370944861/Twinkel-LLM-72M",
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="cpu"  # Force CPU for stability
)

# Generate response
def chat(message):
    messages = [{"role": "user", "content": message}]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        return_token_type_ids=False  # Important!
    )
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test
response = chat("What is Python?")
print(response)

📋 Model Details

Parameters: 72M (72 million)
Architecture: Custom decoder-only transformer
- Hidden size: 448
- Layers: 6
- Attention: Grouped Query Attention (GQA)
- FFN: SwiGLU activation
- Position encoding: RoPE
Context length: 512 tokens
Tokenizer: SmolLM3 tokenizer (128K vocab)
Training: Pre-trained on C4 + instruction fine-tuning
Creator: Kunal Pandey
License: Apache 2.0

⚠️ Known Limitations

GPU Inference Issues
- Model currently has compatibility issues with GPU inference
- CUDA assert errors occur during GPU loading
- Workaround: Use CPU inference (as shown above)
- Fix is planned for v0.2
Model Size
- Only 72M parameters (much smaller than production models)
- Limited knowledge and reasoning capabilities
- May produce inconsistent or incorrect responses
Context Window
- Limited to 512 tokens
- Cannot handle long conversations or documents
Response Quality
- Experimental model, responses may be:
  - Off-topic or irrelevant
  - Repetitive
  - Factually incorrect
- Not suitable for production use
Language
- Primarily English
- Limited multilingual support

🎯 Intended Use

This is an experimental educational project suitable for:

✅ Learning about LLM architecture
✅ Understanding model training and fine-tuning
✅ Experimenting with small language models
✅ CPU-based inference testing

❌ NOT suitable for:

Production applications
Critical or safety-sensitive tasks
High-quality text generation
GPU-accelerated inference (until v0.2)

🛠️ Training Details

Pre-training

Dataset: C4 (English)
Steps: 20,000
Batch size: 32 (effective)
Hardware: Kaggle P100 GPU
Optimization: AdamW with mixed precision

Fine-tuning

Dataset: Custom instruction dataset (~70K samples)
Epochs: 2-3
Learning rate: 1e-4
Hardware: Kaggle P100 GPU

🐛 Troubleshooting

GPU CUDA Error

AcceleratorError: CUDA error: device-side assert triggered

Solution: Force CPU inference:

model = AutoModelForCausalLM.from_pretrained(
    "Kunal7370944861/Twinkel-LLM-72M",
    trust_remote_code=True,
    device_map="cpu"  # Add this
)

token_type_ids Error

ValueError: The following `model_kwargs` are not used: ['token_type_ids']

Solution: Disable token_type_ids:

inputs = tokenizer(
    prompt,
    return_tensors="pt",
    return_token_type_ids=False  # Add this
)

📊 Performance

This is an experimental model with limited capabilities:

Size: 72M parameters (vs billions in production models)
Quality: Basic responses, may be off-topic
Speed (CPU): ~5-10 tokens/second on standard CPU
Reliability: Experimental, expect issues

🔮 Future Plans

Version 0.2 (Planned):

✅ Fix GPU compatibility issues
✅ Improve response quality
✅ Add proper identity training
✅ Increase context length
✅ Better instruction following

🙏 Acknowledgments

Creator: Kunal Pandey
Tokenizer: Based on SmolLM3 (Hugging Face)
Training data: C4 dataset (AllenAI)
Inspiration: SmolLM project

📜 License

Apache 2.0 - Free for commercial and research use.

⚠️ Disclaimer

This is an experimental educational project. The model:

May produce incorrect, biased, or inappropriate content
Has not been safety-tested or aligned
Should not be used in production environments
Is provided "as-is" without warranties

Use at your own risk for experimental and educational purposes only.

📧 Contact

For questions, issues, or feedback, please open an issue on the model repository.

Model Status: 🚧 Experimental Alpha
Created by: Kunal Pandey
Version: 0.1-alpha
Last updated: January 2026

Downloads last month: 1