Instructions to use finnianx/michel-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use finnianx/michel-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="finnianx/michel-nano")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("finnianx/michel-nano") model = AutoModelForMultimodalLM.from_pretrained("finnianx/michel-nano") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use finnianx/michel-nano with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "finnianx/michel-nano" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "finnianx/michel-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/finnianx/michel-nano
- SGLang
How to use finnianx/michel-nano with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "finnianx/michel-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "finnianx/michel-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "finnianx/michel-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "finnianx/michel-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use finnianx/michel-nano with Docker Model Runner:
docker model run hf.co/finnianx/michel-nano
michel-nano
michel-nano is an ultra-tiny ~6 million parameter base language model trained on 1.14 billion tokens. It was created by merging two intermediate training checkpoints using mergekit to combine their strengths into a single, superior model.
Merge Details
This model is a 50/50 SLERP (Spherical Linear Interpolation) merge of two checkpoints from the same training run:
- Checkpoint A (Step 60,000): Excelled at language modeling and reasoning, achieving the best WikiText perplexity and ARC-Easy scores.
- Checkpoint B (Step 70,000): Showed stronger grammatical understanding, achieving the best BLiMP score.
By merging them, this model inherits the best traits of bothβachieving lower WikiText perplexity and higher BLiMP/ARC-E scores than either parent checkpoint individually.
Training Details
- Parameters: ~6,000,000
- Training Tokens: 1.14 Billion
- Context Length: 512 tokens
- Post-training: None (This is a base pre-trained model)
Dataset Mixture
| Dataset | Weight |
|---|---|
HuggingFaceFW/fineweb-edu |
50% |
epfml/FineWeb-HQ |
30% |
HuggingFaceTB/cosmopedia (stories split) |
20% |
Tokenizer
The tokenizer is a basic Byte-Pair Encoding (BPE) tokenizer trained from scratch on a subset of 100,000 samples drawn from the same data mixture. It features a compact vocabulary size of 6,000 plus chatml special tokens for future finetuning.
Evaluation
Evaluated using the lm-evaluation-harness (0-shot).
| Task | Metric | Value |
|---|---|---|
| BLiMP (Avg) | acc | 0.6523 |
| ARC-Challenge | acc | 0.1732 |
| ARC-Challenge | acc_norm | 0.2278 |
| ARC-Easy | acc | 0.3443 |
| ARC-Easy | acc_norm | 0.3338 |
| BoolQ | acc | 0.3801 |
| HellaSwag | acc | 0.2667 |
| HellaSwag | acc_norm | 0.2708 |
| PIQA | acc | 0.5647 |
| PIQA | acc_norm | 0.5457 |
| WikiText | bits_per_byte | 1.6987 |
| WikiText | byte_perplexity | 3.2461 |
| WikiText | word_perplexity | 542.6415 |
| Winogrande | acc | 0.5012 |
Intended Use
As a base model that has not undergone any instruction tuning or alignment, michel-nano is very limited in its raw conversational abilities. It is best suited as a lightweight foundation for fine-tuning on specific downstream tasks. Its small footprint and strong grammatical foundation make it highly adaptable for applications such as:
- Text classification (e.g., toxic comment detection, sentiment analysis)
- Lightweight named entity recognition (NER)
- Random toy projects
- Downloads last month
- 52