Instructions to use HuggingFaceTB/SmolLM-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceTB/SmolLM-1.7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceTB/SmolLM-1.7B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-1.7B") model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-1.7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use HuggingFaceTB/SmolLM-1.7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceTB/SmolLM-1.7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolLM-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceTB/SmolLM-1.7B
- SGLang
How to use HuggingFaceTB/SmolLM-1.7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceTB/SmolLM-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolLM-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceTB/SmolLM-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolLM-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceTB/SmolLM-1.7B with Docker Model Runner:
docker model run hf.co/HuggingFaceTB/SmolLM-1.7B
Adding Evaluation Results
This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
Thank you for sharing.
Some common models like MMLU typically use a 5-shot setting to measure a model's in-context learning capabilities.
Can you explain why MMLU evaluations use a zero-shot plus option content approach?
According to your blog, in this setup, MMLU evaluations are higher than those of QWen1.5B and Phi models, whereas in 5-shot evaluations, the conclusion is the opposite. Is this situation reasonable? Thank you.
The difference comes from the MMLU prompt implementation rather than 0-shot vs 5-shot. Each answer to an MMLU question has a letter from A to D, the leaderboard uses MCF (multiple-choice formulation) version where the model needs to return the letter corresponding to the right answer, whereas in the cloze version (that we use) we compute log probs over full answers not just single letters. Most small not instruction tuned models don't seem to have the ability to match answers to their corresponding letter and give an almost random score (0.25) when using MCF, so cloze version gives more signal.
In cloze version the models outperforms Qwen1.5B and Phi for both 0-shot and 5-shot, you can find the guidelines to reproduce our scores here: https://huggingface.co/HuggingFaceFW/ablation-model-fineweb-edu#evaluation
You can find more details about this in this blog post https://huggingface.co/blog/open-llm-leaderboard-mmlu#1001-flavors-of-mmlu and these papers https://arxiv.org/pdf/2406.08446 + appendix G.2 https://arxiv.org/pdf/2406.11794)