Instructions to use HuggingFaceTB/SmolLM-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceTB/SmolLM-1.7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceTB/SmolLM-1.7B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-1.7B")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-1.7B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use HuggingFaceTB/SmolLM-1.7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceTB/SmolLM-1.7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-1.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceTB/SmolLM-1.7B

SGLang

How to use HuggingFaceTB/SmolLM-1.7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceTB/SmolLM-1.7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-1.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceTB/SmolLM-1.7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-1.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceTB/SmolLM-1.7B with Docker Model Runner:
```
docker model run hf.co/HuggingFaceTB/SmolLM-1.7B
```

Adding Evaluation Results

by leaderboard-pr-bot - opened Jul 22, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+111

-3

leaderboard-pr-bot

Jul 22, 2024

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Adding Evaluation Results6af8ed72

ldwang

Aug 14, 2024

Thank you for sharing.

Some common models like MMLU typically use a 5-shot setting to measure a model's in-context learning capabilities.

Can you explain why MMLU evaluations use a zero-shot plus option content approach?

According to your blog, in this setup, MMLU evaluations are higher than those of QWen1.5B and Phi models, whereas in 5-shot evaluations, the conclusion is the opposite. Is this situation reasonable? Thank you.

loubnabnl

Hugging Face Smol Models Research org Aug 14, 2024

•

edited Aug 14, 2024

The difference comes from the MMLU prompt implementation rather than 0-shot vs 5-shot. Each answer to an MMLU question has a letter from A to D, the leaderboard uses MCF (multiple-choice formulation) version where the model needs to return the letter corresponding to the right answer, whereas in the cloze version (that we use) we compute log probs over full answers not just single letters. Most small not instruction tuned models don't seem to have the ability to match answers to their corresponding letter and give an almost random score (0.25) when using MCF, so cloze version gives more signal.

In cloze version the models outperforms Qwen1.5B and Phi for both 0-shot and 5-shot, you can find the guidelines to reproduce our scores here: https://huggingface.co/HuggingFaceFW/ablation-model-fineweb-edu#evaluation

You can find more details about this in this blog post https://huggingface.co/blog/open-llm-leaderboard-mmlu#1001-flavors-of-mmlu and these papers https://arxiv.org/pdf/2406.08446 + appendix G.2 https://arxiv.org/pdf/2406.11794)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment