How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantLLM/SmolLM2-135M-QuantLLM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantLLM/SmolLM2-135M-QuantLLM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantLLM/SmolLM2-135M-QuantLLM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantLLM/SmolLM2-135M-QuantLLM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Quick Links

πŸ€— SmolLM2-135M-QuantLLM

HuggingFaceTB/SmolLM2-135M converted to SAFETENSORS format

QuantLLM Format

⭐ Star QuantLLM on GitHub


πŸ“– About This Model

This model is HuggingFaceTB/SmolLM2-135M converted to SafeTensors format for use with HuggingFace Transformers and PyTorch.

Property Value
Base Model HuggingFaceTB/SmolLM2-135M
Format SAFETENSORS
Quantization None (Full Precision)
License apache-2.0
Created With QuantLLM

πŸš€ Quick Start

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")
tokenizer = AutoTokenizer.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")

# Generate text
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With QuantLLM

from quantllm import TurboModel

# Load with automatic optimization
model = TurboModel.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")

# Generate
response = model.generate("Write a poem about coding")
print(response)

Requirements

pip install transformers torch

πŸ“Š Model Details

Property Value
Original Model HuggingFaceTB/SmolLM2-135M
Format SAFETENSORS
Quantization Full Precision
License apache-2.0
Export Date 2026-04-29
Exported By QuantLLM v2.1

πŸš€ Created with QuantLLM

QuantLLM

Convert any model to GGUF, ONNX, or MLX in one line!

from quantllm import turbo

# Load any HuggingFace model
model = turbo("HuggingFaceTB/SmolLM2-135M")

# Export to any format
model.export("safetensors", quantization="Q4_K_M")

# Push to HuggingFace
model.push("your-repo", format="safetensors")
GitHub Stars

πŸ“š Documentation Β· πŸ› Report Issue Β· πŸ’‘ Request Feature

πŸ“Š Export Details

Exported with QuantLLM from HuggingFaceTB/SmolLM2-135M (134.5M params).

Property Value
Format SafeTensors
Size 541.6 MB
Parameters 134.5M
Dtype float32

How to use

Downloads last month
182
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for QuantLLM/SmolLM2-135M-QuantLLM

Finetuned
(914)
this model