kjj0/fineweb100B-gpt2
Updated • 4.83k • 1
How to use michaelbzhu/test-7.6B-base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="michaelbzhu/test-7.6B-base", trust_remote_code=True) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("michaelbzhu/test-7.6B-base", trust_remote_code=True, dtype="auto")How to use michaelbzhu/test-7.6B-base with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "michaelbzhu/test-7.6B-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "michaelbzhu/test-7.6B-base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/michaelbzhu/test-7.6B-base
How to use michaelbzhu/test-7.6B-base with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "michaelbzhu/test-7.6B-base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "michaelbzhu/test-7.6B-base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "michaelbzhu/test-7.6B-base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "michaelbzhu/test-7.6B-base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use michaelbzhu/test-7.6B-base with Docker Model Runner:
docker model run hf.co/michaelbzhu/test-7.6B-base
# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("michaelbzhu/test-7.6B-base", trust_remote_code=True, dtype="auto")trained on 12,312,444,928 tokens from the kjj0/fineweb100B-gpt2 dataset
$ lm_eval --model hf \
--model_args pretrained=michaelbzhu/test-7.6B-base,trust_remote_code=True \
--tasks mmlu_college_medicine,hellaswag,lambada_openai,arc_easy,winogrande,arc_challenge,openbookqa \
--device cuda:0 \
--batch_size 16
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|----------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ | 0.2295|± |0.0123|
| | |none | 0|acc_norm |↑ | 0.2628|± |0.0129|
|arc_easy | 1|none | 0|acc |↑ | 0.5358|± |0.0102|
| | |none | 0|acc_norm |↑ | 0.4663|± |0.0102|
|hellaswag | 1|none | 0|acc |↑ | 0.3788|± |0.0048|
| | |none | 0|acc_norm |↑ | 0.4801|± |0.0050|
|lambada_openai | 1|none | 0|acc |↑ | 0.4527|± |0.0069|
| | |none | 0|perplexity|↓ |14.3601|± |0.4468|
|college_medicine| 1|none | 0|acc |↑ | 0.2254|± |0.0319|
|openbookqa | 1|none | 0|acc |↑ | 0.1920|± |0.0176|
| | |none | 0|acc_norm |↑ | 0.3020|± |0.0206|
|winogrande | 1|none | 0|acc |↑ | 0.5107|± |0.0140|
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="michaelbzhu/test-7.6B-base", trust_remote_code=True)