Instructions to use cosmo3769/starcoderbase-3b-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cosmo3769/starcoderbase-3b-GPTQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cosmo3769/starcoderbase-3b-GPTQ")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cosmo3769/starcoderbase-3b-GPTQ")
model = AutoModelForCausalLM.from_pretrained("cosmo3769/starcoderbase-3b-GPTQ")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cosmo3769/starcoderbase-3b-GPTQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cosmo3769/starcoderbase-3b-GPTQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cosmo3769/starcoderbase-3b-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cosmo3769/starcoderbase-3b-GPTQ

SGLang

How to use cosmo3769/starcoderbase-3b-GPTQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cosmo3769/starcoderbase-3b-GPTQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cosmo3769/starcoderbase-3b-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cosmo3769/starcoderbase-3b-GPTQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cosmo3769/starcoderbase-3b-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cosmo3769/starcoderbase-3b-GPTQ with Docker Model Runner:
```
docker model run hf.co/cosmo3769/starcoderbase-3b-GPTQ
```

cosmo3769 commited on Mar 10, 2024

Commit

92ebaf8

verified ·

1 Parent(s): 31f4e5f

Create README.md

Browse files

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# starcoderbase-3b-GPTQ
+Quantized starcoderbase-3b model to GPTQ format (4-bit precision).
+## Benchmark
+[Benchmarking script](https://github.com/cosmo3769/Quantized-LLMs/blob/main/notebooks/llmbenchmark-starcodebase-3b-lm-eval-harness.ipynb)
+### Baseline starcoderbase-3b model (non-quantized)
+|         Tasks         |Version|Filter|n-shot|    Metric     |Value |   |Stderr|
+|-----------------------|-------|------|------|---------------|-----:|---|-----:|
+|codexglue_code2text    |N/A    |none  |None  |smoothed_bleu_4|1.3519|±  |0.3067|
+| - code2text_go        |      1|none  |None  |smoothed_bleu_4|1.5781|±  |0.3734|
+| - code2text_java      |      1|none  |None  |smoothed_bleu_4|1.2778|±  |0.1991|
+| - code2text_javascript|      1|none  |None  |smoothed_bleu_4|1.1443|±  |0.1181|
+| - code2text_php       |      1|none  |None  |smoothed_bleu_4|0.5171|±  |0.5171|
+| - code2text_python    |      1|none  |None  |smoothed_bleu_4|2.8338|±  |1.5323|
+| - code2text_ruby      |      3|none  |None  |smoothed_bleu_4|0.7601|±  |0.7601|
+|      Groups       |Version|Filter|n-shot|    Metric     |Value |   |Stderr|
+|-------------------|-------|------|------|---------------|-----:|---|-----:|
+|codexglue_code2text|N/A    |none  |None  |smoothed_bleu_4|1.3519|±  |0.3067|
+|                    Tasks                    |Version|Filter|n-shot|  Metric   |Value|   |Stderr|
+|---------------------------------------------|------:|------|------|-----------|----:|---|-----:|
+|bigbench_code_line_description_generate_until|      1|none  |None  |exact_match|    0|±  |     0|
+|                    Tasks                     |Version|Filter|n-shot|Metric|Value|   |Stderr|
+|----------------------------------------------|------:|------|------|------|----:|---|-----:|
+|bigbench_code_line_description_multiple_choice|      0|none  |None  |acc   | 0.25|±  |0.0564|
+### Quantized starcoderbase-3b model to GPTQ format
+|         Tasks         |Version|Filter|n-shot|    Metric     |Value |   |Stderr|
+|-----------------------|-------|------|------|---------------|-----:|---|-----:|
+|codexglue_code2text    |N/A    |none  |None  |smoothed_bleu_4|0.9254|±  |0.2109|
+| - code2text_go        |      1|none  |None  |smoothed_bleu_4|1.4702|±  |0.4813|
+| - code2text_java      |      1|none  |None  |smoothed_bleu_4|0.6907|±  |0.6907|
+| - code2text_javascript|      1|none  |None  |smoothed_bleu_4|0.9469|±  |0.0339|
+| - code2text_php       |      1|none  |None  |smoothed_bleu_4|0.5171|±  |0.5171|
+| - code2text_python    |      1|none  |None  |smoothed_bleu_4|1.1676|±  |0.2156|
+| - code2text_ruby      |      3|none  |None  |smoothed_bleu_4|0.7601|±  |0.7601|
+|      Groups       |Version|Filter|n-shot|    Metric     |Value |   |Stderr|
+|-------------------|-------|------|------|---------------|-----:|---|-----:|
+|codexglue_code2text|N/A    |none  |None  |smoothed_bleu_4|0.9254|±  |0.2109|
+|                    Tasks                    |Version|Filter|n-shot|  Metric   |Value|   |Stderr|
+|---------------------------------------------|------:|------|------|-----------|----:|---|-----:|
+|bigbench_code_line_description_generate_until|      1|none  |None  |exact_match|    0|±  |     0|
+|                    Tasks                     |Version|Filter|n-shot|Metric|Value|   |Stderr|
+|----------------------------------------------|------:|------|------|------|----:|---|-----:|
+|bigbench_code_line_description_multiple_choice|      0|none  |None  |acc   |  0.1|±  |   0.1|