Instructions to use latimar/Phind-Codellama-34B-v2-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use latimar/Phind-Codellama-34B-v2-exl2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="latimar/Phind-Codellama-34B-v2-exl2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("latimar/Phind-Codellama-34B-v2-exl2")
model = AutoModelForCausalLM.from_pretrained("latimar/Phind-Codellama-34B-v2-exl2")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use latimar/Phind-Codellama-34B-v2-exl2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "latimar/Phind-Codellama-34B-v2-exl2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "latimar/Phind-Codellama-34B-v2-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/latimar/Phind-Codellama-34B-v2-exl2

SGLang

How to use latimar/Phind-Codellama-34B-v2-exl2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "latimar/Phind-Codellama-34B-v2-exl2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "latimar/Phind-Codellama-34B-v2-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "latimar/Phind-Codellama-34B-v2-exl2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "latimar/Phind-Codellama-34B-v2-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use latimar/Phind-Codellama-34B-v2-exl2 with Docker Model Runner:
```
docker model run hf.co/latimar/Phind-Codellama-34B-v2-exl2
```

YAML Metadata Error:"base_model" with value "https://huggingface.co/Phind/Phind-CodeLlama-34B-v2" is not valid. Use a model id from https://hf.co/models.

Phind-CodeLlama-34B-v2 EXL2

Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.

Each separate quant is in a different branch, like in The Bloke's GPTQ repos.

export BRANCH=5_0-bpw-h8
git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2

There are the following branches:

5_0-bpw-h8
5_0-bpw-h8-evol-ins
4_625-bpw-h6
4_4-bpw-h8
4_125-bpw-h6
3_8-bpw-h6
2_75-bpw-h6
2_55-bpw-h6

Calibration dataset used for conversion: wikitext-v2
Evaluation dataset used to calculate perplexity: wikitext-v2
Calibration dataset used for conversion of 5_0-bpw-h8-evol-ins: wizardLM-evol-instruct_70k
Evaluation dataset used to calculate ppl for Evol-Ins: : nikrosh-evol-instruct
When converting 4_4-bpw-h8 quant, additional -mr 32 arg was used.

PPL was measured with the test_inference.py exllamav2 script:

python test_inference.py -m /storage/models/LLaMA/EXL2/Phind-Codellama-34B-v2 -ed /storage/datasets/text/evol-instruct/nickrosh-evol-instruct-code-80k.parquet

BPW	PPL on Wiki	PPL on Evol-Ins	File Size (Gb)
2.55-h6	11.0310	2.4542	10.56
2.75-h6	9.7902	2.2888	11.33
3.8-h6	6.7293	2.0724	15.37
4.125-h6	6.6713	2.0617	16.65
4.4-h8	6.6487	2.0509	17.76
4.625-h6	6.6576	2.0459	18.58
5.0-h8	6.6379	2.0419	20.09
5.0-h8-ev	6.7785	2.0445	20.09

Downloads last month: 10