ibm-ai-platform
/

codellama-13b-accelerator

Model card Files Files and versions

JRosenkranz commited on Apr 21, 2024

Commit

0976041

·

verified ·

1 Parent(s): 1143687

Update README.md

Files changed (1) hide show

README.md +62 -1

README.md CHANGED Viewed

@@ -70,4 +70,65 @@ pip install . --no-cache-dir
 python sample_client.py
 ```
-_Note: first prompt may be slower as there is a slight warmup time_

 python sample_client.py
 ```
+_Note: first prompt may be slower as there is a slight warmup time_
+#### Install
+```bash
+git clone https://github.com/foundation-model-stack/fms-extras
+git fetch origin pull/4/head:code_llama_variant
+git checkout code_llama_variant
+(cd fms-extras && pip install -e .)
+pip install transformers==4.35.0 sentencepiece numpy
+```
+#### Run Sample
+##### batch_size=1 (compile + cudagraphs)
+```bash
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b_code \
+    --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --speculator_path=ibm-fms/codellama-13b-accelerator \
+    --speculator_source=hf \
+    --top_k_tokens_per_head=4,3,2,2,2,2,2 \
+    --prompt_type=code
+    --compile \
+    --compile_mode=reduce-overhead
+```
+##### batch_size=1 (compile)
+```bash
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b_code \
+    --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --speculator_path=ibm-fms/codellama-13b-accelerator \
+    --speculator_source=hf \
+    --top_k_tokens_per_head=4,3,2,2,2,2,2 \
+    --prompt_type=code
+    --compile \
+```
+##### batch_size=4 (compile)
+```bash
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b_code \
+    --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
+    --speculator_path=ibm-fms/codellama-13b-accelerator \
+    --speculator_source=hf \
+    --batch_input \
+    --top_k_tokens_per_head=4,3,2,2,2,2,2 \
+    --prompt_type=code
+    --compile \
+```
+Sample code can be found [here](https://github.com/foundation-model-stack/fms-extras/pull/18)