Intel
/

phi-2-int4-inc

Text Generation

text-generation-inference

4-bit precision

intel/auto-round

Model card Files Files and versions

n1ck-guo commited on Oct 22, 2024

Commit

0b780a4

·

verified ·

1 Parent(s): 8c36421

Update README.md

Files changed (1) hide show

README.md +13 -12

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ This model is an int4 model with group_size128 and sym quantization of [microsof
-### Use the model
 ### INT4 Inference with ITREX on CPU
 Install the latest [intel-extension-for-transformers](
 https://github.com/intel/intel-extension-for-transformers)
@@ -36,15 +36,19 @@ She is curious and brave and
 ```
-### INT4 Inference with AutoGPTQ
-pip install auto-gptq
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 quantized_model_dir = "Intel/phi-2-int4-inc"
-tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, revision="5973e3ad50beaefb937345d693639ce92ca836f9")
-model = AutoModelForCausalLM.from_pretrained(quantized_model_dir, device_map="auto", trust_remote_code=True, revision="5973e3ad50beaefb937345d693639ce92ca836f9")
 text = "There is a girl who likes adventure,"
 inputs = tokenizer(text, return_tensors="pt", return_attention_mask=False).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=50)
@@ -68,8 +72,7 @@ She is curious and brave and
 pip install lm-eval==0.4.2
 ```bash
-cd auto-round
-python3 -m auto_round --eval --model Intel/phi-2-int4-inc --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu --batch_size 16
 ```
@@ -88,16 +91,14 @@ python3 -m auto_round --eval --model Intel/phi-2-int4-inc --device cuda:0 --task
 | arc_easy       | 0.8001 | 0.8013 |
 | arc_challenge  | 0.5282 | 0.5137 |
-##
-### generate the model
 Here is the sample command to generate the model
 ```bash
-cd auto-round
-pip install -r requirements.txt
-python3 -m auto_round \
 --model  microsoft/phi-2 \
 --device 0 \
 --group_size 128 \

+### How To Use
 ### INT4 Inference with ITREX on CPU
 Install the latest [intel-extension-for-transformers](
 https://github.com/intel/intel-extension-for-transformers)
 ```
+### INT4 Inference
 ```python
+##pip install auto-round
 from transformers import AutoModelForCausalLM, AutoTokenizer
 quantized_model_dir = "Intel/phi-2-int4-inc"
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
+model = AutoModelForCausalLM.from_pretrained(quantized_model_dir,
+                                             device_map="auto",
+                                             trust_remote_code=True,
+                                             ## revision="5973e3a" ##AutoGPTQ format
+                                            )
 text = "There is a girl who likes adventure,"
 inputs = tokenizer(text, return_tensors="pt", return_attention_mask=False).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=50)
 pip install lm-eval==0.4.2
 ```bash
+auto-round --eval --model Intel/phi-2-int4-inc --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu --batch_size 16
 ```
 | arc_easy       | 0.8001 | 0.8013 |
 | arc_challenge  | 0.5282 | 0.5137 |
+### Generate the model
 Here is the sample command to generate the model
 ```bash
+auto-round \
 --model  microsoft/phi-2 \
 --device 0 \
 --group_size 128 \