duyntnet commited on
Commit
e2e89d2
·
verified ·
1 Parent(s): dd9efcf

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - transformers
9
+ - gguf
10
+ - imatrix
11
+ - starcoder2-7b
12
+ ---
13
+ Quantizations of https://huggingface.co/bigcode/starcoder2-7b
14
+
15
+
16
+ # From original readme
17
+
18
+ ### Generation
19
+ Here are some examples to get started with the model. You can find a script for fine-tuning in StarCoder2's [GitHub repository](https://github.com/bigcode-project/starcoder2).
20
+
21
+ First, make sure to install `transformers` from source:
22
+ ```bash
23
+ pip install git+https://github.com/huggingface/transformers.git
24
+ ```
25
+
26
+ #### Running the model on CPU/GPU/multi GPU
27
+ * _Using full precision_
28
+ ```python
29
+ # pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer
31
+
32
+ checkpoint = "bigcode/starcoder2-7b"
33
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
36
+ # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
37
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
38
+
39
+ inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
40
+ outputs = model.generate(inputs)
41
+ print(tokenizer.decode(outputs[0]))
42
+ ```
43
+ ```bash
44
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
45
+ Memory footprint: 29232.57 MB
46
+ ```
47
+ * _Using `torch.bfloat16`_
48
+ ```python
49
+ # pip install accelerate
50
+ import torch
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+ checkpoint = "bigcode/starcoder2-7b"
54
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
55
+
56
+ # for fp16 use `torch_dtype=torch.float16` instead
57
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
58
+
59
+ inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
60
+ outputs = model.generate(inputs)
61
+ print(tokenizer.decode(outputs[0]))
62
+ ```
63
+ ```bash
64
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
65
+ Memory footprint: 14616.29 MB
66
+ ```
67
+
68
+ #### Quantized Versions through `bitsandbytes`
69
+ * _Using 8-bit precision (int8)_
70
+
71
+ ```python
72
+ # pip install bitsandbytes accelerate
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
74
+
75
+ # to use 4bit use `load_in_4bit=True` instead
76
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
77
+
78
+ checkpoint = "bigcode/starcoder2-7b"
79
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
80
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)
81
+
82
+ inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
83
+ outputs = model.generate(inputs)
84
+ print(tokenizer.decode(outputs[0]))
85
+ ```
86
+ ```bash
87
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
88
+ # load_in_8bit
89
+ Memory footprint: 7670.52 MB
90
+ # load_in_4bit
91
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
92
+ Memory footprint: 4197.64 MB
93
+ ```