Text Generation
Transformers
GGUF
English
code
aashish1904 commited on
Commit
9a84809
·
verified ·
1 Parent(s): c5e5686

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ pipeline_tag: text-generation
5
+ inference: true
6
+ widget:
7
+ - text: 'def print_hello_world():'
8
+ example_title: Hello world
9
+ group: Python
10
+ license: bigscience-openrail-m
11
+ datasets:
12
+ - books
13
+ - arxiv
14
+ - c4
15
+ - falcon-refinedweb
16
+ - wiki
17
+ - github-issues
18
+ - stack_markdown
19
+ library_name: transformers
20
+ tags:
21
+ - code
22
+ language:
23
+ - en
24
+
25
+ ---
26
+
27
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
28
+
29
+
30
+ # QuantFactory/Refact-1_6-base-GGUF
31
+ This is quantized version of [smallcloudai/Refact-1_6-base](https://huggingface.co/smallcloudai/Refact-1_6-base) created using llama.cpp
32
+
33
+ # Original Model Card
34
+
35
+
36
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
37
+
38
+
39
+ # Refact-1.6B-base
40
+
41
+ Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
42
+ The model might contain some problems, especially with the FIM format
43
+
44
+
45
+ # It Works As a Chat
46
+
47
+ The primary application of this model is code completion (infill) in multiple programming languages.
48
+ But it works as a chat quite well.
49
+
50
+
51
+ # Example
52
+
53
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
54
+
55
+ ```python
56
+ # pip install -q transformers
57
+ from transformers import AutoModelForCausalLM, AutoTokenizer
58
+
59
+ checkpoint = "smallcloudai/Refact-1_6B-fim"
60
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
61
+
62
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
63
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
64
+
65
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
66
+
67
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
68
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
69
+ print("-"*80)
70
+ print(tokenizer.decode(outputs[0]))
71
+ ```
72
+
73
+ # Chat Format
74
+
75
+ The same model works as chat (experimental).
76
+
77
+ ```python
78
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
79
+ "<empty_output>USER {query}\n" \
80
+ "<empty_output>ASSISTANT"
81
+ prompt = prompt_template.format(system="You are a programming assistant",
82
+ query="How do I sort a list in Python?")
83
+ ```
84
+
85
+ # Architecture
86
+
87
+ As described in more detail in the blog post, we used:
88
+
89
+ - [ALiBi](https://arxiv.org/abs/2108.12409) based attention
90
+ - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
91
+ - [Multi Query Attention](https://arxiv.org/abs/1911.02150)
92
+
93
+ We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
94
+
95
+
96
+ # Training
97
+
98
+ For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
99
+ Filtering is the key to success of this model:
100
+
101
+ - We only used text in English
102
+ - Only topics related to computer science
103
+ - Applied heavy deduplication
104
+
105
+ The text to code proportion was 50:50, model trained for 1.2T tokens.
106
+
107
+ We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
108
+ its practical use is limited. But if you still want it, write us a message on Discord.
109
+
110
+
111
+ # Limitations and Bias
112
+
113
+ The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
114
+ code comments. Its performance on non-English languages is lower, for sure.
115
+
116
+
117
+ # Model Stats
118
+
119
+ - **Architecture:** LLAMA-like model with multi-query attention
120
+ - **Objectives** Fill-in-the-Middle, Chat
121
+ - **Tokens context:** 4096
122
+ - **Pretraining tokens:** 1.2T
123
+ - **Finetuning tokens:** 40B
124
+ - **Precision:** bfloat16
125
+ - **GPUs** 64 NVidia A5000
126
+ - **Training time** 28 days
127
+
128
+
129
+ # License
130
+
131
+ The model is licensed under the BigScience OpenRAIL-M v1 license agreement
132
+
133
+
134
+ # Citation
135
+
136
+ If you are using this model, please give a link to this page.