|
|
| --- |
| |
| pipeline_tag: text-generation |
| inference: true |
| widget: |
| - text: 'def print_hello_world():' |
| example_title: Hello world |
| group: Python |
| license: bigscience-openrail-m |
| datasets: |
| - books |
| - arxiv |
| - c4 |
| - falcon-refinedweb |
| - wiki |
| - github-issues |
| - stack_markdown |
| library_name: transformers |
| tags: |
| - code |
| language: |
| - en |
|
|
| --- |
| |
| [](https://hf.co/QuantFactory) |
|
|
|
|
| # QuantFactory/Refact-1_6-base-GGUF |
| This is quantized version of [smallcloudai/Refact-1_6-base](https://huggingface.co/smallcloudai/Refact-1_6-base) created using llama.cpp |
| |
| # Original Model Card |
| |
| |
|  |
| |
| |
| # Refact-1.6B-base |
| |
| Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉 |
| The model might contain some problems, especially with the FIM format |
| |
| |
| # It Works As a Chat |
| |
| The primary application of this model is code completion (infill) in multiple programming languages. |
| But it works as a chat quite well. |
| |
| |
| # Example |
| |
| Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output: |
| |
| ```python |
| # pip install -q transformers |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| checkpoint = "smallcloudai/Refact-1_6B-fim" |
| device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
|
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device) |
|
|
| prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>' |
|
|
| inputs = tokenizer.encode(prompt, return_tensors="pt").to(device) |
| outputs = model.generate(inputs, max_length=100, temperature=0.2) |
| print("-"*80) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
| |
| # Chat Format |
| |
| The same model works as chat (experimental). |
| |
| ```python |
| prompt_template = "<empty_output>SYSTEM {system}\n" \ |
| "<empty_output>USER {query}\n" \ |
| "<empty_output>ASSISTANT" |
| prompt = prompt_template.format(system="You are a programming assistant", |
| query="How do I sort a list in Python?") |
| ``` |
| |
| # Architecture |
| |
| As described in more detail in the blog post, we used: |
| |
| - [ALiBi](https://arxiv.org/abs/2108.12409) based attention |
| - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf) |
| - [Multi Query Attention](https://arxiv.org/abs/1911.02150) |
| |
| We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below. |
| |
| |
| # Training |
| |
| For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets. |
| Filtering is the key to success of this model: |
| |
| - We only used text in English |
| - Only topics related to computer science |
| - Applied heavy deduplication |
| |
| The text to code proportion was 50:50, model trained for 1.2T tokens. |
| |
| We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so |
| its practical use is limited. But if you still want it, write us a message on Discord. |
| |
| |
| # Limitations and Bias |
| |
| The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in |
| code comments. Its performance on non-English languages is lower, for sure. |
| |
| |
| # Model Stats |
| |
| - **Architecture:** LLAMA-like model with multi-query attention |
| - **Objectives** Fill-in-the-Middle, Chat |
| - **Tokens context:** 4096 |
| - **Pretraining tokens:** 1.2T |
| - **Finetuning tokens:** 40B |
| - **Precision:** bfloat16 |
| - **GPUs** 64 NVidia A5000 |
| - **Training time** 28 days |
| |
| |
| # License |
| |
| The model is licensed under the BigScience OpenRAIL-M v1 license agreement |
| |
| |
| # Citation |
| |
| If you are using this model, please give a link to this page. |
| |