| | --- |
| | language: |
| | - en |
| | - fr |
| | - es |
| | - pt |
| | tags: |
| | - falcon3 |
| | --- |
| | |
| | # Falcon3-7B-Instruct |
| |
|
| | **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B. |
| |
|
| | This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks. |
| | Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K. |
| |
|
| | ## Model Details |
| | - Architecture |
| | - Transformer based causal decoder only architecture |
| | - 28 decoder blocks |
| | - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads |
| | - Wider head dimension: 256 |
| | - High RoPE value to support long context understanding: 1000042 |
| | - 32k context length |
| | - 131k vocab size |
| | - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips |
| | - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data |
| | - Supports EN, FR, ES, PT |
| | - Developed by [Technology Innovation Institute](https://www.tii.ae) |
| | - License: TII Falcon-LLM License 2.0 |
| | - Model Release Date: December 2024 |
| |
|
| |
|
| | ## Getting started |
| |
|
| | <details> |
| | <summary> Click to expand </summary> |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_name = "tiiuae/Falcon3-7B-Instruct" |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | |
| | prompt = "How many hours in one day?" |
| | messages = [ |
| | {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."}, |
| | {"role": "user", "content": prompt} |
| | ] |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True |
| | ) |
| | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| | |
| | generated_ids = model.generate( |
| | **model_inputs, |
| | max_new_tokens=1024 |
| | ) |
| | generated_ids = [ |
| | output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
| | ] |
| | |
| | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | print(response) |
| | ``` |
| |
|
| | </details> |
| |
|
| | <br> |
| |
|
| | # Benchmarks |
| | We report in the following table our internal pipeline benchmarks: |
| |
|
| | <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;"> |
| | <colgroup> |
| | <col style="width: 10%;"> |
| | <col style="width: 10%;"> |
| | <col style="width: 7%;"> |
| | <col style="width: 7%;"> |
| | <col style="width: 7%;"> |
| | <col style="width: 7%;"> |
| | <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;"> |
| | </colgroup> |
| | <thead> |
| | <tr> |
| | <th>Category</th> |
| | <th>Benchmark</th> |
| | <th>Llama-3.1-8B-Instruct</th> |
| | <th>Qwen2-7B-Instruct</th> |
| | <th>Qwen2.5-7B-Instruct</th> |
| | <th>gemma-2-9b-it</th> |
| | <th>Falcon3-7B-Instruct</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr> |
| | <td rowspan="3">General</td> |
| | <td>MMLU (5-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>MMLU-PRO (5-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>IFEval</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td rowspan="2">Math</td> |
| | <td>GSM8K (5-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>MATH(4-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td rowspan="4">Reasoning</td> |
| | <td>Arc Challenge (25-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>GPQA (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>MUSR (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>BBH (3-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td rowspan="4">CommonSense Understanding</td> |
| | <td>PIQA (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>SciQ (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>Winogrande (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | <tr> |
| | <td>OpenbookQA (0-shot)</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>-</td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| |
|
| | # Citation |
| | If Falcon3 family were helpful to your work, feel free to give us a cite. |
| |
|
| | ``` |
| | @misc{Falcon3, |
| | title = {The Falcon 3 family of Open Models}, |
| | author = {TII Team}, |
| | month = {December}, |
| | year = {2024} |
| | } |
| | ``` |
| |
|