| | --- |
| | license: other |
| | license_name: fair-noncommercial-research-license |
| | license_link: https://huggingface.co/facebook/blt-1b/blob/main/LICENSE |
| | extra_gated_fields: |
| | First Name: text |
| | Last Name: text |
| | Date of birth: date_picker |
| | Country: country |
| | Affiliation: text |
| | I accept the terms and conditions: checkbox |
| | geo: ip_location |
| | language: |
| | - en |
| | tags: |
| | - facebook |
| | - meta-pytorch |
| | - blt |
| | --- |
| | |
| | # Byte Latent Transformer (BLT) |
| |
|
| | This repository contains the model weights for our paper: "Byte Latent Transformer: Patches Scale Better Than Tokens" |
| |
|
| | - [Paper Link](https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf) |
| | - [HF Paper Link](https://huggingface.co/papers/2412.09871) |
| |
|
| | ## Abstract |
| |
|
| | We introduce the Byte Latent Transformer architecture (BLTs), a new byte-level LLM architecture that |
| | for the first time, matches tokenization-based LLM performance at scale, with significant improvements |
| | in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve |
| | as the primary units of computation. Patches are segmented dynamically based on the entropy of the |
| | next byte, allocating more compute and model capacity where there is more data complexity. The BLT |
| | architecture includes new attention mechanisms to maximize the information flow between byte and |
| | patch hidden representations and a new type of byte-sequence memory. We present the first scaling |
| | study of byte-level models up to 8B parameters and 8T training bytes, showing for the first time |
| | that we can train a model end-to-end at scale from bytes with no tokenization or other preprocessing. |
| | Scaling trends reveal training and inference efficiency benefits from dynamically selecting very long |
| | patches on average, along with qualitative improvements with reasoning and long tail generalization |
| | from modeling byte-sequences. |
| |
|
| | To run the model, see the readme here: https://github.com/facebookresearch/blt |
| |
|
| | ## Links |
| |
|
| | - Code: https://github.com/facebookresearch/blt |
| | - BLT 1B Weights: https://huggingface.co/facebook/blt-1b |
| | - BLT 7B Weights: https://huggingface.co/facebook/blt-7b |
| | - BLT Weight Collection: https://huggingface.co/collections/facebook/blt-6801263d4ac1704702a192a6 |
| |
|
| |
|