Text Generation
Transformers
Safetensors
English
gla
nielsr HF Staff commited on
Commit
7b2d417
·
verified ·
1 Parent(s): a024d87

Improve model card: Add description, links, usage, and update metadata

Browse files

This PR significantly improves the model card for `fla-hub/gla-1.3B-100B` by:

- Adding a detailed description of the model and its context, derived from the paper's abstract.
- Including a direct link to the research paper: [A Systematic Analysis of Hybrid Linear Attention](https://huggingface.co/papers/2507.06457).
- Providing a link to the likely official code repository: [https://github.com/FLAG-CMU/fla](https://github.com/FLAG-CMU/fla).
- Adding a practical Python code snippet for text generation using the Hugging Face `transformers` library, which significantly enhances usability.
- Updating the metadata:
- Changed `library_name` from `fla` to `transformers` to enable the inference widget and better integrate with the Hugging Face ecosystem.
- Added `pipeline_tag: text-generation` as a top-level field for improved discoverability and categorization on the Hub.

These additions will make the model more accessible and useful for the community.

Files changed (1) hide show
  1. README.md +55 -4
README.md CHANGED
@@ -1,11 +1,62 @@
1
  ---
 
 
2
  language:
3
  - en
 
 
 
4
  tags:
5
  - text-generation
6
  - gla
7
- license: mit
8
- datasets:
9
- - cerebras/SlimPajama-627B
10
- library_name: fla
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - cerebras/SlimPajama-627B
4
  language:
5
  - en
6
+ library_name: transformers
7
+ license: mit
8
+ pipeline_tag: text-generation
9
  tags:
10
  - text-generation
11
  - gla
 
 
 
 
12
  ---
13
+
14
+ # GLA 1.3B-100B: A Hybrid Linear Attention Model
15
+
16
+ This repository contains the `gla-1.3B-100B` model, a 1.3B parameter variant trained on 100B tokens, which was presented in the paper [A Systematic Analysis of Hybrid Linear Attention](https://huggingface.co/papers/2507.06457).
17
+
18
+ Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. This paper systematically evaluates various linear attention models across generations—vector recurrences to advanced gating mechanisms—both standalone and hybridized. The `gla-1.3B-100B` model is one of 72 models trained and open-sourced to enable this comprehensive analysis. The research highlights that superior standalone linear models do not necessarily excel in hybrids, and emphasizes selective gating, hierarchical recurrence, and controlled forgetting as critical for effective hybrid models. Architectures such as HGRN-2 or GatedDeltaNet with a linear-to-full ratio between 3:1 and 6:1 are recommended for achieving Transformer-level recall efficiently.
19
+
20
+ ## Usage
21
+
22
+ This model can be easily loaded and used for text generation tasks with the Hugging Face `transformers` library:
23
+
24
+ ```python
25
+ from transformers import AutoTokenizer, AutoModelForCausalLM
26
+
27
+ # Load the tokenizer and model
28
+ model_id = "fla-hub/gla-1.3B-100B"
29
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
30
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
31
+
32
+ # Example for text generation
33
+ prompt = "Hello, my name is"
34
+ inputs = tokenizer(prompt, return_tensors="pt")
35
+
36
+ # Generate text
37
+ outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, top_k=50, top_p=0.95, temperature=0.7)
38
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
39
+
40
+ print(generated_text)
41
+ ```
42
+
43
+ ## Paper and Citation
44
+
45
+ If you find this work useful, please consider citing the original paper:
46
+
47
+ [A Systematic Analysis of Hybrid Linear Attention](https://huggingface.co/papers/2507.06457)
48
+
49
+ ```bibtex
50
+ @article{li2025systematic,
51
+ title={A Systematic Analysis of Hybrid Linear Attention},
52
+ author={Li, Tianhong and Deng, Mingyang and He, Kaiming},
53
+ journal={arXiv preprint arXiv:2507.06457},
54
+ year={2025},
55
+ }
56
+ ```
57
+
58
+ ## Code
59
+
60
+ The official codebase for the models and research, including training scripts and other checkpoints, can be found on GitHub:
61
+
62
+ [https://github.com/FLAG-CMU/fla](https://github.com/FLAG-CMU/fla)