Improve model card: Add abstract and full paper title to link

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
- library_name: transformers
5
- tags:
6
- - tptt
7
- - peft
8
- - trust_remote_code
9
- pipeline_tag: text-generation
10
  base_model: allenai/OLMo-1B-hf
11
  datasets:
12
  - yahma/alpaca-cleaned
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Titanesque-OLMo-1B-hf
@@ -34,8 +34,10 @@ datasets:
34
 
35
  Titanesque version of `allenai/OLMo-1B-hf` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
- The architecture was presented in the paper [TPTT](https://huggingface.co/papers/2506.17671).
38
 
 
 
39
 
40
  ## Model list
41
 
@@ -45,7 +47,7 @@ Classic model parameter with LiZA injection :
45
  |-------------------------------|----------------------|------------|------------|----------------|---------------|------|-------------------------------------------------------|
46
  | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
47
  | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
48
- | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
49
  | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
50
  | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
51
 
@@ -73,5 +75,4 @@ print(tokenizer.decode(outputs, skip_special_tokens=True))
73
 
74
  If you use TPTT in your academic work, please cite [Furfaro](https://huggingface.co/ffurfaro). For questions or support, please open an issue on the [GitHub repository](https://github.com/fabienfrfr/tptt) or contact the maintainer.
75
 
76
-
77
  ---
 
1
  ---
 
 
 
 
 
 
 
 
2
  base_model: allenai/OLMo-1B-hf
3
  datasets:
4
  - yahma/alpaca-cleaned
5
+ language: en
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - tptt
11
+ - peft
12
+ - trust_remote_code
13
  ---
14
 
15
  # Titanesque-OLMo-1B-hf
 
34
 
35
  Titanesque version of `allenai/OLMo-1B-hf` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
+ The architecture was presented in the paper [TPTT: Transforming Pretrained Transformers into Titans](https://huggingface.co/papers/2506.17671).
38
 
39
+ ## Abstract
40
+ Transformer-based large language models (LLMs) have achieved strong performance across many natural language processing tasks. Nonetheless, their quadratic computational and memory requirements, particularly in self-attention layers, pose challenges for efficient inference on long contexts and for deployment in resource-limited environments. We present TPTT (Transforming Pretrained Transformers into Titans), a framework designed to augment pretrained Transformers with linearized attention (LiZA) and internal memory gating via Memory as Gate (MaG), applied without full retraining. TPTT supports parameter-efficient fine-tuning (LoRA) and integrates with standard toolkits such as Hugging Face Transformers. We evaluated TPTT on several pretrained models, including Llama-1B, OlMoE-1B-7B, Qwen2.5-1.5B, Gemma3-270m, OpenELM-1.3B, and Mistral-7B, in order to assess applicability across architectures of different scales. Experiments on models with approximately 1 billion parameters, evaluated primarily on the MMLU benchmark, suggest potential improvements in both efficiency and accuracy compared to baseline models. For example, Titans-Llama-1B exhibited up to a 20% relative increase in Exact Match scores in one-shot evaluation. An additional finding is that it is possible to convert a quadratic-attention model into a purely linear-attention model using the DeltaProduct mechanism. All training runs were carried out with modest computational resources. These preliminary findings indicate that TPTT may help adapt pretrained LLMs for long-context tasks with limited overhead. Further studies on larger models and a broader set of benchmarks will be necessary to evaluate the generality and robustness of the framework. Code is available at this https URL . Python package at this https URL .
41
 
42
  ## Model list
43
 
 
47
  |-------------------------------|----------------------|------------|------------|----------------|---------------|------|-------------------------------------------------------|
48
  | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
49
  | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
50
+ | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
51
  | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
52
  | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
53
 
 
75
 
76
  If you use TPTT in your academic work, please cite [Furfaro](https://huggingface.co/ffurfaro). For questions or support, please open an issue on the [GitHub repository](https://github.com/fabienfrfr/tptt) or contact the maintainer.
77
 
 
78
  ---