File size: 2,877 Bytes
b57c451
 
 
 
 
 
 
c517fa7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
license: apache-2.0
datasets:
- Skylion007/openwebtext
language:
- en
pipeline_tag: text-generation
tags:
- research
- convolutional
- fft
- transformer-alternative
- causal-lm
---

# GCLM — Global Convolutional Language Model

## Model Summary

**GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**.

Instead of attention heads, GCLM uses:
- **Local depthwise convolutions** for short-range context
- **FFT-based global convolutions** for long-range sequence modeling

This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling.

> GCLM is a transformer alternative — not a transformer replacement.

---

## Architecture Overview

- Token + learned positional embeddings
- Stacked convolutional blocks:
  - Local depthwise + pointwise convolution
  - Optional global FFT convolution every *N* layers
  - Feedforward MLP
  - Residual connections + LayerNorm
- Causal language modeling head

**Key properties:**
- No attention mechanism
- No KV cache
- Linear memory scaling with sequence length
- Extremely long-context friendly (tested up to 8k+ tokens)

---

## Training Data

The model was trained on:
- **Skylion007/openwebtext**

This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.

---

## Intended Use

**Primary use cases:**
- Research into transformer alternatives
- Long-context modeling experiments
- Architectural ablation studies
- Educational exploration of non-attention sequence models

**Not intended for:**
- Safety-critical applications
- Medical, legal, or financial advice
- Deployment as a production chatbot without additional alignment work

---

## Limitations

- This model is **research-grade**, not instruction-tuned
- Outputs may be:
  - Incoherent
  - Factually incorrect
  - Biased or unsafe
- Performance characteristics differ significantly from transformer LMs
- No reinforcement learning or alignment tuning applied

---

## Ethical Considerations

GCLM was trained on publicly available web data and may reflect societal biases present in that data.

Users are responsible for:
- Applying appropriate filtering
- Avoiding harmful or misleading use cases
- Evaluating outputs critically

---

## License

This model is released under the **Apache License 2.0**.

You are free to:
- Use
- Modify
- Distribute
- Use commercially

Attribution and license preservation are required.  
Patent rights are explicitly granted under this license.

---

## Citation

If you use GCLM in your research, please cite or reference the project.


## Important

The model will not be put in the repo until it has finished training.