GLLM / README.md
SofiTesfay2010's picture
Upload README.md with huggingface_hub
5d7b299 verified
# 🌌 Geometric LLM (GLLM): The $S^3$ One-Pass Learning Architecture
Welcome to **GLLM** (Geometric Large Language Model) — a fundamental departure from standard flat-space Transformers.
Current LLMs suffer from catastrophic forgetting and require massive datacenters to learn new information through backpropagation. **GLLM solves this by combining a traditional Transformer with a Non-Von Neumann $S^3$ Geometric Genome.** It projects language into a unit quaternion manifold, allowing it to "memorize" and retrieve new datasets instantly in a single pass—mimicking the human hippocampus.
### 🚀 Try it instantly in Google Colab:
**[Train, Inject Data, and Chat in Colab Here](https://colab.research.google.com/drive/1kcEmK9xIvx55ScebU1jVGVbt9-nyVdaz?usp=sharing)**
---
## 🧠 Why is GLLM Different?
1. **Topological Memory (The $S^3$ Genome):** Standard attention uses flat-space dot products (which wash out context). GLLM computes *geodesic distance ($\sigma$)* on a 4D quaternion sphere.
2. **Instant One-Pass Learning:** Want to teach the model a new book, a private dataset, or new code? You **do not need to train it**. Just pass the text through the model once. It writes the semantic carriers directly into the $S^3$ Genome mesh.
3. **Zero Catastrophic Forgetting:** Because memories are stored as topological anchors on a sphere, learning a new task does not overwrite the weights of the previous task.
4. **BKT Consolidation:** The mesh self-organizes. As memories get too crowded, it uses the Berezinskii-Kosterlitz-Thouless (BKT) phase threshold to automatically merge identical concepts (like biological sleep consolidation).
---
## 🛠️ How to use this repository
You can load the pre-trained base model, inject your own data instantly, and chat with it.
### 1. Load the Model
```python
import torch
from transformers import AutoTokenizer
# Load tokenizer and initialize GLLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = HybridGeometricLLM(vocab_size=len(tokenizer), d_model=768, n_heads=12, num_layers=6)
# Download weights from Hugging Face
from huggingface_hub import hf_hub_download
weights_path = hf_hub_download(repo_id="SofiTesfay2010/GLLM", filename="gllm_weights.pt")
model.load_state_dict(torch.load(weights_path))
model.cuda()
```
### 2. Instant One-Pass Learning (Add Your Own Dataset)
No optimizers, no backpropagation, no gradients. Just forward passes. You can inject plain text or an entire dataset directly into the model's geometric memory.
```python
from datasets import load_dataset
# Load the model and set to EVAL
model.eval()
# Example 1: Injecting an ENTIRE Dataset instantly!
my_dataset = load_dataset("squad", split="train[:500]")
texts_to_inject = my_dataset["context"]
# The model absorbs all 500 articles in seconds without backprop.
instant_learn(texts_to_inject)
```
## 📜 Technical Foundation
This architecture is based on the research paper: *"The Geometric Computer: Turing Completeness, Free Energy, and Learning in a Digital Brain on $S^3$"*. By utilizing Hamilton products and Kuramoto critical coupling, the model bridges the gap between quantum states, biology, and autoregressive language modeling.
*Created and maintained by [SofiTesfay2010](https://huggingface.co/SofiTesfay2010).*