| # 🌌 Geometric LLM (GLLM): The $S^3$ One-Pass Learning Architecture |
|
|
| Welcome to **GLLM** (Geometric Large Language Model) — a fundamental departure from standard flat-space Transformers. |
|
|
| Current LLMs suffer from catastrophic forgetting and require massive datacenters to learn new information through backpropagation. **GLLM solves this by combining a traditional Transformer with a Non-Von Neumann $S^3$ Geometric Genome.** It projects language into a unit quaternion manifold, allowing it to "memorize" and retrieve new datasets instantly in a single pass—mimicking the human hippocampus. |
|
|
| ### 🚀 Try it instantly in Google Colab: |
| **[Train, Inject Data, and Chat in Colab Here](https://colab.research.google.com/drive/1kcEmK9xIvx55ScebU1jVGVbt9-nyVdaz?usp=sharing)** |
|
|
| --- |
|
|
| ## 🧠 Why is GLLM Different? |
|
|
| 1. **Topological Memory (The $S^3$ Genome):** Standard attention uses flat-space dot products (which wash out context). GLLM computes *geodesic distance ($\sigma$)* on a 4D quaternion sphere. |
| 2. **Instant One-Pass Learning:** Want to teach the model a new book, a private dataset, or new code? You **do not need to train it**. Just pass the text through the model once. It writes the semantic carriers directly into the $S^3$ Genome mesh. |
| 3. **Zero Catastrophic Forgetting:** Because memories are stored as topological anchors on a sphere, learning a new task does not overwrite the weights of the previous task. |
| 4. **BKT Consolidation:** The mesh self-organizes. As memories get too crowded, it uses the Berezinskii-Kosterlitz-Thouless (BKT) phase threshold to automatically merge identical concepts (like biological sleep consolidation). |
|
|
| --- |
|
|
| ## 🛠️ How to use this repository |
|
|
| You can load the pre-trained base model, inject your own data instantly, and chat with it. |
|
|
| ### 1. Load the Model |
| ```python |
| import torch |
| from transformers import AutoTokenizer |
| |
| # Load tokenizer and initialize GLLM |
| tokenizer = AutoTokenizer.from_pretrained("gpt2") |
| model = HybridGeometricLLM(vocab_size=len(tokenizer), d_model=768, n_heads=12, num_layers=6) |
| |
| # Download weights from Hugging Face |
| from huggingface_hub import hf_hub_download |
| weights_path = hf_hub_download(repo_id="SofiTesfay2010/GLLM", filename="gllm_weights.pt") |
| model.load_state_dict(torch.load(weights_path)) |
| model.cuda() |
| ``` |
|
|
| ### 2. Instant One-Pass Learning (Add Your Own Dataset) |
| No optimizers, no backpropagation, no gradients. Just forward passes. You can inject plain text or an entire dataset directly into the model's geometric memory. |
|
|
| ```python |
| from datasets import load_dataset |
| |
| # Load the model and set to EVAL |
| model.eval() |
| |
| # Example 1: Injecting an ENTIRE Dataset instantly! |
| my_dataset = load_dataset("squad", split="train[:500]") |
| texts_to_inject = my_dataset["context"] |
| |
| # The model absorbs all 500 articles in seconds without backprop. |
| instant_learn(texts_to_inject) |
| ``` |
|
|
| ## 📜 Technical Foundation |
| This architecture is based on the research paper: *"The Geometric Computer: Turing Completeness, Free Energy, and Learning in a Digital Brain on $S^3$"*. By utilizing Hamilton products and Kuramoto critical coupling, the model bridges the gap between quantum states, biology, and autoregressive language modeling. |
|
|
| *Created and maintained by [SofiTesfay2010](https://huggingface.co/SofiTesfay2010).* |
|
|