SofiTesfay2010
/

GLLM

Model card Files Files and versions

GLLM / README.md

SofiTesfay2010's picture

Upload README.md with huggingface_hub

5d7b299 verified about 1 month ago

|

history blame contribute delete

3.29 kB

	# 🌌 Geometric LLM (GLLM): The $S^3$ One-Pass Learning Architecture

	Welcome to GLLM (Geometric Large Language Model) — a fundamental departure from standard flat-space Transformers.

	Current LLMs suffer from catastrophic forgetting and require massive datacenters to learn new information through backpropagation. GLLM solves this by combining a traditional Transformer with a Non-Von Neumann $S^3$ Geometric Genome. It projects language into a unit quaternion manifold, allowing it to "memorize" and retrieve new datasets instantly in a single pass—mimicking the human hippocampus.

	### 🚀 Try it instantly in Google Colab:
	[Train, Inject Data, and Chat in Colab Here](https://colab.research.google.com/drive/1kcEmK9xIvx55ScebU1jVGVbt9-nyVdaz?usp=sharing)

	---

	## 🧠 Why is GLLM Different?

	1. Topological Memory (The $S^3$ Genome): Standard attention uses flat-space dot products (which wash out context). GLLM computes geodesic distance ($\sigma$) on a 4D quaternion sphere.
	2. Instant One-Pass Learning: Want to teach the model a new book, a private dataset, or new code? You do not need to train it. Just pass the text through the model once. It writes the semantic carriers directly into the $S^3$ Genome mesh.
	3. Zero Catastrophic Forgetting: Because memories are stored as topological anchors on a sphere, learning a new task does not overwrite the weights of the previous task.
	4. BKT Consolidation: The mesh self-organizes. As memories get too crowded, it uses the Berezinskii-Kosterlitz-Thouless (BKT) phase threshold to automatically merge identical concepts (like biological sleep consolidation).

	---

	## 🛠️ How to use this repository

	You can load the pre-trained base model, inject your own data instantly, and chat with it.

	### 1. Load the Model
	```python
	import torch
	from transformers import AutoTokenizer

	# Load tokenizer and initialize GLLM
	tokenizer = AutoTokenizer.from_pretrained("gpt2")
	model = HybridGeometricLLM(vocab_size=len(tokenizer), d_model=768, n_heads=12, num_layers=6)

	# Download weights from Hugging Face
	from huggingface_hub import hf_hub_download
	weights_path = hf_hub_download(repo_id="SofiTesfay2010/GLLM", filename="gllm_weights.pt")
	model.load_state_dict(torch.load(weights_path))
	model.cuda()
	```

	### 2. Instant One-Pass Learning (Add Your Own Dataset)
	No optimizers, no backpropagation, no gradients. Just forward passes. You can inject plain text or an entire dataset directly into the model's geometric memory.

	```python
	from datasets import load_dataset

	# Load the model and set to EVAL
	model.eval()

	# Example 1: Injecting an ENTIRE Dataset instantly!
	my_dataset = load_dataset("squad", split="train[:500]")
	texts_to_inject = my_dataset["context"]

	# The model absorbs all 500 articles in seconds without backprop.
	instant_learn(texts_to_inject)
	```

	## 📜 Technical Foundation
	This architecture is based on the research paper: "The Geometric Computer: Turing Completeness, Free Energy, and Learning in a Digital Brain on $S^3$". By utilizing Hamilton products and Kuramoto critical coupling, the model bridges the gap between quantum states, biology, and autoregressive language modeling.

	Created and maintained by [SofiTesfay2010](https://huggingface.co/SofiTesfay2010).