Spaces:

Aluode
/

MoireFormer-Chat

Sleeping

App Files Files Community

MoireFormer-Chat / README.md

Aluode

Update README.md

0c8ec8c verified 4 days ago

preview code

raw

history blame contribute delete

1.95 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: MoireFormer Chat
emoji: 🌊
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
license: mit

MoireFormer (104.9M Proof-of-Concept)

This repository hosts the PyTorch weights moire_phase2_weights_final.pt for MoireFormer, a neural network architecture that replaces standard dot-product attention with Moiré phase-interference wave mechanics.

Instead of computing attention via Q · K^T, this model splits token embeddings into amplitude and phase and computes attention through geometric wave resonance.

GitHub Code: https://github.com/anttiluode/MoireFormer

Theory: https://github.com/anttiluode/Geometric-Neuron

Model Details

Architecture: MoireGPT (custom transformer)

Parameters: 104.9M

Structure:

8 layers
8 heads
768 embedding dimension

Capabilities:

English / Spanish syntax
conversational structure
instruction following

Note: This is a proof-of-substrate model, not a factual knowledge model.

How To Run

This model cannot be loaded with AutoModel.

It must run through the custom architecture.

1 Clone repo

git clone https://github.com/anttiluode/MoireFormer.git
cd MoireFormer

2 Install dependencies

pip install torch transformers datasets

3 Download weights

Download:

https://huggingface.co/Aluode/MoireFormer/blob/main/moire_phase2_weights_final.pt

Place the file inside the repo folder.

4 Run chat interface

python moire_chat.py --weights moire_phase2_weights_final.pt --size large

Training Curriculum

Phase 1
15 epochs on Dolly-15k, WikiText-2, OpenAssistant.

Phase 2
5 epochs on Guanaco dataset.

The experiment demonstrates that wave-field attention can learn discrete language syntax via phase geometry.

Disclaimer

This is an experimental architecture exploring biological wave-field computation in neural networks.

At 100M parameters it will hallucinate factual information.