Phase-Technologies commited on
Commit
0f06821
·
verified ·
1 Parent(s): 731c6f9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: 'custom'
3
+ tags:
4
+ - custom-architecture
5
+ - numpy
6
+ - chatbot
7
+ - text-generation
8
+ license: mit
9
+ metrics:
10
+ - loss
11
+ - perplexity
12
+ ---
13
+
14
+ # HRAN Chatbot Model Card
15
+
16
+ (Haykin Resonant Attention Network) is a ~1.01M parameter, custom-built sequence-to-sequence model. Rather than relying on standard deep learning frameworks like PyTorch or TensorFlow, HRAN is engineered entirely in NumPy to explore the mathematical first principles of computation, information theory, and adaptation.
17
+
18
+ The architecture is strictly derived from concepts in Simon Haykin's Neural Networks and Learning Machines (3rd Ed.), actively challenging modern transformer defaults by replacing dot-product attention and standard activations with biologically and mathematically grounded alternatives.
19
+
20
+ * **Developer**: Phase-Technologies
21
+ * **Model Type**: Custom Sequence-to-Sequence Language Model
22
+ * **Parameters**: ~1.01 Million
23
+ * **Framework**: Pure NumPy
24
+ * **License**: MIT
25
+
26
+ ## Architectural Innovations
27
+ HRAN abandons several standard transformer conventions in favor of experimental mechanics:
28
+
29
+ * **RBF Attention (Ch.5)**: Replaces standard dot-product attention with a Gaussian kernel (A_{ij} = ext{softmax}(-\gamma \|q_i - k_j\|^2)). This forces attention heads to localize in representation space based on physical distance rather than inner product maximization.
30
+ * **Hebbian Seed Initialization (Ch.2)**: Pre-seeds embeddings with co-occurrence statistics using Oja's rule before gradient descent, attempting to bridge unsupervised geometry with supervised learning.
31
+ * **Infomax Activation (Ch.10)**: Utilizes f(x) = anh(x) + lpha x (derived from Bell-Sejnowski ICA) to maximize mutual information throughput and strictly avoid information bottlenecks in hidden layers.
32
+ * **Lateral Inhibition Gate (Ch.9)**: Introduces competitive learning where winning activations are amplified and weak ones suppressed, producing sparse, discriminative representations.
33
+ * **Wiener-SNR Gradient Scaling (Ch.3)**: Scales parameter updates by local signal-to-noise ratio, allowing high-signal weights to learn quickly while suppressing noisy weight updates.
34
+
35
+ ## Training Data
36
+ The model was trained on a highly curated, 100% original dataset of 235 question-answer pairs (augmented to 1,040 samples). The dataset spans deep topics including neural network architecture, philosophy, physics, mathematics, and Haykin's specific theories.
37
+
38
+ ## Performance & Limitations
39
+ Disclaimer: This model is for architectural research and educational purposes only. It is not a functional conversational AI.
40
+
41
+ During training, the model experienced severe mathematical divergence.
42
+ * **Final Training Loss**: ~5.85
43
+ * **Perplexity**: ~347.9
44
+ * **Output State**: The current weights (hran_best.pkl) exhibit severe vocabulary degradation and mode collapse, largely outputting repetitive stop-words (e.g., "is is define is is").
45
+
46
+ This failure state provides a valuable case study in the difficulties of applying continuous-space RBF kernels to discrete language tokens, as well as the instability introduced by custom dynamic gradient scaling (Wiener-SNR) on small datasets.
47
+
48
+ ## Future Roadmap
49
+ This experimental build serves as a foundational testbed for understanding the deep mechanics of sequence modeling. Future iterations and related projects aim to:
50
+
51
+ * Replace the basic word-level tokenizer with the highly optimized Crayon tokenizer to drastically improve subword processing, vocabulary stability, and sequence compression.
52
+ * Integrate these first-principles architectural learnings into the broader RootFlow+ framework, specifically exploring how alternative attention mechanisms might inform the Heart-Head-Hands Transformer (H3T) approach to solving the AI grounding problem.
53
+
54
+ ## How to Use
55
+ Because HRAN is a custom NumPy architecture, it cannot be loaded via the standard transformers library. You must download both the architecture script and the weights.
56
+
57
+ ```python
58
+ from huggingface_hub import hf_hub_download
59
+ import sys
60
+
61
+ # 1. Download the architecture and weights
62
+ script_path = hf_hub_download(repo_id="Phase-Technologies/hran-chatbot", filename="hran_chatbot.py")
63
+ weights_path = hf_hub_download(repo_id="Phase-Technologies/hran-chatbot", filename="hran_best.pkl")
64
+
65
+ # 2. Add the downloaded script's directory to your path
66
+ import os
67
+ sys.path.append(os.path.dirname(script_path))
68
+ import hran_chatbot as hran
69
+
70
+ # 3. Initialize config and rebuild tokenizer
71
+ config = hran.CFG
72
+ tokenizer = hran.HRANTokenizer(max_vocab=config.vocab_size)
73
+ tokenizer.build(hran.FULL_DATASET)
74
+ config.vocab_size = tokenizer.vocab_size
75
+
76
+ # 4. Load Model
77
+ model = hran.HRANModel(config)
78
+ model.load(weights_path)
79
+
80
+ # 5. Generate Text
81
+ response = hran.generate_response(model, tokenizer, "What is attention?")
82
+ print(response)
83
+ ```