thefinalboss commited on
Commit
39a16d2
·
verified ·
1 Parent(s): 5d8ed85

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -388
README.md CHANGED
@@ -1,422 +1,85 @@
1
  ---
 
2
  language:
3
  - en
4
  - fr
5
- license: mit
6
- library_name: pytorch
7
  tags:
8
  - non-transformer
9
  - cognitive-routing
10
  - hierarchical-memory
11
  - character-level
12
- - o(n)-complexity
13
- - language-model
14
- - novel-architecture
15
  pipeline_tag: text-generation
16
- model_type: cognet
17
  ---
18
 
19
- <div align="center">
20
-
21
  # CogNet-40M
22
 
23
- ### A Non-Transformer Language Model with Cognitive Routing and Hierarchical Memory
24
-
25
- [![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
26
- [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)]()
27
- [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-EE4C2C?logo=pytorch&logoColor=white)]()
28
- [![CUDA 13.1](https://img.shields.io/badge/CUDA-13.1-76B900?logo=nvidia&logoColor=white)]()
29
-
30
- **No self-attention. No quadratic complexity. Pure cognition.**
31
-
32
- [Architecture](#architecture) · [Quick Start](#quick-start) · [Training](#training) · [Benchmarks](#benchmarks)
33
-
34
- </div>
35
-
36
- ---
37
-
38
- ## Why CogNet?
39
-
40
- Every language model today is built on the same foundation: the Transformer and its self-attention mechanism. Self-attention is powerful — it enables tokens to communicate with every other token in the sequence. But this communication comes at a cost: **O(n²) time and memory complexity**. As sequence lengths grow, the computational burden explodes quadratically.
41
-
42
- CogNet asks a different question: **What if we replace self-attention entirely with mechanisms inspired by how human cognition actually works?**
43
-
44
- Human brains don't compute all-pairs interactions between every piece of information. Instead, we use:
45
- - **Selective routing** — we focus attention on relevant information channels
46
- - **Hierarchical memory** — we store and retrieve from working, episodic, and semantic memory
47
- - **Adaptive computation** — we spend more time on hard problems
48
- - **Compositional reasoning** — we bind roles to fillers to build complex representations
49
-
50
- CogNet implements each of these principles as a differentiable neural module, creating a language model that processes sequences in **O(n) time** while maintaining rich contextual representations through hierarchical memory.
51
-
52
- ---
53
 
54
  ## Architecture
55
 
56
- ### System Overview
57
-
58
- CogNet replaces the standard Transformer block with a **Cognitive Block** that routes, remembers, reasons, and composes:
59
-
60
- ```
61
- Input Tokens
62
-
63
-
64
- ┌─────────────┐
65
- TokenEncoder │ Embedding + Learned Positional Encoding
66
- └──────┬──────┘
67
-
68
- ┌────▼────────────────────────────────────────────┐
69
- │ Cognitive Block × 6 │
70
- │ │
71
- │ ┌──────────────────┐ │
72
- │ │ CoherenceRouter │ O(n) channel routing │
73
- │ │ ┌────────────┐ │ │
74
- │ │ │ Channel 0 │ │ Depthwise Sep. Conv │
75
- │ │ │ Channel 1 │ │ + SwiGLU FFN │
76
- │ │ │ Channel 2 │ │ │
77
- │ │ │ Channel 3 │ │ (each channel processes │
78
- │ │ │ Channel 4 │ │ a routed subset of │
79
- │ │ │ Channel 5 │ │ tokens independently) │
80
- │ │ └────────────┘ │ │
81
- │ └──────────────────┘ │
82
- │ │ │
83
- │ ┌────────▼──────────────────────┐ │
84
- │ │ SharedHierarchicalMemory │ │
85
- │ │ ┌──────────┐ ┌────────────┐ ┌───────────┐ │
86
- │ │ │ Working │ │ Episodic │ │ Semantic │ │
87
- │ │ │ 32 slots │ │ 64 slots │ │ 128 slots │ │
88
- │ │ │ (recent) │ │ (patterns)│ │ (concepts)│ │
89
- │ │ └────┬─────┘ └─────┬──────┘ └─────┬─────┘ │
90
- │ │ └──────┬──────┘──────────────┘ │
91
- │ │ Gated Combination │
92
- │ └──────────────────────────────┘ │
93
- │ │ │
94
- │ ┌────────▼──────────────────────┐ │
95
- │ │ AdaptiveComputationBlock │ │
96
- │ │ (1-2 steps per token) │ │
97
- │ │ ┌──────┐ ┌──────┐ │ │
98
- │ │ │FFN 1 │→│FFN 2 │ SwiGLU │ │
99
- │ │ └──────┘ └──────┘ + halt │ │
100
- │ └──────────────────────────────┘ │
101
- │ │ │
102
- │ ┌────────▼──────────────────────┐ │
103
- │ │ CompositionalReasoner │ │
104
- │ │ Role-Filler Binding (HDC) │ │
105
- │ │ Circular Convolution │ │
106
- │ └──────────────────────────────┘ │
107
- │ │
108
- └──────────────────────────────────────────────────┘
109
-
110
- ┌────▼──────┐
111
- │ LayerNorm │
112
- └────┬──────┘
113
-
114
- ┌────▼──────┐
115
- │ OutputHead│ Weight-tied with TokenEncoder
116
- └───────────┘
117
-
118
-
119
- Token Logits
120
- ```
121
-
122
- ### Component Deep Dive
123
-
124
- #### 1. CoherenceRouter — O(n) Token Routing
125
-
126
- The CoherenceRouter replaces self-attention with a learned routing mechanism that assigns each token to one or more processing channels based on **coherence scoring**. Unlike self-attention which computes all n×n token interactions, the CoherenceRouter:
127
-
128
- 1. Projects each token into a **query** and **key** vector of dimension `num_channels`
129
- 2. Computes the mean key across the entire sequence (O(n) reduction)
130
- 3. Scores each token against this mean key via element-wise multiplication (O(n))
131
- 4. Applies a single refinement step for improved routing accuracy
132
- 5. Produces soft routing weights via softmax, plus hard top-2 masks for efficiency
133
-
134
- **Complexity**: O(n × C) where C is the number of channels, compared to O(n²) for self-attention.
135
-
136
- The key insight is that routing doesn't need to know about every pairwise interaction — it only needs to know "which processing channel should handle this token?" This is analogous to how the brain routes sensory information to specialized cortical areas.
137
-
138
- #### 2. CognitiveChannel — Efficient Per-Channel Processing
139
 
140
- Each of the 6 CognitiveChannels processes the tokens routed to it using two stacked operations:
141
 
142
- - **Depthwise Separable Convolution**: A depthwise conv (kernel=3, groups=channel_dim) captures local patterns, followed by a pointwise conv (kernel=1) for cross-feature mixing. This is O(n) per channel.
143
- - **SwiGLU Feed-Forward Network**: The SwiGLU activation (SiLU gate × linear up) provides the non-linear transformation capacity of a standard FFN, but applied independently within each channel's feature space.
144
-
145
- Both operations include residual connections and LayerNorm for stable training.
146
-
147
- #### 3. SharedHierarchicalMemory — 3-Tier Key-Value Store
148
-
149
- This is the core innovation that enables CogNet to maintain rich contextual representations without self-attention. Inspired by the Atkinson-Shiffrin model of human memory, the module implements three tiers of learned key-value memory:
150
-
151
- | Tier | Slots | Analogy | Content |
152
- |------|-------|---------|---------|
153
- | **Working Memory** | 32 | Short-term buffer | Recent token representations |
154
- | **Episodic Memory** | 64 | Event sequences | Recurring patterns and phrases |
155
- | **Semantic Memory** | 128 | Knowledge store | Abstract concepts and relationships |
156
-
157
- **Read mechanism**: For each input token, the module projects a query vector and performs scaled dot-product attention against each tier's keys and values independently. The three tier outputs are then combined via a **learned gating network** that produces softmax weights over the three tiers, allowing the model to dynamically balance between recent context (Working), pattern matching (Episodic), and conceptual knowledge (Semantic).
158
-
159
- **Key properties**:
160
- - Memory slots are **learned parameters** — they encode persistent knowledge across the entire training corpus, not just the current sequence
161
- - The gating mechanism enables **dynamic memory access** — different tokens may rely more on working memory (for local coherence) or semantic memory (for factual knowledge)
162
- - Total memory capacity: 224 key-value pairs per layer, providing a compressed but rich knowledge store
163
-
164
- **Complexity**: O(n × S) per tier where S is the number of slots, compared to O(n²) for self-attention. Since S is fixed (224 total), this is effectively O(n).
165
-
166
- #### 4. AdaptiveComputationBlock — Variable-Depth Processing
167
-
168
- Not all tokens require the same amount of computation. The AdaptiveComputationBlock allows each token to be processed for 1 to `max_adaptive_steps` iterations of SwiGLU FFN layers, with a learned **halting mechanism** that determines when a token's representation is sufficiently refined.
169
-
170
- After each step, a sigmoid halting probability is computed. The token's output is the weighted sum of its intermediate states, where the weights are determined by the halting probabilities. This enables:
171
- - **Fast processing** for simple, predictable tokens (e.g., articles, common suffixes)
172
- - **Deep processing** for ambiguous or information-rich tokens (e.g., rare words, punctuation at clause boundaries)
173
-
174
- #### 5. CompositionalReasoner — Hyperdimensional Binding
175
-
176
- The CompositionalReasoner implements **role-filler binding** from hyperdimensional computing (HDC). It projects each token into a role vector and a filler vector, then binds them via element-wise multiplication (analogous to circular convolution in the frequency domain). A shift-based unbinding operation adds positional awareness.
177
-
178
- This enables the model to represent compositional structures like "the **subject** of the sentence is **the cat**" where "subject" is the role and "the cat" is the filler — a fundamental capability for understanding linguistic structure without explicit syntax trees.
179
-
180
- ---
181
-
182
- ## Complexity Analysis
183
-
184
- | Operation | Transformer | CogNet | Speedup Factor |
185
- |-----------|-------------|--------|----------------|
186
- | Token mixing | O(n² × d) | O(n × C × d) | **n / C** |
187
- | Memory access | O(n² × d) | O(n × S × d) | **n / S** |
188
- | FFN | O(n × d × ff) | O(n × d × ff) | 1× (same) |
189
- | **Total per layer** | **O(n² × d)** | **O(n × (C + S + ff) × d)** | **~n / (C + S)** |
190
-
191
- For a 256-token sequence with C=6 channels and S=224 memory slots, CogNet achieves roughly a **4× speedup** over an equivalent Transformer layer. This advantage grows linearly with sequence length — at 1024 tokens, the speedup approaches **16×**.
192
-
193
- ---
194
-
195
- ## Model Specifications
196
-
197
- | Parameter | Value |
198
- |-----------|-------|
199
- | **Architecture** | CogNet (Non-Transformer) |
200
- | **Total Parameters** | 39,725,784 (~40M) |
201
- | **Hidden Dimension** | 512 |
202
- | **Cognitive Blocks** | 6 |
203
- | **Cognitive Channels** | 6 |
204
- | **Channel Dimension** | 128 |
205
- | **FF Dimension** | 1024 |
206
- | **Working Memory Slots** | 32 |
207
- | **Episodic Memory Slots** | 64 |
208
- | **Semantic Memory Slots** | 128 |
209
- | **Key Dimension** | 256 |
210
- | **Max Sequence Length** | 256 |
211
- | **Vocabulary Size** | 136 (character-level) |
212
- | **Model Size (FP32)** | ~159 MB |
213
- | **Model Size (FP16)** | ~80 MB |
214
- | **Adaptive Steps** | 1–2 |
215
- | **Routing Iterations** | 1 |
216
- | **Composition** | Hyperdimensional binding |
217
-
218
- ### Character-Level Tokenizer
219
-
220
- CogNet uses a 136-character vocabulary tokenizer that covers:
221
- - Standard ASCII (printable characters, digits, punctuation)
222
- - French accented characters (à, é, è, ê, ë, î, ï, ô, ù, û, ü, ÿ, ç, æ, œ)
223
- - Special formatting characters (tab, newline)
224
- - European typographic marks (guillemets « », inverted question mark ¿)
225
-
226
- Character-level tokenization ensures:
227
- - **No out-of-vocabulary tokens** — every string is representable
228
- - **Cross-lingual capability** — no bias toward English subword units
229
- - **Compact vocabulary** — only 136 embedding vectors vs 32K+ for BPE tokenizers
230
- - **Fine-grained generation** — the model learns orthographic patterns directly
231
-
232
- ---
233
-
234
- ## Quick Start
235
-
236
- ### Installation
237
-
238
- ```bash
239
- pip install torch
240
- ```
241
-
242
- ### Download Model
243
-
244
- ```python
245
- from huggingface_hub import hf_hub_download
246
-
247
- # Download model checkpoint
248
- ckpt_path = hf_hub_download("AFKmoney/CogNet-40M", "cognet_best.pt")
249
- tokenizer_path = hf_hub_download("AFKmoney/CogNet-40M", "tokenizer_v3.json")
250
- model_code = hf_hub_download("AFKmoney/CogNet-40M", "cognet_1b.py")
251
- infer_code = hf_hub_download("AFKmoney/CogNet-40M", "infer.py")
252
- ```
253
-
254
- ### Inference
255
-
256
- ```python
257
- import sys, torch
258
- sys.path.insert(0, ".") # Add downloaded files to path
259
-
260
- from cognet_1b import CogNet1B
261
- from infer import CharTokenizer
262
-
263
- # Load tokenizer
264
- tokenizer = CharTokenizer.load("tokenizer_v3.json")
265
-
266
- # Build model
267
- model = CogNet1B(
268
- vocab_size=136, hidden_dim=512, num_blocks=6,
269
- num_channels=6, channel_dim=128, ff_dim=1024,
270
- routing_iters=1, max_adaptive_steps=2, max_seq_len=256,
271
- working_slots=32, episodic_slots=64, semantic_slots=128,
272
- key_dim=256, dropout=0.1
273
- )
274
-
275
- # Load checkpoint (handles FP16 weights)
276
- ckpt = torch.load("cognet_best.pt", map_location="cpu", weights_only=False)
277
- state = {k: v.float() if v.dtype == torch.float16 else v
278
- for k, v in ckpt["model_state_dict"].items()}
279
- model.load_state_dict(state)
280
- model.eval()
281
-
282
- # Generate text
283
- prompt = "Once upon a time"
284
- ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
285
-
286
- with torch.no_grad():
287
- gen = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
288
-
289
- print(tokenizer.decode(gen[0].tolist()))
290
- ```
291
-
292
- ### CUDA Inference
293
-
294
- ```python
295
- device = "cuda" if torch.cuda.is_available() else "cpu"
296
- model = model.to(device)
297
- ids = ids.to(device)
298
-
299
- with torch.no_grad():
300
- gen = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
301
- ```
302
-
303
- ---
304
 
305
  ## Training
306
 
307
- ### Training Data
308
-
309
- | Dataset | Size | Domain | Language |
310
- |---------|------|--------|----------|
311
- | WikiText-2 (raw) | ~2M tokens | Encyclopedic | English |
312
- | TinyStories (50K) | ~15M tokens | Narrative | English |
313
- | Alpaca (52K) | ~5M tokens | Instructions | English |
314
- | **Total** | **~63M tokens** | **Mixed** | **English** |
315
-
316
- ### Training Configuration
317
-
318
- | Parameter | Value |
319
- |-----------|-------|
320
- | **Hardware** | NVIDIA GeForce RTX 5060 Ti (16 GB VRAM) |
321
- | **Sequence Length** | 256 |
322
- | **Batch Size** | 64 (gradient accumulation × 4, effective = 256) |
323
- | **Optimizer** | AdamW (β₁=0.9, β₂=0.95, weight_decay=0.01) |
324
- | **Learning Rate** | 5e-4 (cosine schedule with warmup) |
325
- | **Warmup Steps** | 500 |
326
- | **Precision** | FP16 (AMP) with TF32 enabled |
327
- | **Gradient Clipping** | 1.0 |
328
- | **Total Steps** | 30,000 |
329
- | **Throughput** | ~3M tokens/min |
330
-
331
- ### Training Curve
332
-
333
- Training loss follows a smooth descent from initial loss ~4.5 (random character predictions over 136 vocab) down to ~0.007, with validation perplexity reaching 1.01 — meaning the model predicts the next character with high confidence. The hierarchical memory gates show interesting dynamics: Working memory dominates early in training (local character patterns), while Semantic memory gates increase as the model learns abstract patterns.
334
-
335
- ---
336
-
337
- ## Benchmarks
338
-
339
- ### Perplexity
340
-
341
  | Metric | Value |
342
  |--------|-------|
343
- | Training Loss | 0.007 |
344
- | Training PPL | 1.01 |
345
- | Validation Loss | 0.008 |
346
- | Validation PPL | 1.01 |
 
 
 
347
 
348
- *Note: These perplexity scores are character-level on the training distribution. Cross-model comparison with BPE-tokenized models requires adjustment for tokenization granularity.*
349
-
350
- ### Generation Samples
351
-
352
- ```
353
- Prompt: "The "
354
- Output: "The little cat, Lily watched her. She day, and sorry"
355
-
356
- Prompt: "Once upon a time"
357
- Output: "Once upon a time there was a little girl named Lily. She"
358
 
359
- Prompt: "CogNet is"
360
- Output: "CogNet is a model that can help her and here children."
 
 
361
  ```
362
 
363
- ### Scaling Properties
364
-
365
- CogNet's O(n) complexity means it scales favorably with sequence length:
366
 
367
- | Sequence Length | Transformer O(n²) Ops | CogNet O(n) Ops | Ratio |
368
- |----------------|----------------------|-----------------|-------|
369
- | 256 | 65,536 | 256 | 256× |
370
- | 512 | 262,144 | 512 | 512× |
371
- | 1024 | 1,048,576 | 1,024 | 1,024× |
372
- | 2048 | 4,194,304 | 2,048 | 2,048× |
373
 
374
- *Theoretical ops for a single self-attention layer vs. CogNet routing + memory.*
375
 
376
- ---
377
-
378
- ## Architecture Comparison
379
-
380
- | Feature | GPT-2 (Small) | CogNet-40M |
381
- |---------|---------------|------------|
382
- | Parameters | 117M | 40M |
383
- | Architecture | Transformer (decoder) | Cognitive Routing |
384
- | Sequence mixing | Self-Attention (O(n²)) | Coherence Routing (O(n)) |
385
- | Memory mechanism | Fixed context window | Hierarchical 3-tier memory |
386
- | Computation | Uniform per token | Adaptive (1-2 steps) |
387
- | Tokenizer | BPE (50,257 vocab) | Character (136 vocab) |
388
- | Max context | 1,024 tokens | 256 tokens |
389
- | Composition | None | Hyperdimensional binding |
390
- | Positional encoding | Learned | Learned |
391
-
392
- ---
393
-
394
- ## Limitations
395
-
396
- - **Context length**: Currently limited to 256 tokens. Extending to longer contexts requires architectural modifications to the memory read mechanism.
397
- - **Character-level tokenization**: While OOV-free, character-level models require more processing steps to build up word-level and phrase-level representations compared to subword tokenizers.
398
- - **Scale**: At 40M parameters, CogNet is a research proof-of-concept. Scaling to 1B+ parameters is the next milestone.
399
- - **Evaluation**: Benchmarks are computed on the training distribution. Zero-shot evaluation on standard NLP benchmarks is planned.
400
- - **Language coverage**: Currently trained on English text only, though the tokenizer supports French accented characters.
401
-
402
- ---
403
-
404
- ## Citation
405
-
406
- ```bibtex
407
- @software{cognet2026,
408
- title = {CogNet: A Non-Transformer Language Model with Cognitive Routing and Hierarchical Memory},
409
- author = {AFKmoney},
410
- year = {2026},
411
- url = {https://huggingface.co/AFKmoney/CogNet-40M},
412
- license = {MIT}
413
- }
414
- ```
415
-
416
- ---
417
 
418
- <div align="center">
419
 
420
- **Built from scratch. No transformers. Just cognition.**
 
 
 
 
421
 
422
- </div>
 
1
  ---
2
+ license: mit
3
  language:
4
  - en
5
  - fr
6
+ - code
 
7
  tags:
8
  - non-transformer
9
  - cognitive-routing
10
  - hierarchical-memory
11
  - character-level
12
+ - aicl
13
+ - text-generation
14
+ - custom-architecture
15
  pipeline_tag: text-generation
16
+ library_name: pytorch
17
  ---
18
 
 
 
19
  # CogNet-40M
20
 
21
+ A 39.7M parameter non-transformer language model with O(n) cognitive routing and hierarchical memory.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Architecture
24
 
25
+ | Component | Detail |
26
+ |-----------|--------|
27
+ | Architecture | Non-transformer (Cognitive Routing) |
28
+ | Parameters | 39,718,536 (~40M) |
29
+ | Hidden Dim | 512 |
30
+ | Blocks | 6 cognitive blocks |
31
+ | Channels | 6 routing channels x 128 dim |
32
+ | FF Dim | 1024 |
33
+ | Max Seq Len | 256 |
34
+ | Tokenizer | Character-level (136 vocab) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
+ ## Hierarchical Memory
37
 
38
+ - Working Memory (32 slots): Active processing
39
+ - Episodic Memory (64 slots): Short-term recall
40
+ - Semantic Memory (128 slots): Long-term knowledge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Training
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  | Metric | Value |
45
  |--------|-------|
46
+ | Steps | 50,000 |
47
+ | Batch Size | 64 |
48
+ | LR | 3e-4 (cosine) |
49
+ | Precision | FP16 AMP |
50
+ | GPU | RTX 5060 Ti 16GB |
51
+ | Final Loss | ~0.005 |
52
+ | Final PPL | ~1.01 |
53
 
54
+ ## Quick Start
 
 
 
 
 
 
 
 
 
55
 
56
+ ```python
57
+ from inference import CogNetInference
58
+ ai = CogNetInference("cognet_best.pt", "tokenizer_v3.json")
59
+ print(ai.generate("Once upon a time"))
60
  ```
61
 
62
+ ## AICL Integration
 
 
63
 
64
+ CogNet powers AICL (Architecture Compilation Language) as its native AI engine for code generation, diagnosis, and repair.
 
 
 
 
 
65
 
66
+ ## Files
67
 
68
+ | File | Size | Description |
69
+ |------|------|-------------|
70
+ | cognet_best.pt | 152MB | FP32 checkpoint |
71
+ | cognet_fp16.pt | 77MB | FP16 checkpoint |
72
+ | tokenizer_v3.json | - | Char tokenizer (136 vocab) |
73
+ | config.json | - | Model config |
74
+ | cognet_model.py | - | Architecture source |
75
+ | inference.py | - | Inference script |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
+ ## Roadmap
78
 
79
+ - [x] CogNet-40M (39.7M)
80
+ - [x] HuggingFace integration
81
+ - [x] AICL native engine
82
+ - [ ] CogNet-1B (1B params)
83
+ - [ ] ONNX export
84
 
85
+ MIT License. Built with PyTorch on RTX 5060 Ti via QuickPod.