Pomilon commited on
Commit
505558f
Β·
verified Β·
1 Parent(s): a31a9ac

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +32 -35
  2. checkpoints/checkpoint_12000.pth +3 -0
README.md CHANGED
@@ -16,70 +16,67 @@ tags:
16
 
17
  # Aetheris: Hybrid Mamba-MoE (294M)
18
 
19
- > **Developed by:** [Pomilon Intelligence Lab](https://huggingface.co/Pomilon-Lab)
20
- > **Status:** 🟑 Experimental / Research Preview
21
  > **Source Code:** [GitHub - Pomilon/Aetheris](https://github.com/Pomilon/Aetheris)
22
 
23
- **Aetheris** is an experimental language model exploring the intersection of **State Space Models (SSM)** and **Mixture-of-Experts (MoE)** architectures.
24
 
25
- We designed this model to investigate efficient scaling on consumer hardware. By combining Mamba's linear-time sequence modeling with the sparse capacity of MoE, Aetheris aims to maximize parameter count while minimizing inference latency.
26
 
27
- ## πŸ§ͺ Architecture & Design
28
 
29
- This project tests the hypothesis: *Can we interleave dense Mamba blocks with sparse MoE layers to create a model that is parameter-rich but computationally light?*
30
 
31
- The architecture follows a strict alternating pattern:
32
- 1. **SSM Blocks (Odd Layers):** Dense Mamba blocks responsible for sequence mixing and memory.
33
- 2. **MoE Blocks (Even Layers):** Sparse router layers that direct tokens to 1 of 4 experts.
34
 
35
- ### πŸ“Š Technical Specifications
36
 
37
- Due to the sparse nature of the MoE layers, approximately **43% of the parameters remain inactive** during any given inference step.
38
 
39
- | Metric | Count (Millions) | Description |
40
  | :--- | :--- | :--- |
41
- | **Total Parameters** | **294.44M** | Storage footprint on disk. |
42
- | **Active Parameters** | **167.03M** | Computational cost per token (Inference). |
43
 
44
- ## πŸ“‰ Training Status
45
 
46
- Training is currently in progress on a single NVIDIA RTX 5000.
47
 
48
- * **Current Step:** 11,000
49
- * **Current Loss:** ~1.4167
50
- * **Dataset:** A subset of SlimPajama-627B
51
 
52
- > **⚠️ Performance Notice:** Aetheris is currently in a "proof-of-concept" state. While it generates coherent English syntax, it does not yet possess strong reasoning capabilities. It is intended for architectural analysis rather than downstream tasks.
53
 
54
- ## πŸš€ Usage & Inference
55
 
56
- Since Aetheris utilizes a custom architecture not yet supported by standard Transformers, you must use the custom inference code provided in our repository.
 
 
57
 
58
  ```bash
59
- # 1. Clone the repository
60
  git clone https://github.com/Pomilon/Aetheris.git
61
  cd Aetheris
62
 
63
- # 2. Install dependencies
64
  pip install -r requirements.txt
65
 
66
- # 3. Run generation (Ensure you have downloaded the model weights first)
67
- python -m aetheris.cli.main generate --prompt "The future of AI is" --checkpoint_dir path/to/checkpoints_folder
68
  ````
69
 
 
 
70
  ## πŸ“š Acknowledgements
71
 
72
- This research builds upon foundational work in the field:
73
 
74
  * **Mamba:** Gu & Dao (2023)
75
  * **Mixture of Experts:** Shazeer et al. (2017)
76
 
77
- -----
78
-
79
- ### πŸ“ Author's Note (The "Scuffed" Reality)
80
-
81
- *Hi, Pomilon here\! While the text above sounds official, here is the reality:*
82
-
83
- *This is a "learning by doing" experiment. I wanted to see if I could smash these two architectures together on my laptop without it exploding. I built this from scratch to learn, so don't expect GPT-4 performance\! It's currently in the "babbling coherently" phase.*
84
 
85
- *If you manage to break it (or fix it), let me know\!*
 
16
 
17
  # Aetheris: Hybrid Mamba-MoE (294M)
18
 
19
+ > **Status:** 🟑 Experimental / Proof-of-Concept
 
20
  > **Source Code:** [GitHub - Pomilon/Aetheris](https://github.com/Pomilon/Aetheris)
21
 
22
+ **Aetheris** is a "learning by doing" experiment where I attempted to smash together a **Mamba State Space Model** backbone with **Mixture-of-Experts (MoE)** layers.
23
 
24
+ I built this from scratch to see if I could combine Mamba's long-context efficiency with MoE's sparse capacity on consumer hardware. It is **not** a state-of-the-art foundation modelβ€”it's a fun architectural playground.
25
 
26
+ ## πŸ§ͺ The "What If" Experiment
27
 
28
+ The idea was simple: *Can I interleave dense Mamba blocks with sparse MoE layers to make a model that is big on disk but fast in inference?*
29
 
30
+ The architecture alternates between:
31
+ 1. **SSM Blocks (Odd Layers):** Dense Mamba blocks for handling memory and context.
32
+ 2. **MoE Blocks (Even Layers):** Sparse layers that route tokens to only 1 of 4 experts.
33
 
34
+ ### πŸ“Š The Specs
35
 
36
+ Because of the hybrid design, ~43% of the model is "dormant" during inference.
37
 
38
+ | Metric | Count (Millions) | What it means |
39
  | :--- | :--- | :--- |
40
+ | **Total Capacity** | **294.44M** | The size on disk. |
41
+ | **Active Params** | **167.03M** | The actual compute used per token. |
42
 
43
+ ## πŸ“‰ Training Log (Live)
44
 
45
+ I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
46
 
47
+ * **Latest Checkpoint:** Step 11,000
48
+ * **Loss:** ~1.4167
49
+ * **Dataset:** Subset of SlimPajama-627B
50
 
51
+ > **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
52
 
53
+ ## πŸš€ How to Run (The "Scuffed" Way)
54
 
55
+ Since this uses a custom architecture, `AutoModel.from_pretrained` won't work out of the box. You need the code from my repo.
56
+
57
+ Right now, the easiest way to run it is using the CLI tool in the repo:
58
 
59
  ```bash
60
+ # 1. Clone the repo
61
  git clone https://github.com/Pomilon/Aetheris.git
62
  cd Aetheris
63
 
64
+ # 2. Install requirements
65
  pip install -r requirements.txt
66
 
67
+ # 3. Run generation (Make sure you download the model to a folder first!)
68
+ python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth
69
  ````
70
 
71
+ *(I'll add a cleaner inference script later, but this works for now\!)*
72
+
73
  ## πŸ“š Acknowledgements
74
 
75
+ This project stands on the shoulders of giants. It is an implementation study based on:
76
 
77
  * **Mamba:** Gu & Dao (2023)
78
  * **Mixture of Experts:** Shazeer et al. (2017)
79
 
80
+ ## License
 
 
 
 
 
 
81
 
82
+ MIT
checkpoints/checkpoint_12000.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86793c523a0c2be249ec16e6ae9c0f932bdb7783e6b184d7431e94c2ccbc8de7
3
+ size 3533562641