File size: 1,523 Bytes
f905d90
 
 
 
 
 
 
 
 
5317192
9978c1c
3c27179
9978c1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb0f766
9978c1c
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: README
emoji: 👁
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---

# ShallowMind - Just like DeepMind, but way more stupid🧠

Hi there! My name is Alessandro, i'm a ai research engineer.
ShallowMind is my workspace for training and experimenting with language models.  
The name is playful, but the goal is straightforward: to build increasingly capable models while exploring new ideas in pretraining and reasoning.

---

## Research Interests

- **Information-theoretic pretraining**  
  Looking at ways to identify and prioritize the most informative tokens, to see whether current scaling laws can be adjusted. (Work in progress — I’ll share results once experiments are further along.)

- **Reasoning models**  
  Testing approaches that improve step-by-step and compositional reasoning.

- **Architectural variations**  
  Extending my training pipeline to support Mixture-of-Experts (MoE) and other non-standard components.

---

## Current Work

- Built a **custom pre-training pipeline** and pre-trained a first model from scratch (~1B scale) as a proof of concept.  
- Iterating on the pipeline to add **MoE layers** and **information-gain–based logic**.  
- Next steps:  
  - Fine-tune the first model into **Promptasaurus-Zero**.  
  - Train **Blahblahthron-7B** as a larger-scale follow-up experiment.

---

## Roadmap

- Share ablations and code from early experiments.  
- Scale training to larger models.  
- Document results on token selection and reasoning tasks.  

---