Spaces:
Running
Running
File size: 1,523 Bytes
f905d90 5317192 9978c1c 3c27179 9978c1c eb0f766 9978c1c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | ---
title: README
emoji: 👁
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---
# ShallowMind - Just like DeepMind, but way more stupid🧠
Hi there! My name is Alessandro, i'm a ai research engineer.
ShallowMind is my workspace for training and experimenting with language models.
The name is playful, but the goal is straightforward: to build increasingly capable models while exploring new ideas in pretraining and reasoning.
---
## Research Interests
- **Information-theoretic pretraining**
Looking at ways to identify and prioritize the most informative tokens, to see whether current scaling laws can be adjusted. (Work in progress — I’ll share results once experiments are further along.)
- **Reasoning models**
Testing approaches that improve step-by-step and compositional reasoning.
- **Architectural variations**
Extending my training pipeline to support Mixture-of-Experts (MoE) and other non-standard components.
---
## Current Work
- Built a **custom pre-training pipeline** and pre-trained a first model from scratch (~1B scale) as a proof of concept.
- Iterating on the pipeline to add **MoE layers** and **information-gain–based logic**.
- Next steps:
- Fine-tune the first model into **Promptasaurus-Zero**.
- Train **Blahblahthron-7B** as a larger-scale follow-up experiment.
---
## Roadmap
- Share ablations and code from early experiments.
- Scale training to larger models.
- Document results on token selection and reasoning tasks.
---
|