Spaces:
Running
Running
| title: README | |
| emoji: 👁 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: static | |
| pinned: false | |
| # ShallowMind - Just like DeepMind, but way more stupid🧠 | |
| Hi there! My name is Alessandro, i'm a ai research engineer. | |
| ShallowMind is my workspace for training and experimenting with language models. | |
| The name is playful, but the goal is straightforward: to build increasingly capable models while exploring new ideas in pretraining and reasoning. | |
| --- | |
| ## Research Interests | |
| - **Information-theoretic pretraining** | |
| Looking at ways to identify and prioritize the most informative tokens, to see whether current scaling laws can be adjusted. (Work in progress — I’ll share results once experiments are further along.) | |
| - **Reasoning models** | |
| Testing approaches that improve step-by-step and compositional reasoning. | |
| - **Architectural variations** | |
| Extending my training pipeline to support Mixture-of-Experts (MoE) and other non-standard components. | |
| --- | |
| ## Current Work | |
| - Built a **custom pre-training pipeline** and pre-trained a first model from scratch (~1B scale) as a proof of concept. | |
| - Iterating on the pipeline to add **MoE layers** and **information-gain–based logic**. | |
| - Next steps: | |
| - Fine-tune the first model into **Promptasaurus-Zero**. | |
| - Train **Blahblahthron-7B** as a larger-scale follow-up experiment. | |
| --- | |
| ## Roadmap | |
| - Share ablations and code from early experiments. | |
| - Scale training to larger models. | |
| - Document results on token selection and reasoning tasks. | |
| --- | |