MTLM2-40M / README.md
Madras1's picture
Update README.md
ba86220 verified
---
language:
- en
license: mit
library_name: transformers
tags:
- pytorch
- jax
- custom_code
- tiny-model
- experimental
- research
pipeline_tag: text-generation
inference: false
safetensors:
parameters:
F32: 40M
---
# MTLM2-40M (TinyGPT)
**MTLM2-40M** is a highly experimental, tiny language model (~40 Million parameters) designed as a research artifact to explore the lower bounds of language modeling capabilities.
The primary research question driving this model was:
> *Can a microscopic model (40M params), when saturated with a large amount of data (14B tokens), generate minimally coherent narrative text?*
The answer is **yes**. Validated by a Perplexity of **54.21** on WikiText-2, the model demonstrates surprising structural and narrative cohesion for its size.
## ๐Ÿ“Š Model Details
- **Architecture:** TinyGPT (Custom `Llama`-style architecture with tweaks for small-scale efficiency).
- **Parameters:** ~41.5 Million.
- **Training Data:** 14 Billion Tokens (English).
- **Training Framework:** JAX / Flax (XLA).
- **Hardware:** Google Cloud TPU v5e-8.
- **Training Time:** ~6 Hours.
- **Precision:** FP32 (Exported to Safetensors).
## ๐Ÿ“‰ Benchmarks
| Benchmark | Metric | Result |
| :--- | :--- | :--- |
| **WikiText-2** | Perplexity (PPL) | **54.21** |
*Note: Evaluation performed using sliding window approach. The low PPL confirms strong grammatical alignment despite the small parameter count.*
## ๐Ÿงช Capabilities & Limitations
### What it CAN do:
* **Narrative Cohesion:** It can generate grammatically correct English sentences that flow logically.
* **High Saturation:** Trained on ~350 tokens per parameter (far exceeding Chinchilla scaling laws), making it extremely "dense".
### What it CANNOT do:
* **Reasoning/Logic:** It cannot solve math problems, code, or answer complex logical queries.
* **Instruction Following:** It has **NOT** undergone Supervised Fine-Tuning (SFT). It is a base completion model.
## Usage
This model requires `trust_remote_code=True`.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Madras1/MTLM2-40M"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "The future of AI is"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0]))
```
Author
Developed by Madras1 (Gabriel).