|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
<img src="banner.png" alt="AgGPT Banner" width="100%"> |
|
|
|
|
|
# AgGPT-13 Mini |
|
|
# BETA |
|
|
|
|
|
## Light. Mini. Smart. |
|
|
|
|
|
### **BETA** |
|
|
|
|
|
AgGPT-13 mini is a very lightweight beta release of the AgGPT-13 model series. |
|
|
|
|
|
AgGPT-13 mini was trained from scratch on a small dataset of chat conversations, using the GPT architecture. It is designed to be a fast, efficient, and easy-to-use model for various NLP tasks. More than anything, this model is a proof of concept for the AgGPT-13 series, showcasing the potential of lightweight models in real-world applications, this model was trained in minutes on a macbook air M2 with 8GB of RAM, showcasing the efficiency of the training process and the potential for more powerful models in the future. |
|
|
|
|
|
This release serves as a foundation to the full AgGPT-13 model release. |
|
|
|
|
|
The full model will include more advanced features, larger datasets, and improved performance. |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Very Lightweight** – Optimized for lower memory usage. |
|
|
- **Flexible** – Works on CPU or GPU. |
|
|
- **Easy to Use** – Straightforward code. |
|
|
|
|
|
## Installation & Usage |
|
|
|
|
|
```bash |
|
|
pip install torch transformers safetensors |
|
|
``` |
|
|
|
|
|
how to train the model: |
|
|
|
|
|
```bash |
|
|
python train.py config/train_aggpt_char.py --device=cpu --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0 |
|
|
``` |
|
|
|
|
|
how to test the model: |
|
|
|
|
|
```bash |
|
|
python sample.py --out_dir=out-aggpt --device=cpu |
|
|
``` |