TellerAIv2-0.003B-Raw / HowToUse.txt
Arthur Samuel Galego Panucci FIgueiredo
Create HowToUse.txt
6496a26 verified
πŸ“Œ Teller-v2 β€” Usage Guide
Teller-v2 is a lightweight experimental language model designed for text generation and small-scale AI research.
This repository contains everything needed to train, finetune, and run inference locally.
πŸ”§ Requirements
β€’ Python 3.10–3.11 recommended
β€’ PyTorch (CPU or CUDA build)
β€’ Transformers
β€’ tqdm
Install core dependencies:
pip install torch transformers tqdm
πŸ“‚ Project Structure
teller_v2p/
β”œβ”€ data/
β”‚ └─ dataset.txt
β”œβ”€ trainteller.py
β”œβ”€ generateteller.py
└─ bytetokenizer.py
✏️ Dataset Format
Place your training text inside:
./data/dataset.txt
This file should contain plain text.
The model learns directly from this dataset.
The Dataset needs to be like this:
PROMPT: blabla
OUTPUT: blabla
...
πŸš€ Training the Model
Run the training script:
python trainteller.py --epochs [epochs number you want] --batch size [batch size you want] ...
This will:
β€’ load and tokenize dataset
β€’ initialize the model
β€’ train for configured steps
β€’ save weights (e.g., model.pt)
Modify training parameters inside trainteller.py if needed
(such as learning rate, batch size, or number of steps).
πŸ’¬ Generating Text
After training, run:
python generateteller.py "your prompt here"
Example:
python generateteller.py "Hello AI,"
The script will load model weights and produce continuation text.
πŸ” Tokenizer Notes
The tokenizer used is a simple byte-level tokenizer located in:
bytetokenizer.py
It maps raw bytes to model-usable tokens β€” ensuring it supports any language or symbol.
πŸ“Œ Tips & Customization
β€’ Replace dataset.txt with your own training corpus to create a new model.
β€’ Edit hyperparameters inside train_teller_v25.py to adapt training quality/performance.
β€’ If training stalls or produces low-quality results, try reducing learning rate or training longer.
⚠️ Disclaimer
Teller-v2 is an experimental project.
It is not intended for production inference without further optimization.
⭐ Credits
Created by Arthur.
Feel free to extend, redistribute, and improve as long as credits are preserved.