📌 Teller-v2 — Usage Guide

Teller-v2 is a lightweight experimental language model designed for text generation and small-scale AI research.
This repository contains everything needed to train, finetune, and run inference locally.

🔧 Requirements

• Python 3.10–3.11 recommended
• PyTorch (CPU or CUDA build)
• Transformers
• tqdm

Install core dependencies:

pip install torch transformers tqdm

📂 Project Structure
teller_v2p/
├─ data/
│  └─ dataset.txt
├─ trainteller.py
├─ generateteller.py
└─ bytetokenizer.py


✏️ Dataset Format

Place your training text inside:

./data/dataset.txt


This file should contain plain text.
The model learns directly from this dataset.
The Dataset needs to be like this:

PROMPT: blabla
OUTPUT: blabla
...

🚀 Training the Model

Run the training script:

python trainteller.py --epochs [epochs number you want] --batch size [batch size you want] ...


This will:

• load and tokenize dataset
• initialize the model
• train for configured steps
• save weights (e.g., model.pt)

Modify training parameters inside trainteller.py if needed
(such as learning rate, batch size, or number of steps).

💬 Generating Text

After training, run:

python generateteller.py "your prompt here"


Example:

python generateteller.py "Hello AI,"


The script will load model weights and produce continuation text.

🔁 Tokenizer Notes

The tokenizer used is a simple byte-level tokenizer located in:

bytetokenizer.py


It maps raw bytes to model-usable tokens — ensuring it supports any language or symbol.

📌 Tips & Customization

• Replace dataset.txt with your own training corpus to create a new model.
• Edit hyperparameters inside train_teller_v25.py to adapt training quality/performance.
• If training stalls or produces low-quality results, try reducing learning rate or training longer.

⚠️ Disclaimer

Teller-v2 is an experimental project.
It is not intended for production inference without further optimization.

⭐ Credits

Created by Arthur.
Feel free to extend, redistribute, and improve as long as credits are preserved.