📌 Teller-v2 — Usage Guide Teller-v2 is a lightweight experimental language model designed for text generation and small-scale AI research. This repository contains everything needed to train, finetune, and run inference locally. 🔧 Requirements • Python 3.10–3.11 recommended • PyTorch (CPU or CUDA build) • Transformers • tqdm Install core dependencies: pip install torch transformers tqdm 📂 Project Structure teller_v2p/ ├─ data/ │ └─ dataset.txt ├─ trainteller.py ├─ generateteller.py └─ bytetokenizer.py ✏️ Dataset Format Place your training text inside: ./data/dataset.txt This file should contain plain text. The model learns directly from this dataset. The Dataset needs to be like this: PROMPT: blabla OUTPUT: blabla ... 🚀 Training the Model Run the training script: python trainteller.py --epochs [epochs number you want] --batch size [batch size you want] ... This will: • load and tokenize dataset • initialize the model • train for configured steps • save weights (e.g., model.pt) Modify training parameters inside trainteller.py if needed (such as learning rate, batch size, or number of steps). 💬 Generating Text After training, run: python generateteller.py "your prompt here" Example: python generateteller.py "Hello AI," The script will load model weights and produce continuation text. 🔁 Tokenizer Notes The tokenizer used is a simple byte-level tokenizer located in: bytetokenizer.py It maps raw bytes to model-usable tokens — ensuring it supports any language or symbol. 📌 Tips & Customization • Replace dataset.txt with your own training corpus to create a new model. • Edit hyperparameters inside train_teller_v25.py to adapt training quality/performance. • If training stalls or produces low-quality results, try reducing learning rate or training longer. ⚠️ Disclaimer Teller-v2 is an experimental project. It is not intended for production inference without further optimization. ⭐ Credits Created by Arthur. Feel free to extend, redistribute, and improve as long as credits are preserved.