lmsys/lmsys-chat-1m
Viewer โข Updated โข 1M โข 7.03k โข 924
A powerful and lightweight GPT-style language model built with PyTorch, featuring word-level tokenization and GRU-based architecture.
pip install torch tqdm
training_corpora/ folderpython AgGPT21.py
The model will automatically:
.txt files from training_corpora/AgGPT21.ptOnce trained, start chatting with your model:
python chat.py
AgGPT-21-2/
โโโ banner.png # Project banner image
โโโ AgGPT21.py # Main training script
โโโ chat.py # Interactive chat interface
โโโ README.md # This file
โโโ AgGPT21.pt # Trained model (generated after training)
โโโ training_corpora/ # Training data folder
โโโ corpora_000.txt # Training file 1
โโโ corpora_001.txt # Training file 2
โโโ ... # More training files
โโโ corpora_041.txt # Training file N
| Parameter | Default | Description |
|---|---|---|
SEQ_LEN |
64 | Sequence length for training |
EMBED_SIZE |
128 | Embedding dimension |
HIDDEN_SIZE |
128 | GRU hidden dimension |
NUM_LAYERS |
1 | Number of GRU layers |
DROPOUT |
0.2 | Dropout rate |
| Parameter | Default | Description |
|---|---|---|
BATCH_SIZE |
8 | Training batch size |
EPOCHS |
6 | Maximum training epochs |
LR |
2e-3 | Learning rate |
WEIGHT_DECAY |
1e-4 | L2 regularization |
CLIP_NORM |
1.0 | Gradient clipping |
| Parameter | Default | Description |
|---|---|---|
TEMPERATURE |
0.9 | Sampling temperature (0.1-2.0) |
TOP_K |
50 | Top-k sampling limit |
TOP_P |
0.9 | Nucleus sampling threshold |
GENERATE_LENGTH |
200 | Default generation length |
In the interactive chat mode, you can use these commands:
quit/exit/bye: End the conversationhelp: Show available commandsclear: Clear the screenmodel: Display model informationtemp X: Set temperature (e.g., temp 0.8)length X: Set response length (e.g., length 150)# Train the model (automatic multi-file loading)
python AgGPT21.py
Output:
Found 42 training files
Reading corpora_000.txt...
Reading corpora_001.txt...
...
Total words loaded: 2,847,392
Vocabulary size: 30,000
Tokens used: 1,000,000 | device=mps
Model params: 4,099,200
Train batches per epoch: 1,562 | Val batches: 79
Epochs: 100%|โโโโโโโโโโโโ| 6/6 [05:23<00:00, 53.92s/it, train=2.1847, val=2.3456]
Saved AgGPT21.pt
๐ค You: Tell me about artificial intelligence
๐ค AgGPT-21: Artificial intelligence is a fascinating field that focuses on creating systems capable of performing tasks that typically require human intelligence. These systems can learn from data, recognize patterns, make decisions, and solve complex problems. AI has applications in many areas including natural language processing, computer vision, robotics, and machine learning...
MAX_VOCAB = 50000 # Increase vocabulary size
DATA_PERCENT = 0.5 # Use only 50% of available data
MAX_TOKENS = 500_000 # Limit to 500k tokens
# The model automatically detects and uses available accelerators:
# - CUDA (NVIDIA GPUs)
# - MPS (Apple Silicon)
# - CPU (fallback)
Input โ Embedding โ Dropout โ GRU โ Dropout โ [Projection] โ Linear โ Output
โ โ โ โ
Token Vector Hidden Logits
IDs (128-dim) States (Vocab-size)
Key Features:
"No .txt files found"
training_corpora/ with .txt extension"CUDA out of memory"
BATCH_SIZE or SEQ_LENDATA_PERCENT < 1.0 to train on less data"Model file not found"
python AgGPT21.pyAgGPT21.pt exists in the project directoryYour training files should be plain text. The model will automatically:
<pad>, <eos>, etc.Example format:
user: how are you today
<pad>
ai: I'm doing well, thank you for asking! How are you?
<eos>
This project is open source. Feel free to use, modify, and distribute as needed.
If you encounter issues or have questions:
Made with โค๏ธ for the AI community
AgGPT-21 - Where conversation meets intelligence.
Base model
AGofficial/AgGPT-15