| # EpsteinGPT - Minimal GPT Model |
|
|
| This repository contains a Minimal GPT (MVT) model trained on the Epstein email threads dataset. |
|
|
| ## Model Details |
|
|
| This is a custom-built Causal Transformer model (`MinimalGPT`) inspired by nanoGPT/minGPT architectures. It was trained from scratch using a custom Byte-Pair Encoding (BPE) tokenizer. |
|
|
| ### Configuration (`config.json`) |
| ```json |
| { |
| "vocab_size": 5000, |
| "block_size": 256, |
| "n_layer": 8, |
| "n_head": 8, |
| "n_embd": 512, |
| "batch_size": 16, |
| "dropout": 0.1, |
| "bias": false |
| } |
| ``` |
|
|
| ## Files Included |
|
|
| * `epsteingpt_tokenizer.json`: The custom BPE tokenizer used for encoding and decoding text. |
| * `EpsteinGPT.pt`: The PyTorch checkpoint containing the trained model weights. |
| * `EpsteinGPT.ptl`: The TorchScript Lite version of the trained model, optimized for deployment. |
| * `model.py`: Defines the `MVTConfig` class and the `MinimalGPT` model architecture. |
| * `config.json`: Model configuration in JSON format. |
| * `README.md`: This file. |
|
|
| ## How to Use |
|
|
| To use this model, you would typically: |
|
|
| 1. Load the tokenizer: |
| ```python |
| from tokenizers import Tokenizer |
| tokenizer = Tokenizer.from_file("epsteingpt_tokenizer.json") |
| ``` |
| 2. Load the model architecture and configuration (from `model.py` and `config.json`). |
| 3. Load the trained weights from `EpsteinGPT.pt` into the model. |
| 4. Use the model for text generation or other tasks. |
| |
| For generation, you can refer to the `generate.py` script used during development. |
|
|
| ## Training |
|
|
| The model was trained on a dataset of Epstein email threads. The training process involved: |
|
|
| 1. **Tokenizer Training:** A BPE tokenizer was trained on the raw text data. |
| 2. **Data Preparation:** The text data was tokenized and converted into a numerical format. |
| 3. **Model Training:** The `MinimalGPT` model was trained using a custom training loop. |
|
|
| ## Further Information |
|
|
| For more details on the model architecture and training process, refer to the `model.py` and `train.py` scripts. |
|
|