Upload Tiny Epstein 100M model
Browse files- README.md +109 -0
- config.json +15 -0
- pytorch_model.bin +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +12 -0
README.md
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- tiny-epstein
|
| 6 |
+
- epstein-files
|
| 7 |
+
- transformers
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# tiny-epstein-100m
|
| 11 |
+
|
| 12 |
+
A small transformer model (~100M parameters) trained on the [teyler/epstein-files-20k](https://huggingface.co/datasets/teyler/epstein-files-20k) dataset.
|
| 13 |
+
The architecture is inspired by **Tiny Aya** modifications and is designed for efficient on-device inference.
|
| 14 |
+
|
| 15 |
+
## Model Details
|
| 16 |
+
|
| 17 |
+
- **Architecture**: Decoder-only transformer with parallel blocks, Grouped Query Attention (GQA), SwiGLU activation, and bias‑free LayerNorm.
|
| 18 |
+
- **Sliding Window Attention**: 3:1 local:global ratio (first 75% of layers use sliding window with RoPE; remaining layers use full attention with NoPE).
|
| 19 |
+
- **Parameters**: ~100 million
|
| 20 |
+
- **Context Length**: 1024 tokens (configurable)
|
| 21 |
+
- **Tokenizer**: GPT‑2 (same as used during training)
|
| 22 |
+
- **Training Data**: [teyler/epstein-files-20k](https://huggingface.co/datasets/teyler/epstein-files-20k) – 20,000 documents related to the Epstein files.
|
| 23 |
+
|
| 24 |
+
## Intended Use
|
| 25 |
+
|
| 26 |
+
This model is primarily for research and experimentation. It can generate continuations of text given a prompt, especially on topics related to the Epstein files.
|
| 27 |
+
|
| 28 |
+
## How to Use
|
| 29 |
+
|
| 30 |
+
### Installation
|
| 31 |
+
|
| 32 |
+
Make sure you have `torch` and `transformers` installed.
|
| 33 |
+
If you want to run inference, install the required packages:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
pip install torch transformers
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
Loading the Model and Tokenizer
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
import torch
|
| 43 |
+
from transformers import AutoTokenizer
|
| 44 |
+
from huggingface_hub import snapshot_download
|
| 45 |
+
|
| 46 |
+
# Download the model from Hugging Face Hub
|
| 47 |
+
model_path = snapshot_download(repo_id="liminerity/tiny-epstein-100m")
|
| 48 |
+
|
| 49 |
+
# Load tokenizer
|
| 50 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 51 |
+
|
| 52 |
+
# Load model (custom architecture needs the model definition – see below)
|
| 53 |
+
# We need to define the model class again or import from a module.
|
| 54 |
+
# For convenience, the model definition is included in the training script.
|
| 55 |
+
# Here we provide a minimal loading snippet assuming you have the model class.
|
| 56 |
+
|
| 57 |
+
# Define model config (must match the saved config.json)
|
| 58 |
+
class ModelConfig:
|
| 59 |
+
vocab_size = 50257
|
| 60 |
+
emb_dim = 768
|
| 61 |
+
hidden_dim = 2048
|
| 62 |
+
num_layers = 12
|
| 63 |
+
num_heads = 12
|
| 64 |
+
num_kv_heads = 4
|
| 65 |
+
max_seq_len = 1024
|
| 66 |
+
window_size = 1024
|
| 67 |
+
sliding_window_ratio = 0.75
|
| 68 |
+
rope_theta = 10000.0
|
| 69 |
+
dtype = torch.float16
|
| 70 |
+
bias = False
|
| 71 |
+
dropout = 0.0
|
| 72 |
+
|
| 73 |
+
# Instantiate model (you need the model class definition, e.g., TinyAya)
|
| 74 |
+
# Here we assume you have the TinyAya class from the training script.
|
| 75 |
+
# If not, copy the class definition from the training script into this cell.
|
| 76 |
+
model = TinyAya(ModelConfig())
|
| 77 |
+
state_dict = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
|
| 78 |
+
model.load_state_dict(state_dict)
|
| 79 |
+
model.eval()
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
Text Generation Example
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
prompt = "The Epstein files reveal"
|
| 86 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 87 |
+
with torch.no_grad():
|
| 88 |
+
outputs = model.generate(
|
| 89 |
+
inputs.input_ids,
|
| 90 |
+
max_new_tokens=50,
|
| 91 |
+
temperature=0.8,
|
| 92 |
+
do_sample=True
|
| 93 |
+
)
|
| 94 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
Training Details
|
| 98 |
+
|
| 99 |
+
The model was trained for one epoch on the full dataset using an L4 GPU in Google Colab.
|
| 100 |
+
Optimizer: AdamW (lr=1e-4) with gradient clipping (max norm=1.0). Mixed precision (float16) was used.
|
| 101 |
+
|
| 102 |
+
Limitations
|
| 103 |
+
|
| 104 |
+
· The model is small and was trained on a limited dataset; it may produce repetitive or nonsensical outputs.
|
| 105 |
+
· It has not undergone any safety fine‑tuning; use with caution.
|
| 106 |
+
|
| 107 |
+
License
|
| 108 |
+
|
| 109 |
+
MIT
|
config.json
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"vocab_size": 50257,
|
| 3 |
+
"emb_dim": 768,
|
| 4 |
+
"hidden_dim": 2048,
|
| 5 |
+
"num_layers": 12,
|
| 6 |
+
"num_heads": 12,
|
| 7 |
+
"num_kv_heads": 4,
|
| 8 |
+
"max_seq_len": 1024,
|
| 9 |
+
"window_size": 1024,
|
| 10 |
+
"sliding_window_ratio": 0.75,
|
| 11 |
+
"rope_theta": 10000.0,
|
| 12 |
+
"dtype": "torch.float16",
|
| 13 |
+
"bias": false,
|
| 14 |
+
"dropout": 0.0
|
| 15 |
+
}
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9d45ba5b7dafe97cd332405e514ead294500c539dd188ab7d87eb9f9e0820f56
|
| 3 |
+
size 456456877
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"bos_token": "<|endoftext|>",
|
| 5 |
+
"eos_token": "<|endoftext|>",
|
| 6 |
+
"errors": "replace",
|
| 7 |
+
"is_local": false,
|
| 8 |
+
"model_max_length": 1024,
|
| 9 |
+
"pad_token": "<|endoftext|>",
|
| 10 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 11 |
+
"unk_token": "<|endoftext|>"
|
| 12 |
+
}
|