Upload Tiny Epstein 100M model

Browse files

Files changed (5) hide show

README.md +109 -0
config.json +15 -0
pytorch_model.bin +3 -0
tokenizer.json +0 -0
tokenizer_config.json +12 -0

README.md ADDED Viewed

	@@ -0,0 +1,109 @@

+---
+language: en
+license: mit
+tags:
+- tiny-epstein
+- epstein-files
+- transformers
+---
+# tiny-epstein-100m
+A small transformer model (~100M parameters) trained on the [teyler/epstein-files-20k](https://huggingface.co/datasets/teyler/epstein-files-20k) dataset.
+The architecture is inspired by **Tiny Aya** modifications and is designed for efficient on-device inference.
+## Model Details
+- **Architecture**: Decoder-only transformer with parallel blocks, Grouped Query Attention (GQA), SwiGLU activation, and bias‑free LayerNorm.
+- **Sliding Window Attention**: 3:1 local:global ratio (first 75% of layers use sliding window with RoPE; remaining layers use full attention with NoPE).
+- **Parameters**: ~100 million
+- **Context Length**: 1024 tokens (configurable)
+- **Tokenizer**: GPT‑2 (same as used during training)
+- **Training Data**: [teyler/epstein-files-20k](https://huggingface.co/datasets/teyler/epstein-files-20k) – 20,000 documents related to the Epstein files.
+## Intended Use
+This model is primarily for research and experimentation. It can generate continuations of text given a prompt, especially on topics related to the Epstein files.
+## How to Use
+### Installation
+Make sure you have `torch` and `transformers` installed.
+If you want to run inference, install the required packages:
+```bash
+pip install torch transformers
+```
+Loading the Model and Tokenizer
+```python
+import torch
+from transformers import AutoTokenizer
+from huggingface_hub import snapshot_download
+# Download the model from Hugging Face Hub
+model_path = snapshot_download(repo_id="liminerity/tiny-epstein-100m")
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+# Load model (custom architecture needs the model definition – see below)
+# We need to define the model class again or import from a module.
+# For convenience, the model definition is included in the training script.
+# Here we provide a minimal loading snippet assuming you have the model class.
+# Define model config (must match the saved config.json)
+class ModelConfig:
+    vocab_size = 50257
+    emb_dim = 768
+    hidden_dim = 2048
+    num_layers = 12
+    num_heads = 12
+    num_kv_heads = 4
+    max_seq_len = 1024
+    window_size = 1024
+    sliding_window_ratio = 0.75
+    rope_theta = 10000.0
+    dtype = torch.float16
+    bias = False
+    dropout = 0.0
+# Instantiate model (you need the model class definition, e.g., TinyAya)
+# Here we assume you have the TinyAya class from the training script.
+# If not, copy the class definition from the training script into this cell.
+model = TinyAya(ModelConfig())
+state_dict = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+```
+Text Generation Example
+```python
+prompt = "The Epstein files reveal"
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        inputs.input_ids,
+        max_new_tokens=50,
+        temperature=0.8,
+        do_sample=True
+    )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+Training Details
+The model was trained for one epoch on the full dataset using an L4 GPU in Google Colab.
+Optimizer: AdamW (lr=1e-4) with gradient clipping (max norm=1.0). Mixed precision (float16) was used.
+Limitations
+· The model is small and was trained on a limited dataset; it may produce repetitive or nonsensical outputs.
+· It has not undergone any safety fine‑tuning; use with caution.
+License
+MIT

config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "vocab_size": 50257,
+  "emb_dim": 768,
+  "hidden_dim": 2048,
+  "num_layers": 12,
+  "num_heads": 12,
+  "num_kv_heads": 4,
+  "max_seq_len": 1024,
+  "window_size": 1024,
+  "sliding_window_ratio": 0.75,
+  "rope_theta": 10000.0,
+  "dtype": "torch.float16",
+  "bias": false,
+  "dropout": 0.0
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d45ba5b7dafe97cd332405e514ead294500c539dd188ab7d87eb9f9e0820f56
+size 456456877

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "is_local": false,
+  "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}