american-stories-shuffled-d20

GPT-style language model trained on American Stories using nanochat.

Model Details

Parameters ~393M
Layers 20
Hidden size 1280
Attention heads 10
Vocab size 65536
Context length 2048
Training step 21400
Dataset american_stories
Ordering shuffled

Usage

import torch
from huggingface_hub import hf_hub_download

# Download
model_path = hf_hub_download(repo_id="alanamarzoev/american-stories-shuffled-d20", filename="model.pt")
config_path = hf_hub_download(repo_id="alanamarzoev/american-stories-shuffled-d20", filename="config.json")

# Load config
import json
with open(config_path) as f:
    config = json.load(f)

# Load model (requires nanochat)
from nanochat.gpt import GPT, GPTConfig
model_config = GPTConfig(**config["model_config"])
model = GPT(model_config)
state_dict = torch.load(model_path, map_location="cpu")
model.load_state_dict(state_dict)

Config

{
  "run_name": "as_shuffled_d20",
  "device_type": "",
  "depth": 20,
  "max_seq_len": 2048,
  "num_iterations": -1,
  "target_flops": -1.0,
  "target_param_data_ratio": 20,
  "device_batch_size": 32,
  "total_batch_size": 524288,
  "embedding_lr": 0.2,
  "unembedding_lr": 0.004,
  "weight_decay": 0.0,
  "matrix_lr": 0.02,
  "grad_clip": 1.0,
  "warmup_ratio": 0.0,
  "warmdown_ratio": 0.2,
  "final_lr_frac": 0.0,
  "resume_from_step": -1,
  "eval_every": 250,
  "eval_tokens": 10485760,
  "core_metric_every": 2000,
  "core_metric_max_per_task": 500,
  "sample_every": 2000,
  "save_every": -1,
  "model_tag": "american_stories_shuffled_d20",
  "dataset": "american_stories",
  "ordering": "shuffled",
  "val_data_dir": "/data/lingo/marzoev/.cache/nanochat/american_stories_data_val"
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train alanamarzoev/american-stories-shuffled-d20