american-stories-shuffled-d20
GPT-style language model trained on American Stories using nanochat.
Model Details
| Parameters | ~393M |
| Layers | 20 |
| Hidden size | 1280 |
| Attention heads | 10 |
| Vocab size | 65536 |
| Context length | 2048 |
| Training step | 21400 |
| Dataset | american_stories |
| Ordering | shuffled |
Usage
import torch
from huggingface_hub import hf_hub_download
# Download
model_path = hf_hub_download(repo_id="alanamarzoev/american-stories-shuffled-d20", filename="model.pt")
config_path = hf_hub_download(repo_id="alanamarzoev/american-stories-shuffled-d20", filename="config.json")
# Load config
import json
with open(config_path) as f:
config = json.load(f)
# Load model (requires nanochat)
from nanochat.gpt import GPT, GPTConfig
model_config = GPTConfig(**config["model_config"])
model = GPT(model_config)
state_dict = torch.load(model_path, map_location="cpu")
model.load_state_dict(state_dict)
Config
{
"run_name": "as_shuffled_d20",
"device_type": "",
"depth": 20,
"max_seq_len": 2048,
"num_iterations": -1,
"target_flops": -1.0,
"target_param_data_ratio": 20,
"device_batch_size": 32,
"total_batch_size": 524288,
"embedding_lr": 0.2,
"unembedding_lr": 0.004,
"weight_decay": 0.0,
"matrix_lr": 0.02,
"grad_clip": 1.0,
"warmup_ratio": 0.0,
"warmdown_ratio": 0.2,
"final_lr_frac": 0.0,
"resume_from_step": -1,
"eval_every": 250,
"eval_tokens": 10485760,
"core_metric_every": 2000,
"core_metric_max_per_task": 500,
"sample_every": 2000,
"save_every": -1,
"model_tag": "american_stories_shuffled_d20",
"dataset": "american_stories",
"ordering": "shuffled",
"val_data_dir": "/data/lingo/marzoev/.cache/nanochat/american_stories_data_val"
}
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support