dl-course-hw13 / config.json
lllezd's picture
Push model using huggingface_hub.
8557934 verified
raw
history blame contribute delete
284 Bytes
{
"attention_type": "gqa",
"dropout": 0.1,
"hidden_dim": 768,
"intermediate_dim": 2048,
"kv_latent_dim": null,
"max_seq_len": 128,
"n_head": 12,
"n_kv_head": 6,
"n_layer": 12,
"q_latent_dim": null,
"rope_base": 10000.0,
"use_rope": true,
"vocab_size": 1024
}