Zv3n commited on
Commit
2eb9cc5
·
verified ·
1 Parent(s): ed8c04f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - llama
5
+ - pre-trained
6
+ - from-scratch
7
+ - pytorch
8
+ ---
9
+
10
+ # LLaMA-355M Base Model
11
+
12
+ A 355M parameter LLaMA-style language model trained from scratch.
13
+
14
+ ## Architecture
15
+ - **Type**: LLaMA-style Transformer
16
+ - **Parameters**: 355M
17
+ - **Layers**: 24
18
+ - **Heads**: 16
19
+ - **Hidden dim**: 1024
20
+ - **Context**: 512 tokens
21
+ - **Vocab**: 50257 (tiktoken GPT-2 BPE)
22
+
23
+ ## Features
24
+ - RMSNorm (instead of LayerNorm)
25
+ - Rotary Position Embeddings (RoPE)
26
+ - SwiGLU activation
27
+ - Flash Attention
28
+ - No bias terms
29
+
30
+ ## Training
31
+ - Pre-trained on a mix of OpenWebText, AutoMathText, WikiText, HackerNews
32
+ - Fine-tuned on: claude-reasoning
33
+ - Trained on RTX 3080 Ti
34
+
35
+ ## Usage
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+ model = AutoModelForCausalLM.from_pretrained("YOUR_USER/llama-355m")
39
+ tokenizer = AutoTokenizer.from_pretrained("gpt2")
40
+ ```
config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "model_type": "llama",
6
+ "hidden_size": 1024,
7
+ "intermediate_size": 2752,
8
+ "num_attention_heads": 16,
9
+ "num_hidden_layers": 24,
10
+ "num_key_value_heads": 16,
11
+ "max_position_embeddings": 512,
12
+ "vocab_size": 50257,
13
+ "rms_norm_eps": 1e-06,
14
+ "rope_theta": 10000.0,
15
+ "tie_word_embeddings": true,
16
+ "torch_dtype": "float32",
17
+ "hidden_act": "silu",
18
+ "bos_token_id": 0,
19
+ "eos_token_id": 0
20
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "max_new_tokens": 256,
3
+ "temperature": 0.7,
4
+ "top_p": 0.9,
5
+ "top_k": 40,
6
+ "do_sample": true
7
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2900bbe7bd6134ba51b7eb59d66b8a8337657198bbc389160dbbd5293501246
3
+ size 1420375090
tokenizer_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "model_type": "gpt2",
3
+ "tokenizer_class": "GPT2Tokenizer"
4
+ }