FlameF0X commited on
Commit
0ade45d
·
verified ·
1 Parent(s): 2781c52

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - pytorch
7
+ - causal-lm
8
+ - arxiv
9
+ - lfm2
10
+ datasets:
11
+ - FlameF0X/arXiv-AI-ML
12
+ ---
13
+
14
+ # LFM2 Model - Pretrained on arXiv AI/ML Papers
15
+
16
+ ## Model Description
17
+
18
+ This is a LFM2 (Liquid Foundation Model 2) architecture pretrained on the arXiv AI/ML dataset.
19
+
20
+ ### Model Architecture
21
+
22
+ - **Model Type:** Causal Language Model
23
+ - **Architecture:** LFM2
24
+ - **Hidden Size:** 512
25
+ - **Layers:** 8
26
+ - **Attention Heads:** 8
27
+ - **KV Heads:** 4 (Grouped Query Attention)
28
+ - **Max Sequence Length:** 2048
29
+ - **Vocabulary Size:** 50257
30
+
31
+ ### Training Details
32
+
33
+ - **Dataset:** FlameF0X/arXiv-AI-ML
34
+ - **Training Samples:** 2500
35
+ - **Batch Size:** 4
36
+ - **Learning Rate:** 0.0003
37
+ - **Epochs:** 3
38
+ - **Final Loss:** 3.8660
39
+
40
+ ### Usage
41
+
42
+ ```python
43
+ from transformers import AutoTokenizer
44
+ import torch
45
+
46
+ # Load tokenizer
47
+ tokenizer = AutoTokenizer.from_pretrained("./lfm2-arxiv-model")
48
+
49
+ # Load model (you'll need the model class definition)
50
+ # model = Lfm2ForCausalLM.from_pretrained("./lfm2-arxiv-model")
51
+
52
+ # Generate text
53
+ # input_text = "Attention mechanisms in neural networks"
54
+ # inputs = tokenizer(input_text, return_tensors="pt")
55
+ # outputs = model.generate(**inputs, max_length=100)
56
+ # print(tokenizer.decode(outputs[0]))
57
+ ```
58
+
59
+ ### Citation
60
+
61
+ If you use this model, please cite:
62
+
63
+ ```bibtex
64
+ @misc{lfm2-arxiv-2025,
65
+ title={LFM2 Pretrained on arXiv AI/ML},
66
+ author={Your Name},
67
+ year={2025},
68
+ publisher={HuggingFace},
69
+ }
70
+ ```
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "lfm2",
3
+ "architectures": [
4
+ "Lfm2ForCausalLM"
5
+ ],
6
+ "vocab_size": 50257,
7
+ "hidden_size": 512,
8
+ "intermediate_size": 2048,
9
+ "num_hidden_layers": 8,
10
+ "num_attention_heads": 8,
11
+ "num_key_value_heads": 4,
12
+ "max_position_embeddings": 2048,
13
+ "initializer_range": 0.02,
14
+ "norm_eps": 1e-05,
15
+ "pad_token_id": 0,
16
+ "bos_token_id": 1,
17
+ "eos_token_id": 2,
18
+ "rope_theta": 1000000.0,
19
+ "block_multiple_of": 256,
20
+ "block_ffn_dim_multiplier": 1.0,
21
+ "block_auto_adjust_ff_dim": true,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.36.0"
24
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "pad_token_id": 0,
5
+ "max_length": 2048,
6
+ "do_sample": true,
7
+ "temperature": 0.7,
8
+ "top_p": 0.9
9
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ed7ec33289f4e1b601bd300c9cd697d65283618f7085e9abedf2b45a97aa8e3
3
+ size 203632480
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|endoftext|>",
5
+ "unk_token": "<|endoftext|>"
6
+ }
tied_weights.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "lm_head.weight": "model.embed_tokens.weight"
3
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ }
12
+ },
13
+ "bos_token": "<|endoftext|>",
14
+ "clean_up_tokenization_spaces": false,
15
+ "eos_token": "<|endoftext|>",
16
+ "extra_special_tokens": {},
17
+ "model_max_length": 1024,
18
+ "pad_token": "<|endoftext|>",
19
+ "tokenizer_class": "GPT2Tokenizer",
20
+ "unk_token": "<|endoftext|>"
21
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff