FlameF0X commited on
Commit
4bc8dee
·
verified ·
1 Parent(s): 902d260

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +13 -90
  2. config.json +1 -1
  3. model.safetensors +2 -2
README.md CHANGED
@@ -1,95 +1,18 @@
1
  ---
2
- language:
3
- - en
4
- tags:
5
- - causal-lm
6
- - arxiv
7
- - research-assistant
8
- - small-language-model
9
- datasets:
10
- - FlameF0X/arXiv-AI-ML
11
- library_name: transformers
12
- license: other
13
- license_name: lfm1.0
14
- license_link: LICENSE
15
  ---
 
16
 
17
- # LFM2-Research
18
-
19
- **A small causal language model pre-trained on arXiv AI/ML research papers.**
20
-
21
- ## Overview
22
-
23
- LFM2-Research is a compact language model built on the [LFM2 (Liquid Foundation Model 2)](https://github.com/huggingface/transformers/tree/main/src/transformers/models/lfm2) -- i got the code of the model from `src/transformers/models/lfm2`, architecture and pre-trained exclusively on AI and machine learning research papers from arXiv. It is designed as a lightweight research assistant capable of engaging with technical literature in the AI/ML domain.
24
-
25
- > ⚠️ This model has not undergone RLHF, instruction tuning, or any alignment procedure. It is a raw pre-trained model and is best suited for experimentation and research purposes.
26
-
27
- ---
28
-
29
- ## Model Architecture
30
-
31
- | Parameter | Value |
32
  |---|---|
33
- | Model Type | Causal Language Model |
34
- | Architecture | LFM2 |
35
- | Hidden Size | 512 |
36
  | Layers | 8 |
37
- | Attention Heads | 8 |
38
- | KV Heads | 4 (Grouped Query Attention) |
39
- | Max Sequence Length | 2048 |
40
- | Vocabulary Size | 50,257 |
41
-
42
- ---
43
-
44
- ## Training Details
45
-
46
- | Parameter | Value |
47
- |---|---|
48
- | Dataset | [`FlameF0X/arXiv-AI-ML`](https://huggingface.co/datasets/FlameF0X/arXiv-AI-ML) |
49
- | Training Samples | 2,500 |
50
- | Batch Size | 4 |
51
- | Learning Rate | 3e-4 |
52
- | Epochs | 23 |
53
- | Final Loss | 0.3090 |
54
-
55
- ---
56
-
57
- ## Intended Use
58
-
59
- - Exploring AI/ML concepts in a research context
60
- - Prototyping lightweight domain-specific language model pipelines
61
- - Studying the effect of narrow-domain pre-training on small models
62
-
63
- ### Out-of-Scope Use
64
-
65
- This model is **not** intended for production use, general-purpose chat, or any application requiring safe, aligned, or factually reliable outputs. It has a very small training set (2,500 samples) and may produce repetitive, incoherent, or factually incorrect text.
66
-
67
- ---
68
-
69
- ## Limitations
70
-
71
- - **Small training set:** Only 2,500 samples were used for pre-training, which significantly limits generalization, but because they are research papers are are quite rich.
72
- - **No alignment:** The model has not been fine-tuned with human feedback or instruction tuning of any kind.
73
- - **Potential overfitting:** Given the high number of epochs (23) relative to the dataset size, the model may have overfit to training examples.
74
- - **Narrow domain:** The model has only been exposed to AI/ML research text and will likely perform poorly on out-of-domain inputs.
75
-
76
- ---
77
-
78
- ## Safety
79
-
80
- Because training data consists solely of academic research papers, the risk of harmful content generation is low. However, the lack of any alignment procedure means outputs are unpredictable and should not be treated as authoritative or safe for end-user-facing applications.
81
-
82
- ---
83
-
84
- ## Citation
85
-
86
- If you use this model in your work, please cite it as:
87
- ```bibtex
88
- @misc{lfm2-research,
89
- author = {FlameF0X},
90
- title = {LFM2-Research: A Small Language Model Pre-trained on arXiv AI/ML Papers},
91
- year = {2025},
92
- url = {https://huggingface.co/FlameF0X/LFM2-Research}
93
- }
94
- ```
95
- Or dont do it. It's up to you.
 
1
  ---
2
+ language: [en]
3
+ license: apache-2.0
4
+ tags: [pytorch, causal-lm, arxiv, lfm2]
5
+ datasets: [FlameF0X/arXiv-AI-ML]
 
 
 
 
 
 
 
 
 
6
  ---
7
+ # LFM2 – Pretrained on arXiv AI/ML
8
 
9
+ | Param | Value |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  |---|---|
11
+ | Hidden size | 512 |
 
 
12
  | Layers | 8 |
13
+ | Attention heads | 8 |
14
+ | KV heads | 4 |
15
+ | Max seq len | 2048 |
16
+ | Vocab size | 50257 |
17
+ | Epochs | 12 |
18
+ | Final loss | 0.3772 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -20,5 +20,5 @@
20
  "block_ffn_dim_multiplier": 1.0,
21
  "block_auto_adjust_ff_dim": true,
22
  "torch_dtype": "float32",
23
- "transformers_version": "4.36.0"
24
  }
 
20
  "block_ffn_dim_multiplier": 1.0,
21
  "block_auto_adjust_ff_dim": true,
22
  "torch_dtype": "float32",
23
+ "transformers_version": "4.50.0"
24
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:460a4af2c49a4151fc5b2858dfce727117b7c8fa23cbc9886b2f947fa6935574
3
- size 203632480
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:659ee87cd8c532c4344a41c2426bd37047a392fe54cdf46690cb739b904c217d
3
+ size 203638464