LH-Tech-AI commited on
Commit
a5f05f9
·
verified ·
1 Parent(s): e91767a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - llama
11
+ - tiny-model
12
+ - sub-1M
13
+ - cpu
14
+ - small
15
+ - tiny
16
+ - quark
17
+ - 1m
18
+ ---
19
+
20
+ # Quark-0.5M
21
+
22
+ **Quark-0.5M** is an ultra-lightweight Llama-based model with only **465,504 parameters**.
23
+ It was trained from scratch to demonstrate the power of high-quality data (FineWeb-Edu) on extremely small architectures.
24
+
25
+ ## Model Details
26
+ - **Architecture:** Llama-based
27
+ - **Parameters:** 465,504
28
+ - **Vocabulary Size:** 500 (Custom Byte-Level BPE)
29
+ - **Hidden Size:** 96
30
+ - **Intermediate Size:** 192
31
+ - **Layers:** 4
32
+ - **Heads:** 4
33
+ - **Context Length:** 256 tokens
34
+
35
+ ## Training
36
+ - **Dataset:** 400 Million Tokens of `HuggingFaceFW/fineweb-edu` (Sample-10BT)
37
+ - **Training Time:** ~42 minutes on a single Kaggle T4 GPU
38
+ - **Final Loss:** 2.46
39
+ - **Optimizer:** AdamW with Cosine Learning Rate Decay
40
+
41
+ ## Intended Use
42
+ Quark is a research project to explore the limits of "Micro-LLMs". It is surprisingly capable of forming grammatically correct English sentences and structured lists, despite fitting into less than 1MB of disk space.
43
+
44
+ ## Performance Example
45
+ > **Prompt:** "Artificial intelligence is "
46
+ > **Output:** "Artificial intelligence is very important. These are more likely to be adapted with the people of the following:
47
+ > - Subjects, evidence and social treatment for reduces costs..."
48
+
49
+ ## How to use
50
+ ```python
51
+ from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
52
+
53
+ model = LlamaForCausalLM.from_pretrained("LH-Tech-AI/Quark-0.5M")
54
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("LH-Tech-AI/Quark-0.5M")
55
+
56
+ prompt = "The scientific method is"
57
+ inputs = tokenizer(prompt, return_tensors="pt")
58
+ outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.4)
59
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
60
+ ```