sloshywings commited on
Commit
f4321ba
·
verified ·
1 Parent(s): 4c01491

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -87
README.md CHANGED
@@ -1,45 +1,52 @@
1
- library_name: transformers
2
- tags: [evaluate360m, smollm2, reasoning, lightweight, low-end-hardware]
3
- Model Card for Evaluate360M
4
- Model Details
5
- Model Description
 
6
  Evaluate360M is a lightweight large language model optimized for reasoning tasks. It is designed to run efficiently on low-end commercial hardware, such as mobile phones, while maintaining strong performance in logical reasoning and general-purpose applications.
7
 
8
- Developed by: [More Information Needed]
9
- Funded by [optional]: [More Information Needed]
10
- Shared by [optional]: [More Information Needed]
11
- Model type: Transformer-based decoder model
12
- Language(s) (NLP): English
13
- License: [More Information Needed]
14
- Finetuned from model [optional]: HuggingFaceTB/SmolLM2-360M-Instruct
15
- Model Sources
16
- Repository: [More Information Needed]
17
- Paper [optional]: [More Information Needed]
18
- Demo [optional]: [More Information Needed]
19
- Uses
20
- Direct Use
21
- Evaluate360M is intended for general-purpose reasoning tasks and can be used in applications that require lightweight LLMs, such as:
22
-
23
- Mobile-based AI assistants
24
- Low-power embedded systems
25
- Edge computing applications
26
- Downstream Use
 
 
 
 
27
  It can be further fine-tuned for specific domains, including code generation, summarization, or dialogue systems.
28
 
29
- Out-of-Scope Use
30
- Not optimized for handling very large context windows
31
- Not designed for generating high-fidelity creative text, such as poetry or fiction
32
- Bias, Risks, and Limitations
33
- Limitations
34
- Struggles with handling large context windows.
35
- Not evaluated for potential biases yet.
36
- Recommendations
37
- Users should be aware of the model’s limitations in context length and should evaluate its performance for their specific use cases.
38
-
39
- How to Get Started with the Model
40
- python
41
- Copy
42
- Edit
 
 
43
  from transformers import AutoModelForCausalLM, AutoTokenizer
44
 
45
  model_name = "evaluate360m"
@@ -49,52 +56,68 @@ model = AutoModelForCausalLM.from_pretrained(model_name)
49
  inputs = tokenizer("What is the capital of France?", return_tensors="pt")
50
  outputs = model.generate(**inputs)
51
  print(tokenizer.decode(outputs[0]))
52
- Training Details
53
- Training Data
54
- Dataset: HuggingFaceH4/Bespoke-Stratos-17k
55
- Preprocessing: Token packing enabled (--packing), sequence length up to 2048 tokens
56
- Training Procedure
57
- Optimizer & Precision:
58
- bf16 mixed precision
59
- gradient_accumulation_steps = 8
60
- Gradient checkpointing enabled
61
- Hyperparameters:
62
- Learning rate: 2e-5
63
- Epochs: 3
64
- Batch size: 4 (per device, both training and evaluation)
65
- Evaluation & Saving:
66
- Evaluation every 500 steps
67
- Model checkpoint saved every 1000 steps, keeping a max of 2 checkpoints
68
- Compute Infrastructure
69
- Hardware Used: A100 GPU
70
- Training Time: 6 hours
71
- Evaluation
72
- Benchmarks: No evaluation conducted yet.
73
- Metrics: Not available yet.
74
- Environmental Impact
75
- Hardware Type: A100 GPU
76
- Hours Used: 6 hours
77
- Cloud Provider: [More Information Needed]
78
- Compute Region: [More Information Needed]
79
- Carbon Emitted: [More Information Needed]
80
- Technical Specifications
81
- Model Architecture
82
- Similar to SmolLM2-360M
83
- Inspired by MobileLLM
84
- Uses Grouped-Query Attention (GQA)
85
- Prioritizes depth over width
86
- Citation [optional]
87
- BibTeX:
88
- [More Information Needed]
89
-
90
- APA:
91
- [More Information Needed]
92
-
93
- More Information
94
- [More Information Needed]
95
-
96
- Model Card Authors [optional]
97
- [More Information Needed]
98
-
99
- Model Card Contact
100
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for Evaluate360M
2
+
3
+ ## Model Details
4
+
5
+ ### Model Description
6
+
7
  Evaluate360M is a lightweight large language model optimized for reasoning tasks. It is designed to run efficiently on low-end commercial hardware, such as mobile phones, while maintaining strong performance in logical reasoning and general-purpose applications.
8
 
9
+ - **Developed by:** [More Information Needed]
10
+ - **Funded by [optional]:** [More Information Needed]
11
+ - **Shared by [optional]:** [More Information Needed]
12
+ - **Model type:** Transformer-based decoder model
13
+ - **Language(s) (NLP):** English
14
+ - **License:** [More Information Needed]
15
+ - **Finetuned from model [optional]:** `HuggingFaceTB/SmolLM2-360M-Instruct`
16
+
17
+ ### Model Sources
18
+
19
+ - **Repository:** [More Information Needed]
20
+ - **Paper [optional]:** [More Information Needed]
21
+ - **Demo [optional]:** [More Information Needed]
22
+
23
+ ## Uses
24
+
25
+ ### Direct Use
26
+ Evaluate360M is intended for general-purpose reasoning tasks and can be used in applications that require lightweight LLMs, such as:
27
+ - Mobile-based AI assistants
28
+ - Low-power embedded systems
29
+ - Edge computing applications
30
+
31
+ ### Downstream Use
32
  It can be further fine-tuned for specific domains, including code generation, summarization, or dialogue systems.
33
 
34
+ ### Out-of-Scope Use
35
+ - Not optimized for handling very large context windows
36
+ - Not designed for generating high-fidelity creative text, such as poetry or fiction
37
+
38
+ ## Bias, Risks, and Limitations
39
+
40
+ ### Limitations
41
+ - Struggles with handling large context windows.
42
+ - Not evaluated for potential biases yet.
43
+
44
+ ### Recommendations
45
+ Users should be aware of the model’s limitations in context length and should evaluate its performance for their specific use cases.
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ ```python
50
  from transformers import AutoModelForCausalLM, AutoTokenizer
51
 
52
  model_name = "evaluate360m"
 
56
  inputs = tokenizer("What is the capital of France?", return_tensors="pt")
57
  outputs = model.generate(**inputs)
58
  print(tokenizer.decode(outputs[0]))
59
+ ```
60
+
61
+ ## Training Details
62
+
63
+ ### Training Data
64
+ - **Dataset:** `HuggingFaceH4/Bespoke-Stratos-17k`
65
+ - **Preprocessing:** Token packing enabled (`--packing`), sequence length up to 2048 tokens
66
+
67
+ ### Training Procedure
68
+ - **Optimizer & Precision:**
69
+ - `bf16` mixed precision
70
+ - `gradient_accumulation_steps = 8`
71
+ - Gradient checkpointing enabled
72
+ - **Hyperparameters:**
73
+ - Learning rate: `2e-5`
74
+ - Epochs: `3`
75
+ - Batch size: `4` (per device, both training and evaluation)
76
+ - **Evaluation & Saving:**
77
+ - Evaluation every `500` steps
78
+ - Model checkpoint saved every `1000` steps, keeping a max of `2` checkpoints
79
+
80
+ ### Compute Infrastructure
81
+ - **Hardware Used:** A100 GPU
82
+ - **Training Time:** 6 hours
83
+
84
+ ## Evaluation
85
+
86
+ - **Benchmarks:** No evaluation conducted yet.
87
+ - **Metrics:** Not available yet.
88
+
89
+ ## Environmental Impact
90
+
91
+ - **Hardware Type:** A100 GPU
92
+ - **Hours Used:** 6 hours
93
+ - **Cloud Provider:** [More Information Needed]
94
+ - **Compute Region:** [More Information Needed]
95
+ - **Carbon Emitted:** [More Information Needed]
96
+
97
+ ## Technical Specifications
98
+
99
+ ### Model Architecture
100
+ - Similar to SmolLM2-360M
101
+ - Inspired by MobileLLM
102
+ - Uses **Grouped-Query Attention (GQA)**
103
+ - Prioritizes depth over width
104
+
105
+ ## Citation [optional]
106
+
107
+ **BibTeX:**
108
+ [More Information Needed]
109
+
110
+ **APA:**
111
+ [More Information Needed]
112
+
113
+ ## More Information
114
+
115
+ [More Information Needed]
116
+
117
+ ## Model Card Authors [optional]
118
+
119
+ [More Information Needed]
120
+
121
+ ## Model Card Contact
122
+
123
+ [More Information Needed]