Files changed (1) hide show
  1. README.md +145 -161
README.md CHANGED
@@ -1,162 +1,146 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-3B-Instruct
4
- library_name: transformers
5
- license: mit
6
- language:
7
- - en
8
- - zh
9
- - fr
10
- - es
11
- - pt
12
- - de
13
- - it
14
- - ru
15
- - ja
16
- - ko
17
- - vi
18
- - th
19
- - ar
20
- - fa
21
- - he
22
- - tr
23
- - cs
24
- - pl
25
- - hi
26
- - bn
27
- - ur
28
- - id
29
- - ms
30
- - lo
31
- - my
32
- - ceb
33
- - km
34
- - tl
35
- - nl
36
- tags:
37
- - chemistry
38
- - biology
39
- - code
40
- - text-generation-inference
41
- - STEM
42
- - unsloth
43
- ---
44
- <div align="center">
45
- <span style="font-family: default; font-size: 1.5em;">Athena-3</span>
46
- <div>
47
- 🚀 Faster, Sharper, Smarter than Athena 1 and Athena 2🌟
48
- </div>
49
- </div>
50
- <br>
51
- <div align="center" style="line-height: 1;">
52
- <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
53
- <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
54
- </a>
55
- <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
56
- <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
57
- </a>
58
- <a href="https://huggingface.co/Spestly/Athena-3-3B" style="margin: 2px;">
59
- <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
60
- </a>
61
- </div>
62
-
63
- # **Athena-3-3B Model Card**
64
-
65
- *Athena generated this model card!*
66
-
67
- ## **Model Overview**
68
-
69
- **Athena-3-3B** is a 3.09-billion-parameter causal language model fine-tuned from Qwen2.5-3B-Instruct. This model is designed to excel in various natural language processing tasks, offering enhanced reasoning and instruction-following capabilities.
70
-
71
- ## **Model Details**
72
-
73
- - **Model Developer:** Aayan Mishra
74
- - **Model Type:** Causal Language Model
75
- - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
76
- - **Parameters:** 3.09 billion total (2.77 billion non-embedding)
77
- - **Layers:** 36
78
- - **Attention Heads:** 16 for query and 2 for key-value (Grouped Query Attention)
79
- - **Vocabulary Size:** Approximately 151,646 tokens
80
- - **Context Length:** Supports up to 32,768 tokens
81
- - **Languages Supported:** Primarily English, with basic support for other languages
82
- - **License:** MIT
83
-
84
- ## **Training Details**
85
-
86
- Athena-3-3B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following and general NLP tasks. This approach aimed to enhance the model's performance in complex reasoning and academic tasks.
87
-
88
- ## **Intended Use**
89
-
90
- Athena-3-3B is designed for a range of applications, including but not limited to:
91
-
92
- - **General NLP Tasks:** Engaging in text completion, summarization, and question-answering tasks.
93
- - **Academic Assistance:** Providing support for tutoring, essay composition, and research inquiries.
94
- - **Data Analysis:** Offering insights and interpretations of data-centric queries.
95
-
96
- While Athena-3-3B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
97
-
98
- ## **How to Use**
99
-
100
- To utilize Athena-3-3B, ensure that you have the latest version of the `transformers` library installed:
101
-
102
- ```bash
103
- pip install transformers
104
- ```
105
-
106
- Here's an example of how to load the Athena-3-3B model and generate a response:
107
-
108
- ```python
109
- from transformers import AutoModelForCausalLM, AutoTokenizer
110
-
111
- model_name = "Spestly/Athena-3-3B"
112
- model = AutoModelForCausalLM.from_pretrained(
113
- model_name,
114
- torch_dtype="auto",
115
- device_map="auto"
116
- )
117
- tokenizer = AutoTokenizer.from_pretrained(model_name)
118
-
119
- prompt = "Explain the concept of entropy in thermodynamics."
120
- messages = [
121
- {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
122
- {"role": "user", "content": prompt}
123
- ]
124
- text = tokenizer.apply_chat_template(
125
- messages,
126
- tokenize=False,
127
- add_generation_prompt=True
128
- )
129
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
130
- generated_ids = model.generate(
131
- **model_inputs,
132
- max_new_tokens=512
133
- )
134
- generated_ids = [
135
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
136
- ]
137
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
138
- print(response)
139
- ```
140
- ### **Maverick Search usage 🔍**
141
-
142
- To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
143
-
144
- ## **Limitations**
145
-
146
- Users should be aware of the following limitations:
147
-
148
- - **Biases:** Athena-3-3B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
149
- - **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
150
- - **Language Support:** While primarily trained on English data, performance in other languages may be inconsistent.
151
-
152
- ## **Acknowledgements**
153
-
154
- Athena-3-3B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Athena-3-3B.
155
-
156
- ## **License**
157
-
158
- Athena-3-3B is released under the MIT License, permitting wide usage with proper attribution.
159
-
160
- ## **Contact**
161
-
162
  - Email: maverick@aayanmishra.com
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-3B-Instruct
4
+ library_name: transformers
5
+ license: mit
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ tags:
21
+ - chemistry
22
+ - biology
23
+ - code
24
+ - text-generation-inference
25
+ - STEM
26
+ - unsloth
27
+ ---
28
+ <div align="center">
29
+ <span style="font-family: default; font-size: 1.5em;">Athena-3</span>
30
+ <div>
31
+ 🚀 Faster, Sharper, Smarter than Athena 1 and Athena 2🌟
32
+ </div>
33
+ </div>
34
+ <br>
35
+ <div align="center" style="line-height: 1;">
36
+ <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
37
+ <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
38
+ </a>
39
+ <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
40
+ <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
41
+ </a>
42
+ <a href="https://huggingface.co/Spestly/Athena-3-3B" style="margin: 2px;">
43
+ <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
44
+ </a>
45
+ </div>
46
+
47
+ # **Athena-3-3B Model Card**
48
+
49
+ *Athena generated this model card!*
50
+
51
+ ## **Model Overview**
52
+
53
+ **Athena-3-3B** is a 3.09-billion-parameter causal language model fine-tuned from Qwen2.5-3B-Instruct. This model is designed to excel in various natural language processing tasks, offering enhanced reasoning and instruction-following capabilities.
54
+
55
+ ## **Model Details**
56
+
57
+ - **Model Developer:** Aayan Mishra
58
+ - **Model Type:** Causal Language Model
59
+ - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
60
+ - **Parameters:** 3.09 billion total (2.77 billion non-embedding)
61
+ - **Layers:** 36
62
+ - **Attention Heads:** 16 for query and 2 for key-value (Grouped Query Attention)
63
+ - **Vocabulary Size:** Approximately 151,646 tokens
64
+ - **Context Length:** Supports up to 32,768 tokens
65
+ - **Languages Supported:** Primarily English, with basic support for other languages
66
+ - **License:** MIT
67
+
68
+ ## **Training Details**
69
+
70
+ Athena-3-3B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following and general NLP tasks. This approach aimed to enhance the model's performance in complex reasoning and academic tasks.
71
+
72
+ ## **Intended Use**
73
+
74
+ Athena-3-3B is designed for a range of applications, including but not limited to:
75
+
76
+ - **General NLP Tasks:** Engaging in text completion, summarization, and question-answering tasks.
77
+ - **Academic Assistance:** Providing support for tutoring, essay composition, and research inquiries.
78
+ - **Data Analysis:** Offering insights and interpretations of data-centric queries.
79
+
80
+ While Athena-3-3B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
81
+
82
+ ## **How to Use**
83
+
84
+ To utilize Athena-3-3B, ensure that you have the latest version of the `transformers` library installed:
85
+
86
+ ```bash
87
+ pip install transformers
88
+ ```
89
+
90
+ Here's an example of how to load the Athena-3-3B model and generate a response:
91
+
92
+ ```python
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer
94
+
95
+ model_name = "Spestly/Athena-3-3B"
96
+ model = AutoModelForCausalLM.from_pretrained(
97
+ model_name,
98
+ torch_dtype="auto",
99
+ device_map="auto"
100
+ )
101
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
102
+
103
+ prompt = "Explain the concept of entropy in thermodynamics."
104
+ messages = [
105
+ {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
106
+ {"role": "user", "content": prompt}
107
+ ]
108
+ text = tokenizer.apply_chat_template(
109
+ messages,
110
+ tokenize=False,
111
+ add_generation_prompt=True
112
+ )
113
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
114
+ generated_ids = model.generate(
115
+ **model_inputs,
116
+ max_new_tokens=512
117
+ )
118
+ generated_ids = [
119
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
120
+ ]
121
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
122
+ print(response)
123
+ ```
124
+ ### **Maverick Search usage 🔍**
125
+
126
+ To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
127
+
128
+ ## **Limitations**
129
+
130
+ Users should be aware of the following limitations:
131
+
132
+ - **Biases:** Athena-3-3B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
133
+ - **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
134
+ - **Language Support:** While primarily trained on English data, performance in other languages may be inconsistent.
135
+
136
+ ## **Acknowledgements**
137
+
138
+ Athena-3-3B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Athena-3-3B.
139
+
140
+ ## **License**
141
+
142
+ Athena-3-3B is released under the MIT License, permitting wide usage with proper attribution.
143
+
144
+ ## **Contact**
145
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  - Email: maverick@aayanmishra.com