File size: 5,481 Bytes
e504a12
598f0f5
 
bf5da72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e504a12
bf5da72
 
 
e504a12
bf5da72
e504a12
bf5da72
 
e504a12
 
 
385729c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdc5e8a
0e0c669
4e25284
0e0c669
bdc5e8a
0e0c669
bdc5e8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e0c669
bdc5e8a
 
 
0e0c669
bdc5e8a
 
 
 
 
 
0e0c669
bdc5e8a
 
 
0e0c669
bdc5e8a
 
 
 
 
0e0c669
bdc5e8a
 
 
0e0c669
bdc5e8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e25284
 
 
 
bdc5e8a
 
 
 
0e0c669
bdc5e8a
 
 
 
 
0e0c669
bdc5e8a
 
 
0e0c669
bdc5e8a
 
 
0e0c669
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
base_model:
- Qwen/Qwen2.5-14B-Instruct
license: mit
language:
- en
- zh
- fr
- es
- pt
- de
- it
- ru
- ja
- ko
- vi
- th
- ar
- fa
- he
- tr
- cs
- pl
- hi
- bn
- ur
- id
- ms
- lo
- my
- ceb
- km
- tl
- nl
tags:
- chemistry
- biology
- code
- text-generation-inference
- STEM
- unsloth
- text-generation-inference
- transformers
- qwen2
- trl
---
<div align="center">
<span style="font-family: default; font-size: 1.5em;">Athena-3</span>
<div>
πŸš€ Faster, Sharper, Smarter than Athena 1 and Athena 2🌟
</div>
</div>
<br>
<div align="center" style="line-height: 1;">
  <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
    <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
    <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://huggingface.co/Spestly/Athena-3-14B" style="margin: 2px;">
    <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>

## **Athena-3**

*Athena generated this model card!*

**Athena-3-14B** is a 14.0-billion-parameter causal language model fine-tuned from Qwen2.5-14B-Instruct. This model is designed to provide highly fluent, contextually aware, and logically sound outputs across a broad range of NLP and reasoning tasks. It balances instruction-following with generative flexibility.

## **Model Details**

- **Model Developer:** Aayan Mishra
- **Model Type:** Causal Language Model
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
- **Parameters:** 14.0 billion total (12.84 billion non-embedding)
- **Layers:** 40
- **Attention Heads:** 40 for query and 4 for key-value (Grouped Query Attention)
- **Vocabulary Size:** Approximately 151,646 tokens
- **Context Length:** Supports up to 131,072 tokens
- **Languages Supported:** Over 29 languages, including strong performance in English, Chinese, and multilingual instruction tasks
- **License:** MIT

## **Training Details**

Athena-3-14B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated instruction-tuned dataset. It is tailored for generalist NLP performance with a focus on reasoning, alignment, and fluency.

## **Intended Use**

Athena-3-14B is ideal for a wide variety of tasks, including:

- **Instruction Following:** Handling complex prompts with step-by-step logical output
- **Writing Assistance:** Generating essays, emails, and coherent narratives
- **NLP Tasks:** Summarization, question answering, translation, and text classification
- **STEM Support:** Reasoning through academic and technical content

While Athena-3-14B is a versatile model, it is not intended for safety-critical applications or the handling of private, sensitive information.

## **How to Use**

To utilize Athena-3-14B, ensure that you have the latest version of the `transformers` library installed:

```bash
pip install transformers
```

Here's an example of how to load the Athena-3-14B model and generate a response:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Spestly/Athena-3-14B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the concept of entropy in thermodynamics."
messages = [
    {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

### **Maverick Search usage πŸ”**

To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)

## **Limitations**

Users should be aware of the following limitations:

- **Biases:** Athena-3-14B may reflect biases from its pretraining and fine-tuning data. Outputs should be reviewed for fairness and accuracy.
- **Knowledge Cutoff:** The model's knowledge is current as of August 2024.
- **Multilingual Performance:** Performance varies by language, with strongest capabilities in English and aligned datasets.

## **Acknowledgements**

Athena-3-14B builds upon the Qwen2.5-14B foundation. Special thanks to the open-source ecosystem and Unsloth for enabling efficient fine-tuning workflows.

## **License**

Athena-3-14B is released under the MIT License, permitting broad use and distribution with proper attribution.

## **Contact**

- Email: maverick@aayanmishra.com