File size: 5,481 Bytes
e504a12 598f0f5 bf5da72 e504a12 bf5da72 e504a12 bf5da72 e504a12 bf5da72 e504a12 385729c bdc5e8a 0e0c669 4e25284 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 4e25284 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 bdc5e8a 0e0c669 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
base_model:
- Qwen/Qwen2.5-14B-Instruct
license: mit
language:
- en
- zh
- fr
- es
- pt
- de
- it
- ru
- ja
- ko
- vi
- th
- ar
- fa
- he
- tr
- cs
- pl
- hi
- bn
- ur
- id
- ms
- lo
- my
- ceb
- km
- tl
- nl
tags:
- chemistry
- biology
- code
- text-generation-inference
- STEM
- unsloth
- text-generation-inference
- transformers
- qwen2
- trl
---
<div align="center">
<span style="font-family: default; font-size: 1.5em;">Athena-3</span>
<div>
π Faster, Sharper, Smarter than Athena 1 and Athena 2π
</div>
</div>
<br>
<div align="center" style="line-height: 1;">
<a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
<img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
<img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/Spestly/Athena-3-14B" style="margin: 2px;">
<img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
## **Athena-3**
*Athena generated this model card!*
**Athena-3-14B** is a 14.0-billion-parameter causal language model fine-tuned from Qwen2.5-14B-Instruct. This model is designed to provide highly fluent, contextually aware, and logically sound outputs across a broad range of NLP and reasoning tasks. It balances instruction-following with generative flexibility.
## **Model Details**
- **Model Developer:** Aayan Mishra
- **Model Type:** Causal Language Model
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
- **Parameters:** 14.0 billion total (12.84 billion non-embedding)
- **Layers:** 40
- **Attention Heads:** 40 for query and 4 for key-value (Grouped Query Attention)
- **Vocabulary Size:** Approximately 151,646 tokens
- **Context Length:** Supports up to 131,072 tokens
- **Languages Supported:** Over 29 languages, including strong performance in English, Chinese, and multilingual instruction tasks
- **License:** MIT
## **Training Details**
Athena-3-14B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated instruction-tuned dataset. It is tailored for generalist NLP performance with a focus on reasoning, alignment, and fluency.
## **Intended Use**
Athena-3-14B is ideal for a wide variety of tasks, including:
- **Instruction Following:** Handling complex prompts with step-by-step logical output
- **Writing Assistance:** Generating essays, emails, and coherent narratives
- **NLP Tasks:** Summarization, question answering, translation, and text classification
- **STEM Support:** Reasoning through academic and technical content
While Athena-3-14B is a versatile model, it is not intended for safety-critical applications or the handling of private, sensitive information.
## **How to Use**
To utilize Athena-3-14B, ensure that you have the latest version of the `transformers` library installed:
```bash
pip install transformers
```
Here's an example of how to load the Athena-3-14B model and generate a response:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Spestly/Athena-3-14B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the concept of entropy in thermodynamics."
messages = [
{"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
### **Maverick Search usage π**
To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
## **Limitations**
Users should be aware of the following limitations:
- **Biases:** Athena-3-14B may reflect biases from its pretraining and fine-tuning data. Outputs should be reviewed for fairness and accuracy.
- **Knowledge Cutoff:** The model's knowledge is current as of August 2024.
- **Multilingual Performance:** Performance varies by language, with strongest capabilities in English and aligned datasets.
## **Acknowledgements**
Athena-3-14B builds upon the Qwen2.5-14B foundation. Special thanks to the open-source ecosystem and Unsloth for enabling efficient fine-tuning workflows.
## **License**
Athena-3-14B is released under the MIT License, permitting broad use and distribution with proper attribution.
## **Contact**
- Email: maverick@aayanmishra.com |