File size: 6,536 Bytes
fd375dc
a63b764
 
fd375dc
 
 
 
 
 
ddb5e9a
fd375dc
 
191fd0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a63b764
 
 
 
 
 
 
d3f26d0
 
 
 
 
 
 
a63b764
 
 
 
 
 
 
d3f26d0
 
d7b1e8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
license: mit
language:
- en
- zh
- fr
- es
- pt
- de
- it
- ru
- ja
- ko
- vi
- th
- ar
- fa
- he
- tr
- cs
- pl
- hi
- bn
- ur
- id
- ms
- lo
- my
- ceb
- km
- tl
- nl
library_name: transformers
extra_gated_prompt: >-
  By accessing this model, you agree to comply with ethical usage guidelines and
  accept full responsibility for its applications. You will not use this model
  for harmful, malicious, or illegal activities, and you understand that the
  model's use is subject to ongoing monitoring for misuse. This model is
  provided 'AS IS' and agreeing to this means that you are responsible for all
  the outputs generated by you
extra_gated_fields:
  Name: text
  Organization: text
  Country: country
  Date of Birth: date_picker
  Intended Use:
    type: select
    options:
    - Research
    - Education
    - Personal Development
    - Commercial Use
    - label: Other
      value: other
  I agree to use this model in accordance with all applicable laws and ethical guidelines: checkbox
  I agree to use this model under the MIT licence: checkbox
---
<div align="center">
<span style="font-family: default; font-size: 1.5em;">Athena-R3</span>
<div>
🚀 Athena-R3: Think Deeper. Solve Smarter. 🤔 
</div>
</div>
<br>
<div align="center" style="line-height: 1;">
  <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
    <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
    <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://huggingface.co/Spestly/Athena-R3-1.5B" style="margin: 2px;">
    <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>

*Generated by Athena-3!*

## **Model Overview**

**Athena-R3-1.5B** is a 1.5-billion-parameter causal language model fine-tuned from [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B). This model is specifically tailored to enhance reasoning capabilities, making it adept at handling complex problem-solving tasks and providing coherent, contextually relevant responses.

## **Model Details**

- **Model Developer:** Aayan Mishra
- **Model Type:** Causal Language Model
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Attention QKV bias
- **Parameters:** 1.5 billion total
- **Layers:** 24
- **Attention Heads:** 16 for query and 2 for key-value (Grouped Query Attention)
- **Vocabulary Size:** Approximately 151,646 tokens
- **Context Length:** Supports up to 128,000 tokens
- **Languages Supported:** Primarily English, with capabilities in other languages
- **License:** MIT

## **Training Details**

Athena-R3-1.5B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process involved 60 epochs over approximately 90 minutes, utilizing a curated dataset focused on reasoning tasks, including mathematical problem-solving and logical inference. This approach aimed to bolster the model's proficiency in complex reasoning and analytical tasks.

## **Intended Use**

Athena-R3-1.5B is designed for a variety of applications, including but not limited to:

- **Advanced Reasoning:** Assisting with complex problem-solving and logical analysis.
- **Academic Support:** Providing explanations and solutions for mathematical and scientific queries.
- **General NLP Tasks:** Engaging in text completion, summarization, and question-answering tasks.
- **Data Interpretation:** Offering insights and explanations for data-centric inquiries.

While Athena-R3-1.5B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.

## **How to Use**

To utilize Athena-R3-1.5B, ensure that you have the latest version of the `transformers` library installed:

```bash
pip install transformers
```

Here's an example of how to load the Athena-R3-1.5B model and generate a response:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Spestly/Athena-R3-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the concept of entropy in thermodynamics."
messages = [
    {"role": "system", "content": "You are Athena, an AI assistant designed to be helpful."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## **Limitations**

Users should be aware of the following limitations:

- **Biases:** Athena-R3-1.5B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
- **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
- **Language Support:** While the model supports multiple languages, performance is strongest in English.

## **Acknowledgements**

Athena-R3-1.5B builds upon the work of the DeepSeek team, particularly the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Athena-R3-1.5B.

## **License**

Athena-R3-1.5B is released under the MIT License, permitting wide usage with proper attribution.

## **Contact**

- Email: athena@aayanmishra.com