Files changed (1) hide show
  1. README.md +98 -86
README.md CHANGED
@@ -1,87 +1,99 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-3B-Instruct
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen2
8
- - trl
9
- license: other
10
- license_name: qwen-research
11
- license_link: https://huggingface.co/Spestly/Athena-1-3B/blob/main/LICENSE
12
- language:
13
- - en
14
- ---
15
- ![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)
16
-
17
- # Athena-1 3B:
18
-
19
- Athena-1 3B is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct). It is designed to provide efficient, high-quality text generation while maintaining a compact size. Athena 3B is optimized for lightweight applications, conversational AI, and structured data tasks, making it ideal for real-world use cases where performance and resource efficiency are critical.
20
-
21
- ---
22
-
23
- ## Key Features
24
-
25
- ### ⚡ Lightweight and Efficient
26
- - **Compact Size**: At just **3.09 billion parameters**, Athena-1 3B offers excellent performance with reduced computational requirements.
27
- - **Instruction Following**: Fine-tuned for precise and reliable adherence to user prompts.
28
- - **Coding and Mathematics**: Proficient in solving coding challenges and handling mathematical tasks.
29
-
30
- ### 📖 Long-Context Understanding
31
- - **Context Length**: Supports up to **32,768 tokens**, enabling the processing of moderately lengthy documents or conversations.
32
- - **Token Generation**: Can generate up to **8K tokens** of output.
33
-
34
- ### 🌍 Multilingual Support
35
- - Supports **29+ languages**, including:
36
- - English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
37
- - Japanese, Korean, Vietnamese, Thai, Arabic, and more.
38
-
39
- ### 📊 Structured Data & Outputs
40
- - **Structured Data Interpretation**: Processes structured formats like tables and JSON.
41
- - **Structured Output Generation**: Generates well-formatted outputs, including JSON and other structured formats.
42
-
43
- ---
44
-
45
- ## Model Details
46
-
47
- - **Base Model**: [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
48
- - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
49
- - **Parameters**: 3.09B total (2.77B non-embedding).
50
- - **Layers**: 36
51
- - **Attention Heads**: 16 for Q, 2 for KV.
52
- - **Context Length**: Up to **32,768 tokens**.
53
-
54
- ---
55
-
56
- ## Applications
57
-
58
- Athena 3B is designed for a variety of real-world applications:
59
- - **Conversational AI**: Build fast, responsive, and lightweight chatbots.
60
- - **Code Generation**: Generate, debug, or explain code snippets.
61
- - **Mathematical Problem Solving**: Assist with calculations and reasoning.
62
- - **Document Processing**: Summarize and analyze moderately large documents.
63
- - **Multilingual Applications**: Support for global use cases with diverse language requirements.
64
- - **Structured Data**: Process and generate structured data, such as tables and JSON.
65
-
66
- ---
67
-
68
- ## Quickstart
69
-
70
- Here’s how you can use Athena 3B for quick text generation:
71
-
72
- ```python
73
- # Use a pipeline as a high-level helper
74
- from transformers import pipeline
75
-
76
- messages = [
77
- {"role": "user", "content": "Who are you?"},
78
- ]
79
- pipe = pipeline("text-generation", model="Spestly/Athena-1-3B")
80
- pipe(messages)
81
-
82
- # Load model directly
83
- from transformers import AutoTokenizer, AutoModelForCausalLM
84
-
85
- tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-3B")
86
- model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-3B")
 
 
 
 
 
 
 
 
 
 
 
 
87
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-3B-Instruct
3
+ tags:
4
+ - text-generation-inference
5
+ - transformers
6
+ - unsloth
7
+ - qwen2
8
+ - trl
9
+ license: other
10
+ license_name: qwen-research
11
+ license_link: https://huggingface.co/Spestly/Athena-1-3B/blob/main/LICENSE
12
+ language:
13
+ - zho
14
+ - eng
15
+ - fra
16
+ - spa
17
+ - por
18
+ - deu
19
+ - ita
20
+ - rus
21
+ - jpn
22
+ - kor
23
+ - vie
24
+ - tha
25
+ - ara
26
+ ---
27
+ ![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)
28
+
29
+ # Athena-1 3B:
30
+
31
+ Athena-1 3B is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct). It is designed to provide efficient, high-quality text generation while maintaining a compact size. Athena 3B is optimized for lightweight applications, conversational AI, and structured data tasks, making it ideal for real-world use cases where performance and resource efficiency are critical.
32
+
33
+ ---
34
+
35
+ ## Key Features
36
+
37
+ ### Lightweight and Efficient
38
+ - **Compact Size**: At just **3.09 billion parameters**, Athena-1 3B offers excellent performance with reduced computational requirements.
39
+ - **Instruction Following**: Fine-tuned for precise and reliable adherence to user prompts.
40
+ - **Coding and Mathematics**: Proficient in solving coding challenges and handling mathematical tasks.
41
+
42
+ ### 📖 Long-Context Understanding
43
+ - **Context Length**: Supports up to **32,768 tokens**, enabling the processing of moderately lengthy documents or conversations.
44
+ - **Token Generation**: Can generate up to **8K tokens** of output.
45
+
46
+ ### 🌍 Multilingual Support
47
+ - Supports **29+ languages**, including:
48
+ - English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
49
+ - Japanese, Korean, Vietnamese, Thai, Arabic, and more.
50
+
51
+ ### 📊 Structured Data & Outputs
52
+ - **Structured Data Interpretation**: Processes structured formats like tables and JSON.
53
+ - **Structured Output Generation**: Generates well-formatted outputs, including JSON and other structured formats.
54
+
55
+ ---
56
+
57
+ ## Model Details
58
+
59
+ - **Base Model**: [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
60
+ - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
61
+ - **Parameters**: 3.09B total (2.77B non-embedding).
62
+ - **Layers**: 36
63
+ - **Attention Heads**: 16 for Q, 2 for KV.
64
+ - **Context Length**: Up to **32,768 tokens**.
65
+
66
+ ---
67
+
68
+ ## Applications
69
+
70
+ Athena 3B is designed for a variety of real-world applications:
71
+ - **Conversational AI**: Build fast, responsive, and lightweight chatbots.
72
+ - **Code Generation**: Generate, debug, or explain code snippets.
73
+ - **Mathematical Problem Solving**: Assist with calculations and reasoning.
74
+ - **Document Processing**: Summarize and analyze moderately large documents.
75
+ - **Multilingual Applications**: Support for global use cases with diverse language requirements.
76
+ - **Structured Data**: Process and generate structured data, such as tables and JSON.
77
+
78
+ ---
79
+
80
+ ## Quickstart
81
+
82
+ Here’s how you can use Athena 3B for quick text generation:
83
+
84
+ ```python
85
+ # Use a pipeline as a high-level helper
86
+ from transformers import pipeline
87
+
88
+ messages = [
89
+ {"role": "user", "content": "Who are you?"},
90
+ ]
91
+ pipe = pipeline("text-generation", model="Spestly/Athena-1-3B")
92
+ pipe(messages)
93
+
94
+ # Load model directly
95
+ from transformers import AutoTokenizer, AutoModelForCausalLM
96
+
97
+ tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-3B")
98
+ model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-3B")
99
  ```