Spestly commited on
Commit
bdc5e8a
·
verified ·
1 Parent(s): 598f0f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md CHANGED
@@ -11,4 +11,103 @@ license: apache-2.0
11
  language:
12
  - en
13
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
 
11
  language:
12
  - en
13
  ---
14
+ ![Header](Maverick.png)
15
+
16
+ # **Maverick-1-14B Model Card**
17
+
18
+ ## **Model Overview**
19
+
20
+ **Maverick-1-14B** is a 14.0-billion-parameter causal language model fine-tuned from Qwen2.5-14B-Instruct. This model is designed to provide highly fluent, contextually aware, and logically sound outputs across a broad range of NLP and reasoning tasks. It balances instruction-following with generative flexibility.
21
+
22
+ ## **Model Details**
23
+
24
+ - **Model Developer:** Aayan Mishra
25
+ - **Model Type:** Causal Language Model
26
+ - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
27
+ - **Parameters:** 14.0 billion total (12.84 billion non-embedding)
28
+ - **Layers:** 40
29
+ - **Attention Heads:** 40 for query and 4 for key-value (Grouped Query Attention)
30
+ - **Vocabulary Size:** Approximately 151,646 tokens
31
+ - **Context Length:** Supports up to 131,072 tokens
32
+ - **Languages Supported:** Over 29 languages, including strong performance in English, Chinese, and multilingual instruction tasks
33
+ - **License:** MIT
34
+
35
+ ## **Training Details**
36
+
37
+ Maverick-1-14B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated instruction-tuned dataset. It is tailored for generalist NLP performance with a focus on reasoning, alignment, and fluency.
38
+
39
+ ## **Intended Use**
40
+
41
+ Maverick-1-14B is ideal for a wide variety of tasks, including:
42
+
43
+ - **Instruction Following:** Handling complex prompts with step-by-step logical output
44
+ - **Writing Assistance:** Generating essays, emails, and coherent narratives
45
+ - **NLP Tasks:** Summarization, question answering, translation, and text classification
46
+ - **STEM Support:** Reasoning through academic and technical content
47
+
48
+ While Maverick-1-14B is a versatile model, it is not intended for safety-critical applications or the handling of private, sensitive information.
49
+
50
+ ## **How to Use**
51
+
52
+ To utilize Maverick-1-14B, ensure that you have the latest version of the `transformers` library installed:
53
+
54
+ ```bash
55
+ pip install transformers
56
+ ```
57
+
58
+ Here's an example of how to load the Maverick-1-14B model and generate a response:
59
+
60
+ ```python
61
+ from transformers import AutoModelForCausalLM, AutoTokenizer
62
+
63
+ model_name = "Spestly/Maverick-1-14B"
64
+ model = AutoModelForCausalLM.from_pretrained(
65
+ model_name,
66
+ torch_dtype="auto",
67
+ device_map="auto"
68
+ )
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
70
+
71
+ prompt = "Explain the concept of entropy in thermodynamics."
72
+ messages = [
73
+ {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
74
+ {"role": "user", "content": prompt}
75
+ ]
76
+ text = tokenizer.apply_chat_template(
77
+ messages,
78
+ tokenize=False,
79
+ add_generation_prompt=True
80
+ )
81
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
82
+ generated_ids = model.generate(
83
+ **model_inputs,
84
+ max_new_tokens=512
85
+ )
86
+ generated_ids = [
87
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
88
+ ]
89
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
90
+ print(response)
91
+ ```
92
+
93
+ ## **Limitations**
94
+
95
+ Users should be aware of the following limitations:
96
+
97
+ - **Biases:** Maverick-1-14B may reflect biases from its pretraining and fine-tuning data. Outputs should be reviewed for fairness and accuracy.
98
+ - **Knowledge Cutoff:** The model's knowledge is current as of August 2024.
99
+ - **Multilingual Performance:** Performance varies by language, with strongest capabilities in English and aligned datasets.
100
+
101
+ ## **Acknowledgements**
102
+
103
+ Maverick-1-14B builds upon the Qwen2.5-14B foundation. Special thanks to the open-source ecosystem and Unsloth for enabling efficient fine-tuning workflows.
104
+
105
+ ## **License**
106
+
107
+ Maverick-1-14B is released under the MIT License, permitting broad use and distribution with proper attribution.
108
+
109
+ ## **Contact**
110
+
111
+ - Email: maverick@aayanmishra.com
112
+
113