Spestly commited on
Commit
79144eb
·
verified ·
1 Parent(s): ddb768e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -78
README.md CHANGED
@@ -122,131 +122,140 @@ language:
122
  - sw
123
 
124
  ---
125
- ![Header](./Nous-V1-Banner.png)
126
- # Nous-V1 4B
127
 
128
- ## Overview
 
 
129
 
130
- **Nous-V1 4B** is a cutting-edge 4 billion parameter language model developed by Apexion AI, based on the architecture of [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation.
 
131
 
132
- **Key Features:**
133
-
134
- - **⚡ Efficient 4B Parameter Scale:** Balances model capability with practical deployment on modern hardware
135
- - **🧠 Enhanced Contextual Understanding:** Supports an 128k token context window, enabling complex multi-turn conversations and document analysis
136
- - **🌐 Multilingual & Multi-domain:** Trained on a diverse dataset for broad language and domain coverage
137
- - **🤖 Instruction-Following & Adaptability:** Fine-tuned to respond accurately and adaptively across tasks
138
- - **🚀 Optimized Inference:** Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications
139
 
140
  ---
141
 
142
- ## Why Choose Nous-V1 4B?
143
-
144
- While larger models can offer more raw power, Nous-V1 4B strikes a practical balance — optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring:
145
 
146
- - Real-time conversational agents
147
- - Code completion and programming assistance
148
- - Content generation and summarization
149
- - Multilingual natural language understanding
 
 
 
 
 
 
 
150
 
151
  ---
152
 
153
- ## 🖥️ How to Run Locally
 
 
 
 
 
 
 
 
154
 
155
- You can easily integrate Nous-V1 4B via the Hugging Face Transformers library or deploy it on popular serving platforms.
156
 
157
- ### Using Hugging Face Transformers
158
 
159
  ```python
160
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
161
 
162
- model_name = "apexion-ai/Nous-1-4B"
163
 
164
- # load the tokenizer and the model
165
- tokenizer = AutoTokenizer.from_pretrained(model_name)
166
  model = AutoModelForCausalLM.from_pretrained(
167
- model_name,
168
- torch_dtype="auto",
169
- device_map="auto"
 
170
  )
171
 
172
- # prepare the model input
173
- prompt = "Give me a short introduction to large language model."
174
  messages = [
175
- {"role": "user", "content": prompt}
 
176
  ]
177
- text = tokenizer.apply_chat_template(
178
- messages,
179
- tokenize=False,
180
- add_generation_prompt=True,
181
- enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
182
- )
183
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
184
 
185
- # conduct text completion
186
- generated_ids = model.generate(
187
- **model_inputs,
188
- max_new_tokens=32768
189
- )
190
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
191
 
192
- # parsing thinking content
193
- try:
194
- # rindex finding 151668 (</think>)
195
- index = len(output_ids) - output_ids[::-1].index(151668)
196
- except ValueError:
197
- index = 0
198
 
199
- thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
200
- content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
 
201
 
202
- print("thinking content:", thinking_content)
203
- print("content:", content)
204
 
205
- ```
206
 
207
- ### Deployment Options
208
 
209
- - Compatible with [vLLM](https://github.com/vllm-project/vllm) for efficient serving
210
- - Works with [llama.cpp](https://github.com/ggerganov/llama.cpp) for lightweight inference
 
 
 
 
 
211
 
212
  ---
213
 
214
- ## Recommended Sampling Parameters
215
 
216
- ```yaml
217
- Temperature: 0.7
218
- Top-p: 0.9
219
- Top-k: 40
220
- Min-p: 0.0
221
- ```
222
 
223
  ---
224
 
225
- ## FAQ
226
 
227
- - **Q:** Can I fine-tune Nous-V1 4B on my custom data?
228
- **A:** Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts.
 
 
229
 
230
- - **Q:** What hardware is recommended?
231
- **A:** NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning.
232
 
233
- - **Q:** Is the model safe to use for production?
234
- **A:** Nous-V1 4B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content.
235
 
 
 
 
236
 
237
  ---
238
 
239
- ## 📄 Citation
 
 
240
 
241
  ```bibtex
242
- @misc{apexion2025nousv14b,
243
- title={Nous-V1 4B: Efficient Large Language Model for Versatile NLP Applications},
244
- author={Apexion AI Team},
245
  year={2025},
246
- url={https://huggingface.co/apexion-ai/Nous-V1-4B}
247
  }
248
  ```
249
 
250
  ---
251
 
252
- *Nous-V1 4B — Powering practical AI applications with intelligent language understanding.*
 
 
 
 
 
 
 
122
  - sw
123
 
124
  ---
125
+ # Apollo-1-4B
 
126
 
127
+ [![Model](https://img.shields.io/badge/Model-Apollo--1--4B-blue)](https://huggingface.co/NoemaResearch/Apollo-1-4B)
128
+ [![Base](https://img.shields.io/badge/Base-Qwen3--4B-green)](https://huggingface.co/Qwen/Qwen3-4B)
129
+ [![License](https://img.shields.io/badge/License-Apache_2.0-yellow)](LICENSE)
130
 
131
+ Apollo-1-4B is a **4 billion parameter instruction-tuned model** developed by **Noema Research**.
132
+ It is based on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) and optimized for **reasoning, instruction following, and lightweight deployment at scale**.
133
 
134
+ This model represents the **mid-size member** of the Apollo series, balancing performance and efficiency for a broad range of use cases.
 
 
 
 
 
 
135
 
136
  ---
137
 
138
+ ## Model Overview
 
 
139
 
140
+ - **Base model:** `Qwen3-4B`
141
+ - **Architecture:** Decoder-only transformer
142
+ - **Parameters:** ~4B
143
+ - **Context length:** up to 32k tokens (inherits Qwen3 long-context support)
144
+ - **Domain:** General-purpose reasoning and instruction following
145
+ - **Primary applications:**
146
+ - Conversational AI
147
+ - Multi-step reasoning tasks
148
+ - Education and tutoring systems
149
+ - Knowledge assistants and prototyping agents
150
+ - **License:** Apache 2.0
151
 
152
  ---
153
 
154
+ ## Key Features
155
+
156
+ - **Instruction tuning** for consistent conversational and task-oriented responses
157
+ - **Improved reasoning depth** compared to Apollo-1-2B, enabling stronger performance on complex queries
158
+ - **Long-context handling**, inherited from Qwen3 architecture
159
+ - **Multilingual coverage**, retaining broad knowledge across languages
160
+ - **Balanced resource requirements**, deployable on high-end consumer hardware and cloud GPUs
161
+
162
+ ---
163
 
164
+ ## Usage
165
 
166
+ The model is available in Hugging Face Transformers format. Example:
167
 
168
  ```python
169
+ from transformers import AutoTokenizer, AutoModelForCausalLM
170
+ import torch
171
 
172
+ model_id = "NoemaResearch/Apollo-1-4B"
173
 
174
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 
175
  model = AutoModelForCausalLM.from_pretrained(
176
+ model_id,
177
+ torch_dtype=torch.bfloat16,
178
+ device_map="auto",
179
+ trust_remote_code=True
180
  )
181
 
 
 
182
  messages = [
183
+ {"role":"system", "content":"You are Apollo, a helpful reasoning assistant."},
184
+ {"role":"user", "content":"Summarize the main differences between reinforcement learning and supervised learning."}
185
  ]
 
 
 
 
 
 
 
186
 
187
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
188
+ outputs = model.generate(**inputs, max_new_tokens=768, temperature=0.6, top_p=0.9)
189
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
190
+ ````
 
 
191
 
192
+ **Recommended settings:**
 
 
 
 
 
193
 
194
+ * `temperature=0.4–0.8`
195
+ * `top_p=0.9–0.95`
196
+ * Lower temperatures yield more factual and concise answers
197
 
198
+ ---
 
199
 
200
+ ## Evaluation
201
 
202
+ Apollo-1-4B demonstrates stronger reasoning capabilities relative to Apollo-1-2B, with internal evaluations indicating:
203
 
204
+ * Higher accuracy on step-by-step reasoning tasks
205
+ * More robust **instruction adherence**
206
+ * Reduced **hallucinations** in factual settings
207
+ * Effective balance between performance and efficiency
208
+
209
+ A full benchmark report will be provided in a future update.
210
+ For upstream performance details, see the [Qwen3-4B model card](https://huggingface.co/Qwen/Qwen3-4B).
211
 
212
  ---
213
 
214
+ ## Limitations
215
 
216
+ * **Reasoning scale**: While improved, Apollo-1-4B cannot match larger models (14B+) on complex or open-ended tasks
217
+ * **Knowledge breadth**: Some specialized or domain-specific knowledge remains limited
218
+ * **Hallucinations**: May generate plausible but incorrect information
219
+ * **Prompt sensitivity**: Outputs remain dependent on careful prompt formulation
 
 
220
 
221
  ---
222
 
223
+ ## Responsible Use
224
 
225
+ * Do not rely on Apollo-1-4B for critical decisions without human oversight
226
+ * Verify outputs before applying in factual, legal, or safety-critical contexts
227
+ * Avoid providing personal or sensitive data in prompts
228
+ * The model should not be used to generate unsafe, harmful, or disallowed content
229
 
230
+ ---
 
231
 
232
+ ## Model Variants
 
233
 
234
+ * **Full precision (safetensors)** — research and high-fidelity inference
235
+ * **bf16 / fp16** — efficient inference on modern accelerators
236
+ * **Quantized versions (int8 / int4)** — deployment in resource-constrained environments
237
 
238
  ---
239
 
240
+ ## Citation
241
+
242
+ If you use this model, please cite both Apollo-1-4B and the Qwen3 base model:
243
 
244
  ```bibtex
245
+ @misc{noema2025apollo4b,
246
+ title={Apollo-1-4B},
247
+ author={Noema Research},
248
  year={2025},
249
+ howpublished={\url{https://huggingface.co/NoemaResearch/Apollo-1-4B}}
250
  }
251
  ```
252
 
253
  ---
254
 
255
+ ## Acknowledgements
256
+
257
+ Apollo-1-4B builds upon the [Qwen3](https://huggingface.co/Qwen) family of models.
258
+ We thank the Qwen team for open-sourcing their models and enabling derivative research.
259
+
260
+ ---
261
+