Spestly commited on
Commit
8ac20f8
·
verified ·
1 Parent(s): 770bb1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -78
README.md CHANGED
@@ -122,131 +122,135 @@ language:
122
  - sw
123
 
124
  ---
125
- ![Header](./Nous-V1-Banner.png)
126
- # Nous-V1 8B
127
 
128
- ## Overview
129
 
130
- **Nous-V1 8B** is a cutting-edge 8 billion parameter language model developed by Apexion AI, based on the architecture of [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation.
 
 
131
 
132
- **Key Features:**
 
133
 
134
- - **⚡ Efficient 8B Parameter Scale:** Balances model capability with practical deployment on modern hardware
135
- - **🧠 Enhanced Contextual Understanding:** Supports an 128k token context window, enabling complex multi-turn conversations and document analysis
136
- - **🌐 Multilingual & Multi-domain:** Trained on a diverse dataset for broad language and domain coverage
137
- - **🤖 Instruction-Following & Adaptability:** Fine-tuned to respond accurately and adaptively across tasks
138
- - **🚀 Optimized Inference:** Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications
139
 
140
  ---
141
 
142
- ## Why Choose Nous-V1 8B?
143
 
144
- While larger models can offer more raw power, Nous-V1 8B strikes a practical balance — optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring:
 
 
 
 
 
145
 
146
- - Real-time conversational agents
147
- - Code completion and programming assistance
148
- - Content generation and summarization
149
- - Multilingual natural language understanding
 
150
 
151
  ---
152
 
153
- ## 🖥️ How to Run Locally
154
 
155
- You can easily integrate Nous-V1 8B via the Hugging Face Transformers library or deploy it on popular serving platforms.
 
 
 
 
156
 
157
- ### Using Hugging Face Transformers
158
 
159
- ```python
160
- from transformers import AutoModelForCausalLM, AutoTokenizer
161
 
162
- model_name = "apexion-ai/Nous-1-8B"
163
 
164
- # load the tokenizer and the model
165
- tokenizer = AutoTokenizer.from_pretrained(model_name)
 
 
 
166
  model = AutoModelForCausalLM.from_pretrained(
167
- model_name,
168
- torch_dtype="auto",
169
- device_map="auto"
 
170
  )
171
-
172
- # prepare the model input
173
- prompt = "Give me a short introduction to large language model."
174
  messages = [
175
- {"role": "user", "content": prompt}
 
176
  ]
177
- text = tokenizer.apply_chat_template(
178
- messages,
179
- tokenize=False,
180
- add_generation_prompt=True,
181
- enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
182
- )
183
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
184
 
185
- # conduct text completion
186
- generated_ids = model.generate(
187
- **model_inputs,
188
- max_new_tokens=32768
189
- )
190
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
191
 
192
- # parsing thinking content
193
- try:
194
- # rindex finding 151668 (</think>)
195
- index = len(output_ids) - output_ids[::-1].index(151668)
196
- except ValueError:
197
- index = 0
198
 
199
- thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
200
- content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
201
 
202
- print("thinking content:", thinking_content)
203
- print("content:", content)
204
 
205
- ```
206
 
207
- ### Deployment Options
 
 
 
208
 
209
- - Compatible with [vLLM](https://github.com/vllm-project/vllm) for efficient serving
210
- - Works with [llama.cpp](https://github.com/ggerganov/llama.cpp) for lightweight inference
211
 
212
  ---
213
 
214
- ## Recommended Sampling Parameters
215
 
216
- ```yaml
217
- Temperature: 0.7
218
- Top-p: 0.9
219
- Top-k: 40
220
- Min-p: 0.0
221
- ```
222
 
223
  ---
224
 
225
- ## FAQ
226
 
227
- - **Q:** Can I fine-tune Nous-V1 8B on my custom data?
228
- **A:** Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts.
 
 
229
 
230
- - **Q:** What hardware is recommended?
231
- **A:** NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning.
232
 
233
- - **Q:** Is the model safe to use for production?
234
- **A:** Nous-V1 8B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content.
235
 
 
 
 
236
 
237
  ---
238
 
239
- ## 📄 Citation
 
 
240
 
241
  ```bibtex
242
- @misc{apexion2025nousv14b,
243
- title={Nous-V1 8B: Efficient Large Language Model for Versatile NLP Applications},
244
- author={Apexion AI Team},
245
  year={2025},
246
- url={https://huggingface.co/apexion-ai/Nous-V1-8B}
247
  }
248
  ```
249
 
250
  ---
251
 
252
- *Nous-V1 8B — Powering practical AI applications with intelligent language understanding.*
 
 
 
 
122
  - sw
123
 
124
  ---
 
 
125
 
126
+ # Apollo-1-8B
127
 
128
+ [![Model](https://img.shields.io/badge/Model-Apollo--1--8B-blue)](https://huggingface.co/NoemaResearch/Apollo-1-8B)
129
+ [![Base](https://img.shields.io/badge/Base-Qwen3--8B-green)](https://huggingface.co/Qwen/Qwen3-8B)
130
+ [![License](https://img.shields.io/badge/License-Apache_2.0-yellow)](LICENSE)
131
 
132
+ Apollo-1-8B is a **8 billion parameter instruction-tuned model** developed by **Noema Research**.
133
+ It is based on [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) and optimized for **advanced reasoning, instruction following, and high-performance deployment**.
134
 
135
+ This model represents the **large-scale member** of the Apollo series, balancing strong reasoning capabilities with efficiency for multi-domain applications.
 
 
 
 
136
 
137
  ---
138
 
139
+ ## Model Overview
140
 
141
+ * **Base model:** `Qwen3-8B`
142
+ * **Architecture:** Decoder-only transformer
143
+ * **Parameters:** \~8B
144
+ * **Context length:** up to 32k tokens (inherits Qwen3 long-context support)
145
+ * **Domain:** General-purpose reasoning, instruction following, and code generation
146
+ * **Primary applications:**
147
 
148
+ * Advanced conversational AI
149
+ * Multi-step reasoning and problem solving
150
+ * Knowledge assistants and tutoring systems
151
+ * Software development and code generation
152
+ * **License:** anvdl-1.0
153
 
154
  ---
155
 
156
+ ## Key Features
157
 
158
+ * **Instruction tuning** for reliable multi-step reasoning and task completion
159
+ * **Extended reasoning depth** compared to Apollo-1-4B for complex queries
160
+ * **Long-context handling**, inherited from Qwen3 architecture
161
+ * **Multilingual coverage**, supporting diverse languages and domains
162
+ * **Balanced resource requirements**, deployable on high-end consumer hardware and cloud GPUs
163
 
164
+ ---
165
 
166
+ ## Usage
 
167
 
168
+ The model is available in Hugging Face Transformers format. Example:
169
 
170
+ ```python
171
+ from transformers import AutoTokenizer, AutoModelForCausalLM
172
+ import torch
173
+ model_id = "NoemaResearch/Apollo-1-8B"
174
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
175
  model = AutoModelForCausalLM.from_pretrained(
176
+ model_id,
177
+ torch_dtype=torch.bfloat16,
178
+ device_map="auto",
179
+ trust_remote_code=True
180
  )
 
 
 
181
  messages = [
182
+ {"role":"system", "content":"You are Apollo, a reasoning assistant."},
183
+ {"role":"user", "content":"Explain the differences between supervised, unsupervised, and reinforcement learning with examples."}
184
  ]
185
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
186
+ outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9)
187
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
188
+ ```
 
 
 
189
 
190
+ **Recommended settings:**
 
 
 
 
 
191
 
192
+ * `temperature=0.4–0.8`
193
+ * `top_p=0.9–0.95`
194
+ * Lower temperatures yield more factual and concise answers
 
 
 
195
 
196
+ ---
 
197
 
198
+ ## Evaluation
 
199
 
200
+ Apollo-1-8B demonstrates stronger reasoning and instruction-following capabilities relative to Apollo-1-4B, with internal evaluations indicating:
201
 
202
+ * Higher accuracy on complex multi-step reasoning tasks
203
+ * More robust **instruction adherence**
204
+ * Reduced **hallucinations** in factual and structured outputs
205
+ * High efficiency for large-context tasks
206
 
207
+ A full benchmark report will be provided in a future update.
208
+ For upstream performance details, see the [Qwen3-8B model card](https://huggingface.co/Qwen/Qwen3-8B).
209
 
210
  ---
211
 
212
+ ## Limitations
213
 
214
+ * **Reasoning scale**: While improved, Apollo-1-8B cannot match ultra-large models (14B+) on extremely complex or open-ended tasks
215
+ * **Knowledge breadth**: Some highly specialized or niche knowledge may be limited
216
+ * **Hallucinations**: May generate plausible but incorrect information
217
+ * **Prompt sensitivity**: Outputs remain dependent on careful prompt formulation
 
 
218
 
219
  ---
220
 
221
+ ## Responsible Use
222
 
223
+ * Do not rely on Apollo-1-8B for critical decisions without human oversight
224
+ * Verify outputs before applying in factual, legal, or safety-critical contexts
225
+ * Avoid providing personal or sensitive data in prompts
226
+ * The model should not be used to generate unsafe, harmful, or disallowed content
227
 
228
+ ---
 
229
 
230
+ ## Model Variants
 
231
 
232
+ * **Full precision (safetensors)** — research and high-fidelity inference
233
+ * **bf16 / fp16** — efficient inference on modern accelerators
234
+ * **Quantized versions (int8 / int4)** — deployment in resource-constrained environments
235
 
236
  ---
237
 
238
+ ## Citation
239
+
240
+ If you use this model, please cite both Apollo-1-8B and the Qwen3 base model:
241
 
242
  ```bibtex
243
+ @misc{noema2025apollo8b,
244
+ title={Apollo-1-8B},
245
+ author={Noema Research},
246
  year={2025},
247
+ howpublished={\url{https://huggingface.co/NoemaResearch/Apollo-1-8B}}
248
  }
249
  ```
250
 
251
  ---
252
 
253
+ ## Acknowledgements
254
+
255
+ Apollo-1-8B builds upon the [Qwen3](https://huggingface.co/Qwen) family of models.
256
+ We thank the Qwen team for open-sourcing their models and enabling derivative research.