Release: 32K Context Variant
Browse files
README.md
CHANGED
|
@@ -10,14 +10,16 @@ tags:
|
|
| 10 |
- code
|
| 11 |
- enterprise
|
| 12 |
- 0.6b
|
|
|
|
|
|
|
| 13 |
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
# DeepBrainz-R1-0.6B
|
| 17 |
|
| 18 |
-
**DeepBrainz-R1-0.6B** is a compact, high-performance reasoning model engineered by **DeepBrainz AI & Labs**.
|
| 19 |
|
| 20 |
-
This
|
| 21 |
|
| 22 |
---
|
| 23 |
|
|
@@ -26,7 +28,7 @@ This model is part of the **DeepBrainz-R1 Series**, built to deliver frontier-cl
|
|
| 26 |
- **Parameter Count:** ~0.6B
|
| 27 |
- **Context Window:** 32,768 tokens
|
| 28 |
- **Specialization:** STEM Reasoning, Logic, Code Analysis
|
| 29 |
-
- **Architecture:** Optimized Dense Transformer
|
| 30 |
- **Deployment:** Ready for vLLM, TGI, and local inference
|
| 31 |
|
| 32 |
---
|
|
@@ -65,9 +67,11 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 65 |
|
| 66 |
---
|
| 67 |
|
| 68 |
-
##
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
| 71 |
|
| 72 |
---
|
| 73 |
|
|
|
|
| 10 |
- code
|
| 11 |
- enterprise
|
| 12 |
- 0.6b
|
| 13 |
+
- long-context
|
| 14 |
+
base_model: Qwen/Qwen3-0.6B
|
| 15 |
library_name: transformers
|
| 16 |
---
|
| 17 |
|
| 18 |
# DeepBrainz-R1-0.6B
|
| 19 |
|
| 20 |
+
**DeepBrainz-R1-0.6B** is a compact, high-performance reasoning model engineered by **DeepBrainz AI & Labs**. It is part of the **DeepBrainz-R1 Series**, designed to deliver frontier-class reasoning capabilities in cost-effective parameter sizes.
|
| 21 |
|
| 22 |
+
This variant features a **32,768 token context window**, optimized for processing medium-to-long documents and codebases.
|
| 23 |
|
| 24 |
---
|
| 25 |
|
|
|
|
| 28 |
- **Parameter Count:** ~0.6B
|
| 29 |
- **Context Window:** 32,768 tokens
|
| 30 |
- **Specialization:** STEM Reasoning, Logic, Code Analysis
|
| 31 |
+
- **Architecture:** Optimized Dense Transformer
|
| 32 |
- **Deployment:** Ready for vLLM, TGI, and local inference
|
| 33 |
|
| 34 |
---
|
|
|
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
+
## 🏗️ Technical Summary
|
| 71 |
|
| 72 |
+
The model was produced using a **multi-stage optimization process** involving large-scale supervision and iterative refinement. It is designed to maximize reasoning quality while maintaining instruction robustness.
|
| 73 |
+
|
| 74 |
+
*Specific training methodologies and dataset compositions are proprietary.*
|
| 75 |
|
| 76 |
---
|
| 77 |
|