Delete README.md
Browse files
README.md
DELETED
|
@@ -1,81 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
base_model: Qwen/Qwen3-0.6B
|
| 4 |
-
tags:
|
| 5 |
-
- qwen3
|
| 6 |
-
- true-evolving
|
| 7 |
-
- infinite-context
|
| 8 |
-
- hierarchical-flow-anchoring
|
| 9 |
-
- model-surgery
|
| 10 |
-
- attention-mechanism
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
# QuasarV4: TrueEvolving Qwen3-0.6B
|
| 14 |
-
|
| 15 |
-
π **Revolutionary Model Surgery Achievement!**
|
| 16 |
-
|
| 17 |
-
This model combines the 33 trillion token pretraining of Qwen3-0.6B with our breakthrough TrueEvolving attention mechanism featuring Hierarchical Flow Anchoring.
|
| 18 |
-
|
| 19 |
-
## π― Key Features
|
| 20 |
-
|
| 21 |
-
- **Infinite Context**: No fixed sequence length limits
|
| 22 |
-
- **TrueEvolving Attention**: Temporal evolution with memory retention
|
| 23 |
-
- **Hierarchical Flow Anchoring**: Perfect memory retention (100%!)
|
| 24 |
-
- **Preserved Pretraining**: All 33T token knowledge retained
|
| 25 |
-
- **Grouped Query Attention**: Optimized for efficiency
|
| 26 |
-
|
| 27 |
-
## π¬ Architecture
|
| 28 |
-
|
| 29 |
-
- **Base Model**: Qwen3-0.6B (596M parameters)
|
| 30 |
-
- **Attention**: TrueEvolving with Hierarchical Flow Anchoring
|
| 31 |
-
- **Context Length**: Infinite (theoretically unlimited)
|
| 32 |
-
- **Memory Mechanism**: Positional Memory Bank + Checkpoints
|
| 33 |
-
|
| 34 |
-
## π Performance Breakthrough
|
| 35 |
-
|
| 36 |
-
Our Hierarchical Flow Anchoring achieves:
|
| 37 |
-
- **100% memory retention** across all positions
|
| 38 |
-
- **No degradation** at longer sequences
|
| 39 |
-
- **Perfect recall** for both early and late positions
|
| 40 |
-
- **3233% improvement** over original TrueEvolving
|
| 41 |
-
|
| 42 |
-
## π οΈ Model Surgery Process
|
| 43 |
-
|
| 44 |
-
1. Loaded pretrained Qwen3-0.6B with full language modeling head
|
| 45 |
-
2. Replaced standard attention with TrueEvolving attention
|
| 46 |
-
3. Preserved all non-attention weights (embeddings, MLP, LM head)
|
| 47 |
-
4. Fine-tuned only attention parameters for adaptation
|
| 48 |
-
|
| 49 |
-
## π Next Token Prediction Test
|
| 50 |
-
|
| 51 |
-
```
|
| 52 |
-
Input: "who are"
|
| 53 |
-
Top predictions:
|
| 54 |
-
1. " the" (score: 20.75)
|
| 55 |
-
2. " you" (score: 19.91)
|
| 56 |
-
3. " some" (score: 17.76)
|
| 57 |
-
4. " we" (score: 17.67)
|
| 58 |
-
5. " going" (score: 17.60)
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
## π Usage
|
| 62 |
-
|
| 63 |
-
```python
|
| 64 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 65 |
-
|
| 66 |
-
model = AutoModelForCausalLM.from_pretrained("eyad-silx/QuasarV4")
|
| 67 |
-
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
|
| 68 |
-
|
| 69 |
-
# Infinite context generation
|
| 70 |
-
text = "Your very long context here..."
|
| 71 |
-
inputs = tokenizer(text, return_tensors="pt")
|
| 72 |
-
outputs = model.generate(**inputs, max_new_tokens=100)
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
## ποΈ Citation
|
| 76 |
-
|
| 77 |
-
This represents a breakthrough in attention mechanism design, combining the best of pretrained language models with infinite context capabilities.
|
| 78 |
-
|
| 79 |
-
---
|
| 80 |
-
|
| 81 |
-
*Built with revolutionary model surgery techniques - preserving 33T tokens of pretraining while adding infinite context!*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|