Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Aurea: Adaptive Multimodal Fusion for Vision-Language Models
|
| 2 |
|
| 3 |
Aurea is an open-source research project aimed at advancing vision-language model (VLM) pretraining by leveraging cutting-edge vision encoders—DINOv2 and SigLIP2. The core of Aurea is a novel adaptive **spatial-range attention mechanism** that intelligently fuses spatial and semantic information from encoder-derived visual features, enabling richer and more context-aware representations for various downstream tasks.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- microsoft/Phi-4-mini-instruct
|
| 7 |
+
- facebook/dinov2-with-registers-giant
|
| 8 |
+
- google/siglip2-so400m-patch14-224
|
| 9 |
+
pipeline_tag: visual-question-answering
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
# Aurea: Adaptive Multimodal Fusion for Vision-Language Models
|
| 13 |
|
| 14 |
Aurea is an open-source research project aimed at advancing vision-language model (VLM) pretraining by leveraging cutting-edge vision encoders—DINOv2 and SigLIP2. The core of Aurea is a novel adaptive **spatial-range attention mechanism** that intelligently fuses spatial and semantic information from encoder-derived visual features, enabling richer and more context-aware representations for various downstream tasks.
|