Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
|
| 2 |
|
| 3 |
-

|
| 4 |
|
| 5 |
|
| 6 |
CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
|
|
@@ -27,7 +27,7 @@ The training data is sourced from two types of datasets:
|
|
| 27 |
### 2. Data Filtering
|
| 28 |
To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
|
| 29 |
|
| 30 |
-

|
| 31 |
*Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
|
| 32 |
|
| 33 |
### 3. Data Refinement
|
|
|
|
| 1 |
# CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
|
| 2 |
|
| 3 |
+

|
| 4 |
|
| 5 |
|
| 6 |
CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
|
|
|
|
| 27 |
### 2. Data Filtering
|
| 28 |
To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
|
| 29 |
|
| 30 |
+

|
| 31 |
*Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
|
| 32 |
|
| 33 |
### 3. Data Refinement
|