Spaces:

mldlb
/

RISE

Running

App Files Files Community

mldlb commited on Sep 1

Commit

7df7c06

verified ·

1 Parent(s): aaec13b

Upload 2 files

Browse files

Files changed (1) hide show

index.html +135 -286

index.html CHANGED Viewed

@@ -3,10 +3,10 @@
 <head>
   <meta charset="utf-8">
   <meta name="description"
-        content="Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.">
-  <meta name="keywords" content="Nerfies, D-NeRF, NeRF">
   <meta name="viewport" content="width=device-width, initial-scale=1">
-  <title>Nerfies: Deformable Neural Radiance Fields</title>
   <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
         rel="stylesheet">
@@ -33,39 +33,29 @@
     <div class="container is-max-desktop">
       <div class="columns is-centered">
         <div class="column has-text-centered">
-          <h1 class="title is-1 publication-title">Nerfies: Deformable Neural Radiance Fields</h1>
           <div class="is-size-5 publication-authors">
             <span class="author-block">
-              <a href="https://keunhong.com" target="_blank">Keunhong Park</a><sup>1</sup>,</span>
             <span class="author-block">
-              <a href="https://utkarshsinha.com" target="_blank">Utkarsh Sinha</a><sup>2</sup>,</span>
             <span class="author-block">
-              <a href="https://jonbarron.info" target="_blank">Jonathan T. Barron</a><sup>2</sup>,
             </span>
             <span class="author-block">
-              <a href="http://sofienbouaziz.com" target="_blank">Sofien Bouaziz</a><sup>2</sup>,
-            </span>
-            <span class="author-block">
-              <a href="https://www.danbgoldman.com" target="_blank">Dan B Goldman</a><sup>2</sup>,
-            </span>
-            <span class="author-block">
-              <a href="https://homes.cs.washington.edu/~seitz/" target="_blank">Steven M. Seitz</a><sup>1,2</sup>,
-            </span>
-            <span class="author-block">
-              <a href="http://www.ricardomartinbrualla.com" target="_blank">Ricardo Martin-Brualla</a><sup>2</sup>
             </span>
           </div>
           <div class="is-size-5 publication-authors">
-            <span class="author-block"><sup>1</sup>University of Washington,</span>
-            <span class="author-block"><sup>2</sup>Google Research</span>
           </div>
           <div class="column has-text-centered">
             <div class="publication-links">
               <!-- PDF Link. -->
               <span class="link-block">
-                <a href="https://arxiv.org/pdf/2011.12948" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="fas fa-file-pdf"></i>
@@ -74,7 +64,7 @@
                 </a>
               </span>
               <span class="link-block">
-                <a href="https://arxiv.org/abs/2011.12948" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="ai ai-arxiv"></i>
@@ -82,19 +72,9 @@
                   <span>arXiv</span>
                 </a>
               </span>
-              <!-- Video Link. -->
-              <span class="link-block">
-                <a href="https://www.youtube.com/watch?v=MrKrnHhk8IA" target="_blank"
-                   class="external-link button is-normal is-rounded is-dark">
-                  <span class="icon">
-                      <i class="fab fa-youtube"></i>
-                  </span>
-                  <span>Video</span>
-                </a>
-              </span>
               <!-- Code Link. -->
               <span class="link-block">
-                <a href="https://github.com/google/nerfies" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="fab fa-github"></i>
@@ -102,17 +82,7 @@
                   <span>Code</span>
                   </a>
               </span>
-              <!-- Dataset Link. -->
-              <span class="link-block">
-                <a href="https://github.com/google/nerfies/releases/tag/0.1" target="_blank"
-                   class="external-link button is-normal is-rounded is-dark">
-                  <span class="icon">
-                      <i class="far fa-images"></i>
-                  </span>
-                  <span>Data</span>
-                  </a>
             </div>
           </div>
         </div>
       </div>
@@ -123,78 +93,14 @@
 <section class="hero teaser">
   <div class="container is-max-desktop">
     <div class="hero-body">
-      <video id="teaser" autoplay muted loop playsinline height="100%">
-        <source src="./static/videos/teaser.mp4"
-                type="video/mp4">
-      </video>
       <h2 class="subtitle has-text-centered">
-        <span class="dnerf">Nerfies</span> turns selfie videos from your phone into
-        free-viewpoint
-        portraits.
       </h2>
     </div>
   </div>
 </section>
-<section class="hero is-light is-small">
-  <div class="hero-body">
-    <div class="container">
-      <div id="results-carousel" class="carousel results-carousel">
-        <div class="item item-steve">
-          <video poster="" id="steve" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/steve.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-chair-tp">
-          <video poster="" id="chair-tp" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/chair-tp.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-shiba">
-          <video poster="" id="shiba" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/shiba.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-fullbody">
-          <video poster="" id="fullbody" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/fullbody.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-blueshirt">
-          <video poster="" id="blueshirt" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/blueshirt.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-mask">
-          <video poster="" id="mask" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/mask.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-coffee">
-          <video poster="" id="coffee" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/coffee.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-        <div class="item item-toby">
-          <video poster="" id="toby" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/toby2.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-      </div>
-    </div>
-  </div>
-</section>
 <section class="section">
   <div class="container is-max-desktop">
     <!-- Abstract. -->
@@ -203,233 +109,176 @@
         <h2 class="title is-3">Abstract</h2>
         <div class="content has-text-justified">
           <p>
-            We present the first method capable of photorealistically reconstructing a non-rigidly
-            deforming scene using photos/videos captured casually from mobile phones.
           </p>
           <p>
-            Our approach augments neural radiance fields
-            (NeRF) by optimizing an
-            additional continuous volumetric deformation field that warps each observed point into a
-            canonical 5D NeRF.
-            We observe that these NeRF-like deformation fields are prone to local minima, and
-            propose a coarse-to-fine optimization method for coordinate-based models that allows for
-            more robust optimization.
-            By adapting principles from geometry processing and physical simulation to NeRF-like
-            models, we propose an elastic regularization of the deformation field that further
-            improves robustness.
           </p>
           <p>
-            We show that <span class="dnerf">Nerfies</span> can turn casually captured selfie
-            photos/videos into deformable NeRF
-            models that allow for photorealistic renderings of the subject from arbitrary
-            viewpoints, which we dub <i>"nerfies"</i>. We evaluate our method by collecting data
-            using a
-            rig with two mobile phones that take time-synchronized photos, yielding train/validation
-            images of the same pose at different viewpoints. We show that our method faithfully
-            reconstructs non-rigidly deforming scenes and reproduces unseen views with high
-            fidelity.
           </p>
         </div>
       </div>
     </div>
     <!--/ Abstract. -->
-    <!-- Paper video. -->
-    <div class="columns is-centered has-text-centered">
-      <div class="column is-four-fifths">
-        <h2 class="title is-3">Video</h2>
-        <div class="publication-video">
-          <iframe src="https://www.youtube.com/embed/MrKrnHhk8IA?rel=0&amp;showinfo=0"
-                  frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
-        </div>
-      </div>
-    </div>
-    <!--/ Paper video. -->
   </div>
 </section>
 <section class="section">
   <div class="container is-max-desktop">
-    <div class="columns is-centered">
-      <!-- Visual Effects. -->
-      <div class="column">
-        <div class="content">
-          <h2 class="title is-3">Visual Effects</h2>
-          <p>
-            Using <i>nerfies</i> you can create fun visual effects. This Dolly zoom effect
-            would be impossible without nerfies since it would require going through a wall.
-          </p>
-          <video id="dollyzoom" autoplay controls muted loop playsinline height="100%">
-            <source src="./static/videos/dollyzoom-stacked.mp4"
-                    type="video/mp4">
-          </video>
-        </div>
-      </div>
-      <!--/ Visual Effects. -->
-      <!-- Matting. -->
-      <div class="column">
-        <h2 class="title is-3">Matting</h2>
-        <div class="columns is-centered">
-          <div class="column content">
-            <p>
-              As a byproduct of our method, we can also solve the matting problem by ignoring
-              samples that fall outside of a bounding box during rendering.
-            </p>
-            <video id="matting-video" controls playsinline height="100%">
-              <source src="./static/videos/matting.mp4"
-                      type="video/mp4">
-            </video>
-          </div>
-        </div>
-      </div>
-    </div>
-    <!--/ Matting. -->
-    <!-- Animation. -->
     <div class="columns is-centered">
       <div class="column is-full-width">
-        <h2 class="title is-3">Animation</h2>
-        <!-- Interpolating. -->
-        <h3 class="title is-4">Interpolating states</h3>
         <div class="content has-text-justified">
           <p>
-            We can also animate the scene by interpolating the deformation latent codes of two input
-            frames. Use the slider here to linearly interpolate between the left frame and the right
-            frame.
           </p>
-        </div>
-        <div class="columns is-vcentered interpolation-panel">
-          <div class="column is-3 has-text-centered">
-            <img src="./static/images/interpolate_start.jpg"
-                 class="interpolation-image"
-                 alt="Interpolate start reference image."/>
-            <p>Start Frame</p>
-          </div>
-          <div class="column interpolation-video-column">
-            <div id="interpolation-image-wrapper">
-              Loading...
-            </div>
-            <input class="slider is-fullwidth is-large is-info"
-                   id="interpolation-slider"
-                   step="1" min="0" max="100" value="0" type="range">
-          </div>
-          <div class="column is-3 has-text-centered">
-            <img src="./static/images/interpolate_end.jpg"
-                 class="interpolation-image"
-                 alt="Interpolation end reference image."/>
-            <p class="is-bold">End Frame</p>
-          </div>
-        </div>
-        <br/>
-        <!--/ Interpolating. -->
-        <!-- Re-rendering. -->
-        <h3 class="title is-4">Re-rendering the input video</h3>
-        <div class="content has-text-justified">
           <p>
-            Using <span class="dnerf">Nerfies</span>, you can re-render a video from a novel
-            viewpoint such as a stabilized camera by playing back the training deformations.
           </p>
         </div>
         <div class="content has-text-centered">
-          <video id="replay-video"
-                 controls
-                 muted
-                 preload
-                 playsinline
-                 width="75%">
-            <source src="./static/videos/replay.mp4"
-                    type="video/mp4">
-          </video>
         </div>
-        <!--/ Re-rendering. -->
       </div>
     </div>
-    <!--/ Animation. -->
-    <!-- Concurrent Work. -->
     <div class="columns is-centered">
-      <div class="column is-full-width">
-        <h2 class="title is-3">Related Links</h2>
         <div class="content has-text-justified">
-          <p>
-            There's a lot of excellent work that was introduced around the same time as ours.
-          </p>
-          <p>
-            <a href="https://arxiv.org/abs/2104.09125" target="_blank">Progressive Encoding for Neural Optimization</a> introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.
-          </p>
-          <p>
-            <a href="https://www.albertpumarola.com/research/D-NeRF/index.html" target="_blank">D-NeRF</a> and <a href="https://gvv.mpi-inf.mpg.de/projects/nonrigid_nerf/" target="_blank">NR-NeRF</a>
-            both use deformation fields to model non-rigid scenes.
-          </p>
-          <p>
-            Some works model videos with a NeRF by directly modulating the density, such as <a href="https://video-nerf.github.io/" target="_blank">Video-NeRF</a>, <a href="https://www.cs.cornell.edu/~zl548/NSFF/" target="_blank">NSFF</a>, and <a href="https://neural-3d-video.github.io/" target="_blank">DyNeRF</a>
-          </p>
-          <p>
-            There are probably many more by the time you are reading this. Check out <a href="https://dellaert.github.io/NeRF/" target="_blank">Frank Dellart's survey on recent NeRF papers</a>, and <a href="https://github.com/yenchenlin/awesome-NeRF" target="_blank">Yen-Chen Lin's curated list of NeRF papers</a>.
-          </p>
         </div>
       </div>
     </div>
-    <!--/ Concurrent Work. -->
   </div>
 </section>
 <section class="section" id="BibTeX">
   <div class="container is-max-desktop content">
     <h2 class="title">BibTeX</h2>
-    <pre><code>@article{park2021nerfies,
-  author    = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
-  title     = {Nerfies: Deformable Neural Radiance Fields},
-  journal   = {ICCV},
-  year      = {2021},
 }</code></pre>
   </div>
 </section>
 <footer class="footer">
   <div class="container">
     <div class="content has-text-centered">
-      <a class="icon-link" target="_blank"
-         href="./static/videos/nerfies_paper.pdf">
-        <i class="fas fa-file-pdf"></i>
-      </a>
-      <a class="icon-link" href="https://github.com/keunhong" target="_blank" class="external-link" disabled>
-        <i class="fab fa-github"></i>
-      </a>
-    </div>
-    <div class="columns is-centered">
-      <div class="column is-8">
-        <div class="content">
-          <p>
-            This website is licensed under a <a rel="license" target="_blank"
-                                                href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
-            Commons Attribution-ShareAlike 4.0 International License</a>.
-          </p>
-          <p>
-            This means you are free to borrow the <a target="_blank"
-              href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website,
-            we just ask that you link back to this page in the footer.
-            Please remember to remove the analytics code included in the header of the website which
-            you do not want on your website.
-          </p>
-        </div>
-      </div>
     </div>
   </div>
 </footer>
 </body>
-</html>

 <head>
   <meta charset="utf-8">
   <meta name="description"
+        content="RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning">
+  <meta name="keywords" content="RISE, VLM, Vision-Language Models, Image Annotation, Chain of Thought, CoT">
   <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning</title>
   <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
         rel="stylesheet">
     <div class="container is-max-desktop">
       <div class="columns is-centered">
         <div class="column has-text-centered">
+          <h1 class="title is-1 publication-title">RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning</h1>
           <div class="is-size-5 publication-authors">
             <span class="author-block">
+              <a href="#" target="_blank">Suhang Hu</a><sup>1,*</sup>,</span>
             <span class="author-block">
+              <a href="#" target="_blank">Wei Hu</a><sup>†</sup>,</span>
             <span class="author-block">
+              <a href="#" target="_blank">Yuhang Su</a>,
             </span>
             <span class="author-block">
+              <a href="#" target="_blank">Fan Zhang</a>
             </span>
           </div>
           <div class="is-size-5 publication-authors">
+            <span class="author-block"><sup>1</sup>Beijing University of Chemical Technology</span>
           </div>
           <div class="column has-text-centered">
             <div class="publication-links">
               <!-- PDF Link. -->
               <span class="link-block">
+                <a href="#" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="fas fa-file-pdf"></i>
                 </a>
               </span>
               <span class="link-block">
+                <a href="#" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="ai ai-arxiv"></i>
                   <span>arXiv</span>
                 </a>
               </span>
               <!-- Code Link. -->
               <span class="link-block">
+                <a href="#" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="fab fa-github"></i>
                   <span>Code</span>
                   </a>
               </span>
             </div>
           </div>
         </div>
       </div>
 <section class="hero teaser">
   <div class="container is-max-desktop">
     <div class="hero-body">
+      <img src="#" alt="RISE Framework Overview" style="width: 100%;">
       <h2 class="subtitle has-text-centered">
+        RISE: A two-stage framework for self-supervised reasoning in Vision-Language Models
       </h2>
     </div>
   </div>
 </section>
 <section class="section">
   <div class="container is-max-desktop">
     <!-- Abstract. -->
         <h2 class="title is-3">Abstract</h2>
         <div class="content has-text-justified">
           <p>
+            Vision-Language Models (VLMs) struggle with complex image annotation tasks, such as emotion classification and context-driven object detection, which demand sophisticated reasoning. Standard Supervised Fine-Tuning (SFT) focuses solely on annotation outcomes, ignoring underlying rationales, while Visual Reinforcement Fine-Tuning (Visual-RFT) produces inconsistent Chains of Thought (CoTs) due to the absence of high-quality, verified CoTs during pre-training.
           </p>
           <p>
+            We introduce <strong>RISE</strong> (Reason-Inspire-Strengthen-Expertise), a two-stage framework to overcome these limitations. In the <strong>Reason</strong> stage (RISE-CoT), a reinforcement learning-driven "annotation-reasoning-annotation" closed-loop generates visually grounded, logically consistent CoTs by verifying their ability to reconstruct original annotations without direct leakage. The <strong>Inspire</strong> and <strong>Strengthen</strong> stage (RISE-R1) leverages a high-quality CoT subset for supervised fine-tuning, followed by reinforcement fine-tuning to produce interpretable reasoning and accurate annotations.
           </p>
           <p>
+            Evaluated on complex and simple image annotation tasks, RISE-trained Qwen2-VL-2B outperforms SFT and Visual-RFT, achieving robust performance and enhanced explainability. RISE offers a self-supervised solution for advancing VLM reasoning without requiring manually annotated CoTs.
           </p>
         </div>
       </div>
     </div>
     <!--/ Abstract. -->
   </div>
 </section>
 <section class="section">
   <div class="container is-max-desktop">
+    <h2 class="title is-3">RISE Framework</h2>
     <div class="columns is-centered">
       <div class="column is-full-width">
+        <h3 class="title is-4">Two-Stage Approach</h3>
         <div class="content has-text-justified">
           <p>
+            RISE operates through two stages to enhance VLM reasoning capabilities for image annotation tasks:
           </p>
+          <h4 class="title is-5">1. RISE-CoT: Closed-Loop Reasoning Generation</h4>
+          <p>
+            This stage generates high-quality, visually grounded Chains of Thought (CoTs) for image-annotation pairs in a self-supervised manner. The process involves:
+          </p>
+          <ul>
+            <li><strong>Reasoning Generation:</strong> VLM produces a CoT justifying the annotation without leaking specifics</li>
+            <li><strong>Annotation Reconstruction:</strong> VLM reconstructs the annotation from the generated CoT</li>
+            <li><strong>Consistency Validation:</strong> Reward function evaluates CoT quality based on reconstruction accuracy</li>
+          </ul>
+          <h4 class="title is-5">2. RISE-R1: Training VLM for Enhanced CoTs</h4>
           <p>
+            This stage trains the VLM to produce structured "think-answer" outputs:
           </p>
+          <ul>
+            <li><strong>Inspire (SFT):</strong> Supervised fine-tuning on high-quality CoT subset</li>
+            <li><strong>Strengthen (RFT):</strong> Reinforcement fine-tuning on full dataset to optimize task-specific outputs</li>
+          </ul>
         </div>
         <div class="content has-text-centered">
+          <img src="#" alt="RISE Framework Diagram" style="width: 80%;">
+          <p class="has-text-centered">Figure 1: RISE two-stage framework</p>
         </div>
       </div>
     </div>
+  </div>
+</section>
+<section class="section">
+  <div class="container is-max-desktop">
+    <h2 class="title is-3">Experiments & Results</h2>
     <div class="columns is-centered">
+      <div class="column">
+        <h3 class="title is-4">Datasets</h3>
         <div class="content has-text-justified">
+          <p>We evaluated RISE on four image annotation datasets with varying complexity:</p>
+          <ul>
+            <li><strong>Emotion6:</strong> Emotion classification with probability distributions</li>
+            <li><strong>LISA:</strong> Context-driven object detection</li>
+            <li><strong>ImageNet-Sub:</strong> Simple classification task</li>
+            <li><strong>COCO-Sub:</strong> Multi-target object detection</li>
+          </ul>
         </div>
       </div>
+      <div class="column">
+        <h3 class="title is-4">Key Results</h3>
+        <div class="content has-text-justified">
+          <p>RISE demonstrates superior performance across both complex and simple tasks:</p>
+          <ul>
+            <li>Outperforms SFT and Visual-RFT on Emotion6 and LISA</li>
+            <li>Achieves robust performance on ImageNet-Sub and COCO-Sub</li>
+            <li>Generates high-quality, interpretable Chains of Thought</li>
+            <li>Provides self-supervised solution without manual CoT annotation</li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    <div class="columns is-centered">
+      <div class="column content has-text-centered">
+        <img src="#" alt="Results Table" style="width: 90%;">
+        <p>Table 1: Performance comparison on complex tasks (Emotion6 and LISA)</p>
+      </div>
     </div>
+    <div class="columns is-centered">
+      <div class="column content has-text-centered">
+        <img src="#" alt="Qualitative Results" style="width: 90%;">
+        <p>Figure 2: Qualitative examples of RISE's "think-answer" outputs</p>
+      </div>
+    </div>
+  </div>
+</section>
+<section class="section">
+  <div class="container is-max-desktop">
+    <h2 class="title is-3">Ablation Studies</h2>
+    <div class="content has-text-justified">
+      <p>Our ablation studies confirm the importance of key RISE components:</p>
+      <ul>
+        <li><strong>CoT Quality:</strong> RISE-CoT generates higher quality CoTs compared to Base-Model and GPT-4o</li>
+        <li><strong>SFT Initialization:</strong> SFT on high-quality CoT subset is crucial for RFT success</li>
+        <li><strong>Reward Function:</strong> Full reward function (with leakage prevention and format constraints) achieves best performance</li>
+        <li><strong>Threshold Selection:</strong> τ=0.75 optimally balances CoT quality and dataset size</li>
+      </ul>
+    </div>
+    <div class="columns is-centered">
+      <div class="column content has-text-centered">
+        <img src="#" alt="Ablation Results" style="width: 80%;">
+        <p>Table 2: Ablation study on CoT quality</p>
+      </div>
+    </div>
   </div>
 </section>
+<section class="section">
+  <div class="container is-max-desktop">
+    <h2 class="title is-3">Conclusion</h2>
+    <div class="content has-text-justified">
+      <p>
+        We introduced RISE, a novel two-stage framework that significantly enhances VLMs for complex image annotation tasks.
+        RISE autonomously generates high-quality CoTs by verifying their ability to reconstruct original annotations, then uses
+        these CoTs to train VLMs to produce accurate and interpretable "think-answer" outputs directly from images.
+      </p>
+      <p>
+        Through its verifiable, self-supervised CoT generation, RISE improves annotation accuracy and interpretability while
+        uniquely enabling implicit evaluation and refinement of dataset annotation quality. This framework effectively boosts
+        the reasoning capabilities of lower-capacity VLMs across various image annotation tasks, allowing them to perform akin to larger models.
+      </p>
+    </div>
+  </div>
+</section>
 <section class="section" id="BibTeX">
   <div class="container is-max-desktop content">
     <h2 class="title">BibTeX</h2>
+    <pre><code>@article{hu2024rise,
+  title={RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning},
+  author={Hu, Suhang and Hu, Wei and Su, Yuhang and Zhang, Fan},
+  journal={arXiv preprint},
+  year={2024}
 }</code></pre>
   </div>
 </section>
 <footer class="footer">
   <div class="container">
     <div class="content has-text-centered">
+      <p>
+        This website is licensed under a <a rel="license" target="_blank"
+                                            href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
+        Commons Attribution-ShareAlike 4.0 International License</a>.
+      </p>
     </div>
   </div>
 </footer>
 </body>
+</html>