Spaces:

metacritical
/

DeepSeekPapers

Running

App Files Files Community

metacritical commited on Feb 18

Commit

73705e5

verified ·

1 Parent(s): bf1e8f3

Update index.html

Browse files

Files changed (1) hide show

index.html +11 -97

index.html CHANGED Viewed

@@ -61,81 +61,6 @@
       <div class="columns is-centered">
         <div class="column is-10">
-          <!-- Native Sparse Attention -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                <a href="https://arxiv.org/abs/2502.11089">Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention</a>
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="release-date">Released: February 2025</p>
-              <p class="paper-description">
-                Introduces a new approach to sparse attention that is both hardware-efficient and natively trainable,
-                improving the performance of large language models.
-              </p>
-            </div>
-          </div>
-          <!-- DeepSeek-R1 -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="release-date">Released: January 20, 2025</p>
-              <p class="paper-description">
-                The R1 model builds on previous work to enhance reasoning capabilities through large-scale
-                reinforcement learning, competing directly with leading models like OpenAI's o1.
-              </p>
-            </div>
-          </div>
-          <!-- DeepSeek-V3 -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                DeepSeek-V3 Technical Report
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="release-date">Released: December 2024</p>
-              <p class="paper-description">
-                Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision
-                training and high-performance computing (HPC) co-design strategies.
-              </p>
-            </div>
-          </div>
-          <!-- DeepSeek-V2 -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="release-date">Released: May 2024</p>
-              <p class="paper-description">
-                Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
-                training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
-              </p>
-            </div>
-          </div>
-          <!-- DeepSeekMath -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="release-date">Released: April 2024</p>
-              <p class="paper-description">
-                This paper presents methods to improve mathematical reasoning in LLMs, introducing the
-                Group Relative Policy Optimization (GRPO) algorithm during reinforcement learning stages.
-              </p>
-            </div>
-          </div>
           <!-- DeepSeekLLM -->
           <div class="card paper-card">
             <div class="card-content">
@@ -151,48 +76,37 @@
             </div>
           </div>
-          <!-- Papers without specific dates -->
-          <!-- DeepSeek-Prover -->
           <div class="card paper-card">
             <div class="card-content">
               <h3 class="title is-4">
-                DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
                 <span class="coming-soon-badge">Deep Dive Coming Soon</span>
               </h3>
               <p class="paper-description">
-                Focuses on enhancing theorem proving capabilities in language models using synthetic data
-                for training, establishing new benchmarks in automated mathematical reasoning.
               </p>
             </div>
           </div>
-          <!-- DeepSeek-Coder-V2 -->
           <div class="card paper-card">
             <div class="card-content">
               <h3 class="title is-4">
-                DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
                 <span class="coming-soon-badge">Deep Dive Coming Soon</span>
               </h3>
               <p class="paper-description">
-                This paper details advancements in code-related tasks with an emphasis on open-source
-                methodologies, improving upon earlier coding models with enhanced capabilities.
               </p>
             </div>
           </div>
-          <!-- DeepSeekMoE -->
-          <div class="card paper-card">
-            <div class="card-content">
-              <h3 class="title is-4">
-                DeepSeekMoE: Advancing Mixture-of-Experts Architecture
-                <span class="coming-soon-badge">Deep Dive Coming Soon</span>
-              </h3>
-              <p class="paper-description">
-                Discusses the integration and benefits of the Mixture-of-Experts approach within the
-                DeepSeek framework, focusing on scalability and efficiency improvements.
-              </p>
-            </div>
-          </div>
         </div>
       </div>

       <div class="columns is-centered">
         <div class="column is-10">
           <!-- DeepSeekLLM -->
           <div class="card paper-card">
             <div class="card-content">
             </div>
           </div>
+          <!-- DeepSeek-V2 -->
           <div class="card paper-card">
             <div class="card-content">
               <h3 class="title is-4">
+                DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
                 <span class="coming-soon-badge">Deep Dive Coming Soon</span>
               </h3>
+              <p class="release-date">Released: May 2024</p>
               <p class="paper-description">
+                Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
+                training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
               </p>
             </div>
           </div>
+          <!-- Continue with other papers... -->
           <div class="card paper-card">
             <div class="card-content">
               <h3 class="title is-4">
+                DeepSeek-V3 Technical Report
                 <span class="coming-soon-badge">Deep Dive Coming Soon</span>
               </h3>
+              <p class="release-date">Released: December 2024</p>
               <p class="paper-description">
+                Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision
+                training and high-performance computing (HPC) co-design strategies.
               </p>
             </div>
           </div>
+          <!-- Add remaining papers following the same pattern -->
         </div>
       </div>