ByteDance-Seed
/

Seed-Coder-8B-Reasoning

Text Generation

text-generation-inference

Model card Files Files and versions

yuyuzhang commited on May 9, 2025

Commit

02ce519

·

verified ·

1 Parent(s): 2d926d0

Update README.md

Files changed (1) hide show

README.md +20 -5

README.md CHANGED Viewed

@@ -7,16 +7,31 @@ base_model:
 # Seed-Coder-8B-Reasoning
 ## Introduction
-**Seed-Coder-8B-Reasoning** is an 8-billion-parameter model further optimized for **code reasoning**, **problem-solving**, and **algorithmic thinking** tasks.
-Built upon the strong base of Seed-Coder, it undergoes additional training in sandbox environments to significantly enhance its ability to tackle complex coding problems and competitions. It features:
-- Trained on a **massively curated corpus**, filtered using an **LLM-based method** to ensure high-quality real-world code, text-code alignment, and synthetic datasets.
-- **Sandbox fine-tuning** to specifically strengthen **multi-step reasoning**, **algorithm design**, and **competitive programming** capabilities.
-- Maintains **long-context handling** up to 32K tokens, enabling it to reason over extended problem descriptions and large input-output examples.
 <p align="center">
   <img width="100%" src="imgs/seed-coder_intro_performance.jpg">
 </p>
 ## Model Downloads
 | Model Name                  | Length | Download   |    Notes |
 |---------------------------------------------------------|-----------|------------------------------------|-----------------------|

 # Seed-Coder-8B-Reasoning
 ## Introduction
+We are thrilled to introduce Seed-Coder, a powerful, transparent, and parameter-efficient family of open-source code models at the 8B scale, featuring base, instruct, and reasoning variants. Seed-Coder contributes to promote the evolution of open code models through the following highlights.
+- Model-centric: Seed-Coder predominantly leverages LLMs instead of hand-crafted rules for code data filtering, minimizing manual effort in pretraining data construction.
+- Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data.
+- Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks.
 <p align="center">
   <img width="100%" src="imgs/seed-coder_intro_performance.jpg">
 </p>
+This repo contains Seed-Coder-8B-Base model, which has the following features:
+- Type: Causal Language Models
+- Data source: Public Dataset
+- Training Stage: Pretraining & Post-training
+- Context Length: 32,768
+## Highlight
+**Seed-Coder-8B-Reasoning** is an 8-billion-parameter model further optimized for **code reasoning**, **problem-solving**, and **algorithmic thinking** tasks.
+- Trained on a **massively curated corpus**, filtered using an **LLM-based method** to ensure high-quality real-world code, text-code alignment, and synthetic datasets.
+- **Sandbox fine-tuning** to specifically strengthen **multi-step reasoning**, **algorithm design**, and **competitive programming** capabilities.
 ## Model Downloads
 | Model Name                  | Length | Download   |    Notes |
 |---------------------------------------------------------|-----------|------------------------------------|-----------------------|