ByteDance-Seed
/

Seed-Coder-8B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

yuyuzhang commited on May 9, 2025

Commit

15d8bdf

·

verified ·

1 Parent(s): 01e01a1

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -7,8 +7,19 @@ base_model:
 # Seed-Coder-8B-Instruct
 ## Introduction
 Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
-- Trained on a **massively curated corpus**, where **an LLM-based filter** is applied to select **high-quality real-world code**, **text-code alignment data**, and **synthetic datasets** — ensuring cleaner and more useful data compared to traditional heuristic-based curation.
 - Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
 - **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
 - Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.

 # Seed-Coder-8B-Instruct
 ## Introduction
+We are thrilled to introduce Seed-Coder, a powerful, transparent, and parameter-efficient family of open-source code models at the 8B scale, featuring base, instruct, and reasoning variants. Seed-Coder contributes to promote the evolution of open code models through the following highlights.
+- Model-centric: Seed-Coder predominantly leverages LLMs instead of hand-crafted rules for code data filtering, minimizing manual effort in pretraining data construction.
+- Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data.
+- Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks.
+<p align="center">
+  <img width="100%" src="imgs/seed-coder_intro_performance.jpg">
+</p>
+## Highlight
 Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
+- Trained on a **large scale synthetic data**, emphasizing diversity, difficulty, scalability, and quality.
 - Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
 - **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
 - Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.