yuyuzhang commited on
Commit
15d8bdf
·
verified ·
1 Parent(s): 01e01a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -7,8 +7,19 @@ base_model:
7
  # Seed-Coder-8B-Instruct
8
 
9
  ## Introduction
 
 
 
 
 
 
 
 
 
 
 
10
  Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
11
- - Trained on a **massively curated corpus**, where **an LLM-based filter** is applied to select **high-quality real-world code**, **text-code alignment data**, and **synthetic datasets** ensuring cleaner and more useful data compared to traditional heuristic-based curation.
12
  - Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
13
  - **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
14
  - Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.
 
7
  # Seed-Coder-8B-Instruct
8
 
9
  ## Introduction
10
+ We are thrilled to introduce Seed-Coder, a powerful, transparent, and parameter-efficient family of open-source code models at the 8B scale, featuring base, instruct, and reasoning variants. Seed-Coder contributes to promote the evolution of open code models through the following highlights.
11
+
12
+ - Model-centric: Seed-Coder predominantly leverages LLMs instead of hand-crafted rules for code data filtering, minimizing manual effort in pretraining data construction.
13
+ - Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data.
14
+ - Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks.
15
+
16
+ <p align="center">
17
+ <img width="100%" src="imgs/seed-coder_intro_performance.jpg">
18
+ </p>
19
+
20
+ ## Highlight
21
  Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
22
+ - Trained on a **large scale synthetic data**, emphasizing diversity, difficulty, scalability, and quality.
23
  - Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
24
  - **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
25
  - Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.