File size: 2,532 Bytes
d8444c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
base_model:
- ByteDance-Seed/Seed-Coder-8B-Base
---

# Seed-Coder-8B-Instruct

## Introduction
Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
- Trained on a **massively curated corpus**, where **an LLM-based filter** is applied to select **high-quality real-world code**, **text-code alignment data**, and **synthetic datasets** — ensuring cleaner and more useful data compared to traditional heuristic-based curation.
- Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
- **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
- Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.

## Requirements
You will need to install the latest versions of `transformers` and `accelerate`:

```bash
pip install -U transformers accelerate
```

## Quickstart

Here is a simple example demonstrating how to load the model and generate code using the Hugging Face `pipeline` API:

```python
import transformers
import torch

model_id = "ByteDance-Seed/Seed-Coder-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a quick sort algorithm."},
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
)
print(outputs[0]["generated_text"][-1]["content"])
```

## Evaluation

Seed-Coder-8B-Instruct demonstrates strong performance across a variety of coding benchmarks, showing:
- Competitive or superior results compared to similarly sized open-source code models.
- Robustness across different programming languages and domains.
- Ability to understand, reason, and repair complex code snippets.

For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx).

## Citation

If you find our work helpful, feel free to give us a cite.

```
@article{zhang2025seedcoder,
    title={Seed-Coder: Let the Code Model Curate Data for Itself},
    author={Xxx},
    year={2025},
    eprint={2504.xxxxx},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/xxxx.xxxxx}, 
}
```