---
base_model:
- ByteDance-Seed/Seed-Coder-8B-Reasoning
tags:
- text-generation-inference
- transformers
- unsloth
- llama
license: mit
language:
- en
---
![Banner](https://huggingface.co/NoemaResearch/Daedalus-1-8B/resolve/main/img/banner.png)

# Daedalus-1-8B

[![Model](https://img.shields.io/badge/Model-Daedalus--1--8B-blue)](https://huggingface.co/NoemaResearch/Daedalus-1-8B)
[![Base](https://img.shields.io/badge/Base-Seed--Coder--8B--Reasoning-green)](https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Reasoning)
[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)

Daedalus-1-8B is an 8 billion parameter language model for code generation and reasoning, developed by **Noema Research**.  
It is a finetuned derivative of [Seed-Coder-8B-Reasoning](https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Reasoning),  
with enhancements for instruction following, structured code generation, and improved safety alignment.

---

## Model Overview

- **Base model:** `ByteDance-Seed/Seed-Coder-8B-Reasoning`  
- **Architecture:** Decoder-only transformer  
- **Parameters:** ~8.25B  
- **Context length:** Long-context support (up to ~64k tokens)  
- **Domain:** Programming and natural language reasoning  
- **Primary applications:**  
  - Code generation and completion  
  - Debugging and error explanation  
  - Unit test generation  
  - Structured outputs (e.g., JSON, function calls)  
- **License:** MIT  

---

## Key Improvements

Relative to the base model, Daedalus introduces targeted post-training improvements:

- **Instruction tuning** for developer-oriented tasks  
- **Structured output fidelity**, supporting JSON and schema-constrained responses  
- **Enhanced reasoning** for debugging and multi-step problem solving  
- **Reduced error rate** in code execution benchmarks  
- **Safety-oriented adjustments**, including avoidance of unsafe coding patterns  

---

## Usage

The model is released in Hugging Face Transformers format. Example:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "NoemaResearch/Daedalus-1-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role":"system", "content":"You are Daedalus, a coding assistant."},
    {"role":"user", "content":"Write a memory-efficient quicksort in Python with unit tests."}
]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.2, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
````

**Recommended settings:**

* `temperature=0.2–0.6` for deterministic code generation
* `top_p=0.9–0.95` for balanced creativity and correctness

---

## Evaluation

Daedalus inherits strong performance on competitive programming and reasoning tasks from Seed-Coder-8B-Reasoning.
Internal evaluations indicate:

* Higher **unit test pass rates**
* Improved **structured output validity**
* Reduced incidence of **hallucinated APIs**

A comprehensive benchmark report will be released in future updates.
For upstream benchmarks, please refer to the [Seed-Coder-8B-Reasoning model card](https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Reasoning).

---

## Limitations

Daedalus remains subject to common limitations of large language models:

* **Hallucinated libraries or functions:** the model may generate non-existent APIs
* **Insecure coding patterns:** suggestions should be reviewed for security and safety
* **Reasoning errors:** multi-step solutions may fail on complex edge cases
* **Dependence on prompt quality:** outputs are sensitive to phrasing and context

All generated code should be verified, linted, and tested before use in production.

---

## Responsible Use

* Do not provide secrets or credentials in prompts.
* Use outputs only in controlled, sandboxed, or reviewed environments.
* The model should not be employed for generating malicious software or unsafe code.
* We encourage the use of additional guardrails (static analyzers, test harnesses, execution sandboxes) in deployment contexts.

---

## Model Variants

* **Full-precision (safetensors)** — for research and high-fidelity inference
* **bf16 / fp16** — for efficient inference on modern accelerators
* **Quantized variants (int8, int4)** — for resource-constrained environments

---

## Citation

If you use this model, please cite both Daedalus and the underlying Seed-Coder base model:

```bibtex
@misc{noema2025daedalus,
  title={Daedalus-1-8B},
  author={Noema Research},
  year={2025},
  howpublished={\url{https://huggingface.co/NoemaResearch/Daedalus-1-8B}}
}
```

---

## Acknowledgements

Daedalus builds upon the [Seed-Coder](https://huggingface.co/ByteDance-Seed) family of models developed by ByteDance-Seed.
We thank the Seed team for releasing their models under permissive terms, enabling further research and refinement.