DNA-2.0-14B-FP8

Overview

This is an FP8-quantized version of dnotitia/DNA-2.0-14B, optimized for efficient inference by DLM (Data Science Lab., Ltd.).

FP8 (8-bit floating point) quantization with static per-tensor scaling reduces model size by approximately 35% while maintaining near-original accuracy. Fully compatible with vLLM for high-throughput production serving.

Model Details

Attribute Value
Base Model dnotitia/DNA-2.0-14B
Architecture Qwen3ForCausalLM
Parameters ~14B
Quantization FP8 W8A8 (Static Per-Tensor)
Quantization Tool llm-compressor
Calibration Data HuggingFaceH4/ultrachat_200k (512 samples)
Model Size ~19 GB (vs ~30 GB in BF16)
Context Length 32K native / up to 131K with YaRN
Vocabulary 151,936 tokens
License Apache 2.0
Quantized By DLM (Data Science Lab., Ltd.)

Quantization Details

  • Method: Static FP8 quantization via llm-compressor oneshot
  • Precision: FP8_E4M3 for weights, FP8_E4M3 for input activations
  • Strategy: Per-tensor symmetric scaling with MinMax observer
  • Calibration: 512 samples from HuggingFaceH4/ultrachat_200k (train_sft split), max sequence length 2048
  • Format: compressed-tensors (safetensors)
  • Preserved layers: lm_head kept in full precision (BF16)
  • Targets: All Linear layers (except lm_head)

Usage

vLLM (Recommended)

vllm serve dataslab/DNA-2.0-14B-FP8 \
  --dtype auto \
  --max-model-len 32768 \
  --enable-reasoning \
  --reasoning-parser deepseek_r1

Extended context (up to 131K with YaRN):

vllm serve dataslab/DNA-2.0-14B-FP8 \
  --dtype auto \
  --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
  --max-model-len 131072 \
  --enable-reasoning \
  --reasoning-parser deepseek_r1

Python (vLLM)

from vllm import LLM, SamplingParams

llm = LLM(model="dataslab/DNA-2.0-14B-FP8")
sampling_params = SamplingParams(
    temperature=0.6, top_p=0.95, top_k=20, max_tokens=4096
)

messages = [
    {"role": "user", "content": "한국의 경제 발전 과정에 대해 설명해주세요."}
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("dataslab/DNA-2.0-14B-FP8")
model = AutoModelForCausalLM.from_pretrained(
    "dataslab/DNA-2.0-14B-FP8",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "복잡한 윤리적 딜레마에 대해 다각도로 분석해줘."}
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_dict=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Dynamic Thinking Mode

This model inherits DNA 2.0's dynamic thinking capability:

  • Thinking mode: Add /think to enable detailed step-by-step reasoning (temperature=0.6)
  • Non-thinking mode: Add /no_think for concise, direct responses (temperature=0.7)

Base Model

DNA 2.0 is developed by Dnotitia Inc. and features:

  • Smoothie Qwen3 foundation with balanced multilingual optimization
  • Uncensored reasoning training for objective, unbiased responses
  • Advanced RL post-training for enhanced mathematical reasoning and Korean language capabilities

For more details, see the arXiv paper (2507.05686).

License

Apache 2.0 — Same as the base model.


Quantized and released by DLM (Data Science Lab., Ltd.)HuggingFace

Downloads last month
-
Safetensors
Model size
15B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dataslab/DNA-2.0-14B-FP8

Finetuned
Qwen/Qwen3-14B
Quantized
(4)
this model

Paper for dataslab/DNA-2.0-14B-FP8