David-A-Reiss's picture
Update README.md
0ffcf24 verified
|
raw
history blame
2.49 kB
metadata
license: mit
library_name: transformers
base_model:
  - deepseek-ai/DeepSeek-V3-0324
  - deepseek-ai/DeepSeek-R1
  - deepseek-ai/DeepSeek-R1-0528
pipeline_tag: text-generation

DeepSeek-TNG-R1T2-Chimera

TNG Logo


Model Merge of DeepSeek-R1-0528, DeepSeek-R1 and DeepSeek-V3-0324

An open weights model combining the intelligence of R1-0528 and R1 with the token efficiency of V3.

For details on the construction process, which is an extension of that for the original Chimera model, please read our paper.

Paper on arXiV | Announcement on X | LinkedIn post)

Model Details

  • Architecture: DeepSeek-MoE transformer-based language model
  • Combination Method: Merged model weights from DeepSeek-R1-0528, DeepSeek-R1 and DeepSeek-V3-0324
  • Release Date: 2025-07-0x

Use, Out-of-scope Use, Limitations, Risks, Recommendations et al.

Regarding R1T2-Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.

These guidelines are available here on Hugging Face.

Contact

Citation

@misc{tng_technology_consulting_gmbh_2025_07_0x,
    author       = { TNG Technology Consulting GmbH },
    title        = { DeepSeek-TNG-R1T2-Chimera },
    year         = 2025,
    month        = { April },
    url          = { https://huggingface.co/tngtech/DeepSeek-TNG-R1T2-Chimera },
    doi          = { xxx },
    publisher    = { Hugging Face }
}