| | --- |
| | base_model: |
| | - meta-llama/Meta-Llama-3-8B |
| | datasets: |
| | - IgnoraZ/SynthQuestions |
| | language: |
| | - en |
| | license: cc-by-4.0 |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Model Card for Model ID |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| | This is the model from the paper **From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding**. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| | - **Model type:** Chat Model |
| | - **Language(s) (NLP):** English |
| | - **License:** CC-BY-4.0 |
| | - **Finetuned from model:** LLaMA-3-8B |
| | - **Finetuned with data:** 1M dataset from `IgnoraZ/SynthQuestions` |
| |
|
| | For more details like hyper-parameters, please refer to our paper. |
| |
|
| | ### Model Sources |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** https://github.com/Ignoramus0817/SynthQuestions |
| | - **Paper:** https://www.arxiv.org/abs/2506.03968 |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | This is a model in HF format, which can be deployed with common inference frameworks like Transformers, vLLM, SGLang and so on. |
| |
|
| | We finetuned it with custom chat template instead of the default one from LLaMA. **Please make sure to use the chat template in the `tokenizer_config.json` when inferring.** |
| | |
| | ## Evaluation |
| | |
| | <!-- This section describes the evaluation protocols and provides the results. --> |
| | |
| | ### Alignment Benchmarks |
| | | Model | Arena Hard (WR%) | Alpaca Eval 2.0 (LC) | |
| | | :------------: | :--------------: | :------------------: | |
| | | SynthQuestions | 15.4 | 18.87 | |
| | |
| | |
| | ### Closed-form Benchmarks |
| | | Model | IFEVAL | MMLU | ARC-C | GPQA | GSM8K | MATH | |
| | | :------------: | :----: | :---: | :---: | :--: | :---: | :---: | |
| | | SynthQuestions | 57.05 | 65.79 | 63.92 | 30.3 | 70.53 | 22.71 | |
| | |
| | ## Citation |
| | |
| | <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| | |
| | ``` |
| | @misc{zhu2025realsyntheticsynthesizingmillions, |
| | title={From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding}, |
| | author={Chiwei Zhu and Benfeng Xu and Xiaorui Wang and Zhendong Mao}, |
| | year={2025}, |
| | eprint={2506.03968}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2506.03968}, |
| | } |
| | ``` |
| | |
| | ## Model Card Contact |
| | |
| | Please contact tanz@mail.ustc.edu.cn. |