flashresearch
/

FlashResearch-4B-Thinking

Model card Files Files and versions

FlashResearch-4B-Thinking / README.md

sumo43's picture

Update README.md

c4e1fe0 verified 2 months ago

|

history blame contribute delete

3.23 kB

	---
	license: mit
	datasets:
	- cheapresearch/CheapResearch-DS-33k
	---



	# FlashResearch-4B-Thinking

	<img src='cheap.png' width='700'>

	[![Model](https://img.shields.io/badge/HF-Model-blue)](https://huggingface.co/your-username/your-model-name)
	[![License](https://img.shields.io/badge/License-Apache--2.0-green)](#license)
	[![Dataset](https://img.shields.io/badge/Dataset-CheapResearch--DS--33k-orange)](https://huggingface.co/datasets/cheapresearch/CheapResearch-DS-33k)

	A 4B-parameter Qwen model distilled from Tongyi DeepResearch-30B A3B, optimized for web-scale “deep research” tasks and inference with [Alibaba-NLP/DeepResearch](https://github.com/Alibaba-NLP/DeepResearch).

	* Base: Qwen 4B (dense)
	* Teacher: Tongyi DeepResearch 30B A3B (MoE)
	* Method: SFT distillation on 33k curated deep-research examples
	* Dataset: [`flashresearch/FlashResearch-DS-33k`](https://huggingface.co/datasets/cheapresearch/CheapResearch-DS-33k)
	* Primary Use: Fast, low-cost DeepResearch agent runs (browsing, multi-step reasoning, source-grounded answers)

	## Evaluation

	<img src='hle.png' width='500'>
	<img src='simpleqa.png' width='500'>

	## Training Data

	* Primary dataset: [`flashresearch/FlashResearch-DS-33k`](https://huggingface.co/datasets/flashresearch/FlashResearch-DS-33k)

	## Inference with Alibaba-NLP/DeepResearch (Recommended)

	This model is intended to be used directly with the DeepResearch repo.

	### 1) Install & set up

	```bash
	git clone https://github.com/Alibaba-NLP/DeepResearch
	cd DeepResearch
	# Create env (example)
	python -m venv .venv && source .venv/bin/activate
	pip install -e . # or pip install -r requirements.txt if provided
	```

	### 2) Point DeepResearch to this model

	Edit the config to add this model

	```bash
	MODEL_PATH=flashresearch/FlashResearch-4B-Thinking
	```

	### Hardware notes

	* Single 12–16GB GPU is enough for 4B FP16; FP8/INT4 quantization allows smaller VRAM. If you quantize, the summary model can be local as well.



	## Acknowledgements

	* Qwen team for the base 4B architecture
	* Alibaba-NLP for DeepResearch
	* CheapResearch contributors for the 33k dataset

	---

	## Citation

	If you use this model, please cite:

	```bibtex
	@software{cheapresearch_thinking_2025,
	title = {CheapResearch 4B Thinking},
	author = {Artem Y.},
	year = {2025},
	url = {https://huggingface.co/flashresearch/FlashResearch-4B-Thinking}
	}
	```

	And the dataset:

	```bibtex
	@dataset{cheapresearch_ds_33k,
	title = {CheapResearch-DS-33k},
	author = {Artem Y.},
	year = {2025},
	url = {https://huggingface.co/datasets/flashresearch/FlashResearch-DS-33k}
	}
	```

	---

	## Changelog

	* v1.0.0 (2025-10-04) — First public release (33k distillation, DeepResearch-ready)



	### Model Card Metadata (Hugging Face)

	```yaml
	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- qwen
	- deep-research
	- browsing
	- citation
	- reasoning
	- distillation
	- agent
	- vllm
	- cheapresearch
	datasets:
	- flashresearch/FlashResearch-DS-33k
	base_model:
	- Qwen/Qwen3-4B-Thinking-2507
	model-index:
	- name: FlashResearch-4B-Thinking
	results: []
	---
	```