OpenMOSS-Team
/

SciJudge-4B

Text Generation

scientific-evaluation

citation-prediction

preference-learning

text-generation-inference

Model card Files Files and versions

SciJudge-4B / README.md

lkdhy's picture

Update README.md

f7a0340 verified 3 days ago

|

history blame contribute delete

2.51 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: Qwen/Qwen3-4B-Instruct-2507
	tags:
	- scientific-evaluation
	- citation-prediction
	- preference-learning
	- GRPO
	pipeline_tag: text-generation
	library_name: transformers
	---

	# SciJudge-Qwen3-4B

	SciJudge-Qwen3-4B is a fine-tuned language model for scientific paper evaluation. Given two academic papers' metadata (title, abstract, publication date), it predicts which paper has a higher citation count — serving as a proxy for assessing research impact and "scientific taste."

	This model is part of the paper: [AI Can Learn Scientific Taste](https://arxiv.org/abs/2603.14473).

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "OpenMOSS-Team/SciJudge-4B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")

	messages = [
	{"role": "system", "content": "You are a helpful assistant. You first think about the reasoning process in your mind and then provide the user with the answer."},
	{"role": "user", "content": "Today is 2025-12-10. Based on the titles, abstracts, and publication dates of the following two papers A and B, determine which paper has a higher citation count.\nShow your reasoning process in <reason> </reason> tags. And return the final answer in <answer> </answer> tags. The final answer should contain only 'A' or 'B'.\n\nPaper A:\nTitle: ...\nAbstract: ...\nDate: ...\n\nPaper B:\nTitle: ...\nAbstract: ...\nDate: ..."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20)
	response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
	print(response)
	```

	## Training Details

	- Base model: Qwen3-4B-Instruct-2507
	- Training method: GRPO (Generative Reward Policy Optimization) with DAPO loss
	- Training data: 720,341 preference pairs from arXiv papers
	- Learning rate: 8e-7 (cosine schedule, 5% warmup)
	- Batch size: 8 per device × 64 GPUs × 2 gradient accumulation = 1024 effective
	- Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
	- Precision: bfloat16
	- KL coefficient (β): 0.03

	## Citation

	```bibtex
	@article{scijudge2025,
	title={AI Can Learn Scientific Taste},
	year={2025}
	}
	```