ft-Qwen3-8B-5epoch

Fine-tuned Qwen3-8B for Research Question Generation & Literature Review Synthesis Based on MIS / Finance / Economics academic corpora

Overview

ft-Qwen3-8B-5epoch is a domain-specialized language model fine-tuned on a curated corpus of academic papers in Management Information Systems (MIS), Finance, and Economics. The model is optimized for:

generating high-quality research questions (RQ)
synthesizing literature-review-style texts
identifying gaps in academic literature
performing structured, coherent academic reasoning

This model follows the pipeline used in the LLM Literature Review framework and is trained with professional paper metadata and literature review sections collected from high-quality journals (e.g., QJE, JF, JFE, ISR). It is designed for researchers who need automated support for early-stage idea discovery, project formulation, or academic writing.

Key Features

Specialized for Social Science Academic Writing Optimized on MIS/Finance/Econ paper metadata and literature reviews.
High-quality Research Question Generation Model is evaluated using RQSim, a metric measuring semantic similarity between generated and actual research questions in out-of-sample papers.
Literature Gap Identification The model reads metadata of referenced papers and proposes research questions addressing missing links, unresolved tensions, and opportunities for future work.
64K Context Window Suitable for large batches of paper abstracts/metadata.
Deterministic, Organized Output Produces structured lists of RQs following common academic conventions.

Training Data

Training data consist of the following components for each academic paper:

Instruction prompt
The paper’s literature review section
The paper’s research question
Metadata of referenced papers:
- title
- abstract
- author names
- publication year

All metadata are collected through Semantic Scholar’s API. The dataset covers pre-2020 papers for training and post-2020 papers for evaluation.

Evaluation & RQSim Benchmark

To evaluate the model’s ability in research question generation, we use RQSim, a cosine-similarity-based benchmark comparing generated questions with actual RQs from out-of-sample papers.

Higher RQSim indicates higher semantic convergence with real academic research questions.

The benchmark shows that the fine-tuned Qwen3-8B significantly outperforms foundation models without domain adaptation.

Intended Use

Recommended Use Cases

Generating research questions from a set of paper abstracts
Producing literature review drafts
Identifying gaps, inconsistencies, and open problems
Supporting academic workflow in MIS / Finance / Economics
Assisting early-stage idea generation and project formulation

Example Prompt

The following is a list of academic papers' metadata. We call this list 'Literature'.
Each paper includes: title, abstract, authors, year.

Using this Literature, identify gaps and propose research questions.
Your output should only contain numbered research questions.

Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...

Limitations

Does not access full paper texts, only metadata and literature reviews.
Specialized for social science domains; performance in unrelated fields may degrade.
Does not include reinforcement learning stage (PPO) yet; this model is from the supervised fine-tuning stage only.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ScorpioDai/ft-Qwen3-8B-5epoch

Base model

Qwen/Qwen3-8B-Base

Finetuned

(376)

this model