ft-Qwen3-8B-5epoch

Fine-tuned Qwen3-8B for Research Question Generation & Literature Review Synthesis Based on MIS / Finance / Economics academic corpora

Overview

ft-Qwen3-8B-5epoch is a domain-specialized language model fine-tuned on a curated corpus of academic papers in Management Information Systems (MIS), Finance, and Economics. The model is optimized for:

  • generating high-quality research questions (RQ)
  • synthesizing literature-review-style texts
  • identifying gaps in academic literature
  • performing structured, coherent academic reasoning

This model follows the pipeline used in the LLM Literature Review framework and is trained with professional paper metadata and literature review sections collected from high-quality journals (e.g., QJE, JF, JFE, ISR). It is designed for researchers who need automated support for early-stage idea discovery, project formulation, or academic writing.


Key Features

  • Specialized for Social Science Academic Writing Optimized on MIS/Finance/Econ paper metadata and literature reviews.

  • High-quality Research Question Generation Model is evaluated using RQSim, a metric measuring semantic similarity between generated and actual research questions in out-of-sample papers.

  • Literature Gap Identification The model reads metadata of referenced papers and proposes research questions addressing missing links, unresolved tensions, and opportunities for future work.

  • 64K Context Window Suitable for large batches of paper abstracts/metadata.

  • Deterministic, Organized Output Produces structured lists of RQs following common academic conventions.


Training Data

Training data consist of the following components for each academic paper:

  • Instruction prompt

  • The paper’s literature review section

  • The paper’s research question

  • Metadata of referenced papers:

    • title
    • abstract
    • author names
    • publication year

All metadata are collected through Semantic Scholar’s API. The dataset covers pre-2020 papers for training and post-2020 papers for evaluation.


Evaluation & RQSim Benchmark

To evaluate the model’s ability in research question generation, we use RQSim, a cosine-similarity-based benchmark comparing generated questions with actual RQs from out-of-sample papers.

Higher RQSim indicates higher semantic convergence with real academic research questions.

The benchmark shows that the fine-tuned Qwen3-8B significantly outperforms foundation models without domain adaptation.


Intended Use

Recommended Use Cases

  • Generating research questions from a set of paper abstracts
  • Producing literature review drafts
  • Identifying gaps, inconsistencies, and open problems
  • Supporting academic workflow in MIS / Finance / Economics
  • Assisting early-stage idea generation and project formulation

Example Prompt

The following is a list of academic papers' metadata. We call this list 'Literature'.
Each paper includes: title, abstract, authors, year.

Using this Literature, identify gaps and propose research questions.
Your output should only contain numbered research questions.

Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...

Limitations

  • Does not access full paper texts, only metadata and literature reviews.
  • Specialized for social science domains; performance in unrelated fields may degrade.
  • Does not include reinforcement learning stage (PPO) yet; this model is from the supervised fine-tuning stage only.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ScorpioDai/ft-Qwen3-8B-5epoch

Finetuned
(376)
this model