ft-Qwen3-8B-5epoch
Fine-tuned Qwen3-8B for Research Question Generation & Literature Review Synthesis Based on MIS / Finance / Economics academic corpora
Overview
ft-Qwen3-8B-5epoch is a domain-specialized language model fine-tuned on a curated corpus of academic papers in Management Information Systems (MIS), Finance, and Economics. The model is optimized for:
- generating high-quality research questions (RQ)
- synthesizing literature-review-style texts
- identifying gaps in academic literature
- performing structured, coherent academic reasoning
This model follows the pipeline used in the LLM Literature Review framework and is trained with professional paper metadata and literature review sections collected from high-quality journals (e.g., QJE, JF, JFE, ISR). It is designed for researchers who need automated support for early-stage idea discovery, project formulation, or academic writing.
Key Features
Specialized for Social Science Academic Writing Optimized on MIS/Finance/Econ paper metadata and literature reviews.
High-quality Research Question Generation Model is evaluated using RQSim, a metric measuring semantic similarity between generated and actual research questions in out-of-sample papers.
Literature Gap Identification The model reads metadata of referenced papers and proposes research questions addressing missing links, unresolved tensions, and opportunities for future work.
64K Context Window Suitable for large batches of paper abstracts/metadata.
Deterministic, Organized Output Produces structured lists of RQs following common academic conventions.
Training Data
Training data consist of the following components for each academic paper:
Instruction prompt
The paper’s literature review section
The paper’s research question
Metadata of referenced papers:
- title
- abstract
- author names
- publication year
All metadata are collected through Semantic Scholar’s API. The dataset covers pre-2020 papers for training and post-2020 papers for evaluation.
Evaluation & RQSim Benchmark
To evaluate the model’s ability in research question generation, we use RQSim, a cosine-similarity-based benchmark comparing generated questions with actual RQs from out-of-sample papers.
Higher RQSim indicates higher semantic convergence with real academic research questions.
The benchmark shows that the fine-tuned Qwen3-8B significantly outperforms foundation models without domain adaptation.
Intended Use
Recommended Use Cases
- Generating research questions from a set of paper abstracts
- Producing literature review drafts
- Identifying gaps, inconsistencies, and open problems
- Supporting academic workflow in MIS / Finance / Economics
- Assisting early-stage idea generation and project formulation
Example Prompt
The following is a list of academic papers' metadata. We call this list 'Literature'.
Each paper includes: title, abstract, authors, year.
Using this Literature, identify gaps and propose research questions.
Your output should only contain numbered research questions.
Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...
Limitations
- Does not access full paper texts, only metadata and literature reviews.
- Specialized for social science domains; performance in unrelated fields may degrade.
- Does not include reinforcement learning stage (PPO) yet; this model is from the supervised fine-tuning stage only.
- Downloads last month
- 2
Model tree for ScorpioDai/ft-Qwen3-8B-5epoch
Base model
Qwen/Qwen3-8B-Base