|
|
--- |
|
|
tags: |
|
|
- text-generation |
|
|
- gpt2 |
|
|
- language-modeling |
|
|
- academic |
|
|
library_name: transformers |
|
|
license: mit |
|
|
datasets: |
|
|
- arxiv |
|
|
metrics: |
|
|
- perplexity |
|
|
--- |
|
|
|
|
|
# AcademicAbstractGenerator: DistilGPT2 Fine-tuned for Scientific Text |
|
|
|
|
|
## π Overview |
|
|
|
|
|
This model is a fine-tuned version of **DistilGPT2**, optimized for the task of generating short, high-quality, and structurally consistent academic abstract drafts. It has been trained exclusively on a corpus of abstracts from arXiv, focusing on fields like Computer Science and Physics. |
|
|
|
|
|
## π€ Model Architecture |
|
|
|
|
|
The model utilizes the **GPT-2** decoder-only transformer architecture, offering efficiency and speed due to the Distil model's reduced size. |
|
|
|
|
|
* **Base Model:** `distilgpt2` (a distilled, smaller version of GPT-2). |
|
|
* **Architecture:** Decoder-only transformer stack. |
|
|
* **Layers:** 6 transformer layers. |
|
|
* **Task:** Causal Language Modeling (Text Generation). |
|
|
* **Training Objective:** Minimizing the perplexity on academic text, enabling it to better capture formal structure, complex vocabulary, and typical flow of scientific summaries (Introduction -> Method -> Result -> Conclusion). |
|
|
|
|
|
## π― Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
1. **Drafting:** Assisting researchers in generating initial abstract drafts for new papers. |
|
|
2. **Ideation:** Exploring potential research directions by prompting the model with a topic sentence. |
|
|
3. **Educational Purposes:** Learning about generative model capabilities in a specialized domain. |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
* **Factuality:** The model is a text generator, not a knowledge base. Generated content may contain plausible-sounding but **factually incorrect** claims or results. **Human review is mandatory.** |
|
|
* **Length:** Due to its base architecture and training data, it performs best on short sequences (under 256 tokens). |
|
|
* **Overfitting:** May occasionally repeat boilerplate phrases common in academic writing. |
|
|
|
|
|
## π» Example Code |
|
|
|
|
|
Use the `TextGenerationPipeline` for drafting abstracts: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline, set_seed |
|
|
|
|
|
set_seed(42) |
|
|
|
|
|
# Load the model and tokenizer |
|
|
generator = pipeline('text-generation', model='[YOUR_HF_USERNAME]/AcademicAbstractGenerator') |
|
|
|
|
|
prompt = "We propose a novel attention mechanism for transformer models that significantly improves training efficiency." |
|
|
|
|
|
# Generate a 150-token abstract draft |
|
|
output = generator( |
|
|
prompt, |
|
|
max_length=150, |
|
|
num_return_sequences=1, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
truncation=True |
|
|
) |
|
|
|
|
|
print(output[0]['generated_text']) |