Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- text-generation
|
| 4 |
+
- gpt2
|
| 5 |
+
- language-modeling
|
| 6 |
+
- academic
|
| 7 |
+
library_name: transformers
|
| 8 |
+
license: mit
|
| 9 |
+
datasets:
|
| 10 |
+
- arxiv
|
| 11 |
+
metrics:
|
| 12 |
+
- perplexity
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# AcademicAbstractGenerator: DistilGPT2 Fine-tuned for Scientific Text
|
| 16 |
+
|
| 17 |
+
## 📑 Overview
|
| 18 |
+
|
| 19 |
+
This model is a fine-tuned version of **DistilGPT2**, optimized for the task of generating short, high-quality, and structurally consistent academic abstract drafts. It has been trained exclusively on a corpus of abstracts from arXiv, focusing on fields like Computer Science and Physics.
|
| 20 |
+
|
| 21 |
+
## 🤖 Model Architecture
|
| 22 |
+
|
| 23 |
+
The model utilizes the **GPT-2** decoder-only transformer architecture, offering efficiency and speed due to the Distil model's reduced size.
|
| 24 |
+
|
| 25 |
+
* **Base Model:** `distilgpt2` (a distilled, smaller version of GPT-2).
|
| 26 |
+
* **Architecture:** Decoder-only transformer stack.
|
| 27 |
+
* **Layers:** 6 transformer layers.
|
| 28 |
+
* **Task:** Causal Language Modeling (Text Generation).
|
| 29 |
+
* **Training Objective:** Minimizing the perplexity on academic text, enabling it to better capture formal structure, complex vocabulary, and typical flow of scientific summaries (Introduction -> Method -> Result -> Conclusion).
|
| 30 |
+
|
| 31 |
+
## 🎯 Intended Use
|
| 32 |
+
|
| 33 |
+
This model is intended for:
|
| 34 |
+
1. **Drafting:** Assisting researchers in generating initial abstract drafts for new papers.
|
| 35 |
+
2. **Ideation:** Exploring potential research directions by prompting the model with a topic sentence.
|
| 36 |
+
3. **Educational Purposes:** Learning about generative model capabilities in a specialized domain.
|
| 37 |
+
|
| 38 |
+
## ⚠️ Limitations
|
| 39 |
+
|
| 40 |
+
* **Factuality:** The model is a text generator, not a knowledge base. Generated content may contain plausible-sounding but **factually incorrect** claims or results. **Human review is mandatory.**
|
| 41 |
+
* **Length:** Due to its base architecture and training data, it performs best on short sequences (under 256 tokens).
|
| 42 |
+
* **Overfitting:** May occasionally repeat boilerplate phrases common in academic writing.
|
| 43 |
+
|
| 44 |
+
## 💻 Example Code
|
| 45 |
+
|
| 46 |
+
Use the `TextGenerationPipeline` for drafting abstracts:
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from transformers import pipeline, set_seed
|
| 50 |
+
|
| 51 |
+
set_seed(42)
|
| 52 |
+
|
| 53 |
+
# Load the model and tokenizer
|
| 54 |
+
generator = pipeline('text-generation', model='[YOUR_HF_USERNAME]/AcademicAbstractGenerator')
|
| 55 |
+
|
| 56 |
+
prompt = "We propose a novel attention mechanism for transformer models that significantly improves training efficiency."
|
| 57 |
+
|
| 58 |
+
# Generate a 150-token abstract draft
|
| 59 |
+
output = generator(
|
| 60 |
+
prompt,
|
| 61 |
+
max_length=150,
|
| 62 |
+
num_return_sequences=1,
|
| 63 |
+
temperature=0.7,
|
| 64 |
+
do_sample=True,
|
| 65 |
+
truncation=True
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
print(output[0]['generated_text'])
|