Tasfiya025 commited on
Commit
17ed93b
·
verified ·
1 Parent(s): 923bffa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-generation
4
+ - gpt2
5
+ - language-modeling
6
+ - academic
7
+ library_name: transformers
8
+ license: mit
9
+ datasets:
10
+ - arxiv
11
+ metrics:
12
+ - perplexity
13
+ ---
14
+
15
+ # AcademicAbstractGenerator: DistilGPT2 Fine-tuned for Scientific Text
16
+
17
+ ## 📑 Overview
18
+
19
+ This model is a fine-tuned version of **DistilGPT2**, optimized for the task of generating short, high-quality, and structurally consistent academic abstract drafts. It has been trained exclusively on a corpus of abstracts from arXiv, focusing on fields like Computer Science and Physics.
20
+
21
+ ## 🤖 Model Architecture
22
+
23
+ The model utilizes the **GPT-2** decoder-only transformer architecture, offering efficiency and speed due to the Distil model's reduced size.
24
+
25
+ * **Base Model:** `distilgpt2` (a distilled, smaller version of GPT-2).
26
+ * **Architecture:** Decoder-only transformer stack.
27
+ * **Layers:** 6 transformer layers.
28
+ * **Task:** Causal Language Modeling (Text Generation).
29
+ * **Training Objective:** Minimizing the perplexity on academic text, enabling it to better capture formal structure, complex vocabulary, and typical flow of scientific summaries (Introduction -> Method -> Result -> Conclusion).
30
+
31
+ ## 🎯 Intended Use
32
+
33
+ This model is intended for:
34
+ 1. **Drafting:** Assisting researchers in generating initial abstract drafts for new papers.
35
+ 2. **Ideation:** Exploring potential research directions by prompting the model with a topic sentence.
36
+ 3. **Educational Purposes:** Learning about generative model capabilities in a specialized domain.
37
+
38
+ ## ⚠️ Limitations
39
+
40
+ * **Factuality:** The model is a text generator, not a knowledge base. Generated content may contain plausible-sounding but **factually incorrect** claims or results. **Human review is mandatory.**
41
+ * **Length:** Due to its base architecture and training data, it performs best on short sequences (under 256 tokens).
42
+ * **Overfitting:** May occasionally repeat boilerplate phrases common in academic writing.
43
+
44
+ ## 💻 Example Code
45
+
46
+ Use the `TextGenerationPipeline` for drafting abstracts:
47
+
48
+ ```python
49
+ from transformers import pipeline, set_seed
50
+
51
+ set_seed(42)
52
+
53
+ # Load the model and tokenizer
54
+ generator = pipeline('text-generation', model='[YOUR_HF_USERNAME]/AcademicAbstractGenerator')
55
+
56
+ prompt = "We propose a novel attention mechanism for transformer models that significantly improves training efficiency."
57
+
58
+ # Generate a 150-token abstract draft
59
+ output = generator(
60
+ prompt,
61
+ max_length=150,
62
+ num_return_sequences=1,
63
+ temperature=0.7,
64
+ do_sample=True,
65
+ truncation=True
66
+ )
67
+
68
+ print(output[0]['generated_text'])