ZoeDuan commited on
Commit
8da36bd
·
verified ·
1 Parent(s): 7e79479

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - keyword-extraction
6
+ - research-papers
7
+ - t5
8
+ - text-generation
9
+ - academic
10
+ datasets:
11
+ - custom
12
+ widget:
13
+ - text: "extract keywords: Deep Learning for Computer Vision Applications"
14
+ example_title: "Computer Vision Example"
15
+ - text: "extract keywords: Quantum Machine Learning for Drug Discovery"
16
+ example_title: "Quantum Computing Example"
17
+ - text: "extract keywords: Blockchain Technology for Supply Chain Management"
18
+ example_title: "Blockchain Example"
19
+ ---
20
+
21
+ # Research Paper Keyword Extractor
22
+
23
+ ## Model Description
24
+
25
+ This is a fine-tuned T5-small model specifically trained for extracting keywords from research paper titles. The model takes a research paper title as input and generates relevant keywords that capture the main topics, methodologies, and application domains.
26
+
27
+ ## Training Data
28
+
29
+ - **Total Training Examples**: 35
30
+ - **Validation Examples**: 9
31
+ - **Data Sources**: Manual curation + synthetic generation
32
+ - **Domains Covered**: Computer Science, Healthcare, Physics, Engineering, Mathematics, Biology, and more
33
+
34
+ ## Training Configuration
35
+
36
+ - **Base Model**: t5-small
37
+ - **Epochs**: 3
38
+ - **Batch Size**: 2
39
+ - **Learning Rate**: 0.0005
40
+ - **Max Input Length**: 96 tokens
41
+ - **Max Output Length**: 48 tokens
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
47
+
48
+ # Load model and tokenizer
49
+ tokenizer = T5Tokenizer.from_pretrained("ZoeDuan/research-keyword-extractor")
50
+ model = T5ForConditionalGeneration.from_pretrained("ZoeDuan/research-keyword-extractor")
51
+
52
+ def extract_keywords(title):
53
+ input_text = f"extract keywords: {title}"
54
+ input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=96).input_ids
55
+
56
+ outputs = model.generate(
57
+ input_ids,
58
+ max_length=48,
59
+ num_beams=4,
60
+ no_repeat_ngram_size=2,
61
+ early_stopping=True,
62
+ do_sample=True,
63
+ temperature=0.8
64
+ )
65
+
66
+ keywords = tokenizer.decode(outputs[0], skip_special_tokens=True)
67
+ return keywords
68
+
69
+ # Example usage
70
+ title = "Machine Learning for Natural Language Processing Applications"
71
+ keywords = extract_keywords(title)
72
+ print(keywords)
73
+ # Expected output: Machine Learning, Natural Language Processing, NLP, AI, Text Processing
74
+ ```
75
+
76
+ ## Example Predictions
77
+
78
+ | Input Title | Generated Keywords |
79
+ |-------------|-------------------|
80
+ | Deep Learning for Computer Vision Applications | Deep Learning, Computer Vision, Neural Networks, AI, Image Processing |
81
+ | Quantum Computing in Cryptography and Security | Quantum Computing, Cryptography, Security, Quantum Algorithms, Cybersecurity |
82
+ | IoT and Edge Computing for Smart Cities | IoT, Edge Computing, Smart Cities, Internet of Things, Urban Technology |
83
+
84
+ ## Model Performance
85
+
86
+ The model has been trained on diverse research domains and can extract:
87
+ - **Technical methodologies** (e.g., Machine Learning, Deep Learning)
88
+ - **Application domains** (e.g., Healthcare, Finance)
89
+ - **Specific technologies** (e.g., Transformer, CNN, Blockchain)
90
+ - **Research areas** (e.g., Computer Vision, NLP)
91
+
92
+ ## Limitations
93
+
94
+ - Optimized for research paper titles in English
95
+ - May not perform well on highly specialized or emerging domains not covered in training
96
+ - Best performance on titles between 5-15 words
97
+ - May occasionally generate overlapping or redundant keywords
98
+
99
+ ## License
100
+
101
+ This model is released under the Apache 2.0 license.
102
+
103
+ ## Citation
104
+
105
+ If you use this model in your research, please cite:
106
+
107
+ ```
108
+ @misc{research-keyword-extractor,
109
+ title={Research Paper Keyword Extractor},
110
+ author={Zoe Duan},
111
+ year={2025},
112
+ url={https://huggingface.co/ZoeDuan/research-keyword-extractor}
113
+ }
114
+ ```