NextGenC commited on
Commit
5ed226f
·
verified ·
1 Parent(s): 64b5d29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -118,6 +118,74 @@ The system is modular, consisting of several Python components:
118
  - **Visualization: Customize graph appearance in src/visualization/plotting.py.**
119
  - **Data Storage: Modify src/data_management/storage.py to use different formats or databases.**
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  ## 📁 Project Structure
122
 
123
  ```bash
 
118
  - **Visualization: Customize graph appearance in src/visualization/plotting.py.**
119
  - **Data Storage: Modify src/data_management/storage.py to use different formats or databases.**
120
 
121
+ ## 🚧 Limitations
122
+
123
+ - **Language**
124
+ Optimized for English. Performance may degrade significantly on other languages.
125
+
126
+ - **Domain Specificity**
127
+ Achieves best results in AI/ML domains. Adaptation (e.g., domain-specific rules or keywords) is required for other fields.
128
+
129
+ - **PDF Quality**
130
+ Heavily reliant on clean text extraction. Scanned PDFs, complex layouts, or poor OCR significantly reduce accuracy.
131
+
132
+ - **Scalability**
133
+ Processing very large corpora (e.g., >10,000 papers) may require significant computational resources or distributed infrastructure.
134
+
135
+ - **Relationship Nuance**
136
+ Relationships are extracted based on co-occurrence and semantic similarity. Logical or causal connections may not be captured.
137
+
138
+ - **Temporal Accuracy**
139
+ Depends on accurate publication date extraction from metadata or filenames. Errors may affect timeline analysis.
140
+
141
+ - **Visualization Clutter**
142
+ Interactive graph visualizations become cluttered and less interpretable when node count exceeds ~1000.
143
+
144
+ ---
145
+
146
+ ## 🌱 Future Work
147
+
148
+ - **Multi-language Support**
149
+ Integration of multilingual NLP models to support non-English documents.
150
+
151
+ - **Citation Integration**
152
+ Incorporating citation links and citation graph data into network analysis.
153
+
154
+ - **ML-based Extraction**
155
+ Training supervised or semi-supervised models to improve concept and relation extraction quality.
156
+
157
+ - **Advanced Visualizations**
158
+ Implementation of timeline views, dashboards, and alternative graph layouts (e.g., hierarchical, clustered).
159
+
160
+ - **Improved Temporal Modeling**
161
+ Use of advanced time-series techniques to detect emerging trends and historical shifts.
162
+
163
+ - **Web Interface**
164
+ A user-friendly UI for uploading documents, viewing visualizations, and downloading results.
165
+
166
+ - **Knowledge Graph Export**
167
+ Export capabilities for standard knowledge graph formats like RDF, OWL, or JSON-LD.
168
+
169
+ - **Concept Disambiguation**
170
+ Methods to differentiate between identically named but contextually distinct concepts.
171
+
172
+ ---
173
+
174
+ ## 📋 Citation
175
+
176
+ If you use **ChronoSense** in your research or projects, please cite the following:
177
+
178
+ ```bibtex
179
+ @software{chronosense2025,
180
+ author = {Abdullah Kocaman (Zayn)},
181
+ title = {ChronoSense: Scientific Concept Analysis and Visualization System},
182
+ year = {2025},
183
+ version = {1.0},
184
+ url = {https://huggingface.co/NextGenC/ChronoSense},
185
+ note = {A system for extracting, analyzing, and visualizing concepts and trends from scientific documents using NLP and Network Analysis}
186
+ }
187
+
188
+
189
  ## 📁 Project Structure
190
 
191
  ```bash