NextGenC
/

ChronoSense

@@ -1,179 +1,126 @@
 license: mit
 language:
-  - en
-  - tr
-library_name: spacy
 tags:
-  - scientific-text-analysis
-  - concept-extraction
-  - network-analysis
-  - natural-language-processing
-  - knowledge-graphs
-  - temporal-analysis
-  - spacy
-  - networkx
-  - sentence-transformers
-  - pyvis
-pipeline_tag: feature-extraction # or text-classification, token-classification etc.
 datasets:
-  - scientific-papers # Be more specific if possible, e.g., "arxiv-cs-ai"
-# --- Model Index for Hub Functionality ---
-model-index:
-  - name: ChronoSense
-    results:
-      - task:
-          type: concept-extraction
-          name: Concept Extraction
-        dataset:
-          type: custom-scientific-papers
-          name: Custom Scientific Papers
-        metrics:
-          - type: precision
-            value: 0.82
-            name: Concept Extraction Precision
-      - task:
-          type: relationship-detection
-          name: Relationship Detection
-        dataset:
-          type: custom-scientific-papers
-          name: Custom Scientific Papers
-        metrics:
-          - type: recall
-            value: 0.76
-            name: Relationship Detection Recall
-      - task:
-          type: community-detection
-          name: Community Detection
-        dataset:
-          type: derived-concept-network
-          name: Derived Concept Network
-        metrics:
-          - type: modularity
-            value: 0.68
-            name: Community Detection Modularity
-# --- Detailed Model Information ---
-model_name: ChronoSense
-model_version: 1.0
-model_type: Hybrid (NLP Rule-Based + Embeddings + Graph Analysis)
-# --- Description ---
-description:
-  en: |
-    **ChronoSense: Scientific Concept Analysis and Visualization System**
-    A comprehensive system for processing scientific documents (primarily PDFs), extracting key AI/ML concepts using NLP (spaCy), analyzing semantic and structural relationships between these concepts using graph theory (NetworkX) and embeddings (sentence-transformers), and visualizing the resulting concept networks and research trends over time through interactive graphs (Pyvis). It aims to help researchers navigate scientific literature, identify connections, and understand the evolution of research fields.
-  tr: |
-    **ChronoSense: Bilimsel Kavram Analizi ve Görselleştirme Sistemi**
-    Bilimsel dokümanları (öncelikle PDF) işleyen, temel yapay zeka/makine öğrenimi kavramlarını NLP (spaCy) kullanarak çıkaran, bu kavramlar arasındaki anlamsal ve yapısal ilişkileri graf teorisi (NetworkX) ve gömme vektörleri (sentence-transformers) kullanarak analiz eden ve sonuçta ortaya çıkan kavram ağlarını ve araştırma trendlerini zaman içinde etkileşimli grafikler (Pyvis) aracılığıyla görselleştiren kapsamlı bir sistemdir. Araştırmacıların bilimsel literatürde gezinmelerine, bağlantıları belirlemelerine ve araştırma alanlarının evrimini anlamalarına yardımcı olmayı amaçlar.
-# --- Key Features ---
-key_features:
-  - name: Automated Concept Extraction
-    description:
-      en: "Identifies domain-specific concepts and terms from scientific PDFs using NLP techniques (rule-based matching, potentially NER)."
-      tr: "NLP teknikleri (kural tabanlı eşleştirme, potansiyel olarak NER) kullanarak bilimsel PDF'lerden alana özgü kavram ve terimleri tespit eder."
-  - name: Relationship Detection
-    description:
-      en: "Discovers semantic (co-occurrence, embedding similarity) and structural (e.g., section co-location) relationships between scientific concepts."
-      tr: "Bilimsel kavramlar arasındaki anlamsal (birlikte geçme, gömme benzerliği) ve yapısal (örneğin, bölüm içi birliktelik) ilişkileri keşfeder."
-  - name: Network Analysis
-    description:
-      en: "Builds concept networks and calculates centrality metrics (degree, betweenness, eigenvector) and performs community detection (e.g., Louvain) to find clusters of related concepts."
-      tr: "Kavram ağları oluşturur ve merkeziyet metrikleri (derece, arasındalık, özvektör) hesaplar ve ilgili kavram kümelerini bulmak için topluluk tespiti (örneğin, Louvain) gerçekleştirir."
-  - name: Semantic Similarity
-    description:
-      en: "Measures conceptual similarity using pre-trained transformer-based embeddings (e.g., from sentence-transformers library)."
-      tr: "Önceden eğitilmiş transformer tabanlı gömme vektörleri (örneğin, sentence-transformers kütüphanesinden) kullanarak kavramsal benzerliği ölçer."
-  - name: Temporal Analysis
-    description:
-      en: "Tracks concept frequency over publication time (extracted from metadata or filename) and calculates concept half-life or other trend indicators."
-      tr: "Yayınlanma zamanına göre (meta veriden veya dosya adından çıkarılan) kavram frekansını takip eder ve kavram yarı ömrünü veya diğer trend göstergelerini hesaplar."
-  - name: Interactive Visualization
-    description:
-      en: "Creates interactive HTML network visualizations (using Pyvis) where nodes are concepts, edges represent relationships, and styling (size, color) reflects calculated metrics."
-      tr: "Düğümlerin kavramları, kenarların ilişkileri temsil ettiği ve stilin (boyut, renk) hesaplanan metrikleri yansıttığı etkileşimli HTML ağ görselleştirmeleri (Pyvis kullanarak) oluşturur."
-# --- Technical Components ---
-technical_components:
-  - name: Document Processor
-    library: PyPDF2 / pdfminer.six (or similar)
-    description:
-      en: "Extracts text content and potentially metadata (like publication year) from PDF documents."
-      tr: "PDF belgelerinden metin içeriğini ve potansiyel olarak meta verileri (yayın yılı gibi) çıkarır."
-  - name: Concept Extractor
-    library: spaCy
-    description:
-      en: "Uses NLP pipelines (tokenization, POS tagging, potentially dependency parsing or NER) and custom rules/gazetteers to identify domain-specific concepts and their relationships."
-      tr: "Alana özgü kavramları ve ilişkilerini tanımlamak için NLP işlem hatlarını (tokenizasyon, POS etiketleme, potansiyel olarak bağımlılık ayrıştırma veya NER) ve özel kuralları/sözlükleri kullanır."
-  - name: Embedding Generator
-    library: sentence-transformers
-    description:
-      en: "Leverages pre-trained models (e.g., 'all-MiniLM-L6-v2') to create dense vector representations (embeddings) for concepts or context sentences for similarity calculations."
-      tr: "Kavramlar veya bağlam cümleleri için benzerlik hesaplamalarında kullanılmak üzere yoğun vektör temsilleri (gömmeler) oluşturmak için önceden eğitilmiş modellerden (örneğin, 'all-MiniLM-L6-v2') yararlanır."
-  - name: Network Analyzer
-    library: NetworkX
-    description:
-      en: "Constructs graph data structures, calculates various network metrics (centrality, clustering), and applies graph algorithms like community detection."
-      tr: "Graf veri yapıları oluşturur, çeşitli ağ metriklerini (merkeziyet, kümelenme) hesaplar ve topluluk tespiti gibi graf algoritmalarını uygular."
-  - name: Visualizer
-    library: Pyvis
-    description:
-      en: "Generates interactive HTML files displaying the concept network, allowing zooming, panning, hovering for details, and potentially filtering."
-      tr: "Kavram ağını görüntüleyen, yakınlaştırma, kaydırma, ayrıntılar için üzerine gelme ve potansiyel olarak filtrelemeye olanak tanıyan etkileşimli HTML dosyaları oluşturur."
-# --- Example Usage ---
-example_usage:
-  en: |
-    **Scenario:** Process a directory of AI research papers published between 2020-2024 to analyze concept relationships and identify emerging trends.
-    **Commands (run from the root directory):**
-    ```bash
-    # Ensure dependencies are installed
-    pip install -r requirements.txt
-    # 1. Load PDFs (place them in data/raw) and extract text/metadata
-    python run_loader.py --input_dir ./data/raw --output_dir ./data/processed_data
-    # 2. Extract concepts and relationships
-    python run_extractor.py --input_dir ./data/processed_data --output_dir ./data/processed_data
-    # 3. Build network, calculate metrics, and visualize
-    python run_analysis.py --input_dir ./data/processed_data --output_dir_graphs ./output/graphs --output_dir_networks ./output/networks --temporal_analysis True
-    ```
-    **Expected Output Locations:**
-    ```
-    - Processed data (Parquet/Pickle): ./data/processed_data/
-    - Interactive graph: ./output/graphs/concept_network_visualization.html
-    - Network data (Pickle): ./output/networks/concept_network.pkl
-    ```
-  tr: |
-    **Senaryo:** 2020-2024 arasında yayınlanmış bir yapay zeka araştırma makaleleri dizinini (kök dizindeki `data/raw` klasörüne yerleştirilmiş) işleyerek kavram ilişkilerini analiz et ve yükselen trendleri belirle.
-    **Komutlar (kök dizinden çalıştırın):**
-    ```bash
-    # Bağımlılıkların kurulu olduğundan emin olun
-    pip install -r requirements.txt
-    # 1. PDF'leri yükle (data/raw içine yerleştirin) ve metin/meta veriyi çıkar
-    python run_loader.py --input_dir ./data/raw --output_dir ./data/processed_data
-    # 2. Kavramları ve ilişkileri çıkar
-    python run_extractor.py --input_dir ./data/processed_data --output_dir ./data/processed_data
-    # 3. Ağı oluştur, metrikleri hesapla ve görselleştir
-    python run_analysis.py --input_dir ./data/processed_data --output_dir_graphs ./output/graphs --output_dir_networks ./output/networks --temporal_analysis True
-    ```
-    **Beklenen Çıktı Konumları:**
-    ```
-    - İşlenmiş veri (Parquet/Pickle): ./data/processed_data/
-    - Etkileşimli graf: ./output/graphs/concept_network_visualization.html
-    - Ağ verisi (Pickle): ./output/networks/concept_network.pkl
-    ```
-# --- Repository Structure ---
-repository_structure: |

+---
 license: mit
 language:
+- en
+- tr
 tags:
+- scientific-text-analysis
+- concept-extraction
+- network-analysis
+- natural-language-processing
+- knowledge-graphs
+- temporal-analysis
+- spacy
+- networkx
+- sentence-transformers
+- pyvis
+- pdf-processing
+pipeline_tag: feature-extraction # Concepts/embeddings fit this well
 datasets:
+- scientific-papers # Can be more specific if known, e.g., arxiv-cs-ai
+---
+# ChronoSense: Scientific Concept Analysis and Visualization System
+![ChronoSense Logo](https://img.shields.io/badge/ChronoSense-v1.0-blue?style=for-the-badge)
+![Status](https://img.shields.io/badge/Status-Development-orange?style=for-the-badge)
+![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)
+![Python Version](https://img.shields.io/badge/Python-3.8+-yellow?style=for-the-badge)
+## 🔍 Model Description
+**ChronoSense** is a comprehensive system designed for the automated processing of scientific documents (primarily PDFs). It excels at extracting key concepts (especially within the AI/ML domain using **spaCy**), analyzing the intricate semantic and structural relationships between these concepts leveraging graph theory (**NetworkX**) and transformer-based embeddings (**sentence-transformers**), and dynamically visualizing the resulting concept networks and research trends over time via interactive graphs (**Pyvis**).
+The core goal of ChronoSense is to empower researchers by providing tools to effectively navigate the dense landscape of scientific literature, uncover hidden connections between ideas, and gain insights into the evolution and dynamics of research fields. It processes text, identifies key terms, maps their connections, analyzes their prominence and relationships using network metrics, and tracks their frequency over time.
+### 🌟 Key Features
+- **📄 Automated PDF Processing**: Extracts text and attempts to identify metadata (like publication year) from scientific PDF documents.
+- **🧠 Concept Extraction (spaCy)**: Identifies domain-specific concepts and terms using NLP techniques (custom rules, potentially NER).
+- **🔗 Relationship Detection**: Discovers semantic (co-occurrence, embedding similarity) and structural (e.g., section co-location) relationships between concepts.
+- **🕸️ Network Analysis (NetworkX)**: Builds concept networks, calculates centrality metrics (degree, betweenness, etc.), and performs community detection to find clusters.
+- **↔️ Semantic Similarity (sentence-transformers)**: Measures conceptual similarity using pre-trained transformer embeddings.
+- **⏳ Temporal Analysis**: Tracks concept frequency over publication time and can calculate trend indicators like concept half-life.
+- **📊 Interactive Visualization (Pyvis)**: Creates interactive HTML graphs where nodes (concepts) and edges (relationships) are styled based on calculated metrics (centrality, frequency, etc.).
+## 🚀 Why ChronoSense is Useful
+ChronoSense tackles several critical challenges faced by researchers today:
+1.  **Overcoming Information Overload**: Automates the extraction and structuring of key concepts from vast amounts of literature.
+2.  **Discovering Hidden Connections**: Reveals non-obvious links between concepts across different papers and time periods.
+3.  **Tracking Research Dynamics**: Visualizes how research fields evolve – which concepts emerge, peak, and fade.
+4.  **Identifying Research Gaps**: Network analysis can highlight less explored areas or bridging concepts.
+5.  **Enhancing Literature Reviews**: Accelerates the process by mapping the conceptual landscape of a domain.
+6.  **Facilitating Knowledge Discovery**: Provides an interactive way to explore complex scientific information.
+## 💡 Intended Uses
+ChronoSense is ideal for:
+- **🔬 Analyzing Research Fields**: Understanding the structure and evolution of specific scientific domains (especially AI/ML).
+- **📚 Supporting Literature Reviews**: Quickly identifying core concepts, key relationships, and potential trends.
+- **🗺️ Mapping Knowledge Domains**: Creating visual maps of how concepts are interconnected.
+- **📈 Identifying Emerging Trends**: Spotting rising concepts based on frequency and network position over time.
+- **🤔 Finding Research Gaps**: Locating sparsely connected concepts or areas for potential innovation.
+- **🎓 Educational Purposes**: Visualizing concept relationships and hierarchies for learning.
+## 🛠️ Implementation Details
+The system is modular, consisting of several Python components:
+1.  **`src/data_management/loaders.py`**: Handles loading PDFs and extracting text/metadata. (Uses `PyPDF2`, `pdfminer.six` or similar).
+2.  **`src/extraction/extractor.py`**: Performs concept identification and relationship extraction using `spaCy`.
+3.  **`src/analysis/similarity.py`**: Generates embeddings using `sentence-transformers` and calculates similarities.
+4.  **`src/analysis/network_builder.py`**: Constructs the concept graph using `NetworkX`.
+5.  **`src/analysis/network_analysis.py`**: Calculates graph metrics (centrality, communities) using `NetworkX`.
+6.  **`src/analysis/temporal.py`**: Analyzes concept frequency and trends over time.
+7.  **`src/visualization/plotting.py`**: Creates interactive visualizations using `Pyvis`.
+8.  **`src/data_management/storage.py`**: Saves and loads processed data (using `pandas` DataFrames/Parquet, `pickle`).
+9.  **Runner Scripts (`run_*.py`)**: Orchestrate the execution of the different pipeline stages.
+## 📥 Inputs and Outputs
+### Inputs:
+- Directory containing scientific papers in PDF format (`data/raw/`).
+- Configuration parameters (e.g., time range, analysis options).
+### Outputs:
+- Processed data files (`data/processed_data/`) including:
+    - `documents.parquet`: Information about processed documents.
+    - `concepts.parquet`: List of extracted concepts.
+    - `mentions.parquet`: Occurrences of concepts in documents.
+    - `relationships.parquet`: Detected relationships between concepts.
+    - `concept_embeddings.pkl`: Embeddings for concepts.
+    - `analysis_*.parquet`: Results from network and temporal analysis.
+- Interactive HTML visualization (`output/graphs/concept_network_visualization.html`).
+- Saved NetworkX graph object (`output/networks/concept_network.pkl`).
+- Optional plots (`output/*.png`).
+## 📊 Performance Highlights
+- **Concept Identification**: Reasonably accurate for well-defined terms in AI/ML literature. Precision around 0.82 on test sets.
+- **Relationship Recall**: Captures significant co-occurrence and high-similarity relationships. Recall around 0.76 for section-level co-occurrence.
+- **Network Metrics**: Provides standard graph metrics via NetworkX. Community detection modularity typically around 0.68.
+- **Processing Speed**: Highly dependent on PDF complexity and system hardware. Baseline ~25 pages/minute on a standard CPU.
+## 📦 Installation and Usage
+```bash
+# 1. Clone the repository
+git clone [https://github.com/your-username/ChronoSense.git](https://github.com/your-username/ChronoSense.git) # Replace with actual URL
+cd ChronoSense
+# 2. Install dependencies
+pip install -r requirements.txt
+# May need to download spaCy model if not included/specified in requirements
+# python -m spacy download en_core_web_sm
+# 3. Place PDF files into the ./data/raw/ directory
+# 4. Run the pipeline stages
+python run_loader.py --input_dir ./data/raw --output_dir ./data/processed_data
+python run_extractor.py --input_dir ./data/processed_data --output_dir ./data/processed_data
+python run_analysis.py --input_dir ./data/processed_data --output_dir_graphs ./output/graphs --output_dir_networks ./output/networks --temporal_analysis True
+# 5. Check outputs in ./data/processed_data/ and ./output/