Spaces:

echoboi
/

ai_systematic_lit_review

Sleeping

App Files Files Community

echoboi commited on Sep 13, 2025

Commit

df06239

verified ·

1 Parent(s): c99d04d

Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

Dockerfile +30 -0
README.md +43 -13
app.py +2106 -0
database/collections/W1607201421.pkl +3 -0
database/collections/W2774003070.pkl +3 -0
database/collections/W3200878735.pkl +3 -0
database/filters/W2774003070__filter__talks_about_just_transitions_in_global_s__20250909_224951.pkl +3 -0
database/filters/W3200878735__filter__talks_about_just_transitions_in_global_s__20250909_225708.pkl +3 -0
requirements.txt +7 -0
templates/index.html +1667 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,30 @@

+# Use lightweight Python base
+FROM python:3.10-slim
+# Prevent Python from writing .pyc files and buffering stdout/stderr
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1
+# Create app directory
+WORKDIR /app
+# System deps for pandas/openpyxl and builds
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    gcc \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first to leverage Docker cache
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy the rest of the source code
+COPY . .
+# Expose the port the Space will provide via $PORT
+ENV PORT=7860
+# Use gunicorn to serve Flask app
+# Hugging Face Spaces expects the container to listen on 0.0.0.0:$PORT
+CMD exec gunicorn --bind 0.0.0.0:$PORT --workers 2 --timeout 180 paper_analysis_backend:app

README.md CHANGED Viewed

@@ -1,13 +1,43 @@
----
-title: Ai Systematic Lit Review
-emoji: 🏃
-colorFrom: gray
-colorTo: blue
-sdk: gradio
-sdk_version: 5.45.0
-app_file: app.py
-pinned: false
-short_description: ai_scientist
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: AI Systematic Literature Review
+emoji: 🧪
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.0.0
+pinned: false
+license: mit
+app_port: 7860
+---
+# AI Systematic Literature Review
+An intelligent tool for conducting systematic literature reviews using OpenAlex API and AI-powered paper filtering.
+## Features
+- **🔍 Search Papers**: Find academic papers by title using OpenAlex API
+- **📚 Collect Papers**: Gather related papers (cited, citing, and related) from a seed paper
+- **🔬 Filter Papers**: Use AI to filter collected papers based on your research question
+- **📁 Database Management**: View and manage your collections and filters
+- **📊 Export Data**: Export results to BibTeX format
+## How to Use
+1. **Search Papers**: Enter a paper title to find papers in OpenAlex
+2. **Collect Papers**: Use a Work ID to collect related papers (cited, citing, and related)
+3. **Filter Papers**: Use AI to filter collected papers based on your research question
+4. **Database Files**: View all your collections and filters
+5. **Export Data**: Export your results to BibTeX format
+## Setup
+To use AI filtering, you need to set your OpenAI API key as a secret in the Space settings.
+## Technical Details
+- Built with Gradio for the user interface
+- Uses OpenAlex API for paper discovery and collection
+- Integrates with OpenAI API for intelligent paper filtering
+- Automatically saves collections and filters for reuse
+- Respects OpenAlex rate limits

app.py ADDED Viewed

	@@ -0,0 +1,2106 @@

+import gradio as gr
+import requests
+import json
+import time
+import pandas as pd
+from typing import Dict, List, Optional
+import pickle
+import os
+import sys
+import threading
+import tempfile
+import shutil
+from datetime import datetime
+import timeit
+from tqdm import tqdm
+# Define 'toc' function once
+def toc(start_time):
+    elapsed = timeit.default_timer() - start_time
+    print(elapsed)
+# Record start time
+start_time = timeit.default_timer()
+# Helper function to get all pages
+def get_all_pages(url, headers, upper_limit=None):
+    all_results = []
+    unique_ids = set()  # Track unique paper IDs
+    page = 1
+    processing_times = []  # Track time taken per paper
+    # Get first page to get total count
+    first_response = requests.get(f"{url}&page={page}", headers=headers)
+    if first_response.status_code != 200:
+        return []
+    data = first_response.json()
+    total_count = data.get('meta', {}).get('count', 0)
+    start_time = time.time()
+    # Add only unique papers from first page
+    for result in data.get('results', []):
+        if result.get('id') not in unique_ids:
+            unique_ids.add(result.get('id'))
+            all_results.append(result)
+            if upper_limit and len(all_results) >= upper_limit:
+                return all_results
+    papers_processed = len(all_results)
+    time_taken = time.time() - start_time
+    if papers_processed > 0:
+        processing_times.append(time_taken / papers_processed)
+    # Continue getting remaining pages until we have all papers
+    target_count = min(total_count, upper_limit) if upper_limit else total_count
+    pbar = tqdm(total=target_count, desc="Retrieving papers",
+                initial=len(all_results), unit="papers")
+    while len(all_results) < total_count:
+        page += 1
+        page_start_time = time.time()
+        paged_url = f"{url}&page={page}"
+        response = requests.get(paged_url, headers=headers)
+        if response.status_code != 200:
+            print(f"Error retrieving page {page}: {response.status_code}")
+            break
+        data = response.json()
+        results = data.get('results', [])
+        if not results:
+            break
+        # Add only unique papers from this page
+        new_papers = 0
+        for result in results:
+            if result.get('id') not in unique_ids:
+                unique_ids.add(result.get('id'))
+                all_results.append(result)
+                new_papers += 1
+                if upper_limit and len(all_results) >= upper_limit:
+                    pbar.update(new_papers)
+                    pbar.close()
+                    return all_results
+        # Update processing times and estimated time remaining
+        if new_papers > 0:
+            time_taken = time.time() - page_start_time
+            processing_times.append(time_taken / new_papers)
+            avg_time_per_paper = sum(processing_times) / len(processing_times)
+            papers_remaining = target_count - len(all_results)
+            est_time_remaining = papers_remaining * avg_time_per_paper
+            pbar.set_postfix({'Est. Time Remaining': f'{est_time_remaining:.1f}s'})
+        pbar.update(new_papers)
+        # Add a small delay to respect rate limits
+        time.sleep(1)
+    pbar.close()
+    return all_results
+def get_related_papers(work_id, upper_limit=None, progress_callback=None):
+    # Define base URL for OpenAlex API
+    base_url = "https://api.openalex.org/works"
+    work_query = f"/{work_id}"  # OpenAlex work IDs can be used directly in path
+    work_url = base_url + work_query
+    # Add email to be a polite API user
+    headers = {'User-Agent': 'LowAI (chowdhary@iiasa.ac.at)'}
+    response = requests.get(work_url, headers=headers)
+    print(response)
+    if response.status_code == 200:
+        paper = response.json()  # For direct work queries, the response is the paper object
+        paper_id = paper['id']
+        # Use referenced_works field on the seed work directly for cited papers
+        referenced_ids = paper.get('referenced_works', []) or []
+        print("\nTotal counts:")
+        print(f"Cited (referenced_works) count: {len(referenced_ids)}")
+        def fetch_works_by_ids(ids, chunk_size=50):
+            results = []
+            seen = set()
+            total_chunks = (len(ids) + chunk_size - 1) // chunk_size
+            for i in range(0, len(ids), chunk_size):
+                chunk = ids[i:i+chunk_size]
+                # Build ids filter: ids.openalex:ID1|ID2|ID3
+                ids_filter = '|'.join(chunk)
+                url = f"{base_url}?filter=ids.openalex:{ids_filter}&per-page=200"
+                resp = requests.get(url, headers=headers)
+                if resp.status_code != 200:
+                    print(f"Error fetching IDs chunk {i//chunk_size+1}: {resp.status_code}")
+                    continue
+                data = resp.json()
+                for r in data.get('results', []):
+                    rid = r.get('id')
+                    if rid and rid not in seen:
+                        seen.add(rid)
+                        results.append(r)
+                # Update progress for cited papers (0-30%)
+                if progress_callback:
+                    progress = int(30 * (i // chunk_size + 1) / total_chunks)
+                    progress_callback(progress, f"Fetching cited papers... {len(results)} found")
+                time.sleep(1)  # be polite to API
+                if upper_limit and len(results) >= upper_limit:
+                    return results[:upper_limit]
+            return results
+        print("\nRetrieving cited papers via referenced_works IDs...")
+        cited_papers = fetch_works_by_ids(referenced_ids)
+        print(f"Found {len(cited_papers)} unique cited papers")
+        # Count citing papers (works that cite the seed), then paginate to collect all
+        citing_count_url = f"{base_url}?filter=cites:{work_id}&per-page=1"
+        citing_count = requests.get(citing_count_url, headers=headers).json().get('meta', {}).get('count', 0)
+        print(f"Citing papers: {citing_count}")
+        # Get all citing papers with pagination
+        print("\nRetrieving citing papers (paginated)...")
+        page = 1
+        citing_papers = []
+        unique_ids = set()
+        target = citing_count if not upper_limit else min(upper_limit, citing_count)
+        from tqdm import tqdm
+        pbar = tqdm(total=target, desc="Retrieving citing papers", unit="papers")
+        while len(citing_papers) < target:
+            paged_url = f"{base_url}?filter=cites:{work_id}&per-page=200&sort=publication_date:desc&page={page}"
+            resp = requests.get(paged_url, headers=headers)
+            if resp.status_code != 200:
+                print(f"Error retrieving citing page {page}: {resp.status_code}")
+                break
+            data = resp.json()
+            results = data.get('results', [])
+            if not results:
+                break
+            new = 0
+            for r in results:
+                rid = r.get('id')
+                if rid and rid not in unique_ids:
+                    unique_ids.add(rid)
+                    citing_papers.append(r)
+                    new += 1
+                    if len(citing_papers) >= target:
+                        break
+            # Update progress for citing papers (30-70%)
+            if progress_callback:
+                progress = 30 + int(40 * len(citing_papers) / target)
+                progress_callback(progress, f"Fetching citing papers... {len(citing_papers)} found")
+            pbar.update(new)
+            page += 1
+            time.sleep(1)
+        pbar.close()
+        print(f"Found {len(citing_papers)} unique citing papers")
+        # Get all related papers
+        print("\nRetrieving related papers...")
+        related_url = f"{base_url}?filter=related_to:{work_id}&per-page=200&sort=publication_date:desc"
+        related_papers = get_all_pages(related_url, headers, upper_limit)
+        print(f"Found {len(related_papers)} unique related papers")
+        # Update progress for related papers (70-90%)
+        if progress_callback:
+            progress_callback(70, f"Fetching related papers... {len(related_papers)} found")
+        # Create sets of IDs for quick lookup
+        cited_ids = {paper['id'] for paper in cited_papers}
+        citing_ids = {paper['id'] for paper in citing_papers}
+        # Print some debug information
+        print(f"\nDebug Information:")
+        print(f"Seed paper ID: {paper_id}")
+        print(f"Number of unique cited papers: {len(cited_ids)}")
+        print(f"Number of unique citing papers: {len(citing_ids)}")
+        print(f"Number of papers in both sets: {len(cited_ids.intersection(citing_ids))}")
+        # Update progress for processing (90-95%)
+        if progress_callback:
+            progress_callback(90, "Processing and deduplicating papers...")
+        # Combine all papers and remove duplicates while tracking relationship
+        all_papers = cited_papers + citing_papers + related_papers
+        seen_titles = set()
+        unique_papers = []
+        for paper in all_papers:
+            title = paper.get('title', '')
+            if title not in seen_titles:
+                seen_titles.add(title)
+                # Add relationship type
+                if paper['id'] in cited_ids:
+                    paper['relationship'] = 'cited'
+                elif paper['id'] in citing_ids:
+                    paper['relationship'] = 'citing'
+                else:
+                    paper['relationship'] = 'related'
+                unique_papers.append(paper)
+        # Final progress update
+        if progress_callback:
+            progress_callback(100, f"Collection completed! Found {len(unique_papers)} unique papers")
+        return unique_papers
+    else:
+        print(f"Error retrieving seed paper: {response.status_code}")
+        return []
+import requests
+import json
+from typing import Dict, List, Optional
+from openai import OpenAI
+import concurrent.futures
+import threading
+import time
+def analyze_paper_relevance(content: Dict[str, str], research_question: str, api_key: str) -> Optional[Dict]:
+    """Analyze if a paper is relevant to the research question using GPT-5 mini."""
+    client = OpenAI(api_key=api_key)
+    title = content.get('title', '')
+    abstract = content.get('abstract', '')
+    has_abstract = bool(abstract and abstract.strip())
+    if has_abstract:
+        prompt = f"""
+        Research Question: {research_question}
+        Paper Title: {title}
+        Paper Abstract: {abstract}
+        Analyze this paper and determine:
+        1. Is this paper highly relevant to answering the research question?
+        2. What are the main aims/objectives of this paper?
+        3. What are the key takeaways or findings?
+        Return ONLY a valid JSON object in this exact format:
+        {{
+            "relevant": true/false,
+            "relevance_reason": "brief explanation of why it is/isn't relevant",
+            "aims_of_paper": "main objectives of the paper",
+            "key_takeaways": "key findings or takeaways"
+        }}
+        """
+    else:
+        prompt = f"""
+        Research Question: {research_question}
+        Paper Title: {title}
+        Note: No abstract is available for this paper.
+        Analyze this paper based on the title only and determine:
+        1. Is this paper likely to be relevant to answering the research question based on the title?
+        Return ONLY a valid JSON object in this exact format:
+        {{
+            "relevant": true/false,
+            "relevance_reason": "brief explanation of why it is/isn't relevant based on title"
+        }}
+        """
+    try:
+        # Try GPT-5 mini first, fallback to gpt-4o-mini if it fails
+        try:
+            response = client.responses.create(
+                model="gpt-5-nano",
+                input=prompt,
+                reasoning={"effort": "minimal"},
+                text={"verbosity": "low"}
+            )
+        except Exception as e:
+            print(f"GPT-5 nano failed, trying gpt-4o-mini: {e}")
+            response = client.chat.completions.create(
+                model="gpt-4o-mini",
+                messages=[{
+                    "role": "user",
+                    "content": prompt
+                }],
+                max_completion_tokens=1000
+            )
+        # Handle different response formats
+        if hasattr(response, 'choices') and response.choices:
+            # Old format (chat completions)
+            result = response.choices[0].message.content
+        elif hasattr(response, 'output'):
+            # New format (responses) - extract text from output
+            result = ""
+            for item in response.output:
+                if hasattr(item, "content") and item.content:
+                    for content in item.content:
+                        if hasattr(content, "text") and content.text:
+                            result += content.text
+        else:
+            print("Unexpected response format")
+            return None
+        if not result:
+            print("Empty response from GPT")
+            return None
+        # Clean and parse the JSON response
+        result = result.strip()
+        if result.startswith("```json"):
+            result = result[7:]
+        if result.endswith("```"):
+            result = result[:-3]
+        # Try to parse JSON
+        try:
+            return json.loads(result.strip())
+        except json.JSONDecodeError as e:
+            print(f"Failed to parse JSON response: {e}")
+            print(f"Raw response: {result[:200]}...")
+            return None
+    except Exception as e:
+        print(f"Error in GPT analysis: {str(e)}")
+        return None
+def extract_abstract_from_inverted_index(inverted_index: Dict) -> str:
+    """Extract abstract text from inverted index format."""
+    if not inverted_index:
+        return ""
+    words = []
+    for word, positions in inverted_index.items():
+        for pos in positions:
+            while len(words) <= pos:
+                words.append('')
+            words[pos] = word
+    return ' '.join(words).strip()
+def analyze_single_paper(paper: Dict, research_question: str, api_key: str) -> Optional[Dict]:
+    """Analyze a single paper with its own client."""
+    try:
+        client = OpenAI(api_key=api_key)
+        # Extract title and abstract
+        title = paper.get('title', '')
+        abstract = extract_abstract_from_inverted_index(paper.get('abstract_inverted_index', {}))
+        if not title and not abstract:
+            return None
+        # Create content for analysis
+        content = {
+            'title': title,
+            'abstract': abstract
+        }
+        # Analyze with GPT
+        analysis = analyze_paper_relevance_with_client(content, research_question, client)
+        if analysis:
+            paper['gpt_analysis'] = analysis
+            paper['relevance_reason'] = analysis.get('relevance_reason', 'Analysis completed')
+            paper['relevance_score'] = analysis.get('relevant', False)
+            return paper
+        return None
+    except Exception as e:
+        print(f"Error analyzing paper: {e}")
+        return None
+def analyze_paper_batch(papers_batch: List[Dict], research_question: str, api_key: str, batch_id: int) -> List[Dict]:
+    """Analyze a batch of papers in parallel using ThreadPoolExecutor."""
+    results = []
+    # Use ThreadPoolExecutor to process papers in parallel within the batch
+    with concurrent.futures.ThreadPoolExecutor(max_workers=len(papers_batch)) as executor:
+        # Submit all papers for parallel processing
+        future_to_paper = {
+            executor.submit(analyze_single_paper, paper, research_question, api_key): paper
+            for paper in papers_batch
+        }
+        # Collect results as they complete
+        for future in concurrent.futures.as_completed(future_to_paper):
+            try:
+                result = future.result()
+                if result:
+                    results.append(result)
+            except Exception as e:
+                print(f"Error in parallel analysis: {e}")
+                continue
+    return results
+def analyze_paper_relevance_with_client(content: Dict[str, str], research_question: str, client: OpenAI) -> Optional[Dict]:
+    """Analyze if a paper is relevant to the research question using provided client."""
+    title = content.get('title', '')
+    abstract = content.get('abstract', '')
+    prompt = f"""
+    Research Question: {research_question}
+    Paper Title: {title}
+    Paper Abstract: {abstract or 'No abstract available'}
+    Analyze this paper and determine:
+    1. Is this paper highly relevant to answering the research question?
+    2. What are the main aims/objectives of this paper?
+    3. What are the key takeaways or findings?
+    Return ONLY a valid JSON object in this exact format:
+    {{
+        "relevant": true/false,
+        "relevance_reason": "brief explanation of why it is/isn't relevant",
+        "aims_of_paper": "main objectives of the paper",
+        "key_takeaways": "key findings or takeaways"
+    }}
+    """
+    try:
+        # Try GPT-5 nano first, fallback to gpt-4o-mini if it fails
+        try:
+            response = client.responses.create(
+                model="gpt-5-nano",
+                input=prompt,
+                reasoning={"effort": "minimal"},
+                text={"verbosity": "low"}
+            )
+        except Exception as e:
+            response = client.chat.completions.create(
+                model="gpt-4o-mini",
+                messages=[{
+                    "role": "user",
+                    "content": prompt
+                }],
+                max_completion_tokens=1000
+            )
+        # Handle different response formats
+        if hasattr(response, 'choices') and response.choices:
+            # Old format (chat completions)
+            result = response.choices[0].message.content
+        elif hasattr(response, 'output'):
+            # New format (responses) - extract text from output
+            result = ""
+            for item in response.output:
+                if hasattr(item, "content") and item.content:
+                    for content in item.content:
+                        if hasattr(content, "text") and content.text:
+                            result += content.text
+        else:
+            return None
+        if not result:
+            return None
+        # Clean and parse the JSON response
+        result = result.strip()
+        if result.startswith("```json"):
+            result = result[7:]
+        if result.endswith("```"):
+            result = result[:-3]
+        # Try to parse JSON
+        try:
+            return json.loads(result.strip())
+        except json.JSONDecodeError:
+            return None
+    except Exception as e:
+        return None
+def filter_papers_for_research_question(papers: List[Dict], research_question: str, api_key: str, limit: int = 10) -> List[Dict]:
+    """Analyze exactly 'limit' number of papers for relevance using parallel processing."""
+    if not papers or not research_question:
+        return []
+    # Sort papers by publication date (most recent first)
+    sorted_papers = sorted(papers, key=lambda x: x.get('publication_date', ''), reverse=True)
+    # Take only the first 'limit' papers for analysis
+    papers_to_analyze = sorted_papers[:limit]
+    print(f"Analyzing {len(papers_to_analyze)} papers for relevance to: {research_question}")
+    # Process all papers in parallel (no batching needed for small numbers)
+    all_results = []
+    with concurrent.futures.ThreadPoolExecutor(max_workers=min(limit, 20)) as executor:
+        # Submit all papers for parallel processing
+        future_to_paper = {
+            executor.submit(analyze_single_paper, paper, research_question, api_key): paper
+            for paper in papers_to_analyze
+        }
+        # Collect results as they complete
+        completed = 0
+        for future in concurrent.futures.as_completed(future_to_paper):
+            try:
+                result = future.result()
+                completed += 1
+                if result:
+                    all_results.append(result)
+                print(f"Completed {completed}/{len(papers_to_analyze)} papers")
+            except Exception as e:
+                print(f"Error in parallel analysis: {e}")
+                completed += 1
+    # Sort by publication date again (most recent first)
+    all_results.sort(key=lambda x: x.get('publication_date', ''), reverse=True)
+    print(f"Analysis complete. Processed {len(all_results)} papers.")
+    return all_results
+import requests
+import re
+import html
+# Try to import BeautifulSoup, fallback to simple parsing if not available
+try:
+    from bs4 import BeautifulSoup
+    HAS_BS4 = True
+except ImportError:
+    HAS_BS4 = False
+    print("BeautifulSoup not available, using simple HTML parsing")
+# Global progress tracking
+progress_data = {}
+# Configuration: read from environment (set in HF Space Secrets)
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "").strip()
+if not OPENAI_API_KEY:
+    print("[WARN] OPENAI_API_KEY is not set. Set it in Space Settings → Secrets.")
+# Global progress tracking
+progress_data = {}
+# Determine script directory and robust project root
+SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+ROOT_DIR = os.path.dirname(SCRIPT_DIR) if os.path.basename(SCRIPT_DIR) == "code" else SCRIPT_DIR
+# Ensure we can import helper modules (prefer repo root; fallback to ./code)
+CODE_DIR_CANDIDATE = os.path.join(ROOT_DIR, "code")
+CODE_DIR = CODE_DIR_CANDIDATE if os.path.isdir(CODE_DIR_CANDIDATE) else ROOT_DIR
+if CODE_DIR not in sys.path:
+    sys.path.insert(0, CODE_DIR)
+# Database directories: prefer repo-root `database/` when present; fallback to CODE_DIR/database
+DATABASE_DIR_ROOT = os.path.join(ROOT_DIR, "database")
+DATABASE_DIR = DATABASE_DIR_ROOT if os.path.isdir(DATABASE_DIR_ROOT) else os.path.join(CODE_DIR, "database")
+COLLECTION_DB_DIR = os.path.join(DATABASE_DIR, "collections")
+FILTER_DB_DIR = os.path.join(DATABASE_DIR, "filters")
+# Ensure database directories exist
+os.makedirs(COLLECTION_DB_DIR, exist_ok=True)
+os.makedirs(FILTER_DB_DIR, exist_ok=True)
+def ensure_db_dirs() -> None:
+    """Ensure database directories exist (safe to call anytime)."""
+    try:
+        os.makedirs(COLLECTION_DB_DIR, exist_ok=True)
+        os.makedirs(FILTER_DB_DIR, exist_ok=True)
+    except Exception:
+        pass
+# Robust HTTP headers for publisher sites
+DEFAULT_HTTP_HEADERS = {
+    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0 Safari/537.36',
+    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
+    'Accept-Language': 'en-US,en;q=0.9',
+    'Cache-Control': 'no-cache',
+}
+def _http_get(url: str, timeout: int = 15) -> Optional[requests.Response]:
+    try:
+        resp = requests.get(url, headers=DEFAULT_HTTP_HEADERS, timeout=timeout, allow_redirects=True)
+        return resp
+    except Exception as e:
+        print(f"HTTP GET failed for {url}: {e}")
+        return None
+def fetch_abstract_from_doi(doi: str) -> Optional[str]:
+    """Fetch abstract/highlights from a DOI URL with a robust, layered strategy."""
+    if not doi:
+        return None
+    # Normalize DOI
+    doi_clean = doi.replace('https://doi.org/', '').strip()
+    # 1) Crossref (fast, sometimes JATS)
+    try:
+        text = fetch_from_crossref(doi_clean)
+        if text and len(text) > 50:
+            return text
+    except Exception as e:
+        print(f"Crossref fetch failed: {e}")
+    # 2) Fetch target HTML via doi.org redirect
+    try:
+        start_url = f"https://doi.org/{doi_clean}"
+        resp = _http_get(start_url, timeout=15)
+        if not resp or resp.status_code >= 400:
+            return None
+        html_text = resp.text or ''
+        final_url = getattr(resp, 'url', start_url)
+        print(f"Resolved DOI to: {final_url}")
+        # Parse with robust pipeline
+        parsed = robust_extract_abstract(html_text)
+        if parsed and len(parsed) > 50:
+            return parsed
+    except Exception as e:
+        print(f"DOI HTML fetch failed: {e}")
+    # 3) PubMed placeholder (extendable)
+    try:
+        text = fetch_from_pubmed(doi_clean)
+        if text and len(text) > 50:
+            return text
+    except Exception:
+        pass
+    return None
+def fetch_from_crossref(doi: str) -> Optional[str]:
+    """Fetch abstract from Crossref API."""
+    try:
+        url = f"https://api.crossref.org/works/{doi}"
+        response = _http_get(url, timeout=12)
+        if response.status_code == 200:
+            data = response.json()
+            if 'message' in data:
+                message = data['message']
+                # Check for abstract or highlights (case insensitive)
+                for key in message:
+                    if key.lower() in ['abstract', 'highlights'] and message[key]:
+                        raw = str(message[key])
+                        # Crossref sometimes returns JATS/XML; strip tags and unescape entities
+                        text = re.sub(r'<[^>]+>', ' ', raw)
+                        text = html.unescape(re.sub(r'\s+', ' ', text)).strip()
+                        return text
+    except Exception:
+        pass
+    return None
+def fetch_from_doi_org(doi: str) -> Optional[str]:
+    """Legacy wrapper kept for API compatibility; now uses robust pipeline."""
+    try:
+        url = f"https://doi.org/{doi}"
+        resp = _http_get(url, timeout=15)
+        if not resp or resp.status_code >= 400:
+            return None
+        return robust_extract_abstract(resp.text or '')
+    except Exception:
+        return None
+def extract_from_preloaded_state_bruteforce(content: str) -> Optional[str]:
+    """Extract abstract from window.__PRELOADED_STATE__ using brace matching and fallbacks."""
+    try:
+        start_idx = content.find('window.__PRELOADED_STATE__')
+        if start_idx == -1:
+            return None
+        # Find the first '{' after the equals sign
+        eq_idx = content.find('=', start_idx)
+        if eq_idx == -1:
+            return None
+        brace_idx = content.find('{', eq_idx)
+        if brace_idx == -1:
+            return None
+        # Brace matching to find the matching closing '}'
+        depth = 0
+        end_idx = -1
+        for i in range(brace_idx, min(len(content), brace_idx + 5_000_000)):
+            ch = content[i]
+            if ch == '{': depth += 1
+            elif ch == '}':
+                depth -= 1
+                if depth == 0:
+                    end_idx = i
+                    break
+        if end_idx == -1:
+            return None
+        json_str = content[brace_idx:end_idx+1]
+        try:
+            data = json.loads(json_str)
+        except Exception as e:
+            # Try to relax by removing trailing commas and control chars
+            cleaned = re.sub(r',\s*([}\]])', r'\1', json_str)
+            cleaned = re.sub(r'\u0000', '', cleaned)
+            try:
+                data = json.loads(cleaned)
+            except Exception as e2:
+                print(f"Failed to parse preloaded JSON: {e2}")
+                return None
+        # Same traversal as before
+        if isinstance(data, dict) and 'abstracts' in data and isinstance(data['abstracts'], dict) and 'content' in data['abstracts']:
+            abstracts = data['abstracts']['content']
+            if isinstance(abstracts, list):
+                for abstract_item in abstracts:
+                    if isinstance(abstract_item, dict) and '$$' in abstract_item and abstract_item.get('#name') == 'abstract':
+                        class_name = abstract_item.get('$', {}).get('class', '')
+                        for section in abstract_item.get('$$', []):
+                            if isinstance(section, dict) and section.get('#name') == 'abstract-sec':
+                                section_text = extract_text_from_abstract_section(section)
+                                section_highlights = extract_highlights_from_section(section)
+                                if section_text and len(section_text.strip()) > 50:
+                                    return clean_text(section_text)
+                                if section_highlights and len(section_highlights.strip()) > 50:
+                                    return clean_text(section_highlights)
+                        if 'highlight' in class_name.lower():
+                            highlights_text = extract_highlights_from_abstract_item(abstract_item)
+                            if highlights_text and len(highlights_text.strip()) > 50:
+                                return clean_text(highlights_text)
+        return None
+    except Exception as e:
+        print(f"Error extracting from preloaded state (bruteforce): {e}")
+        return None
+def extract_from_json_ld(content: str) -> Optional[str]:
+    """Parse JSON-LD script tags and extract abstract/description if present."""
+    if not HAS_BS4:
+        return None
+    try:
+        soup = BeautifulSoup(content, 'html.parser')
+        for script in soup.find_all('script', type='application/ld+json'):
+            try:
+                data = json.loads(script.string or '{}')
+            except Exception:
+                continue
+            candidates = []
+            if isinstance(data, dict):
+                candidates.append(data)
+            elif isinstance(data, list):
+                candidates.extend([d for d in data if isinstance(d, dict)])
+            for obj in candidates:
+                for key in ['abstract', 'description']:
+                    if key in obj and obj[key]:
+                        text = clean_text(str(obj[key]))
+                        if len(text) > 50:
+                            return text
+        return None
+    except Exception as e:
+        print(f"Error extracting from JSON-LD: {e}")
+        return None
+def clean_text(s: str) -> str:
+    s = html.unescape(s)
+    s = re.sub(r'\s+', ' ', s)
+    return s.strip()
+def extract_from_meta_tags(soup) -> Optional[str]:
+    try:
+        # Common meta carriers of abstract-like summaries
+        candidates = []
+        # OpenGraph description
+        og = soup.find('meta', attrs={'property': 'og:description'})
+        if og and og.get('content'):
+            candidates.append(og['content'])
+        # Twitter description
+        tw = soup.find('meta', attrs={'name': 'twitter:description'})
+        if tw and tw.get('content'):
+            candidates.append(tw['content'])
+        # Dublin Core description
+        dc = soup.find('meta', attrs={'name': 'dc.description'})
+        if dc and dc.get('content'):
+            candidates.append(dc['content'])
+        # citation_abstract
+        cit_abs = soup.find('meta', attrs={'name': 'citation_abstract'})
+        if cit_abs and cit_abs.get('content'):
+            candidates.append(cit_abs['content'])
+        # Fallback: any meta description
+        desc = soup.find('meta', attrs={'name': 'description'})
+        if desc and desc.get('content'):
+            candidates.append(desc['content'])
+        # Clean and return the longest meaningful candidate
+        candidates = [clean_text(c) for c in candidates if isinstance(c, str)]
+        candidates.sort(key=lambda x: len(x), reverse=True)
+        for text in candidates:
+            if len(text) > 50:
+                return text
+        return None
+    except Exception:
+        return None
+def robust_extract_abstract(html_text: str) -> Optional[str]:
+    """Layered extraction over raw HTML: preloaded-state, JSON-LD, meta tags, DOM, regex."""
+    if not html_text:
+        return None
+    # 1) ScienceDirect/Elsevier preloaded state (brace-matched)
+    try:
+        txt = extract_from_preloaded_state_bruteforce(html_text)
+        if txt and len(txt) > 50:
+            return clean_text(txt)
+    except Exception:
+        pass
+    # 2) JSON-LD
+    try:
+        txt = extract_from_json_ld(html_text)
+        if txt and len(txt) > 50:
+            return clean_text(txt)
+    except Exception:
+        pass
+    # 3) BeautifulSoup-based DOM extraction (meta + selectors + heading-sibling)
+    if HAS_BS4:
+        try:
+            soup = BeautifulSoup(html_text, 'html.parser')
+            # meta first
+            meta_txt = extract_from_meta_tags(soup)
+            if meta_txt and len(meta_txt) > 50:
+                return clean_text(meta_txt)
+            # selector scan
+            selectors = [
+                'div.abstract', 'div.Abstract', 'div.ABSTRACT',
+                'div[class*="abstract" i]', 'div[class*="Abstract" i]',
+                'section.abstract', 'section.Abstract', 'section.ABSTRACT',
+                'div[data-testid="abstract" i]', 'div[data-testid="Abstract" i]',
+                'div.article-abstract', 'div.article-Abstract',
+                'div.abstract-content', 'div.Abstract-content',
+                'div.highlights', 'div.Highlights', 'div.HIGHLIGHTS',
+                'div[class*="highlights" i]', 'div[class*="Highlights" i]',
+                'section.highlights', 'section.Highlights', 'section.HIGHLIGHTS',
+                'div[data-testid="highlights" i]', 'div[data-testid="Highlights" i]'
+            ]
+            for css in selectors:
+                node = soup.select_one(css)
+                if node:
+                    t = clean_text(node.get_text(' ', strip=True))
+                    if len(t) > 50:
+                        return t
+            # headings near Abstract/Highlights
+            for tag in soup.find_all(['h1','h2','h3','h4','h5','h6','strong','b']):
+                try:
+                    title = (tag.get_text() or '').strip().lower()
+                    if 'abstract' in title or 'highlights' in title:
+                        blocks = []
+                        sib = tag
+                        steps = 0
+                        while sib and steps < 20:
+                            sib = sib.find_next_sibling()
+                            steps += 1
+                            if not sib: break
+                            if sib.name in ['p','div','section','article','ul','ol']:
+                                blocks.append(sib.get_text(' ', strip=True))
+                        joined = clean_text(' '.join(blocks))
+                        if len(joined) > 50:
+                            return joined
+                except Exception:
+                    continue
+        except Exception:
+            pass
+    # 4) Regex fallback
+    try:
+        patterns = [
+            r'<div[^>]*class="[^\"]*(?:abstract|Abstract|ABSTRACT|highlights|Highlights|HIGHLIGHTS)[^\"]*"[^>]*>(.*?)</div>',
+            r'<section[^>]*class="[^\"]*(?:abstract|Abstract|ABSTRACT|highlights|Highlights|HIGHLIGHTS)[^\"]*"[^>]*>(.*?)</section>',
+            r'<div[^>]*data-testid="(?:abstract|Abstract|highlights|Highlights)"[^>]*>(.*?)</div>'
+        ]
+        for pat in patterns:
+            for m in re.findall(pat, html_text, re.DOTALL | re.IGNORECASE):
+                t = clean_text(re.sub(r'<[^>]+>', ' ', m))
+                if len(t) > 50:
+                    return t
+    except Exception:
+        pass
+    return None
+def extract_text_from_abstract_section(section: dict) -> str:
+    """Extract text content from abstract section structure."""
+    try:
+        text_parts = []
+        if '$$' in section:
+            for item in section['$$']:
+                if isinstance(item, dict):
+                    # Direct text content from simple-para
+                    if item.get('#name') == 'simple-para' and '_' in item:
+                        text_parts.append(item['_'])
+                    # Also check for para elements
+                    elif item.get('#name') == 'para' and '_' in item:
+                        text_parts.append(item['_'])
+                    # Recursively extract from nested structure
+                    elif '$$' in item:
+                        nested_text = extract_text_from_abstract_section(item)
+                        if nested_text:
+                            text_parts.append(nested_text)
+        return ' '.join(text_parts)
+    except Exception as e:
+        print(f"Error extracting text from abstract section: {e}")
+        return ""
+def extract_highlights_from_section(section: dict) -> str:
+    """Extract highlights content from section structure."""
+    try:
+        text_parts = []
+        if '$$' in section:
+            for item in section['$$']:
+                if isinstance(item, dict):
+                    # Look for section-title with "Highlights"
+                    if (item.get('#name') == 'section-title' and
+                        item.get('_') and 'highlight' in item['_'].lower()):
+                        # Found highlights section, extract list items
+                        highlights_text = extract_highlights_list(item, section)
+                        if highlights_text:
+                            text_parts.append(highlights_text)
+                    # Also look for direct list structures
+                    elif item.get('#name') == 'list':
+                        # Found list, extract list items directly
+                        highlights_text = extract_highlights_list(item, section)
+                        if highlights_text:
+                            text_parts.append(highlights_text)
+                    elif '$$' in item:
+                        # Recursively search for highlights
+                        nested_text = extract_highlights_from_section(item)
+                        if nested_text:
+                            text_parts.append(nested_text)
+        return ' '.join(text_parts)
+    except Exception as e:
+        print(f"Error extracting highlights from section: {e}")
+        return ""
+def extract_highlights_list(title_item: dict, parent_section: dict) -> str:
+    """Extract highlights list items from the section structure."""
+    try:
+        highlights = []
+        # Look for the list structure after the highlights title
+        if '$$' in parent_section:
+            for item in parent_section['$$']:
+                if isinstance(item, dict) and item.get('#name') == 'list':
+                    # Found list, extract list items
+                    if '$$' in item:
+                        for list_item in item['$$']:
+                            if isinstance(list_item, dict) and list_item.get('#name') == 'list-item':
+                                # Extract text from list item
+                                item_text = extract_text_from_abstract_section(list_item)
+                                if item_text:
+                                    highlights.append(f"• {item_text}")
+        # Also check if the title_item itself contains a list (for direct list structures)
+        if '$$' in title_item:
+            for item in title_item['$$']:
+                if isinstance(item, dict) and item.get('#name') == 'list':
+                    if '$$' in item:
+                        for list_item in item['$$']:
+                            if isinstance(list_item, dict) and list_item.get('#name') == 'list-item':
+                                item_text = extract_text_from_abstract_section(list_item)
+                                if item_text:
+                                    highlights.append(f"• {item_text}")
+        return ' '.join(highlights)
+    except Exception as e:
+        print(f"Error extracting highlights list: {e}")
+        return ""
+def extract_highlights_from_abstract_item(abstract_item: dict) -> str:
+    """Extract highlights from an abstract item that contains highlights."""
+    try:
+        highlights = []
+        if '$$' in abstract_item:
+            for section in abstract_item['$$']:
+                if isinstance(section, dict) and section.get('#name') == 'abstract-sec':
+                    # Look for highlights within this section
+                    highlights_text = extract_highlights_from_section(section)
+                    if highlights_text:
+                        highlights.append(highlights_text)
+        return ' '.join(highlights)
+    except Exception as e:
+        print(f"Error extracting highlights from abstract item: {e}")
+        return ""
+def fetch_from_pubmed(doi: str) -> Optional[str]:
+    """Fetch abstract from PubMed if available."""
+    try:
+        # This is a simplified approach - in practice, you'd need to use PubMed API
+        # For now, we'll skip this method but could be extended to check for:
+        # - abstract field
+        # - highlights field
+        # - other summary fields
+        pass
+    except Exception:
+        pass
+    return None
+def convert_abstract_to_inverted_index(abstract: str) -> Dict:
+    """Convert abstract text to inverted index format."""
+    if not abstract:
+        return {}
+    # Simple word tokenization and position mapping
+    words = re.findall(r'\b\w+\b', abstract.lower())
+    inverted_index = {}
+    for i, word in enumerate(words):
+        if word not in inverted_index:
+            inverted_index[word] = []
+        inverted_index[word].append(i)
+    return inverted_index
+def extract_work_id_from_url(url: str) -> Optional[str]:
+    """Extract OpenAlex work ID from various URL formats."""
+    if not url:
+        return None
+    # Handle different URL formats
+    if 'openalex.org' in url:
+        if '/works/' in url:
+            # Extract ID from URL like https://openalex.org/W2741809807
+            work_id = url.split('/works/')[-1]
+            return work_id
+        elif 'api.openalex.org/works/' in url:
+            # Extract ID from API URL
+            work_id = url.split('/works/')[-1]
+            return work_id
+    # If it's already just an ID
+    if url.startswith('W') and len(url) > 5:
+        return url
+    return None
+def save_to_database(session_id: str, data_type: str, data: Dict) -> str:
+    """Legacy-compatible save helper that routes to the new split DB layout."""
+    if data_type == 'collection':
+        work_id = data.get('work_id', '')
+        title = data.get('title', '')
+        return save_collection_to_database(work_id, title, data)
+    if data_type == 'filter':
+        source_collection = data.get('source_collection', '')
+        research_question = data.get('research_question', '')
+        return save_filter_to_database(source_collection, research_question, data)
+    # Fallback legacy path (single folder)
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    filename = f"{session_id}_{data_type}_{timestamp}.pkl"
+    filepath = os.path.join(DATABASE_DIR, filename)
+    with open(filepath, 'wb') as f: pickle.dump(data, f)
+    return filename
+def _clean_work_id(work_id_or_url: str) -> str:
+    clean = extract_work_id_from_url(work_id_or_url) or work_id_or_url
+    clean = clean.replace('https://api.openalex.org/works/', '').replace('https://openalex.org/', '')
+    return clean
+def save_collection_to_database(work_id_or_url: str, title: str, data: Dict) -> str:
+    """Save a collection once per work. Filename is the clean work id only (dedup)."""
+    ensure_db_dirs()
+    clean_id = _clean_work_id(work_id_or_url)
+    filename = f"{clean_id}.pkl"
+    filepath = os.path.join(COLLECTION_DB_DIR, filename)
+    # Deduplicate: if exists, do NOT overwrite
+    if os.path.exists(filepath):
+        return filename
+    # Ensure helpful metadata for frontend display
+    data = dict(data)
+    data['work_id'] = work_id_or_url
+    data['title'] = title
+    data['work_identifier'] = clean_id
+    data['created'] = datetime.now().isoformat()
+    with open(filepath, 'wb') as f: pickle.dump(data, f)
+    return filename
+def save_filter_to_database(source_collection_clean_id: str, research_question: str, data: Dict) -> str:
+    """Save a filter result linked to a source collection. Multiple filters allowed."""
+    ensure_db_dirs()
+    # Slug for RQ to keep filenames short
+    rq_slug = ''.join(c for c in research_question[:40] if c.isalnum() or c in (' ', '-', '_')).strip().replace(' ', '_') or 'rq'
+    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+    filename = f"{source_collection_clean_id}__filter__{rq_slug}__{timestamp}.pkl"
+    filepath = os.path.join(FILTER_DB_DIR, filename)
+    data = dict(data)
+    data['filter_identifier'] = filename.replace('.pkl','')
+    data['source_collection'] = source_collection_clean_id
+    data['research_question'] = research_question
+    data['created'] = datetime.now().isoformat()
+    with open(filepath, 'wb') as f: pickle.dump(data, f)
+    return filename
+def get_collection_files() -> List[Dict]:
+    files: List[Dict] = []
+    if not os.path.exists(COLLECTION_DB_DIR): return files
+    for filename in os.listdir(COLLECTION_DB_DIR):
+        if not filename.endswith('.pkl'): continue
+        filepath = os.path.join(COLLECTION_DB_DIR, filename)
+        try:
+            stat = os.stat(filepath)
+            with open(filepath, 'rb') as f: data = pickle.load(f)
+            files.append({
+                'filename': filename,
+                'type': 'collection',
+                'work_identifier': data.get('work_identifier') or filename.replace('.pkl',''),
+                'title': data.get('title',''),
+                'work_id': data.get('work_id',''),
+                'total_papers': data.get('total_papers',0),
+                'created': data.get('created', datetime.fromtimestamp(stat.st_ctime).isoformat()),
+                'size': stat.st_size
+            })
+        except Exception:
+            continue
+    files.sort(key=lambda x: x['created'], reverse=True)
+    return files
+def get_filter_files() -> List[Dict]:
+    files: List[Dict] = []
+    if not os.path.exists(FILTER_DB_DIR): return files
+    for filename in os.listdir(FILTER_DB_DIR):
+        if not filename.endswith('.pkl'): continue
+        filepath = os.path.join(FILTER_DB_DIR, filename)
+        try:
+            stat = os.stat(filepath)
+            with open(filepath, 'rb') as f: data = pickle.load(f)
+            files.append({
+                'filename': filename,
+                'type': 'filter',
+                'filter_identifier': data.get('filter_identifier') or filename.replace('.pkl',''),
+                'source_collection': data.get('source_collection',''),
+                'research_question': data.get('research_question',''),
+                'relevant_papers': data.get('relevant_papers',0),
+                'total_papers': data.get('total_papers',0),
+                'tested_papers': data.get('tested_papers',0),
+                'created': data.get('created', datetime.fromtimestamp(stat.st_ctime).isoformat()),
+                'size': stat.st_size
+            })
+        except Exception:
+            continue
+    files.sort(key=lambda x: x['created'], reverse=True)
+    return files
+def get_database_files() -> List[Dict]:
+    """Combined listing for frontend history panel."""
+    return get_collection_files() + get_filter_files()
+def find_existing_collection(work_id_or_url: str) -> Optional[str]:
+    """Return existing collection filename for a work id if present (dedup)."""
+    clean_id = _clean_work_id(work_id_or_url)
+    filename = f"{clean_id}.pkl"
+    filepath = os.path.join(COLLECTION_DB_DIR, filename)
+    return filename if os.path.exists(filepath) else None
+def filter_papers_for_rq(papers: List[Dict], research_question: str) -> List[Dict]:
+    """Filter papers based on research question using GPT-5 mini."""
+    if not papers or not research_question:
+        return []
+    relevant_papers = []
+    for i, paper in enumerate(papers):
+        print(f"Analyzing paper {i+1}/{len(papers)}: {paper.get('title', 'No title')[:50]}...")
+        # Extract title and abstract
+        title = paper.get('title', '')
+        abstract = ''
+        # Try to get abstract from inverted index
+        inverted_abstract = paper.get('abstract_inverted_index')
+        if inverted_abstract:
+            words = []
+            for word, positions in inverted_abstract.items():
+                for pos in positions:
+                    while len(words) <= pos:
+                        words.append('')
+                    words[pos] = word
+            abstract = ' '.join(words).strip()
+        if not title and not abstract:
+            continue
+        # Create content for GPT analysis
+        content = {
+            'title': title,
+            'abstract': abstract
+        }
+        # Analyze with GPT-5 mini
+        try:
+            analysis = analyze_paper_relevance(content, research_question, OPENAI_API_KEY)
+            if analysis and analysis.get('aims_of_paper'):
+                # Check if paper is relevant to research question
+                relevance_prompt = f"""
+                Research Question: {research_question}
+                Paper Title: {title}
+                Paper Abstract: {abstract or 'No abstract available'}
+                Is this paper highly relevant to answering the research question?
+                Consider the paper's aims, methods, and findings.
+                Return ONLY a JSON object: {{"relevant": true/false, "reason": "brief explanation"}}
+                """
+                relevance_response = analyze_paper_relevance({
+                    'title': 'Relevance Check',
+                    'abstract': relevance_prompt
+                }, research_question, OPENAI_API_KEY)
+                if relevance_response and relevance_response.get('aims_of_paper'):
+                    # Parse the relevance response
+                    try:
+                        relevance_data = json.loads(relevance_response['aims_of_paper'])
+                        if relevance_data.get('relevant', False):
+                            paper['relevance_reason'] = relevance_data.get('reason', 'Relevant to research question')
+                            paper['gpt_analysis'] = analysis
+                            relevant_papers.append(paper)
+                    except:
+                        # If parsing fails, include paper anyway if it has analysis
+                        paper['gpt_analysis'] = analysis
+                        relevant_papers.append(paper)
+        except Exception as e:
+            print(f"Error analyzing paper {i+1}: {e}")
+            continue
+    return relevant_papers
+# Flask routes removed - now using Gradio interface
+def search_papers_by_title(title: str) -> List[Dict]:
+    """Search OpenAlex for papers by title and return ranked matches."""
+    try:
+        # Clean and prepare the title for search
+        clean_title = title.strip()
+        if not clean_title:
+            return []
+        # Search OpenAlex API
+        import urllib.parse
+        params = {
+            'search': clean_title,
+            'per_page': 10,  # Get top 10 results
+            'sort': 'relevance_score:desc'  # Sort by relevance
+        }
+        # Build URL with query parameters
+        query_string = urllib.parse.urlencode(params)
+        search_url = f"https://api.openalex.org/works?{query_string}"
+        print(f"EXACT URL BEING SEARCHED: {search_url}")
+        response = _http_get(search_url, timeout=10)
+        if not response or response.status_code != 200:
+            print(f"OpenAlex search failed: {response.status_code if response else 'No response'}")
+            return []
+        data = response.json()
+        results = data.get('results', [])
+        if not results:
+            print(f"No results found for title: {clean_title}")
+            return []
+        # Return top results (OpenAlex already ranks by relevance)
+        scored_results = []
+        for work in results[:5]:  # Take top 5 from OpenAlex
+            work_title = work.get('title', '')
+            if not work_title:
+                continue
+            work_id = work.get('id', '').replace('https://openalex.org/', '')
+            scored_results.append({
+                'work_id': work_id,
+                'title': work_title,
+                'authors': ', '.join([author.get('author', {}).get('display_name', '') for author in work.get('authorships', [])[:3]]),
+                'year': work.get('publication_date', '')[:4] if work.get('publication_date') else 'Unknown',
+                'venue': work.get('primary_location', {}).get('source', {}).get('display_name', 'Unknown'),
+                'relevance_score': work.get('relevance_score', 0)
+            })
+        return scored_results
+    except Exception as e:
+        print(f"Error searching for papers by title: {e}")
+        return []
+# Flask API routes removed - now using Gradio interface
+# Flask filter route removed - now using Gradio interface
+# Flask database routes removed - now using Gradio interface
+def generate_bibtex_entry(paper):
+    """Generate a BibTeX entry for a single paper."""
+    try:
+        # Handle None or invalid paper objects
+        if not paper or not isinstance(paper, dict):
+            print(f"Invalid paper object: {paper}")
+            return f"@article{{error_{hash(str(paper)) % 10000},\n    title={{Invalid paper data}},\n    author={{Unknown}},\n    year={{Unknown}}\n}}"
+        # Extract basic info with safe defaults
+        title = paper.get('title', 'Unknown Title')
+        year = paper.get('publication_year', 'Unknown Year')
+        doi = paper.get('doi', '')
+        # Generate a unique key (using OpenAlex ID or DOI)
+        work_id = paper.get('id', '')
+        if work_id and isinstance(work_id, str):
+            work_id = work_id.replace('https://openalex.org/', '')
+        if not work_id and doi:
+            work_id = doi.replace('https://doi.org/', '').replace('/', '_')
+        if not work_id:
+            work_id = f"paper_{hash(title) % 10000}"
+        # Extract authors safely
+        authorships = paper.get('authorships', [])
+        author_list = []
+        if isinstance(authorships, list):
+            for authorship in authorships:
+                if isinstance(authorship, dict):
+                    author = authorship.get('author', {})
+                    if isinstance(author, dict):
+                        display_name = author.get('display_name', '')
+                        if display_name:
+                            # Split name and format as "Last, First"
+                            name_parts = display_name.split()
+                            if len(name_parts) >= 2:
+                                last_name = name_parts[-1]
+                                first_name = ' '.join(name_parts[:-1])
+                                author_list.append(f"{last_name}, {first_name}")
+                            else:
+                                author_list.append(display_name)
+        authors = " and ".join(author_list) if author_list else "Unknown Author"
+        # Extract journal info safely
+        primary_location = paper.get('primary_location', {})
+        journal = 'Unknown Journal'
+        if isinstance(primary_location, dict):
+            source = primary_location.get('source', {})
+            if isinstance(source, dict):
+                journal = source.get('display_name', 'Unknown Journal')
+        # Extract volume, issue, pages safely
+        biblio = paper.get('biblio', {})
+        volume = ''
+        issue = ''
+        first_page = ''
+        last_page = ''
+        if isinstance(biblio, dict):
+            volume = biblio.get('volume', '')
+            issue = biblio.get('issue', '')
+            first_page = biblio.get('first_page', '')
+            last_page = biblio.get('last_page', '')
+        # Format pages
+        if first_page and last_page and first_page != last_page:
+            pages = f"{first_page}--{last_page}"
+        elif first_page:
+            pages = first_page
+        else:
+            pages = ""
+        # Format volume and issue
+        volume_info = ""
+        if volume:
+            volume_info = f"volume={{{volume}}}"
+            if issue:
+                volume_info += f", number={{{issue}}}"
+        elif issue:
+            volume_info = f"number={{{issue}}}"
+        # Get URL (prefer DOI, fallback to landing page)
+        url = doi if doi else ''
+        if isinstance(primary_location, dict):
+            landing_url = primary_location.get('landing_page_url', '')
+            if landing_url and not url:
+                url = landing_url
+        # Build BibTeX entry
+        bibtex_entry = f"""@article{{{work_id},
+    title={{{title}}},
+    author={{{authors}}},
+    journal={{{journal}}},
+    year={{{year}}}"""
+        if volume_info:
+            bibtex_entry += f",\n    {volume_info}"
+        if pages:
+            bibtex_entry += f",\n    pages={{{pages}}}"
+        if doi:
+            bibtex_entry += f",\n    doi={{{doi.replace('https://doi.org/', '')}}}"
+        if url:
+            bibtex_entry += f",\n    url={{{url}}}"
+        bibtex_entry += "\n}"
+        return bibtex_entry
+    except Exception as e:
+        print(f"Error generating BibTeX for paper: {e}")
+        print(f"Paper data: {paper}")
+        return f"@article{{error_{hash(str(paper)) % 10000},\n    title={{Error generating entry}},\n    author={{Unknown}},\n    year={{Unknown}}\n}}"
+# Flask BibTeX and download routes removed - now using Gradio interface
+# Flask merge route removed - now using Gradio interface
+def merge_collections(collection_filenames):
+    """Merge multiple collections into a new collection with overlap analysis."""
+    try:
+        if len(collection_filenames) < 2:
+            return {'success': False, 'message': 'At least 2 collections required for merging'}
+        # Load all collections and track their work IDs
+        collections_data = []
+        all_work_ids = set()
+        collection_work_ids = []  # List of sets, one per collection
+        for filename in collection_filenames:
+            collection_path = os.path.join(COLLECTION_DB_DIR, filename)
+            if not os.path.exists(collection_path):
+                return {'success': False, 'message': f'Collection {filename} not found'}
+            with open(collection_path, 'rb') as f:
+                collection_data = pickle.load(f)
+            papers = collection_data.get('papers', [])
+            collection_work_ids_set = set()
+            # Extract work IDs for this collection
+            for paper in papers:
+                if isinstance(paper, dict):
+                    work_id = paper.get('id', '')
+                    if work_id:
+                        collection_work_ids_set.add(work_id)
+                        all_work_ids.add(work_id)
+            collections_data.append({
+                'filename': filename,
+                'title': collection_data.get('title', filename.replace('.pkl', '')),
+                'papers': papers,
+                'work_ids': collection_work_ids_set,
+                'total_papers': len(papers)
+            })
+            collection_work_ids.append(collection_work_ids_set)
+        # Calculate overlap statistics
+        overlap_stats = []
+        total_unique_papers = len(all_work_ids)
+        for i, collection in enumerate(collections_data):
+            collection_work_ids_i = collection_work_ids[i]
+            overlaps = []
+            # Calculate overlap with each other collection
+            for j, other_collection in enumerate(collections_data):
+                if i != j:
+                    other_work_ids = collection_work_ids[j]
+                    intersection = collection_work_ids_i.intersection(other_work_ids)
+                    overlap_count = len(intersection)
+                    overlap_percentage = (overlap_count / len(collection_work_ids_i)) * 100 if collection_work_ids_i else 0
+                    overlaps.append({
+                        'collection': other_collection['title'],
+                        'overlap_count': overlap_count,
+                        'overlap_percentage': round(overlap_percentage, 1)
+                    })
+            overlap_stats.append({
+                'collection': collection['title'],
+                'total_papers': collection['total_papers'],
+                'overlaps': overlaps
+            })
+        # Create merged collection with unique papers only
+        merged_papers = []
+        merged_work_ids = set()
+        for collection in collections_data:
+            for paper in collection['papers']:
+                if isinstance(paper, dict):
+                    work_id = paper.get('id', '')
+                    if work_id and work_id not in merged_work_ids:
+                        merged_papers.append(paper)
+                        merged_work_ids.add(work_id)
+        if not merged_papers:
+            return {'success': False, 'message': 'No papers found in collections to merge'}
+        # Calculate total papers across all collections (before deduplication)
+        total_papers_before_merge = sum(collection['total_papers'] for collection in collections_data)
+        duplicates_removed = total_papers_before_merge - len(merged_papers)
+        deduplication_percentage = (duplicates_removed / total_papers_before_merge) * 100 if total_papers_before_merge > 0 else 0
+        # Create merged collection data
+        collection_titles = [collection['title'] for collection in collections_data]
+        merged_title = f"MERGED: {' + '.join(collection_titles[:3])}"
+        if len(collection_titles) > 3:
+            merged_title += f" + {len(collection_titles) - 3} more"
+        merged_data = {
+            'work_identifier': f"merged_{int(time.time())}",
+            'title': merged_title,
+            'work_id': '',
+            'papers': merged_papers,
+            'total_papers': len(merged_papers),
+            'created': datetime.now().isoformat(),
+            'source_collections': collection_filenames,
+            'merge_stats': {
+                'total_papers_before_merge': total_papers_before_merge,
+                'duplicates_removed': duplicates_removed,
+                'deduplication_percentage': round(deduplication_percentage, 1),
+                'overlap_analysis': overlap_stats
+            }
+        }
+        # Save merged collection
+        merged_filename = f"merged_{int(time.time())}.pkl"
+        merged_path = os.path.join(COLLECTION_DB_DIR, merged_filename)
+        with open(merged_path, 'wb') as f:
+            pickle.dump(merged_data, f)
+        return {
+            'success': True,
+            'message': f'Merged collection created with {len(merged_papers)} unique papers (removed {duplicates_removed} duplicates)',
+            'filename': merged_filename,
+            'total_papers': len(merged_papers),
+            'merge_stats': {
+                'total_papers_before_merge': total_papers_before_merge,
+                'duplicates_removed': duplicates_removed,
+                'deduplication_percentage': round(deduplication_percentage, 1),
+                'overlap_analysis': overlap_stats
+            }
+        }
+    except Exception as e:
+        return {'success': False, 'message': f'Error merging collections: {str(e)}'}
+def fetch_abstracts(papers):
+    """Fetch missing abstracts for papers using their DOI URLs."""
+    try:
+        if not papers:
+            return {'error': 'No papers provided'}
+        updated_papers = []
+        fetched_count = 0
+        total_processed = 0
+        for paper in papers:
+            total_processed += 1
+            updated_paper = paper.copy()
+            # Check if paper already has abstract (check both abstract_inverted_index and abstract fields)
+            has_abstract = (
+                (paper.get('abstract_inverted_index') and
+                 len(paper.get('abstract_inverted_index', {})) > 0) or
+                (paper.get('abstract') and
+                 len(str(paper.get('abstract', '')).strip()) > 50)
+            )
+            if not has_abstract and paper.get('doi'):
+                print(f"Fetching abstract for DOI: {paper.get('doi')}")
+                abstract = fetch_abstract_from_doi(paper.get('doi'))
+                if abstract:
+                    # Convert to inverted index format
+                    inverted_index = convert_abstract_to_inverted_index(abstract)
+                    updated_paper['abstract_inverted_index'] = inverted_index
+                    fetched_count += 1
+                    print(f"Successfully fetched abstract for: {paper.get('title', 'Unknown')[:50]}...")
+                else:
+                    print(f"Could not fetch abstract for: {paper.get('title', 'Unknown')[:50]}...")
+            updated_papers.append(updated_paper)
+        return {
+            'success': True,
+            'fetched_count': fetched_count,
+            'total_processed': total_processed,
+            'updated_papers': updated_papers
+        }
+    except Exception as e:
+        print(f"Error fetching abstracts: {e}")
+        return {'error': str(e)}
+def export_excel_from_file(filename):
+    """Export Excel from a specific database file."""
+    try:
+        # Try collections then filters then legacy
+        filepath = os.path.join(COLLECTION_DB_DIR, filename)
+        if not os.path.exists(filepath):
+            filepath = os.path.join(FILTER_DB_DIR, filename)
+            if not os.path.exists(filepath):
+                filepath = os.path.join(DATABASE_DIR, filename)
+                if not os.path.exists(filepath):
+                    return {'error': 'File not found'}
+        with open(filepath, 'rb') as f:
+            data = pickle.load(f)
+        papers = data.get('papers', [])
+        if not papers:
+            return {'error': 'No papers found in file'}
+        # Prepare data for Excel export
+        excel_data = []
+        for paper in papers:
+            # Extract abstract from inverted index
+            abstract = ""
+            if paper.get('abstract_inverted_index'):
+                words = []
+                for word, positions in paper['abstract_inverted_index'].items():
+                    for pos in positions:
+                        while len(words) <= pos:
+                            words.append('')
+                        words[pos] = word
+                abstract = ' '.join(words).strip()
+            # Extract open access info with null checks
+            oa_info = paper.get('open_access') or {}
+            is_oa = oa_info.get('is_oa', False) if oa_info else False
+            oa_status = oa_info.get('oa_status', '') if oa_info else ''
+            # Extract DOI with null check
+            doi = ""
+            if paper.get('doi'):
+                doi = paper['doi'].replace('https://doi.org/', '')
+            # Extract authors with null checks
+            authors = paper.get('authorships') or []
+            author_names = []
+            for author in authors[:5]:  # Limit to first 5 authors
+                if author and isinstance(author, dict):
+                    author_obj = author.get('author') or {}
+                    if author_obj and isinstance(author_obj, dict):
+                        author_names.append(author_obj.get('display_name', ''))
+            # Extract journal with null checks
+            journal = ""
+            primary_location = paper.get('primary_location')
+            if primary_location and isinstance(primary_location, dict):
+                source = primary_location.get('source')
+                if source and isinstance(source, dict):
+                    journal = source.get('display_name', '')
+            # Extract GPT analysis with null checks
+            gpt_analysis = paper.get('gpt_analysis') or {}
+            gpt_aims = gpt_analysis.get('aims_of_paper', '') if gpt_analysis else ''
+            gpt_takeaways = gpt_analysis.get('key_takeaways', '') if gpt_analysis else ''
+            excel_data.append({
+                'Title': paper.get('title', ''),
+                'Publication Date': paper.get('publication_date', ''),
+                'DOI': doi,
+                'Is Open Access': is_oa,
+                'OA Status': oa_status,
+                'Abstract': abstract,
+                'Relationship': paper.get('relationship', ''),
+                'Authors': ', '.join(author_names),
+                'Journal': journal,
+                'OpenAlex ID': paper.get('id', ''),
+                'Relevance Reason': paper.get('relevance_reason', ''),
+                'GPT Aims': gpt_aims,
+                'GPT Takeaways': gpt_takeaways
+            })
+        # Create DataFrame and export to Excel
+        df = pd.DataFrame(excel_data)
+        excel_filename = f'{filename.replace(".pkl", "")}_{int(time.time())}.xlsx'
+        # Create Excel file in a temporary location
+        temp_dir = tempfile.gettempdir()
+        excel_path = os.path.join(temp_dir, excel_filename)
+        try:
+            df.to_excel(excel_path, index=False)
+            return {'success': True, 'message': f'Excel file created: {excel_filename}', 'filepath': excel_path}
+        except Exception as e:
+            print(f"Error creating Excel file: {e}")
+            # Fallback: try current directory
+            try:
+                df.to_excel(excel_filename, index=False)
+                return {'success': True, 'message': f'Excel file created: {excel_filename}', 'filepath': excel_filename}
+            except Exception as e2:
+                print(f"Error creating Excel file in current directory: {e2}")
+                return {'error': f'Failed to create Excel file: {str(e2)}'}
+    except Exception as e:
+        print(f"Error exporting Excel: {e}")
+        return {'error': str(e)}
+def export_excel():
+    """Export collected papers to Excel format."""
+    try:
+        # Load papers from temporary file
+        if not os.path.exists('temp_papers.pkl'):
+            return {'error': 'No papers found. Please collect papers first.'}
+        with open('temp_papers.pkl', 'rb') as f:
+            papers = pickle.load(f)
+        # Prepare data for Excel export
+        excel_data = []
+        for paper in papers:
+            # Extract abstract from inverted index
+            abstract = ""
+            if paper.get('abstract_inverted_index'):
+                words = []
+                for word, positions in paper['abstract_inverted_index'].items():
+                    for pos in positions:
+                        while len(words) <= pos:
+                            words.append('')
+                        words[pos] = word
+                abstract = ' '.join(words).strip()
+            # Extract open access info with null checks
+            oa_info = paper.get('open_access') or {}
+            is_oa = oa_info.get('is_oa', False) if oa_info else False
+            oa_status = oa_info.get('oa_status', '') if oa_info else ''
+            # Extract DOI with null check
+            doi = ""
+            if paper.get('doi'):
+                doi = paper['doi'].replace('https://doi.org/', '')
+            # Extract authors with null checks
+            authors = paper.get('authorships') or []
+            author_names = []
+            for author in authors[:5]:  # Limit to first 5 authors
+                if author and isinstance(author, dict):
+                    author_obj = author.get('author') or {}
+                    if author_obj and isinstance(author_obj, dict):
+                        author_names.append(author_obj.get('display_name', ''))
+            # Extract journal with null checks
+            journal = ""
+            primary_location = paper.get('primary_location')
+            if primary_location and isinstance(primary_location, dict):
+                source = primary_location.get('source')
+                if source and isinstance(source, dict):
+                    journal = source.get('display_name', '')
+            # Extract GPT analysis with null checks
+            gpt_analysis = paper.get('gpt_analysis') or {}
+            gpt_aims = gpt_analysis.get('aims_of_paper', '') if gpt_analysis else ''
+            gpt_takeaways = gpt_analysis.get('key_takeaways', '') if gpt_analysis else ''
+            excel_data.append({
+                'Title': paper.get('title', ''),
+                'Publication Date': paper.get('publication_date', ''),
+                'DOI': doi,
+                'Is Open Access': is_oa,
+                'OA Status': oa_status,
+                'Abstract': abstract,
+                'Relationship': paper.get('relationship', ''),
+                'Authors': ', '.join(author_names),
+                'Journal': journal,
+                'OpenAlex ID': paper.get('id', ''),
+                'Relevance Reason': paper.get('relevance_reason', ''),
+                'GPT Aims': gpt_aims,
+                'GPT Takeaways': gpt_takeaways
+            })
+        # Create DataFrame and export to Excel
+        df = pd.DataFrame(excel_data)
+        excel_filename = f'research_papers_{int(time.time())}.xlsx'
+        # Create Excel file in a temporary location
+        temp_dir = tempfile.gettempdir()
+        excel_path = os.path.join(temp_dir, excel_filename)
+        try:
+            df.to_excel(excel_path, index=False)
+            return {'success': True, 'message': f'Excel file created: {excel_filename}', 'filepath': excel_path}
+        except Exception as e:
+            print(f"Error creating Excel file: {e}")
+            # Fallback: try current directory
+            try:
+                df.to_excel(excel_filename, index=False)
+                return {'success': True, 'message': f'Excel file created: {excel_filename}', 'filepath': excel_filename}
+            except Exception as e2:
+                print(f"Error creating Excel file in current directory: {e2}")
+                return {'error': f'Failed to create Excel file: {str(e2)}'}
+    except Exception as e:
+        print(f"Error exporting Excel: {e}")
+        return {'error': str(e)}
+def search_papers_interface(paper_title: str):
+    """Search for papers by title."""
+    if not paper_title.strip():
+        return "Please enter a paper title to search."
+    try:
+        matches = search_papers_by_title(paper_title)
+        if not matches:
+            return "No papers found matching that title."
+        # Format results for display
+        result_text = f"Found {len(matches)} papers:\n\n"
+        for i, match in enumerate(matches, 1):
+            result_text += f"{i}. {match['title']}\n"
+            result_text += f"   Authors: {match['authors']}\n"
+            result_text += f"   Year: {match['year']}\n"
+            result_text += f"   Journal: {match['venue']}\n"
+            result_text += f"   Work ID: {match['work_id']}\n\n"
+        return result_text
+    except Exception as e:
+        return f"Error searching papers: {str(e)}"
+def collect_papers_interface(work_id: str, limit: int = 50):
+    """Collect related papers from a work ID."""
+    if not work_id.strip():
+        return "Please enter a work ID to collect papers."
+    try:
+        # Check if collection already exists
+        existing_file = find_existing_collection(work_id)
+        if existing_file:
+            return f"Collection already exists: {existing_file}"
+        # Collect papers
+        papers = get_related_papers(work_id, upper_limit=limit)
+        if not papers:
+            return "No related papers found."
+        # Count papers by relationship type
+        cited_count = sum(1 for p in papers if p.get('relationship') == 'cited')
+        citing_count = sum(1 for p in papers if p.get('relationship') == 'citing')
+        related_count = sum(1 for p in papers if p.get('relationship') == 'related')
+        # Save to database
+        collection_data = {
+            'work_id': work_id,
+            'total_papers': len(papers),
+            'cited_papers': cited_count,
+            'citing_papers': citing_count,
+            'related_papers': related_count,
+            'limit': limit,
+            'papers': papers,
+        }
+        # Get title for the collection
+        title = work_id  # Fallback to work_id if title not available
+        try:
+            seed_resp = requests.get(f'https://api.openalex.org/works/{work_id}', timeout=10)
+            if seed_resp.ok:
+                title = (seed_resp.json() or {}).get('title', work_id)
+        except Exception:
+            pass
+        db_filename = save_collection_to_database(work_id, title, collection_data)
+        result = f"Collection completed!\n\n"
+        result += f"Total papers: {len(papers)}\n"
+        result += f"Cited papers: {cited_count}\n"
+        result += f"Citing papers: {citing_count}\n"
+        result += f"Related papers: {related_count}\n"
+        result += f"Saved as: {db_filename}"
+        return result
+    except Exception as e:
+        return f"Error collecting papers: {str(e)}"
+def filter_papers_interface(collection_filename: str, research_question: str, limit: int = 10):
+    """Filter papers based on research question."""
+    if not collection_filename.strip() or not research_question.strip():
+        return "Please provide both collection filename and research question."
+    try:
+        # Load collection
+        filepath = os.path.join("database/collections", collection_filename)
+        if not os.path.exists(filepath):
+            return f"Collection file not found: {collection_filename}"
+        with open(filepath, 'rb') as f:
+            collection_data = pickle.load(f)
+        papers = collection_data.get('papers', [])
+        if not papers:
+            return "No papers found in collection."
+        # Filter papers
+        relevant_papers = filter_papers_for_research_question(papers, research_question, OPENAI_API_KEY, limit)
+        # Count relevant papers
+        actual_relevant = sum(1 for paper in relevant_papers if paper.get('relevance_score') == True)
+        # Save filter results
+        filter_data = {
+            'research_question': research_question,
+            'total_papers': len(papers),
+            'tested_papers': limit,
+            'relevant_papers': actual_relevant,
+            'limit': limit,
+            'papers': relevant_papers,
+            'source_collection': collection_filename.replace('.pkl', '')
+        }
+        db_filename = save_filter_to_database(collection_filename.replace('.pkl', ''), research_question, filter_data)
+        result = f"Filtering completed!\n\n"
+        result += f"Total papers in collection: {len(papers)}\n"
+        result += f"Papers tested: {limit}\n"
+        result += f"Relevant papers found: {actual_relevant}\n"
+        result += f"Saved as: {db_filename}\n\n"
+        # Show relevant papers
+        if relevant_papers:
+            result += "Relevant papers:\n"
+            for i, paper in enumerate(relevant_papers[:5], 1):  # Show first 5
+                result += f"{i}. {paper.get('title', 'No title')}\n"
+                result += f"   Reason: {paper.get('relevance_reason', 'No reason provided')}\n\n"
+        return result
+    except Exception as e:
+        return f"Error filtering papers: {str(e)}"
+def get_database_files_interface():
+    """Get list of all database files."""
+    try:
+        files = get_database_files()
+        if not files:
+            return "No database files found."
+        result = f"Found {len(files)} database files:\n\n"
+        for file_info in files:
+            file_type = file_info.get('type', 'unknown')
+            filename = file_info.get('filename', 'unknown')
+            created = file_info.get('created', 'unknown')
+            size = file_info.get('size', 0)
+            result += f"📁 {filename}\n"
+            result += f"   Type: {file_type}\n"
+            result += f"   Created: {created}\n"
+            result += f"   Size: {size} bytes\n\n"
+        return result
+    except Exception as e:
+        return f"Error getting database files: {str(e)}"
+def generate_bibtex_interface(filename: str):
+    """Generate BibTeX for a collection."""
+    if not filename.strip():
+        return "Please provide a filename to generate BibTeX."
+    try:
+        # Load collection
+        filepath = os.path.join("database/collections", filename)
+        if not os.path.exists(filepath):
+            return f"Collection file not found: {filename}"
+        with open(filepath, 'rb') as f:
+            collection_data = pickle.load(f)
+        papers = collection_data.get('papers', [])
+        if not papers:
+            return "No papers found in collection."
+        # Generate BibTeX entries
+        bibtex_entries = []
+        for paper in papers:
+            entry = generate_bibtex_entry(paper)
+            bibtex_entries.append(entry)
+        # Combine all entries
+        bibtex_content = "\n\n".join(bibtex_entries)
+        # Save BibTeX file
+        bibtex_filename = filename.replace('.pkl', '.bib')
+        bibtex_path = os.path.join("database/collections", bibtex_filename)
+        with open(bibtex_path, 'w', encoding='utf-8') as f:
+            f.write(bibtex_content)
+        result = f"BibTeX file generated successfully!\n\n"
+        result += f"Filename: {bibtex_filename}\n"
+        result += f"Entries: {len(papers)}\n"
+        result += f"Saved to: {bibtex_path}\n\n"
+        result += "First few entries:\n"
+        result += bibtex_content[:1000] + "..." if len(bibtex_content) > 1000 else bibtex_content
+        return result
+    except Exception as e:
+        return f"Error generating BibTeX: {str(e)}"
+# Create Gradio interface
+with gr.Blocks(title="AI Systematic Literature Review", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("# 🧪 AI Systematic Literature Review")
+    gr.Markdown("Search, collect, and analyze academic papers using OpenAlex and AI-powered filtering.")
+    with gr.Tabs():
+        with gr.Tab("🔍 Search Papers"):
+            gr.Markdown("Search for papers by title using OpenAlex API")
+            with gr.Row():
+                search_title = gr.Textbox(label="Paper Title", placeholder="Enter the title of the paper you want to search for...")
+                search_btn = gr.Button("Search Papers", variant="primary")
+            search_output = gr.Textbox(label="Search Results", lines=10, interactive=False)
+            search_btn.click(search_papers_interface, inputs=search_title, outputs=search_output)
+        with gr.Tab("📚 Collect Papers"):
+            gr.Markdown("Collect related papers from a seed paper using its OpenAlex Work ID")
+            with gr.Row():
+                work_id_input = gr.Textbox(label="OpenAlex Work ID", placeholder="e.g., W2741809807")
+                limit_input = gr.Number(label="Limit", value=50, minimum=1, maximum=1000)
+                collect_btn = gr.Button("Collect Papers", variant="primary")
+            collect_output = gr.Textbox(label="Collection Results", lines=10, interactive=False)
+            collect_btn.click(collect_papers_interface, inputs=[work_id_input, limit_input], outputs=collect_output)
+        with gr.Tab("🔬 Filter Papers"):
+            gr.Markdown("Filter collected papers based on a research question using AI analysis")
+            with gr.Row():
+                collection_file = gr.Textbox(label="Collection Filename", placeholder="e.g., W2741809807.pkl")
+                research_question = gr.Textbox(label="Research Question", placeholder="What is your research question?")
+                filter_limit = gr.Number(label="Papers to Test", value=10, minimum=1, maximum=100)
+                filter_btn = gr.Button("Filter Papers", variant="primary")
+            filter_output = gr.Textbox(label="Filter Results", lines=15, interactive=False)
+            filter_btn.click(filter_papers_interface, inputs=[collection_file, research_question, filter_limit], outputs=filter_output)
+        with gr.Tab("📁 Database Files"):
+            gr.Markdown("View and manage your collected papers and filters")
+            with gr.Row():
+                db_btn = gr.Button("Refresh Database Files", variant="primary")
+            db_output = gr.Textbox(label="Database Files", lines=15, interactive=False)
+            db_btn.click(get_database_files_interface, outputs=db_output)
+        with gr.Tab("📊 Export Data"):
+            gr.Markdown("Export your collections to various formats")
+            with gr.Row():
+                export_filename = gr.Textbox(label="Collection Filename", placeholder="e.g., W2741809807.pkl")
+                export_bibtex_btn = gr.Button("Export to BibTeX")
+            export_output = gr.Textbox(label="Export Results", lines=10, interactive=False)
+            export_bibtex_btn.click(generate_bibtex_interface, inputs=export_filename, outputs=export_output)
+    gr.Markdown("""
+    ## How to use:
+    1. **Search Papers**: Enter a paper title to find papers in OpenAlex
+    2. **Collect Papers**: Use a Work ID to collect related papers (cited, citing, and related)
+    3. **Filter Papers**: Use AI to filter collected papers based on your research question
+    4. **Database Files**: View all your collections and filters
+    5. **Export Data**: Export your results to BibTeX format
+    ## Note:
+    - You need an OpenAI API key set as an environment variable for AI filtering
+    - Collections are automatically saved and can be reused
+    - The system respects OpenAlex rate limits
+    """)
+if __name__ == "__main__":
+    demo.launch()

database/collections/W1607201421.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61bb4e949d3628dd87345737e7b8120aa3707b9231c1e467c85ea7daface8200
+size 133

database/collections/W2774003070.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f73318a00a6301409fc306d82dc050be28d2aec4ff306cd1ed4575b3d361b983
+size 132

database/collections/W3200878735.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6665f6e774a08975080f3eab7395fc7d52feb9f58fe7d3b9055340ceddc67215
+size 132

database/filters/W2774003070__filter__talks_about_just_transitions_in_global_s__20250909_224951.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee597aa75ad51fa7ee130c5fbbcedce7021373a439403ae8766faaf552898b13
+size 131

database/filters/W3200878735__filter__talks_about_just_transitions_in_global_s__20250909_225708.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc11a002a31bf98af1593b1f49d19e52275619fcc7de2da2506f3bf9925ca054
+size 131

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+gradio>=4.0.0
+requests>=2.31.0
+openai>=1.0.0
+pandas>=2.0.0
+tqdm>=4.65.0
+openpyxl>=3.0.0
+beautifulsoup4>=4.12.0

templates/index.html ADDED Viewed

	@@ -0,0 +1,1667 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Research Paper Analysis Tool</title>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+        body {
+            font-family: 'Courier New', monospace;
+            background: #000000;
+            color: #ffffff;
+            line-height: 1.2;
+            padding: 15px;
+            margin: 0;
+            display: flex;
+            font-weight: bold;
+        }
+        .main-content {
+            flex: 1;
+            max-width: 50%;
+            margin-right: 15px;
+        }
+        .history-panel {
+            width: 275px;
+            border: 3px solid #ffffff;
+            padding: 15px;
+            background: #000000;
+            height: 75vh;
+            display: flex;
+            flex-direction: column;
+            margin-right: 10px;
+        }
+        .merge-panel {
+            width: 275px;
+            border: 3px solid #ffffff;
+            padding: 15px;
+            background: #000000;
+            height: 20vh;
+            display: flex;
+            flex-direction: column;
+            margin-right: 10px;
+            margin-bottom: 10px;
+        }
+        .merge-content {
+            flex: 1;
+            border: 2px dashed #ffffff;
+            padding: 10px;
+            margin-bottom: 10px;
+            min-height: 60px;
+            display: flex;
+            flex-direction: column;
+            align-items: center;
+            justify-content: center;
+        }
+        .merge-placeholder {
+            color: #666666;
+            font-size: 10px;
+            text-align: center;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+        }
+        .merge-item {
+            background: #333333;
+            border: 1px solid #ffffff;
+            padding: 5px 8px;
+            margin: 2px 0;
+            font-size: 9px;
+            color: #ffffff;
+            text-transform: uppercase;
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+        }
+        .merge-actions {
+            display: flex;
+            gap: 6px;
+        }
+        .filters-panel {
+            width: 275px;
+            border: 3px solid #ffffff;
+            padding: 15px;
+            background: #000000;
+            height: 80vh;
+            display: flex;
+            flex-direction: column;
+            opacity: 0;
+            transform: translateX(20px);
+            transition: all 0.3s ease;
+        }
+        .filters-panel.visible {
+            opacity: 1;
+            transform: translateX(0);
+        }
+        .container {
+            max-width: 100%;
+            margin: 0;
+        }
+        h1 {
+            font-size: 1.8em;
+            text-align: center;
+            margin-bottom: 20px;
+            font-weight: bold;
+            color: #ffffff;
+            border: 3px solid #ffffff;
+            padding: 10px;
+            text-transform: uppercase;
+            letter-spacing: 2px;
+        }
+        .section {
+            margin: 15px 0;
+            border: 3px solid #ffffff;
+            padding: 15px;
+            background: #000000;
+        }
+        .section h2 {
+            font-size: 1.2em;
+            margin-bottom: 15px;
+            font-weight: bold;
+            color: #ffffff;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+            border-bottom: 2px solid #ffffff;
+            padding-bottom: 5px;
+        }
+        input[type="text"], textarea {
+            width: 100%;
+            background: #000000;
+            color: #ffffff;
+            border: 3px solid #ffffff;
+            padding: 10px;
+            font-family: 'Courier New', monospace;
+            font-size: 12px;
+            margin-bottom: 10px;
+            font-weight: bold;
+        }
+        input[type="text"]:focus, textarea:focus {
+            outline: none;
+            border-color: #ffffff;
+            box-shadow: 0 0 0 3px #ffffff;
+        }
+        button {
+            background: #000000;
+            color: #ffffff;
+            border: 3px solid #ffffff;
+            padding: 8px 15px;
+            font-family: 'Courier New', monospace;
+            font-size: 11px;
+            cursor: pointer;
+            margin-right: 8px;
+            margin-bottom: 8px;
+            font-weight: bold;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+        }
+        button:hover {
+            background: #ffffff;
+            color: #000000;
+        }
+        button:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+        }
+        .status {
+            margin: 10px 0;
+            padding: 12px;
+            border: 1px solid #444444;
+            background: #2a2a2a;
+            border-radius: 4px;
+        }
+        .error {
+            border-color: #ff4444;
+            background: #2a1a1a;
+            color: #ff6666;
+        }
+        .success {
+            border-color: #44ff44;
+            background: #1a2a1a;
+            color: #66ff66;
+        }
+        .paper-list {
+            margin-top: 20px;
+        }
+        .paper-item {
+            border: 1px solid #444444;
+            margin: 15px 0;
+            padding: 20px;
+            background: #1a1a1a;
+            border-radius: 6px;
+            box-shadow: 0 1px 3px rgba(255,255,255,0.1);
+        }
+        .paper-title {
+            font-weight: 600;
+            margin-bottom: 12px;
+            color: #ffffff;
+            font-size: 1.1em;
+        }
+        .paper-meta {
+            font-size: 0.9em;
+            color: #aaaaaa;
+            margin-bottom: 12px;
+        }
+        .paper-abstract {
+            font-size: 0.9em;
+            line-height: 1.3;
+            margin-bottom: 10px;
+        }
+        .relevance-reason {
+            font-size: 0.85em;
+            color: #aaaaaa;
+            font-style: italic;
+            margin-top: 12px;
+            padding: 10px;
+            border-left: 3px solid #444444;
+            background: #2a2a2a;
+            border-radius: 0 4px 4px 0;
+        }
+        .loading {
+            text-align: center;
+            padding: 30px;
+            color: #aaaaaa;
+        }
+        .stats {
+            display: flex;
+            gap: 20px;
+            margin: 20px 0;
+            flex-wrap: wrap;
+        }
+        .stat-item {
+            border: 1px solid #444444;
+            padding: 15px;
+            text-align: center;
+            min-width: 120px;
+            background: #1a1a1a;
+            border-radius: 6px;
+            box-shadow: 0 1px 3px rgba(255,255,255,0.1);
+        }
+        .stat-number {
+            font-size: 2em;
+            font-weight: 600;
+            color: #ffffff;
+        }
+        .stat-label {
+            font-size: 0.8em;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+            color: #aaaaaa;
+        }
+        .history-panel h3, .filters-panel h3 {
+            color: #ffffff;
+            margin-bottom: 15px;
+            font-size: 1em;
+            font-weight: bold;
+            flex-shrink: 0;
+            text-transform: uppercase;
+            letter-spacing: 2px;
+            border: 2px solid #ffffff;
+            padding: 8px;
+            text-align: center;
+        }
+        .history-content {
+            flex: 1;
+            overflow-y: auto;
+            padding-right: 5px;
+        }
+        .history-content::-webkit-scrollbar {
+            width: 6px;
+        }
+        .history-content::-webkit-scrollbar-track {
+            background: #1a1a1a;
+            border-radius: 3px;
+        }
+        .history-content::-webkit-scrollbar-thumb {
+            background: #444444;
+            border-radius: 3px;
+        }
+        .history-content::-webkit-scrollbar-thumb:hover {
+            background: #666666;
+        }
+        .history-item {
+            background: #000000;
+            border: 2px solid #ffffff;
+            padding: 10px;
+            margin-bottom: 8px;
+            color: #ffffff;
+            cursor: pointer;
+            transition: all 0.2s ease;
+        }
+        .history-item:hover {
+            background: #333333;
+            color: #ffffff;
+        }
+        .collection-item {
+            border-left: 4px solid #ffffff;
+        }
+        .filter-item {
+            border-left: 4px solid #666666;
+            margin-left: 8px;
+        }
+        .history-item .history-title {
+            font-weight: bold;
+            color: #ffffff;
+            margin-bottom: 5px;
+            font-size: 0.9em;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+        }
+        .history-item .history-meta {
+            font-size: 0.7em;
+            color: #aaaaaa;
+            margin-bottom: 6px;
+            font-weight: bold;
+        }
+        .download-btn, .delete-btn {
+            background: #000000;
+            color: #ffffff;
+            border: 2px solid #ffffff;
+            padding: 4px 8px;
+            font-size: 9px;
+            margin-right: 4px;
+            cursor: pointer;
+            font-weight: bold;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+            transition: all 0.2s ease;
+        }
+        .download-btn:hover, .delete-btn:hover {
+            background: #ffffff;
+            color: #000000;
+        }
+        .delete-btn {
+            background: #000000;
+            border-color: #ffffff;
+            color: #ffffff;
+        }
+        .delete-btn:hover {
+            background: #ffffff;
+            color: #000000;
+        }
+        .paper-match:hover {
+            background: #2a2a2a !important;
+        }
+        .progress-container {
+            margin: 20px 0;
+            border: 1px solid #444444;
+            padding: 15px;
+            background: #1a1a1a;
+            border-radius: 6px;
+        }
+        .progress-bar {
+            width: 100%;
+            height: 20px;
+            background: #2a2a2a;
+            border: 1px solid #444444;
+            position: relative;
+            margin: 10px 0;
+            border-radius: 10px;
+        }
+        .progress-fill {
+            height: 100%;
+            background: #ffffff;
+            width: 0%;
+            transition: width 0.3s ease;
+            border-radius: 10px;
+        }
+        .progress-text {
+            position: absolute;
+            top: 50%;
+            left: 50%;
+            transform: translate(-50%, -50%);
+            color: #ffffff;
+            font-weight: bold;
+            font-size: 12px;
+        }
+        .export-section {
+            margin: 20px 0;
+            text-align: center;
+        }
+        .export-btn {
+            background: #2a2a2a;
+            color: #ffffff;
+            border: 1px solid #444444;
+            padding: 15px 30px;
+            font-family: inherit;
+            font-size: 16px;
+            cursor: pointer;
+            margin: 20px 0;
+            border-radius: 4px;
+            transition: all 0.15s ease-in-out;
+        }
+        .export-btn:hover {
+            background: #444444;
+            border-color: #666666;
+        }
+        .summary-section {
+            margin: 20px 0;
+        }
+        .summary-table {
+            background: #1a1a1a;
+            border: 1px solid #444444;
+            margin: 10px 0;
+            overflow-x: auto;
+            border-radius: 6px;
+        }
+        .summary-table table {
+            width: 100%;
+            border-collapse: collapse;
+            font-family: inherit;
+            font-size: 12px;
+        }
+        .summary-table th {
+            background: #2a2a2a;
+            color: #ffffff;
+            padding: 8px;
+            text-align: left;
+            border: 1px solid #444444;
+            font-weight: bold;
+        }
+        .summary-table td {
+            padding: 8px;
+            border: 1px solid #444444;
+            color: #ffffff;
+            vertical-align: top;
+        }
+        .summary-table tr:nth-child(even) {
+            background: #2a2a2a;
+        }
+        .relevance-yes {
+            color: #ffffff;
+            font-weight: bold;
+        }
+        .relevance-no {
+            color: #aaaaaa;
+            font-weight: bold;
+        }
+        .relevance-unknown {
+            color: #cccccc;
+            font-weight: bold;
+        }
+        @media (max-width: 768px) {
+            body {
+                padding: 10px;
+            }
+            h1 {
+                font-size: 1.44em;
+                padding: 15px;
+            }
+            .stats {
+                flex-direction: column;
+            }
+        }
+    </style>
+</head>
+<body>
+    <div class="main-content">
+        <div class="container">
+            <h1>Collect Literature and Filter by Research Question</h1>
+        <!-- Step 1: Collect Papers -->
+        <div class="section">
+            <h2>Step 1: Collect Related Papers</h2>
+            <p>Choose how to collect papers:</p>
+            <div style="margin: 15px 0;">
+                <label style="display: block; margin-bottom: 8px; font-weight: bold;">METHOD:</label>
+                <div style="display: flex; gap: 15px; margin-bottom: 15px;">
+                    <label style="display: flex; align-items: center; cursor: pointer;">
+                        <input type="radio" name="collectMethod" value="url" checked style="margin-right: 8px;">
+                        <span>OpenAlex URL</span>
+                    </label>
+                    <label style="display: flex; align-items: center; cursor: pointer;">
+                        <input type="radio" name="collectMethod" value="title" style="margin-right: 8px;">
+                        <span>Search by Title</span>
+                    </label>
+                </div>
+            </div>
+            <div id="urlInput" style="display: block;">
+                <p>Enter an OpenAlex paper URL to collect all related papers (cited, citing, and related works).</p>
+                <input type="text" id="seedUrl" placeholder="https://api.openalex.org/works/W1607201421" value="https://api.openalex.org/works/W1607201421" />
+            </div>
+            <div id="titleInput" style="display: none;">
+                <p>Enter a paper title to search for and collect related papers.</p>
+                <input type="text" id="paperTitle" placeholder="Enter paper title..." value="just transitions" />
+                <button onclick="searchPapers()" id="searchBtn" style="margin-left: 10px;">Search Papers</button>
+                <div id="paperMatches" style="display: none; margin-top: 15px;"></div>
+            </div>
+            <button onclick="collectPapers()" id="collectBtn">Collect Papers</button>
+            <div id="collectStatus" class="status" style="display: none;"></div>
+            <div id="collectDownload" style="display: none;">
+                <button onclick="downloadCollectionExcel()" class="download-btn">Download Collection Excel</button>
+            </div>
+        </div>
+        <!-- Step 2: Filter Papers -->
+        <div class="section">
+            <h2>Step 2: Filter by Research Question</h2>
+            <p>Enter your research question to filter the collected papers for relevance.</p>
+            <textarea id="researchQuestion" rows="3" placeholder="What are the main impacts of climate change on ocean circulation patterns?">What are the key aspects of just transitions in climate policy and energy systems?</textarea>
+            <div style="margin: 10px 0;">
+                <label>Number of most recent papers to analyze:</label>
+                <input type="number" id="paperLimit" value="10" min="1" max="50" style="width: 80px; margin-left: 10px;">
+                <div style="font-size: 11px; color: #0a0; margin-top: 5px;">
+                    Max 50 papers. For more, please provide your own GPT API key.
+                </div>
+            </div>
+            <button onclick="filterPapers()" id="filterBtn" disabled>Filter Papers</button>
+            <div id="filterStatus" class="status" style="display: none;"></div>
+            <div id="filterDownload" style="display: none;">
+                <button onclick="downloadFilterExcel()" class="download-btn">Download Filter Excel</button>
+            </div>
+        </div>
+        <!-- Results -->
+        <div class="section" id="resultsSection" style="display: none;">
+            <h2>Results</h2>
+            <div class="stats" id="stats"></div>
+             <div class="export-section">
+                 <button onclick="exportToExcel()" class="export-btn">Download Excel</button>
+             </div>
+            <div class="summary-section" id="summarySection" style="display: none;">
+                <h3>Analysis Summary</h3>
+                <div class="summary-table" id="summaryTable"></div>
+            </div>
+            <div class="paper-list" id="paperList"></div>
+        </div>
+        </div>
+    </div>
+    <!-- History Panel -->
+    <div class="history-panel">
+        <h3>COLLECTIONS</h3>
+        <div class="history-content">
+            <div id="collectionsList"></div>
+        </div>
+    </div>
+    <!-- Merge Panel -->
+    <div class="merge-panel">
+        <h3>MERGE COLLECTIONS</h3>
+        <div class="merge-content" id="mergeBox" ondrop="dropCollection(event)" ondragover="allowDrop(event)">
+            <div class="merge-placeholder">DRAG COLLECTIONS HERE TO MERGE</div>
+            <div id="mergeItems"></div>
+        </div>
+        <div class="merge-actions" id="mergeActions" style="display:none;">
+            <button onclick="saveMergedCollection()" class="download-btn">SAVE TO COLLECTIONS</button>
+            <button onclick="clearMergeBox()" class="delete-btn">CLEAR</button>
+        </div>
+    </div>
+    <!-- Filters Panel -->
+    <div class="filters-panel" id="filtersPanel">
+        <h3>FILTERS</h3>
+        <div class="history-content">
+            <div id="filtersContainer"></div>
+        </div>
+    </div>
+    <script>
+        let collectedPapers = [];
+        let lastDisplayedPapers = [];
+        // Set default values when page loads
+        document.addEventListener('DOMContentLoaded', function() {
+            document.getElementById('seedUrl').value = 'https://api.openalex.org/works/W1607201421';
+            document.getElementById('researchQuestion').value = 'What are the key aspects of just transitions in climate policy and energy systems?';
+            loadHistory();
+            // Handle radio button switching
+            document.querySelectorAll('input[name="collectMethod"]').forEach(radio => {
+                radio.addEventListener('change', function() {
+                    const urlInput = document.getElementById('urlInput');
+                    const titleInput = document.getElementById('titleInput');
+                    if (this.value === 'url') {
+                        urlInput.style.display = 'block';
+                        titleInput.style.display = 'none';
+                    } else {
+                        urlInput.style.display = 'none';
+                        titleInput.style.display = 'block';
+                        // Auto-search when switching to title method
+                        const paperTitle = document.getElementById('paperTitle').value.trim();
+                        if (paperTitle) {
+                            searchPapers();
+                        }
+                    }
+                });
+            });
+        });
+        let currentCollectionFile = null;
+        let currentFilterFile = null;
+        let historyIndex = { collections: {}, filters: {} };
+        let selectedWorkId = null;
+        function showStatus(elementId, message, type = 'success') {
+            const element = document.getElementById(elementId);
+            element.textContent = message;
+            element.className = `status ${type}`;
+            element.style.display = 'block';
+        }
+        function hideStatus(elementId) {
+            document.getElementById(elementId).style.display = 'none';
+        }
+        async function searchPapers() {
+            const paperTitle = document.getElementById('paperTitle').value.trim();
+            if (!paperTitle) {
+                showStatus('collectStatus', 'Please enter a paper title', 'error');
+                return;
+            }
+            const searchBtn = document.getElementById('searchBtn');
+            searchBtn.disabled = true;
+            searchBtn.textContent = 'Searching...';
+            try {
+                const response = await fetch('/api/search-papers', {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json',
+                    },
+                    body: JSON.stringify({ paper_title: paperTitle })
+                });
+                const data = await response.json();
+                if (data.success) {
+                    displayPaperMatches(data.matches);
+                } else {
+                    showStatus('collectStatus', data.error || 'Search failed', 'error');
+                }
+            } catch (error) {
+                showStatus('collectStatus', `Search error: ${error.message}`, 'error');
+            } finally {
+                searchBtn.disabled = false;
+                searchBtn.textContent = 'Search Papers';
+            }
+        }
+        function displayPaperMatches(matches) {
+            const matchesDiv = document.getElementById('paperMatches');
+            matchesDiv.innerHTML = `
+                <h4 style="color: #ffffff; margin-bottom: 10px; font-size: 0.9em;">SELECT PAPER:</h4>
+                ${matches.map((match, index) => `
+                    <div class="paper-match" data-work-id="${match.work_id}" onclick="selectPaper('${match.work_id}', this)" style="
+                        border: 2px solid #ffffff;
+                        padding: 10px;
+                        margin-bottom: 8px;
+                        cursor: pointer;
+                        background: #000000;
+                        transition: all 0.2s ease;
+                    ">
+                        <div style="font-weight: bold; color: #ffffff; margin-bottom: 5px;">${match.title}</div>
+                        <div style="font-size: 0.8em; color: #aaaaaa; margin-bottom: 3px;">Authors: ${match.authors}</div>
+                        <div style="font-size: 0.8em; color: #aaaaaa; margin-bottom: 3px;">Year: ${match.year} | Venue: ${match.venue}</div>
+                        <div style="font-size: 0.7em; color: #666666;">Relevance: ${match.relevance_score}</div>
+                    </div>
+                `).join('')}
+            `;
+            matchesDiv.style.display = 'block';
+        }
+        function selectPaper(workId, element) {
+            // Remove previous selection
+            document.querySelectorAll('.paper-match').forEach(match => {
+                match.style.background = '#000000';
+                match.style.borderColor = '#ffffff';
+            });
+            // Highlight selected paper
+            element.style.background = '#ffffff';
+            element.style.color = '#000000';
+            element.style.borderColor = '#ffffff';
+            selectedWorkId = workId;
+            // Enable collect button
+            document.getElementById('collectBtn').disabled = false;
+        }
+        async function collectPapers() {
+            const method = document.querySelector('input[name="collectMethod"]:checked').value;
+            let seedUrl = '';
+            let paperTitle = '';
+            if (method === 'url') {
+                seedUrl = document.getElementById('seedUrl').value.trim();
+                if (!seedUrl) {
+                    showStatus('collectStatus', 'Please enter a seed URL', 'error');
+                    return;
+                }
+            } else {
+                paperTitle = document.getElementById('paperTitle').value.trim();
+                if (!paperTitle) {
+                    showStatus('collectStatus', 'Please enter a paper title', 'error');
+                    return;
+                }
+                if (!selectedWorkId) {
+                    showStatus('collectStatus', 'Please search and select a paper first', 'error');
+                    return;
+                }
+            }
+            const collectBtn = document.getElementById('collectBtn');
+            collectBtn.disabled = true;
+            collectBtn.textContent = 'Collecting...';
+            hideStatus('collectStatus');
+            // Show progress container
+            const progressContainer = document.createElement('div');
+            progressContainer.className = 'progress-container';
+            progressContainer.innerHTML = `
+                <div id="progressMessage">Starting paper collection...</div>
+                <div class="progress-bar">
+                    <div class="progress-fill" id="collectProgress"></div>
+                    <div class="progress-text" id="collectProgressText">0%</div>
+                </div>
+            `;
+            document.getElementById('collectStatus').parentNode.insertBefore(progressContainer, document.getElementById('collectStatus'));
+            try {
+                const response = await fetch('/api/collect-papers', {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json',
+                    },
+                    body: JSON.stringify({
+                        seed_url: seedUrl,
+                        paper_title: paperTitle,
+                        method: method,
+                        selected_work_id: selectedWorkId,
+                        user_api_key: window.userApiKey || null
+                    })
+                });
+                const data = await response.json();
+                if (data.success && data.task_id) {
+                    // Start polling for progress
+                    pollProgress(data.task_id, 'collect', progressContainer);
+                } else {
+                    showStatus('collectStatus', `Error: ${data.error}`, 'error');
+                    collectBtn.disabled = false;
+                    collectBtn.textContent = 'Collect Papers';
+                    if (progressContainer.parentNode) {
+                        progressContainer.parentNode.removeChild(progressContainer);
+                    }
+                }
+            } catch (error) {
+                showStatus('collectStatus', `Error: ${error.message}`, 'error');
+                collectBtn.disabled = false;
+                collectBtn.textContent = 'Collect Papers';
+                if (progressContainer.parentNode) {
+                    progressContainer.parentNode.removeChild(progressContainer);
+                }
+            }
+        }
+        async function pollProgress(taskId, type, progressContainer) {
+            const progressFill = document.getElementById('collectProgress');
+            const progressText = document.getElementById('collectProgressText');
+            const progressMessage = document.getElementById('progressMessage');
+            const pollInterval = setInterval(async () => {
+                try {
+                    const response = await fetch(`/api/progress/${taskId}`);
+                    const progress = await response.json();
+                    if (progress.status === 'completed') {
+                        clearInterval(pollInterval);
+                        // Update progress bar to 100%
+                        progressFill.style.width = '100%';
+                        progressText.textContent = '100%';
+                        progressMessage.textContent = 'Collection completed!';
+                        // Process results
+                        const result = progress.result;
+                        collectedPapers = result.papers;
+                        const breakdown = `${result.cited_papers} cited + ${result.citing_papers} citing + ${result.related_papers} related`;
+                        showStatus('collectStatus', `Successfully collected ${result.total_papers} papers (${breakdown})`, 'success');
+                        document.getElementById('filterBtn').disabled = false;
+                        document.getElementById('resultsSection').style.display = 'block';
+                        updateStats(result.total_papers, 0, result.cited_papers, result.citing_papers, result.related_papers);
+                        currentCollectionFile = result.db_filename || null;
+                        historyIndex.currentCollectionId = result.work_id ? (result.work_id.replace('https://api.openalex.org/works/','').replace('https://openalex.org/','')) : null;
+                        document.getElementById('collectDownload').style.display = currentCollectionFile ? 'block' : 'none';
+                        // Reset button
+                        document.getElementById('collectBtn').disabled = false;
+                        document.getElementById('collectBtn').textContent = 'Collect Papers';
+                        // Refresh history to show new collection
+                        loadHistory();
+                        // Remove progress container after a delay
+                        setTimeout(() => {
+                            if (progressContainer.parentNode) {
+                                progressContainer.parentNode.removeChild(progressContainer);
+                            }
+                        }, 2000);
+                    } else if (progress.status === 'error') {
+                        clearInterval(pollInterval);
+                        showStatus('collectStatus', `Error: ${progress.message}`, 'error');
+                        document.getElementById('collectBtn').disabled = false;
+                        document.getElementById('collectBtn').textContent = 'Collect Papers';
+                        if (progressContainer.parentNode) {
+                            progressContainer.parentNode.removeChild(progressContainer);
+                        }
+                    } else if (progress.status === 'running') {
+                        // Update progress bar
+                        const progressPercent = Math.min(progress.progress || 0, 95); // Cap at 95% until completion
+                        progressFill.style.width = `${progressPercent}%`;
+                        progressText.textContent = `${Math.round(progressPercent)}%`;
+                        progressMessage.textContent = progress.message || 'Processing...';
+                    }
+                } catch (error) {
+                    console.error('Error polling progress:', error);
+                }
+            }, 1000); // Poll every second
+        }
+        async function filterPapers() {
+            const researchQuestion = document.getElementById('researchQuestion').value.trim();
+            const paperLimit = parseInt(document.getElementById('paperLimit').value) || 10;
+            if (!researchQuestion) {
+                showStatus('filterStatus', 'Please enter a research question', 'error');
+                return;
+            }
+            // Check if user wants to analyze more than 50 papers
+            if (paperLimit > 50) {
+                const userApiKey = prompt(`You want to analyze ${paperLimit} papers, which exceeds the limit of 50.\n\nPlease provide your own OpenAI API key to continue:\n\n(Your API key will be used only for this analysis and not stored)`);
+                if (!userApiKey || userApiKey.trim() === '') {
+                    showStatus('filterStatus', 'Analysis cancelled - no API key provided', 'error');
+                    return;
+                }
+                // Store the user's API key temporarily for this request
+                window.userApiKey = userApiKey.trim();
+            } else {
+                // Clear any previous user API key
+                window.userApiKey = null;
+            }
+            const filterBtn = document.getElementById('filterBtn');
+            filterBtn.disabled = true;
+            filterBtn.textContent = 'Filtering...';
+            hideStatus('filterStatus');
+            // Show progress container
+            const progressContainer = document.createElement('div');
+            progressContainer.className = 'progress-container';
+            progressContainer.innerHTML = `
+                <div id="filterProgressMessage">Analyzing most recent papers for relevance...</div>
+                <div class="progress-bar">
+                    <div class="progress-fill" id="filterProgress"></div>
+                    <div class="progress-text" id="filterProgressText">0%</div>
+                </div>
+            `;
+            document.getElementById('filterStatus').parentNode.insertBefore(progressContainer, document.getElementById('filterStatus'));
+            try {
+                const response = await fetch('/api/filter-papers', {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json',
+                    },
+                    body: JSON.stringify({
+                        research_question: researchQuestion,
+                        limit: paperLimit,
+                        source_collection: historyIndex.currentCollectionId || null,
+                        papers: collectedPapers.length > 0 ? collectedPapers : null,
+                        user_api_key: window.userApiKey || null
+                    })
+                });
+                const data = await response.json();
+                if (data.success) {
+                    // Simulate progress for filtering (since it's synchronous in backend)
+                    let progress = 0;
+                    const progressInterval = setInterval(() => {
+                        progress += 10;
+                        if (progress > 90) progress = 90;
+                        document.getElementById('filterProgress').style.width = `${progress}%`;
+                        document.getElementById('filterProgressText').textContent = `${progress}%`;
+                        document.getElementById('filterProgressMessage').textContent = `Analyzing most recent papers for relevance... ${progress}%`;
+                        if (progress >= 90) {
+                            clearInterval(progressInterval);
+                            // Complete the progress
+                            setTimeout(() => {
+                                document.getElementById('filterProgress').style.width = '100%';
+                                document.getElementById('filterProgressText').textContent = '100%';
+                                document.getElementById('filterProgressMessage').textContent = 'Analysis completed!';
+                                const tested = data.tested_papers || Math.min(data.limit || 0, data.total_papers || 0);
+                                showStatus('filterStatus', `Analyzed ${tested} most recent papers; found ${data.relevant_papers} relevant`, 'success');
+                                displayPapers(data.papers);
+                                updateStats(data.total_papers, data.relevant_papers, 0, 0, 0, null, null, tested, data.oa_percentage, data.abstract_percentage);
+                                currentFilterFile = data.db_filename || null;
+                                document.getElementById('filterDownload').style.display = currentFilterFile ? 'block' : 'none';
+                                filterBtn.disabled = false;
+                                filterBtn.textContent = 'Filter Papers';
+                                // Refresh history to show new filter
+                                loadHistory();
+                                // Remove progress container after a delay
+                                setTimeout(() => {
+                                    if (progressContainer.parentNode) {
+                                        progressContainer.parentNode.removeChild(progressContainer);
+                                    }
+                                }, 2000);
+                            }, 500);
+                        }
+                    }, 200);
+                } else {
+                    showStatus('filterStatus', `Error: ${data.error}`, 'error');
+                    filterBtn.disabled = false;
+                    filterBtn.textContent = 'Filter Papers';
+                    if (progressContainer.parentNode) {
+                        progressContainer.parentNode.removeChild(progressContainer);
+                    }
+                }
+            } catch (error) {
+                showStatus('filterStatus', `Error: ${error.message}`, 'error');
+                filterBtn.disabled = false;
+                filterBtn.textContent = 'Filter Papers';
+                if (progressContainer.parentNode) {
+                    progressContainer.parentNode.removeChild(progressContainer);
+                }
+            }
+        }
+        function updateStats(total, relevant, cited = 0, citing = 0, related = 0, relevantAbs = null, totalAbs = null, tested = null, oaPercentage = null, abstractPercentage = null) {
+            const statsDiv = document.getElementById('stats');
+            const rate = tested && tested > 0 ? Math.round((relevant / tested) * 100) : 0;
+            const absRate = (totalAbs !== null && totalAbs > 0 && relevantAbs !== null)
+                ? Math.round((relevantAbs / totalAbs) * 100)
+                : 0;
+            statsDiv.innerHTML = `
+                <div class="stat-item">
+                    <div class="stat-number">${total}</div>
+                    <div class="stat-label">Total Papers</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${tested || total}</div>
+                    <div class="stat-label">Tested Papers</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${relevant}</div>
+                    <div class="stat-label">Relevant Papers</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${rate}%</div>
+                    <div class="stat-label">Rel. Rate</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${absRate}%</div>
+                    <div class="stat-label">Rel. Rate (abs)</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${oaPercentage !== null ? oaPercentage + '%' : 'N/A'}</div>
+                    <div class="stat-label">Open Access</div>
+                </div>
+                <div class="stat-item">
+                    <div class="stat-number">${abstractPercentage !== null ? abstractPercentage + '%' : 'N/A'}</div>
+                    <div class="stat-label">With Abstract</div>
+                </div>
+            `;
+        }
+        function computeAndUpdateRelevanceUsingPapers(papers) {
+            if (!Array.isArray(papers)) papers = [];
+            const total = papers.length;
+            let relevant = 0, relevantAbs = 0, totalAbs = 0;
+            for (const p of papers) {
+                const score = p && (p.relevance_score === true || p.relevance_score === 'true');
+                const hasInv = p && p.abstract_inverted_index && typeof p.abstract_inverted_index === 'object' && Object.keys(p.abstract_inverted_index).length > 0;
+                if (hasInv) totalAbs += 1;
+                if (score) {
+                    relevant += 1;
+                    if (hasInv) relevantAbs += 1;
+                }
+            }
+            updateStats(total, relevant, 0, 0, 0, relevantAbs, totalAbs);
+        }
+        function createSummaryTable(papers) {
+            const tableRows = papers.map((paper, index) => {
+                const title = paper.title || 'No title';
+                const relevanceScore = paper.relevance_score;
+                const relevanceReason = paper.relevance_reason || 'No analysis';
+                const gptAnalysis = paper.gpt_analysis || {};
+                // Check if paper has abstract
+                const hasAbstract = paper.abstract_inverted_index && Object.keys(paper.abstract_inverted_index).length > 0;
+                const aims = hasAbstract ? (gptAnalysis.aims_of_paper || 'Not analyzed') : 'N/A (abstract absent)';
+                const takeaways = hasAbstract ? (gptAnalysis.key_takeaways || 'Not analyzed') : 'N/A (abstract absent)';
+                let relevanceClass = 'relevance-unknown';
+                let relevanceText = 'Unknown';
+                if (relevanceScore === true || relevanceScore === 'true') {
+                    relevanceClass = 'relevance-yes';
+                    relevanceText = 'YES';
+                } else if (relevanceScore === false || relevanceScore === 'false') {
+                    relevanceClass = 'relevance-no';
+                    relevanceText = 'NO';
+                }
+                return `
+                    <tr>
+                        <td>${index + 1}</td>
+                        <td title="${title}">${title.length > 60 ? title.substring(0, 60) + '...' : title}</td>
+                        <td class="${relevanceClass}">${relevanceText}</td>
+                        <td title="${relevanceReason}">${relevanceReason.length > 40 ? relevanceReason.substring(0, 40) + '...' : relevanceReason}</td>
+                        <td title="${aims}">${aims.length > 50 ? aims.substring(0, 50) + '...' : aims}</td>
+                        <td title="${takeaways}">${takeaways.length > 50 ? takeaways.substring(0, 50) + '...' : takeaways}</td>
+                    </tr>
+                `;
+            }).join('');
+            return `
+                <table>
+                    <thead>
+                        <tr>
+                            <th>#</th>
+                            <th>Paper Title</th>
+                            <th>Relevant?</th>
+                            <th>Relevance Reason</th>
+                            <th>Main Aims</th>
+                            <th>Key Takeaways</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+                        ${tableRows}
+                    </tbody>
+                </table>
+            `;
+        }
+        function displayPapers(papers) {
+            const paperListDiv = document.getElementById('paperList');
+            const summarySection = document.getElementById('summarySection');
+            const summaryTable = document.getElementById('summaryTable');
+            if (papers.length === 0) {
+                paperListDiv.innerHTML = '<div class="paper-item">No papers analyzed.</div>';
+                summarySection.style.display = 'none';
+                return;
+            }
+            // Show summary table
+            summarySection.style.display = 'block';
+            summaryTable.innerHTML = createSummaryTable(papers);
+            lastDisplayedPapers = papers;
+            // Update stats based on papers data (overall and with abstracts)
+            computeAndUpdateRelevanceUsingPapers(papers);
+            paperListDiv.innerHTML = papers.map(paper => {
+                // Extract abstract from inverted index
+                let abstract = '';
+                if (paper.abstract_inverted_index) {
+                    const words = [];
+                    for (const [word, positions] of Object.entries(paper.abstract_inverted_index)) {
+                        for (const pos of positions) {
+                            while (words.length <= pos) words.push('');
+                            words[pos] = word;
+                        }
+                    }
+                    abstract = words.join(' ').trim();
+                }
+                // Extract open access info
+                const oa = paper.open_access || {};
+                const isOa = oa.is_oa ? 'Yes' : 'No';
+                const oaStatus = oa.oa_status || '';
+                return `
+                    <div class="paper-item">
+                        <div class="paper-title">${paper.title || 'No title'}</div>
+                        <div class="paper-meta">
+                            <strong>Date:</strong> ${paper.publication_date || 'Unknown'} |
+                            <strong>Type:</strong> ${paper.relationship || 'Unknown'} |
+                            <strong>Open Access:</strong> ${isOa} (${oaStatus}) |
+                            <strong>DOI:</strong> ${paper.doi ? paper.doi.replace('https://doi.org/', '') : 'N/A'}
+                        </div>
+                        <div class="paper-meta">
+                            <strong>Authors:</strong> ${paper.authors ? paper.authors.slice(0, 3).map(a => a.display_name).join(', ') : 'Unknown'}
+                        </div>
+                        <div class="paper-abstract">
+                            ${abstract ? abstract.substring(0, 300) + '...' : 'No abstract available'}
+                        </div>
+                        ${paper.relevance_reason ? `<div class="relevance-reason">${paper.relevance_reason}</div>` : ''}
+                        ${paper.gpt_analysis ? `
+                            <div class="relevance-reason">
+                                <strong>GPT Analysis:</strong><br>
+                                ${paper.gpt_analysis.aims_of_paper && paper.gpt_analysis.aims_of_paper !== 'N/A (abstract absent)' ?
+                                    `<strong>Aims:</strong> ${paper.gpt_analysis.aims_of_paper}<br>` : ''}
+                                ${paper.gpt_analysis.key_takeaways && paper.gpt_analysis.key_takeaways !== 'N/A (abstract absent)' ?
+                                    `<strong>Key Takeaways:</strong> ${paper.gpt_analysis.key_takeaways}` : ''}
+                            </div>
+                        ` : ''}
+                    </div>
+                `;
+            }).join('');
+        }
+        async function exportToExcel() {
+            try {
+                const response = await fetch('/api/export-excel');
+                if (response.ok) {
+                    const blob = await response.blob();
+                    const url = window.URL.createObjectURL(blob);
+                    const a = document.createElement('a');
+                    a.href = url;
+                    a.download = `research_papers_${new Date().toISOString().split('T')[0]}.xlsx`;
+                    document.body.appendChild(a);
+                    a.click();
+                    window.URL.revokeObjectURL(url);
+                    document.body.removeChild(a);
+                } else {
+                    const error = await response.json();
+                    alert(`Error exporting Excel: ${error.error}`);
+                }
+            } catch (error) {
+                alert(`Error exporting Excel: ${error.message}`);
+            }
+        }
+        async function downloadCollectionExcel() {
+            if (!currentCollectionFile) {
+                alert('No collection file available');
+                return;
+            }
+            try {
+                const response = await fetch(`/api/export-excel/${currentCollectionFile}`);
+                if (response.ok) {
+                    const blob = await response.blob();
+                    const url = window.URL.createObjectURL(blob);
+                    const a = document.createElement('a');
+                    a.href = url;
+                    a.download = `collection_${currentCollectionFile.replace('.pkl', '')}.xlsx`;
+                    document.body.appendChild(a);
+                    a.click();
+                    window.URL.revokeObjectURL(url);
+                    document.body.removeChild(a);
+                } else {
+                    const error = await response.json();
+                    alert(`Error exporting Excel: ${error.error}`);
+                }
+            } catch (error) {
+                alert(`Error exporting Excel: ${error.message}`);
+            }
+        }
+        async function downloadFilterExcel() {
+            if (!currentFilterFile) {
+                alert('No filter file available');
+                return;
+            }
+            try {
+                const response = await fetch(`/api/export-excel/${currentFilterFile}`);
+                if (response.ok) {
+                    const blob = await response.blob();
+                    const url = window.URL.createObjectURL(blob);
+                    const a = document.createElement('a');
+                    a.href = url;
+                    a.download = `filter_${currentFilterFile.replace('.pkl', '')}.xlsx`;
+                    document.body.appendChild(a);
+                    a.click();
+                    window.URL.revokeObjectURL(url);
+                    document.body.removeChild(a);
+                } else {
+                    const error = await response.json();
+                    alert(`Error exporting Excel: ${error.error}`);
+                }
+            } catch (error) {
+                alert(`Error exporting Excel: ${error.message}`);
+            }
+        }
+        async function loadHistory() {
+            try {
+                const response = await fetch('/api/database-files');
+                const data = await response.json();
+                if (data.success) {
+                    buildHistoryIndex(data.files);
+                    displayHistory(data.files);
+                }
+            } catch (error) {
+                console.error('Error loading history:', error);
+            }
+        }
+        function buildHistoryIndex(files) {
+            historyIndex = { collections: {}, filters: {}, currentCollectionId: null };
+            files.forEach(file => {
+                if (file.type === 'collection') {
+                    const id = file.work_identifier || file.filename.replace('.pkl','');
+                    historyIndex.collections[id] = file;
+                } else if (file.type === 'filter') {
+                    // Group filters by their source collection
+                    const sourceCollection = file.source_collection || 'unknown';
+                    if (!historyIndex.filters[sourceCollection]) {
+                        historyIndex.filters[sourceCollection] = [];
+                    }
+                    historyIndex.filters[sourceCollection].push(file);
+                }
+            });
+        }
+        function displayHistory(files) {
+            const collectionsList = document.getElementById('collectionsList');
+            const filtersList = document.getElementById('filtersList');
+            const filtersContainer = document.getElementById('filtersContainer');
+            // Separate collections and filters
+            const collections = files.filter(file => file.type === 'collection');
+            const filters = files.filter(file => file.type === 'filter');
+            if (collections.length === 0) {
+                collectionsList.innerHTML = '<div class="history-item">No collections found</div>';
+                return;
+            }
+            // Display collections
+            collectionsList.innerHTML = collections.map(collection => {
+                const title = collection.title || collection.work_identifier || 'UNTITLED COLLECTION';
+                const linkedFilters = filters.filter(filter => filter.source_collection === collection.work_identifier);
+                return `
+                    <div class="history-item collection-item" data-collection="${collection.work_identifier || ''}" onclick="selectCollection('${collection.filename}', '${collection.work_identifier || ''}', '${title}')" draggable="true" ondragstart="dragCollection(event, '${collection.filename}', '${title}', ${collection.total_papers || 0})">
+                        <div class="history-title">${title}</div>
+                        <div class="history-meta">${collection.created}</div>
+                        <div class="history-meta">${(collection.size / 1024).toFixed(1)} KB</div>
+                        <div class="history-meta">${collection.total_papers || 0} PAPER${(collection.total_papers || 0) !== 1 ? 'S' : ''}</div>
+                        <div class="history-meta">${linkedFilters.length} FILTER${linkedFilters.length !== 1 ? 'S' : ''}</div>
+                        <div style="margin-top:8px; display:grid; grid-template-columns: 1fr 1fr; grid-template-rows: 1fr 1fr; gap:6px; width:100%;">
+                            <button onclick="event.stopPropagation(); openCollection('${collection.filename}', '${collection.work_identifier || ''}')" class="download-btn" style="margin:0;">OPEN</button>
+                            <button onclick="event.stopPropagation(); downloadHistoryExcel('${collection.filename}')" class="download-btn" style="margin:0;">DOWNLOAD</button>
+                            <button onclick="event.stopPropagation(); generateBibtex('${collection.filename}')" class="download-btn" style="margin:0;">BIBTEX</button>
+                            <button onclick="event.stopPropagation(); deleteHistoryFile('${collection.filename}', '${collection.type}')" class="delete-btn" style="margin:0;">DELETE</button>
+                        </div>
+                    </div>
+                `;
+            }).join('');
+        }
+        function selectCollection(filename, workIdentifier, title) {
+            // Get filters for this collection
+            const filters = historyIndex.filters[workIdentifier] || [];
+            const filtersContainer = document.getElementById('filtersContainer');
+            const filtersPanel = document.getElementById('filtersPanel');
+            if (filters.length === 0) {
+                filtersContainer.innerHTML = '<div class="history-item">NO FILTERS FOUND</div>';
+            } else {
+                filtersContainer.innerHTML = filters.map(filter => {
+                    const filterTitle = filter.research_question || filter.filter_identifier || 'UNTITLED FILTER';
+                    const papersTested = filter.tested_papers || filter.papers_tested || filter.total_papers || 'N/A';
+                    return `
+                        <div class="history-item filter-item" data-filter-source="${filter.source_collection || ''}" onclick="openFilter('${filter.filename}', '${filter.source_collection || ''}')">
+                            <div class="history-title">${filterTitle}</div>
+                            <div class="history-meta">${filter.created}</div>
+                            <div class="history-meta">${(filter.size / 1024).toFixed(1)} KB</div>
+                            <div class="history-meta">${papersTested} PAPERS TESTED</div>
+                            <div style="margin-top:8px; display:flex; gap:6px;">
+                                <button onclick="event.stopPropagation(); openFilter('${filter.filename}', '${filter.source_collection || ''}')" class="download-btn">OPEN</button>
+                                <button onclick="event.stopPropagation(); downloadHistoryExcel('${filter.filename}')" class="download-btn">DOWNLOAD</button>
+                                <button onclick="event.stopPropagation(); deleteHistoryFile('${filter.filename}', '${filter.type}')" class="delete-btn">DELETE</button>
+                            </div>
+                        </div>
+                    `;
+                }).join('');
+            }
+            // Show filters panel with animation
+            filtersPanel.classList.add('visible');
+        }
+        window.highlightLinked = function(el, on) {
+            try {
+                const src = el.getAttribute('data-filter-source');
+                if (src) {
+                    const items = document.querySelectorAll(`[data-collection="${src}"]`);
+                    items.forEach(item => item.classList.toggle('highlight', on));
+                }
+            } catch (e) {}
+        }
+        window.openCollection = async function(filename, workIdentifier) {
+            try {
+                const response = await fetch(`/api/load-database-file/${filename}`);
+                const data = await response.json();
+                if (data.success) {
+                    const fileData = data.data || {};
+                    const papers = fileData.papers || [];
+                    displayPapers(papers);
+                    document.getElementById('resultsSection').style.display = 'block';
+                    updateStats(fileData.total_papers || papers.length || 0, 0, fileData.cited_papers || 0, fileData.citing_papers || 0, fileData.related_papers || 0);
+                    currentCollectionFile = filename; currentFilterFile = null; historyIndex.currentCollectionId = workIdentifier || (fileData.work_identifier || '');
+                    document.getElementById('collectDownload').style.display = 'block';
+                    document.getElementById('filterDownload').style.display = 'none';
+                    // Enable filter button when opening a collection
+                    document.getElementById('filterBtn').disabled = false;
+                    // Save papers to temp file for filtering
+                    collectedPapers = papers;
+                }
+            } catch (error) {
+                alert(`Error opening collection: ${error.message}`);
+            }
+        }
+        window.openFilter = async function(filename, sourceCollectionId) {
+            try {
+                const response = await fetch(`/api/load-database-file/${filename}`);
+                const data = await response.json();
+                if (data.success) {
+                    const fileData = data.data || {};
+                    const papers = fileData.papers || [];
+                    // Populate Step 2 with the research question
+                    const researchQuestion = fileData.research_question || '';
+                    const paperLimit = fileData.tested_papers || fileData.limit || 10;
+                    document.getElementById('researchQuestion').value = researchQuestion;
+                    document.getElementById('paperLimit').value = paperLimit;
+                    // Display the filtered papers
+                    displayPapers(papers);
+                    document.getElementById('resultsSection').style.display = 'block';
+                    // Update stats with all saved statistics
+                    const totalPapers = fileData.total_papers || 0;
+                    const relevantPapers = fileData.relevant_papers || papers.length || 0;
+                    const testedPapers = fileData.tested_papers || fileData.limit || 0;
+                    const oaPercentage = fileData.oa_percentage || null;
+                    const abstractPercentage = fileData.abstract_percentage || null;
+                    updateStats(
+                        totalPapers,
+                        relevantPapers,
+                        0, // cited
+                        0, // citing
+                        0, // related
+                        null, // relevantAbs
+                        null, // totalAbs
+                        testedPapers, // tested
+                        oaPercentage, // oaPercentage
+                        abstractPercentage // abstractPercentage
+                    );
+                    // Update state
+                    currentFilterFile = filename;
+                    currentCollectionFile = null;
+                    historyIndex.currentCollectionId = sourceCollectionId || fileData.source_collection || null;
+                    // Show appropriate download buttons
+                    document.getElementById('filterDownload').style.display = 'block';
+                    document.getElementById('collectDownload').style.display = 'none';
+                    // Enable the filter button since we have a research question
+                    document.getElementById('filterBtn').disabled = false;
+                }
+            } catch (error) {
+                alert(`Error opening filter: ${error.message}`);
+            }
+        }
+        async function loadHistoryFile(filename) {
+            try {
+                const response = await fetch(`/api/load-database-file/${filename}`);
+                const data = await response.json();
+                if (data.success) {
+                    const fileData = data.data;
+                    if (fileData.papers) {
+                        displayPapers(fileData.papers);
+                        document.getElementById('resultsSection').style.display = 'block';
+                    }
+                }
+            } catch (error) {
+                alert(`Error loading file: ${error.message}`);
+            }
+        }
+        async function downloadHistoryExcel(filename) {
+            try {
+                const response = await fetch(`/api/export-excel/${filename}`);
+                if (response.ok) {
+                    const blob = await response.blob();
+                    const url = window.URL.createObjectURL(blob);
+                    const a = document.createElement('a');
+                    a.href = url;
+                    a.download = filename.replace('.pkl', '.xlsx');
+                    document.body.appendChild(a);
+                    a.click();
+                    window.URL.revokeObjectURL(url);
+                    document.body.removeChild(a);
+                } else {
+                    const error = await response.json();
+                    alert(`Error exporting Excel: ${error.error}`);
+                }
+            } catch (error) {
+                alert(`Error exporting Excel: ${error.message}`);
+            }
+        }
+        async function generateBibtex(filename) {
+            try {
+                // Show loading state
+                const button = event.target;
+                const originalText = button.textContent;
+                button.textContent = 'GENERATING...';
+                button.disabled = true;
+                const response = await fetch(`/api/generate-bibtex/${filename}`, {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json'
+                    }
+                });
+                const result = await response.json();
+                if (result.success) {
+                    // Download the generated BibTeX file
+                    try {
+                        const downloadResponse = await fetch(`/api/download-database-file/${result.filename}`);
+                        if (downloadResponse.ok) {
+                            const blob = await downloadResponse.blob();
+                            const url = window.URL.createObjectURL(blob);
+                            const a = document.createElement('a');
+                            a.href = url;
+                            a.download = result.filename;
+                            document.body.appendChild(a);
+                            a.click();
+                            window.URL.revokeObjectURL(url);
+                            document.body.removeChild(a);
+                            alert(`BibTeX file generated and downloaded successfully with ${result.entries_count} entries!`);
+                        } else {
+                            const errorText = await downloadResponse.text();
+                            console.error('Download failed:', downloadResponse.status, errorText);
+                            alert(`BibTeX generated but download failed (${downloadResponse.status}). The file is saved in the database directory.`);
+                        }
+                    } catch (downloadError) {
+                        console.error('Download error:', downloadError);
+                        alert(`BibTeX generated but download failed: ${downloadError.message}. The file is saved in the database directory.`);
+                    }
+                } else {
+                    alert(`Error generating BibTeX: ${result.message}`);
+                }
+            } catch (error) {
+                alert(`Error generating BibTeX: ${error.message}`);
+            } finally {
+                // Restore button state
+                const button = event.target;
+                button.textContent = 'BIBTEX';
+                button.disabled = false;
+            }
+        }
+         async function deleteHistoryFile(filename, type) {
+             const confirmation = prompt(`Are you sure you want to delete this ${type}?\n\nType "delete" to confirm deletion of: ${filename}`);
+             if (confirmation !== 'delete') {
+                 return;
+             }
+             try {
+                 const response = await fetch(`/api/delete-database-file/${filename}`, {
+                     method: 'DELETE'
+                 });
+                 const data = await response.json();
+                 if (data.success) {
+                     alert('File deleted successfully');
+                     // Reload history to update the list
+                     loadHistory();
+                 } else {
+                     alert(`Error deleting file: ${data.error}`);
+                 }
+             } catch (error) {
+                 alert(`Error deleting file: ${error.message}`);
+             }
+         }
+        // Merge functionality
+        let mergedCollections = [];
+        function dragCollection(event, filename, title, paperCount) {
+            event.dataTransfer.setData("text/plain", JSON.stringify({
+                filename: filename,
+                title: title,
+                paperCount: paperCount
+            }));
+        }
+        function allowDrop(event) {
+            event.preventDefault();
+        }
+        function dropCollection(event) {
+            event.preventDefault();
+            const data = JSON.parse(event.dataTransfer.getData("text/plain"));
+            // Check if collection is already in merge box
+            if (mergedCollections.some(item => item.filename === data.filename)) {
+                return;
+            }
+            mergedCollections.push(data);
+            updateMergeBox();
+        }
+        function updateMergeBox() {
+            const mergeItems = document.getElementById('mergeItems');
+            const mergeActions = document.getElementById('mergeActions');
+            const placeholder = document.querySelector('.merge-placeholder');
+            if (mergedCollections.length === 0) {
+                mergeItems.innerHTML = '';
+                mergeActions.style.display = 'none';
+                placeholder.style.display = 'block';
+            } else {
+                placeholder.style.display = 'none';
+                mergeActions.style.display = 'flex';
+                mergeItems.innerHTML = mergedCollections.map((item, index) => `
+                    <div class="merge-item">
+                        <span>${item.title} (${item.paperCount} papers)</span>
+                        <button onclick="removeFromMerge(${index})" style="background:none; border:none; color:#ffffff; cursor:pointer; font-size:12px;">×</button>
+                    </div>
+                `).join('');
+            }
+        }
+        function removeFromMerge(index) {
+            mergedCollections.splice(index, 1);
+            updateMergeBox();
+        }
+        function clearMergeBox() {
+            mergedCollections = [];
+            updateMergeBox();
+        }
+        async function saveMergedCollection() {
+            if (mergedCollections.length < 2) {
+                alert('Please add at least 2 collections to merge');
+                return;
+            }
+            try {
+                const response = await fetch('/api/merge-collections', {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json'
+                    },
+                    body: JSON.stringify({
+                        collections: mergedCollections.map(item => item.filename)
+                    })
+                });
+                const result = await response.json();
+                if (result.success) {
+                    alert(`Merged collection created successfully with ${result.total_papers} papers!`);
+                    clearMergeBox();
+                    loadHistory(); // Refresh the collections list
+                } else {
+                    alert(`Error merging collections: ${result.message}`);
+                }
+            } catch (error) {
+                alert(`Error merging collections: ${error.message}`);
+            }
+        }
+    </script>
+</body>
+</html>