echoboi commited on
Commit
7f63498
·
verified ·
1 Parent(s): e8f05bd

Upload folder using huggingface_hub

Browse files
Files changed (19) hide show
  1. .gitattributes +3 -0
  2. PROGRESS.md +16 -0
  3. __pycache__/app.cpython-312.pyc +2 -2
  4. app.py +216 -31
  5. database/collections/multi_seed_multi_seed_1758155685.json +3 -0
  6. database/collections/multi_seed_multi_seed_1758155809.json +3 -0
  7. database/collections/multi_seed_multi_seed_1758156063.json +3 -0
  8. database/collections/multi_seed_multi_seed_1758156344.json +0 -0
  9. database/collections/multi_seed_multi_seed_1758156664.json +0 -0
  10. database/filters/multi_seed_multi_seed_1758155809__filter__What_are_the_key_aspects_of_just_transit__20250918_122330.json +0 -0
  11. database/filters/multi_seed_multi_seed_1758155809__filter__What_are_the_key_aspects_of_just_transit__20250918_123000.json +0 -0
  12. database/filters/multi_seed_multi_seed_1758156664__filter__What_are_the_key_aspects_of_just_transit__20250918_025849.json +13 -0
  13. database/filters/multi_seed_multi_seed_1758156926__filter__What_are_the_key_aspects_of_just_transit__20250918_120128.json +13 -0
  14. database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_115925.json +13 -0
  15. database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122404.json +0 -0
  16. database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122747.json +0 -0
  17. database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122824.json +0 -0
  18. requirements.txt +1 -0
  19. templates/index.html +459 -53
.gitattributes CHANGED
@@ -37,3 +37,6 @@ ai_slr/__pycache__/app.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
37
  ai_slr/__pycache__/app_backup.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
38
  __pycache__/app.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
39
  __pycache__/app_backup.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
 
 
 
 
37
  ai_slr/__pycache__/app_backup.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
38
  __pycache__/app.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
39
  __pycache__/app_backup.cpython-312.pyc filter=lfs diff=lfs merge=lfs -text
40
+ database/collections/multi_seed_multi_seed_1758155685.json filter=lfs diff=lfs merge=lfs -text
41
+ database/collections/multi_seed_multi_seed_1758155809.json filter=lfs diff=lfs merge=lfs -text
42
+ database/collections/multi_seed_multi_seed_1758156063.json filter=lfs diff=lfs merge=lfs -text
PROGRESS.md CHANGED
@@ -52,3 +52,19 @@
52
  **Title:** Updated multi-seed collection results to show same detailed breakdown as regular collections
53
  **Summary:** Enhanced progress display to show detailed seed-by-seed progress with current seed count and remaining seeds, updated completion messages to distinguish between "Collection" and "Multi-Seed Collection", added deduplication statistics display showing duplicates removed, and ensured multi-seed collections display the same comprehensive breakdown (cited + citing + related papers) as regular collections for consistency.
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  **Title:** Updated multi-seed collection results to show same detailed breakdown as regular collections
53
  **Summary:** Enhanced progress display to show detailed seed-by-seed progress with current seed count and remaining seeds, updated completion messages to distinguish between "Collection" and "Multi-Seed Collection", added deduplication statistics display showing duplicates removed, and ensured multi-seed collections display the same comprehensive breakdown (cited + citing + related papers) as regular collections for consistency.
54
 
55
+ ## 2025-01-09 15:43:00 - Pre-GPT Filtering and Local Development Environment
56
+ **Title:** Added pre-filtering functionality and complete local development setup
57
+ **Summary:** Implemented pre-GPT filtering by publication date range and keyword search, added Step 2 for optional pre-filtering before GPT analysis, created complete local development environment with virtual environment setup, automated scripts for easy local running, environment configuration, and comprehensive documentation. Users can now filter papers by date/keywords before GPT analysis, and developers can run the app locally for testing and development.
58
+
59
+ ## 2025-01-09 15:44:00 - Collection Loading Enhancements and UI Improvements
60
+ **Title:** Enhanced collection loading with visual indicators and improved seed paper display
61
+ **Summary:** Added glowing green dot loading indicator with pulsing animation when collections are loaded, shows "Loading collection..." then "Collection loaded successfully!" status, displays seed papers from multi-seed collections in the SELECTED SEED PAPERS box when collections are opened, removed automatic "just transitions" search suggestions on page load, and improved user experience with clear visual feedback during collection loading operations.
62
+
63
+ ## 2025-01-09 15:45:00 - Detailed Seed Paper Information and Collection Stats
64
+ **Title:** Enhanced seed paper display with real author details and paper counts
65
+ **Summary:** Added get_paper_details function to fetch actual author names, publication years, and venues for seed papers, updated backend to store detailed seed information including paper counts (cited, citing, related), enhanced frontend to display real author details instead of placeholder text, added collection statistics box below seed papers showing total papers and breakdown, and improved seed paper display to show "Papers found: X (Y cited, Z citing, W related)" for each seed.
66
+
67
+ ## 2025-01-09 15:46:00 - Smart Collection Display with Seed Information
68
+ **Title:** Enhanced collection history display with intelligent seed paper information
69
+ **Summary:** Updated collection display logic to show different formats based on collection type: single seed collections show the actual paper title, multi-seed collections show "Paper1, Paper2 + X others" format, merged collections show "Merged Collection" label, added hover tooltips showing full seed details for multi-seed collections, implemented async loading of collection details to display actual seed paper titles instead of generic labels, and enhanced user experience with clear visual distinction between collection types.
70
+
__pycache__/app.cpython-312.pyc CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8305c27cdf38c4c9dd47539d0f12448b5f5acac45fa4c2281b7410515f4a8028
3
- size 109414
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ec346baa18a6de961142ca6f27630bca2b861f655858529f619fe957fcee7ec
3
+ size 125221
app.py CHANGED
@@ -105,6 +105,86 @@ def get_all_pages(url, headers, upper_limit=None):
105
  return all_results
106
 
107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  def get_related_papers(work_id, upper_limit=None, progress_callback=None):
109
  # Define base URL for OpenAlex API
110
  base_url = "https://api.openalex.org/works"
@@ -264,7 +344,27 @@ import time
264
 
265
  def analyze_paper_relevance(content: Dict[str, str], research_question: str, api_key: str) -> Optional[Dict]:
266
  """Analyze if a paper is relevant to the research question using GPT-5 mini."""
267
- client = OpenAI(api_key=api_key)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
268
 
269
  title = content.get('title', '')
270
  abstract = content.get('abstract', '')
@@ -314,11 +414,11 @@ def analyze_paper_relevance(content: Dict[str, str], research_question: str, api
314
  # Try GPT-5 mini first, fallback to gpt-4o-mini if it fails
315
  try:
316
  print("DEBUG: Trying GPT-5 nano...")
317
- response = client.responses.create(
318
  model="gpt-5-nano",
319
- input=prompt,
320
- reasoning={"effort": "minimal"},
321
- text={"verbosity": "low"}
322
  )
323
  print("DEBUG: GPT-5 nano response received")
324
  except Exception as e:
@@ -338,18 +438,9 @@ def analyze_paper_relevance(content: Dict[str, str], research_question: str, api
338
  print(f"DEBUG: Response attributes: {dir(response)}")
339
 
340
  if hasattr(response, 'choices') and response.choices:
341
- # Old format (chat completions)
342
  print("DEBUG: Using chat completions format")
343
  result = response.choices[0].message.content
344
- elif hasattr(response, 'output'):
345
- # New format (responses) - extract text from output
346
- print("DEBUG: Using responses format")
347
- result = ""
348
- for item in response.output:
349
- if hasattr(item, "content") and item.content:
350
- for content in item.content:
351
- if hasattr(content, "text") and content.text:
352
- result += content.text
353
  else:
354
  print("DEBUG: Unexpected response format")
355
  print(f"DEBUG: Response: {response}")
@@ -403,13 +494,43 @@ def extract_abstract_from_inverted_index(inverted_index: Dict) -> str:
403
  def analyze_single_paper(paper: Dict, research_question: str, api_key: str) -> Optional[Dict]:
404
  """Analyze a single paper with its own client."""
405
  try:
406
- client = OpenAI(api_key=api_key)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
407
 
408
  # Extract title and abstract
409
  title = paper.get('title', '')
410
  abstract = extract_abstract_from_inverted_index(paper.get('abstract_inverted_index', {}))
411
 
 
 
 
412
  if not title and not abstract:
 
413
  return None
414
 
415
  # Create content for analysis
@@ -419,30 +540,39 @@ def analyze_single_paper(paper: Dict, research_question: str, api_key: str) -> O
419
  }
420
 
421
  # Analyze with GPT
 
422
  analysis = analyze_paper_relevance_with_client(content, research_question, client)
 
423
 
424
  if analysis:
425
  paper['gpt_analysis'] = analysis
426
  paper['relevance_reason'] = analysis.get('relevance_reason', 'Analysis completed')
427
  paper['relevance_score'] = analysis.get('relevant', False)
 
428
  return paper
429
 
 
430
  return None
431
 
432
  except Exception as e:
 
433
  return None
434
 
435
  def analyze_paper_batch(papers_batch: List[Dict], research_question: str, api_key: str, batch_id: int) -> List[Dict]:
436
  """Analyze a batch of papers in parallel using ThreadPoolExecutor."""
 
 
437
  results = []
438
 
439
  # Use ThreadPoolExecutor to process papers in parallel within the batch
440
  with concurrent.futures.ThreadPoolExecutor(max_workers=len(papers_batch)) as executor:
 
441
  # Submit all papers for parallel processing
442
  future_to_paper = {
443
  executor.submit(analyze_single_paper, paper, research_question, api_key): paper
444
  for paper in papers_batch
445
  }
 
446
 
447
  # Collect results as they complete
448
  for future in concurrent.futures.as_completed(future_to_paper):
@@ -486,15 +616,19 @@ def analyze_paper_relevance_with_client(content: Dict[str, str], research_questi
486
  """
487
 
488
  try:
 
489
  # Try GPT-5 nano first, fallback to gpt-4o-mini if it fails
490
  try:
491
- response = client.responses.create(
 
492
  model="gpt-5-nano",
493
- input=prompt,
494
- reasoning={"effort": "minimal"},
495
- text={"verbosity": "low"}
496
  )
 
497
  except Exception as e:
 
498
  response = client.chat.completions.create(
499
  model="gpt-4o-mini",
500
  messages=[{
@@ -503,37 +637,45 @@ def analyze_paper_relevance_with_client(content: Dict[str, str], research_questi
503
  }],
504
  max_completion_tokens=1000
505
  )
 
506
 
507
  # Handle different response formats
508
  result = None
 
 
 
509
  if hasattr(response, 'choices') and response.choices:
510
- # Old format (chat completions)
 
511
  result = response.choices[0].message.content
512
- elif hasattr(response, 'output'):
513
- # New format (responses) - extract text from output
514
- result = ""
515
- for item in response.output:
516
- if hasattr(item, "content") and item.content:
517
- for content in item.content:
518
- if hasattr(content, "text") and content.text:
519
- result += content.text
520
  else:
 
521
  return None
522
 
523
  if not result:
 
524
  return None
525
 
526
  # Clean and parse the JSON response
527
  result = result.strip()
 
 
528
  if result.startswith("```json"):
529
  result = result[7:]
530
  if result.endswith("```"):
531
  result = result[:-3]
532
 
 
 
533
  # Try to parse JSON
534
  try:
535
- return json.loads(result.strip())
536
- except json.JSONDecodeError:
 
 
 
 
537
  return None
538
 
539
  except Exception as e:
@@ -541,7 +683,13 @@ def analyze_paper_relevance_with_client(content: Dict[str, str], research_questi
541
 
542
  def filter_papers_for_research_question(papers: List[Dict], research_question: str, api_key: str, limit: int = 10) -> List[Dict]:
543
  """Analyze exactly 'limit' number of papers for relevance using parallel processing."""
 
 
 
 
 
544
  if not papers or not research_question:
 
545
  return []
546
 
547
  if not api_key:
@@ -557,13 +705,16 @@ def filter_papers_for_research_question(papers: List[Dict], research_question: s
557
 
558
  # Process all papers in parallel (no batching needed for small numbers)
559
  all_results = []
 
560
 
561
  with concurrent.futures.ThreadPoolExecutor(max_workers=min(limit, 20)) as executor:
 
562
  # Submit all papers for parallel processing
563
  future_to_paper = {
564
  executor.submit(analyze_single_paper, paper, research_question, api_key): paper
565
  for paper in papers_to_analyze
566
  }
 
567
 
568
  # Collect results as they complete
569
  for future in concurrent.futures.as_completed(future_to_paper):
@@ -1525,9 +1676,15 @@ def collect_multiple_seeds_async(seed_papers, limit, task_id):
1525
  existing_paper['contributing_seeds'].append(i + 1)
1526
  break
1527
 
 
 
 
1528
  seed_results.append({
1529
  'work_id': work_id,
1530
  'title': seed_title,
 
 
 
1531
  'papers_found': new_papers,
1532
  'total_papers_from_seed': len(papers),
1533
  'cited_papers': cited_count,
@@ -1588,11 +1745,23 @@ def collect_multiple_seeds_async(seed_papers, limit, task_id):
1588
  with open(temp_path, 'w', encoding='utf-8') as f:
1589
  json.dump(all_papers, f, indent=2, ensure_ascii=False)
1590
 
 
 
 
 
 
 
 
 
 
 
 
1591
  # Create combined collection data
1592
  combined_title = f"Multi-Seed Collection ({total_seeds} seeds)"
1593
  collection_data = {
1594
  'work_id': f"multi_seed_{task_id}",
1595
  'title': combined_title,
 
1596
  'total_papers': len(all_papers),
1597
  'cited_papers': final_cited_count,
1598
  'citing_papers': final_citing_count,
@@ -1601,6 +1770,7 @@ def collect_multiple_seeds_async(seed_papers, limit, task_id):
1601
  'papers': all_papers,
1602
  'seed_results': seed_results,
1603
  'total_seeds': total_seeds,
 
1604
  'deduplication_stats': {
1605
  'total_papers_before_dedup': total_papers_before_dedup,
1606
  'duplicates_removed': duplicates_removed,
@@ -1765,6 +1935,7 @@ def collect_papers():
1765
  def filter_papers():
1766
  """Filter papers based on research question."""
1767
  try:
 
1768
  data = request.get_json()
1769
  research_question = data.get('research_question', '').strip()
1770
  limit = data.get('limit', 10) # Default to 10 most recent relevant papers
@@ -1772,6 +1943,11 @@ def filter_papers():
1772
  papers_data = data.get('papers') # Papers passed directly from frontend
1773
  user_api_key = data.get('user_api_key') # User's own API key for large analyses
1774
 
 
 
 
 
 
1775
  if not research_question:
1776
  return jsonify({'error': 'Research question is required'}), 400
1777
 
@@ -1793,6 +1969,8 @@ def filter_papers():
1793
 
1794
  # Use user's API key if provided, otherwise use default
1795
  api_key_to_use = user_api_key if user_api_key else OPENAI_API_KEY
 
 
1796
 
1797
  if not api_key_to_use:
1798
  return jsonify({
@@ -1800,8 +1978,15 @@ def filter_papers():
1800
  }), 400
1801
 
1802
  # Filter papers using custom analyzer (returns top N most recent relevant papers)
 
 
 
 
 
1803
  relevant_papers = filter_papers_for_research_question(papers, research_question, api_key_to_use, limit)
1804
 
 
 
1805
  # Determine source collection id for linkage
1806
  source_collection_id = None
1807
  if provided_source_collection:
 
105
  return all_results
106
 
107
 
108
+ def get_paper_details(work_id):
109
+ """Get detailed information about a specific paper."""
110
+ try:
111
+ work_url = f"https://api.openalex.org/works/{work_id}"
112
+ work_response = requests.get(work_url, timeout=30)
113
+ work_data = work_response.json()
114
+
115
+ if not work_data or 'id' not in work_data:
116
+ return {}
117
+
118
+ # Extract authors
119
+ authors = []
120
+ if work_data.get('authorships'):
121
+ for authorship in work_data['authorships']:
122
+ author = authorship.get('author', {})
123
+ if author.get('display_name'):
124
+ authors.append(author['display_name'])
125
+
126
+ # Extract venue
127
+ venue = ''
128
+ if work_data.get('primary_location', {}).get('source', {}).get('display_name'):
129
+ venue = work_data['primary_location']['source']['display_name']
130
+
131
+ return {
132
+ 'authors': authors,
133
+ 'publication_date': work_data.get('publication_date', ''),
134
+ 'venue': venue,
135
+ 'title': work_data.get('title', ''),
136
+ 'id': work_data.get('id', '')
137
+ }
138
+
139
+ except Exception as e:
140
+ print(f"Error getting paper details for {work_id}: {e}")
141
+ return {}
142
+
143
+ def find_existing_single_seed_collection(work_id):
144
+ """Check if we already have a single-seed collection for this work_id."""
145
+ try:
146
+ collections_dir = os.path.join(os.getcwd(), 'collections')
147
+ if not os.path.exists(collections_dir):
148
+ return None
149
+
150
+ # Look for files that might contain this work_id
151
+ for filename in os.listdir(collections_dir):
152
+ if filename.endswith('.pkl'):
153
+ file_path = os.path.join(collections_dir, filename)
154
+ try:
155
+ with open(file_path, 'rb') as f:
156
+ data = pickle.load(f)
157
+
158
+ # Check if this is a single-seed collection for this work_id
159
+ if (data.get('work_identifier') == work_id and
160
+ data.get('collection_type') == 'single_seed'):
161
+ print(f"Found existing single-seed collection for {work_id}: {filename}")
162
+ return file_path
163
+ except Exception as e:
164
+ print(f"Error reading {filename}: {e}")
165
+ continue
166
+ except Exception as e:
167
+ print(f"Error searching for existing collections: {e}")
168
+
169
+ return None
170
+
171
+ def load_existing_single_seed_collection(file_path):
172
+ """Load an existing single-seed collection."""
173
+ try:
174
+ with open(file_path, 'rb') as f:
175
+ data = pickle.load(f)
176
+
177
+ # Extract papers and add relationship info
178
+ papers = data.get('papers', [])
179
+ for paper in papers:
180
+ if 'relationship' not in paper:
181
+ paper['relationship'] = 'unknown' # Default relationship
182
+
183
+ return papers
184
+ except Exception as e:
185
+ print(f"Error loading existing collection from {file_path}: {e}")
186
+ return []
187
+
188
  def get_related_papers(work_id, upper_limit=None, progress_callback=None):
189
  # Define base URL for OpenAlex API
190
  base_url = "https://api.openalex.org/works"
 
344
 
345
  def analyze_paper_relevance(content: Dict[str, str], research_question: str, api_key: str) -> Optional[Dict]:
346
  """Analyze if a paper is relevant to the research question using GPT-5 mini."""
347
+ print(f"DEBUG: Starting analyze_paper_relevance with API key length: {len(api_key) if api_key else 0}")
348
+
349
+ try:
350
+ print("DEBUG: Attempting to create OpenAI client...")
351
+ # Try to create client with minimal parameters to avoid proxies issue
352
+ client = OpenAI(api_key=api_key)
353
+ print("DEBUG: OpenAI client created successfully")
354
+ except Exception as e:
355
+ print(f"DEBUG: Error creating OpenAI client: {e}")
356
+ print(f"DEBUG: Error type: {type(e)}")
357
+ print(f"DEBUG: Error args: {e.args}")
358
+ # If there's any error with client creation, try with explicit parameters
359
+ try:
360
+ print("DEBUG: Trying alternative client creation with timeout...")
361
+ client = OpenAI(api_key=api_key, timeout=30.0)
362
+ print("DEBUG: Alternative OpenAI client created successfully")
363
+ except Exception as e2:
364
+ print(f"DEBUG: Failed to create OpenAI client with alternative method: {e2}")
365
+ print(f"DEBUG: Alternative error type: {type(e2)}")
366
+ print(f"DEBUG: Alternative error args: {e2.args}")
367
+ return None
368
 
369
  title = content.get('title', '')
370
  abstract = content.get('abstract', '')
 
414
  # Try GPT-5 mini first, fallback to gpt-4o-mini if it fails
415
  try:
416
  print("DEBUG: Trying GPT-5 nano...")
417
+ response = client.chat.completions.create(
418
  model="gpt-5-nano",
419
+ messages=[{"role": "user", "content": prompt}],
420
+ max_tokens=1000,
421
+ temperature=0.1
422
  )
423
  print("DEBUG: GPT-5 nano response received")
424
  except Exception as e:
 
438
  print(f"DEBUG: Response attributes: {dir(response)}")
439
 
440
  if hasattr(response, 'choices') and response.choices:
441
+ # Standard chat completions format
442
  print("DEBUG: Using chat completions format")
443
  result = response.choices[0].message.content
 
 
 
 
 
 
 
 
 
444
  else:
445
  print("DEBUG: Unexpected response format")
446
  print(f"DEBUG: Response: {response}")
 
494
  def analyze_single_paper(paper: Dict, research_question: str, api_key: str) -> Optional[Dict]:
495
  """Analyze a single paper with its own client."""
496
  try:
497
+ print(f"DEBUG: Starting analysis for paper: {paper.get('title', 'No title')[:50]}...")
498
+ print(f"DEBUG: API key length: {len(api_key) if api_key else 0}")
499
+
500
+ try:
501
+ print("DEBUG: Attempting to create OpenAI client in analyze_single_paper...")
502
+ client = OpenAI(api_key=api_key)
503
+ print("DEBUG: OpenAI client created successfully in analyze_single_paper")
504
+ except TypeError as e:
505
+ print(f"DEBUG: TypeError in analyze_single_paper: {e}")
506
+ print(f"DEBUG: Error type: {type(e)}")
507
+ print(f"DEBUG: Error args: {e.args}")
508
+ if 'proxies' in str(e):
509
+ print(f"DEBUG: Caught proxies error, trying alternative client creation...")
510
+ # Remove any proxies parameter and try again
511
+ try:
512
+ client = OpenAI(api_key=api_key)
513
+ print("DEBUG: Alternative client creation successful")
514
+ except Exception as e2:
515
+ print(f"DEBUG: Alternative client creation failed: {e2}")
516
+ raise
517
+ else:
518
+ raise
519
+ except Exception as e:
520
+ print(f"DEBUG: Unexpected error in analyze_single_paper client creation: {e}")
521
+ print(f"DEBUG: Error type: {type(e)}")
522
+ print(f"DEBUG: Error args: {e.args}")
523
+ raise
524
 
525
  # Extract title and abstract
526
  title = paper.get('title', '')
527
  abstract = extract_abstract_from_inverted_index(paper.get('abstract_inverted_index', {}))
528
 
529
+ print(f"DEBUG: Title: {title[:50]}...")
530
+ print(f"DEBUG: Abstract length: {len(abstract)}")
531
+
532
  if not title and not abstract:
533
+ print("DEBUG: No title or abstract, skipping paper")
534
  return None
535
 
536
  # Create content for analysis
 
540
  }
541
 
542
  # Analyze with GPT
543
+ print(f"DEBUG: Calling analyze_paper_relevance_with_client...")
544
  analysis = analyze_paper_relevance_with_client(content, research_question, client)
545
+ print(f"DEBUG: Analysis result: {analysis}")
546
 
547
  if analysis:
548
  paper['gpt_analysis'] = analysis
549
  paper['relevance_reason'] = analysis.get('relevance_reason', 'Analysis completed')
550
  paper['relevance_score'] = analysis.get('relevant', False)
551
+ print(f"DEBUG: Paper marked as relevant: {analysis.get('relevant', False)}")
552
  return paper
553
 
554
+ print("DEBUG: No analysis returned, skipping paper")
555
  return None
556
 
557
  except Exception as e:
558
+ print(f"DEBUG: Exception in analyze_single_paper: {e}")
559
  return None
560
 
561
  def analyze_paper_batch(papers_batch: List[Dict], research_question: str, api_key: str, batch_id: int) -> List[Dict]:
562
  """Analyze a batch of papers in parallel using ThreadPoolExecutor."""
563
+ print(f"DEBUG: Starting analyze_paper_batch with {len(papers_batch)} papers, batch_id: {batch_id}")
564
+ print(f"DEBUG: API key length: {len(api_key) if api_key else 0}")
565
  results = []
566
 
567
  # Use ThreadPoolExecutor to process papers in parallel within the batch
568
  with concurrent.futures.ThreadPoolExecutor(max_workers=len(papers_batch)) as executor:
569
+ print(f"DEBUG: Created ThreadPoolExecutor with {len(papers_batch)} workers")
570
  # Submit all papers for parallel processing
571
  future_to_paper = {
572
  executor.submit(analyze_single_paper, paper, research_question, api_key): paper
573
  for paper in papers_batch
574
  }
575
+ print(f"DEBUG: Submitted {len(future_to_paper)} papers for parallel processing")
576
 
577
  # Collect results as they complete
578
  for future in concurrent.futures.as_completed(future_to_paper):
 
616
  """
617
 
618
  try:
619
+ print(f"DEBUG: Making API call for paper: {title[:30]}...")
620
  # Try GPT-5 nano first, fallback to gpt-4o-mini if it fails
621
  try:
622
+ print("DEBUG: Trying GPT-5 nano...")
623
+ response = client.chat.completions.create(
624
  model="gpt-5-nano",
625
+ messages=[{"role": "user", "content": prompt}],
626
+ max_tokens=1000,
627
+ temperature=0.1
628
  )
629
+ print("DEBUG: GPT-5 nano response received")
630
  except Exception as e:
631
+ print(f"DEBUG: GPT-5 nano failed: {e}, trying gpt-4o-mini...")
632
  response = client.chat.completions.create(
633
  model="gpt-4o-mini",
634
  messages=[{
 
637
  }],
638
  max_completion_tokens=1000
639
  )
640
+ print("DEBUG: gpt-4o-mini response received")
641
 
642
  # Handle different response formats
643
  result = None
644
+ print(f"DEBUG: Response type: {type(response)}")
645
+ print(f"DEBUG: Response attributes: {dir(response)}")
646
+
647
  if hasattr(response, 'choices') and response.choices:
648
+ # Standard chat completions format
649
+ print("DEBUG: Using chat completions format")
650
  result = response.choices[0].message.content
651
+ print(f"DEBUG: Chat completions result: {result[:100]}...")
 
 
 
 
 
 
 
652
  else:
653
+ print("DEBUG: Unknown response format, returning None")
654
  return None
655
 
656
  if not result:
657
+ print("DEBUG: No result extracted, returning None")
658
  return None
659
 
660
  # Clean and parse the JSON response
661
  result = result.strip()
662
+ print(f"DEBUG: Raw result: {result[:200]}...")
663
+
664
  if result.startswith("```json"):
665
  result = result[7:]
666
  if result.endswith("```"):
667
  result = result[:-3]
668
 
669
+ print(f"DEBUG: Cleaned result: {result[:200]}...")
670
+
671
  # Try to parse JSON
672
  try:
673
+ parsed = json.loads(result.strip())
674
+ print(f"DEBUG: Successfully parsed JSON: {parsed}")
675
+ return parsed
676
+ except json.JSONDecodeError as e:
677
+ print(f"DEBUG: JSON parsing failed: {e}")
678
+ print(f"DEBUG: Failed to parse: {result}")
679
  return None
680
 
681
  except Exception as e:
 
683
 
684
  def filter_papers_for_research_question(papers: List[Dict], research_question: str, api_key: str, limit: int = 10) -> List[Dict]:
685
  """Analyze exactly 'limit' number of papers for relevance using parallel processing."""
686
+ print(f"DEBUG: filter_papers_for_research_question called with {len(papers) if papers else 0} papers")
687
+ print(f"DEBUG: Research question: {research_question}")
688
+ print(f"DEBUG: Limit: {limit}")
689
+ print(f"DEBUG: API key length: {len(api_key) if api_key else 0}")
690
+
691
  if not papers or not research_question:
692
+ print("DEBUG: No papers or research question provided")
693
  return []
694
 
695
  if not api_key:
 
705
 
706
  # Process all papers in parallel (no batching needed for small numbers)
707
  all_results = []
708
+ print(f"DEBUG: Processing {len(papers_to_analyze)} papers in parallel")
709
 
710
  with concurrent.futures.ThreadPoolExecutor(max_workers=min(limit, 20)) as executor:
711
+ print(f"DEBUG: Created ThreadPoolExecutor with max_workers={min(limit, 20)}")
712
  # Submit all papers for parallel processing
713
  future_to_paper = {
714
  executor.submit(analyze_single_paper, paper, research_question, api_key): paper
715
  for paper in papers_to_analyze
716
  }
717
+ print(f"DEBUG: Submitted {len(future_to_paper)} papers for parallel processing")
718
 
719
  # Collect results as they complete
720
  for future in concurrent.futures.as_completed(future_to_paper):
 
1676
  existing_paper['contributing_seeds'].append(i + 1)
1677
  break
1678
 
1679
+ # Get detailed seed paper information
1680
+ seed_paper_details = get_paper_details(work_id)
1681
+
1682
  seed_results.append({
1683
  'work_id': work_id,
1684
  'title': seed_title,
1685
+ 'authors': seed_paper_details.get('authors', []),
1686
+ 'year': seed_paper_details.get('publication_date', '').split('-')[0] if seed_paper_details.get('publication_date') else '',
1687
+ 'venue': seed_paper_details.get('venue', ''),
1688
  'papers_found': new_papers,
1689
  'total_papers_from_seed': len(papers),
1690
  'cited_papers': cited_count,
 
1745
  with open(temp_path, 'w', encoding='utf-8') as f:
1746
  json.dump(all_papers, f, indent=2, ensure_ascii=False)
1747
 
1748
+ # Create display title based on seed count
1749
+ if total_seeds == 1:
1750
+ display_title = seed_results[0]['title'] if seed_results else "Single Seed Collection"
1751
+ elif total_seeds == 2:
1752
+ display_title = f"{seed_results[0]['title']} & {seed_results[1]['title']}"
1753
+ else:
1754
+ # Show first 2 + count for multiple seeds
1755
+ first_two = f"{seed_results[0]['title']}, {seed_results[1]['title']}"
1756
+ remaining = total_seeds - 2
1757
+ display_title = f"{first_two} + {remaining} others"
1758
+
1759
  # Create combined collection data
1760
  combined_title = f"Multi-Seed Collection ({total_seeds} seeds)"
1761
  collection_data = {
1762
  'work_id': f"multi_seed_{task_id}",
1763
  'title': combined_title,
1764
+ 'display_title': display_title, # Add display title for immediate use
1765
  'total_papers': len(all_papers),
1766
  'cited_papers': final_cited_count,
1767
  'citing_papers': final_citing_count,
 
1770
  'papers': all_papers,
1771
  'seed_results': seed_results,
1772
  'total_seeds': total_seeds,
1773
+ 'collection_type': 'multiseed',
1774
  'deduplication_stats': {
1775
  'total_papers_before_dedup': total_papers_before_dedup,
1776
  'duplicates_removed': duplicates_removed,
 
1935
  def filter_papers():
1936
  """Filter papers based on research question."""
1937
  try:
1938
+ print("DEBUG: Starting filter_papers endpoint")
1939
  data = request.get_json()
1940
  research_question = data.get('research_question', '').strip()
1941
  limit = data.get('limit', 10) # Default to 10 most recent relevant papers
 
1943
  papers_data = data.get('papers') # Papers passed directly from frontend
1944
  user_api_key = data.get('user_api_key') # User's own API key for large analyses
1945
 
1946
+ print(f"DEBUG: Research question: {research_question}")
1947
+ print(f"DEBUG: Limit: {limit}")
1948
+ print(f"DEBUG: User API key provided: {bool(user_api_key)}")
1949
+ print(f"DEBUG: Papers data provided: {bool(papers_data)}")
1950
+
1951
  if not research_question:
1952
  return jsonify({'error': 'Research question is required'}), 400
1953
 
 
1969
 
1970
  # Use user's API key if provided, otherwise use default
1971
  api_key_to_use = user_api_key if user_api_key else OPENAI_API_KEY
1972
+ print(f"DEBUG: Using API key length: {len(api_key_to_use) if api_key_to_use else 0}")
1973
+ print(f"DEBUG: API key source: {'user provided' if user_api_key else 'default'}")
1974
 
1975
  if not api_key_to_use:
1976
  return jsonify({
 
1978
  }), 400
1979
 
1980
  # Filter papers using custom analyzer (returns top N most recent relevant papers)
1981
+ print(f"DEBUG: About to call filter_papers_for_research_question with {len(papers)} papers")
1982
+ print(f"DEBUG: Research question: {research_question}")
1983
+ print(f"DEBUG: Limit: {limit}")
1984
+ print(f"DEBUG: API key length: {len(api_key_to_use) if api_key_to_use else 0}")
1985
+
1986
  relevant_papers = filter_papers_for_research_question(papers, research_question, api_key_to_use, limit)
1987
 
1988
+ print(f"DEBUG: filter_papers_for_research_question returned {len(relevant_papers) if relevant_papers else 0} papers")
1989
+
1990
  # Determine source collection id for linkage
1991
  source_collection_id = None
1992
  if provided_source_collection:
database/collections/multi_seed_multi_seed_1758155685.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e55986181ab7bec02034998c2da445afc5bd113632cf23b6dcd4a8511e1482b9
3
+ size 47616821
database/collections/multi_seed_multi_seed_1758155809.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c532804aeb3b99f06b7cd3650330780905bcc46c4c1406881c1738a56b079ea
3
+ size 57878366
database/collections/multi_seed_multi_seed_1758156063.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b2a24846b5af508f38a1937c914161729f4f6184cea41f836d804d9219c54a6
3
+ size 12273368
database/collections/multi_seed_multi_seed_1758156344.json ADDED
The diff for this file is too large to render. See raw diff
 
database/collections/multi_seed_multi_seed_1758156664.json ADDED
The diff for this file is too large to render. See raw diff
 
database/filters/multi_seed_multi_seed_1758155809__filter__What_are_the_key_aspects_of_just_transit__20250918_122330.json ADDED
The diff for this file is too large to render. See raw diff
 
database/filters/multi_seed_multi_seed_1758155809__filter__What_are_the_key_aspects_of_just_transit__20250918_123000.json ADDED
The diff for this file is too large to render. See raw diff
 
database/filters/multi_seed_multi_seed_1758156664__filter__What_are_the_key_aspects_of_just_transit__20250918_025849.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "research_question": "What are the key aspects of just transitions in climate policy and energy systems?",
3
+ "total_papers": 68,
4
+ "tested_papers": 10,
5
+ "relevant_papers": 0,
6
+ "oa_percentage": 43,
7
+ "abstract_percentage": 62,
8
+ "limit": 10,
9
+ "papers": [],
10
+ "source_collection": "multi_seed_multi_seed_1758156664",
11
+ "filter_identifier": "multi_seed_multi_seed_1758156664__filter__What_are_the_key_aspects_of_just_transit__20250918_025849",
12
+ "created": "2025-09-18T02:58:49.151898"
13
+ }
database/filters/multi_seed_multi_seed_1758156926__filter__What_are_the_key_aspects_of_just_transit__20250918_120128.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "research_question": "What are the key aspects of just transitions in climate policy and energy systems?",
3
+ "total_papers": 14,
4
+ "tested_papers": 10,
5
+ "relevant_papers": 0,
6
+ "oa_percentage": 43,
7
+ "abstract_percentage": 71,
8
+ "limit": 10,
9
+ "papers": [],
10
+ "source_collection": "multi_seed_multi_seed_1758156926",
11
+ "filter_identifier": "multi_seed_multi_seed_1758156926__filter__What_are_the_key_aspects_of_just_transit__20250918_120128",
12
+ "created": "2025-09-18T12:01:28.210811"
13
+ }
database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_115925.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "research_question": "What are the key aspects of just transitions in climate policy and energy systems?",
3
+ "total_papers": 346,
4
+ "tested_papers": 10,
5
+ "relevant_papers": 0,
6
+ "oa_percentage": 58,
7
+ "abstract_percentage": 64,
8
+ "limit": 10,
9
+ "papers": [],
10
+ "source_collection": "multi_seed_multi_seed_1758157170",
11
+ "filter_identifier": "multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_115925",
12
+ "created": "2025-09-18T11:59:25.053828"
13
+ }
database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122404.json ADDED
The diff for this file is too large to render. See raw diff
 
database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122747.json ADDED
The diff for this file is too large to render. See raw diff
 
database/filters/multi_seed_multi_seed_1758157170__filter__What_are_the_key_aspects_of_just_transit__20250918_122824.json ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt CHANGED
@@ -3,6 +3,7 @@ Flask-CORS==4.0.1
3
  gunicorn==21.2.0
4
  requests==2.32.3
5
  openai==1.54.4
 
6
  pandas==2.2.3
7
  tqdm==4.66.5
8
  openpyxl==3.1.5
 
3
  gunicorn==21.2.0
4
  requests==2.32.3
5
  openai==1.54.4
6
+ httpx==0.27.2
7
  pandas==2.2.3
8
  tqdm==4.66.5
9
  openpyxl==3.1.5
templates/index.html CHANGED
@@ -24,12 +24,12 @@
24
 
25
  .main-content {
26
  flex: 1;
27
- max-width: 50%;
28
  margin-right: 15px;
29
  }
30
 
31
  .history-panel {
32
- width: 275px;
33
  border: 3px solid #ffffff;
34
  padding: 15px;
35
  background: #000000;
@@ -40,7 +40,7 @@
40
  }
41
 
42
  .merge-panel {
43
- width: 275px;
44
  border: 3px solid #ffffff;
45
  padding: 15px;
46
  background: #000000;
@@ -90,7 +90,7 @@
90
  }
91
 
92
  .filters-panel {
93
- width: 275px;
94
  border: 3px solid #ffffff;
95
  padding: 15px;
96
  background: #000000;
@@ -514,7 +514,7 @@
514
 
515
  <div id="titleInput">
516
  <p>Enter a paper title to search for and collect related papers.</p>
517
- <input type="text" id="paperTitle" placeholder="Enter paper title..." value="just transitions" />
518
  <button onclick="searchPapers()" id="searchBtn" style="margin-left: 10px;">Search Papers</button>
519
  <div id="paperMatches" style="display: none; margin-top: 15px;"></div>
520
 
@@ -527,10 +527,18 @@
527
  </h4>
528
  <div id="selectedSeeds" style="min-height: 60px; border: 1px dashed #666666; padding: 10px; background: #000000;">
529
  <div style="color: #666666; text-align: center; font-size: 0.8em;">No seed papers selected. Search and click papers to add them.</div>
530
- </div>
531
  <div style="margin-top: 10px;">
532
  <button onclick="clearAllSeeds()" style="background: #333333; color: #ffffff; border: 1px solid #666666; padding: 5px 10px; font-size: 10px; margin-right: 10px;">Clear All</button>
533
  <span style="font-size: 0.8em; color: #aaaaaa;">Click papers above to add/remove them from collection</span>
 
 
 
 
 
 
 
 
534
  </div>
535
  </div>
536
  </div>
@@ -542,9 +550,36 @@
542
  </div>
543
  </div>
544
 
545
- <!-- Step 2: Filter Papers -->
546
  <div class="section">
547
- <h2>Step 2: Filter by Research Question</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
548
  <p>Enter your research question to filter the collected papers for relevance.</p>
549
  <textarea id="researchQuestion" rows="3" placeholder="What are the main impacts of climate change on ocean circulation patterns?">What are the key aspects of just transitions in climate policy and energy systems?</textarea>
550
  <div style="margin: 10px 0;">
@@ -607,17 +642,15 @@
607
  let collectedPapers = [];
608
  let lastDisplayedPapers = [];
609
  let selectedSeeds = []; // Array to store multiple selected seed papers
 
 
610
 
611
  // Set default values when page loads
612
  document.addEventListener('DOMContentLoaded', function() {
613
  document.getElementById('researchQuestion').value = 'What are the key aspects of just transitions in climate policy and energy systems?';
614
  loadHistory();
615
 
616
- // Auto-search on page load
617
- const paperTitle = document.getElementById('paperTitle').value.trim();
618
- if (paperTitle) {
619
- searchPapers();
620
- }
621
  });
622
 
623
  let currentCollectionFile = null;
@@ -674,20 +707,20 @@
674
  function displayPaperMatches(matches) {
675
  const matchesDiv = document.getElementById('paperMatches');
676
  matchesDiv.innerHTML = `
677
- <h4 style="color: #ffffff; margin-bottom: 10px; font-size: 0.9em;">SELECT PAPER:</h4>
678
  ${matches.map((match, index) => `
679
  <div class="paper-match" data-work-id="${match.work_id}" onclick="selectPaper('${match.work_id}', this)" style="
680
  border: 2px solid #ffffff;
681
- padding: 10px;
682
- margin-bottom: 8px;
683
  cursor: pointer;
684
  background: #000000;
685
  transition: all 0.2s ease;
686
  ">
687
- <div style="font-weight: bold; color: #ffffff; margin-bottom: 5px;">${match.title}</div>
688
- <div style="font-size: 0.8em; color: #aaaaaa; margin-bottom: 3px;">Authors: ${match.authors}</div>
689
- <div style="font-size: 0.8em; color: #aaaaaa; margin-bottom: 3px;">Year: ${match.year} | Venue: ${match.venue}</div>
690
- <div style="font-size: 0.7em; color: #666666;">Relevance: ${match.relevance_score}</div>
691
  </div>
692
  `).join('')}
693
  `;
@@ -718,9 +751,10 @@
718
  venue: element.querySelectorAll('div')[2].textContent.split(' | ')[1].replace('Venue: ', '')
719
  };
720
  selectedSeeds.push(paperData);
721
- element.style.background = '#ffffff';
722
- element.style.color = '#000000';
723
  element.style.borderColor = '#ffffff';
 
724
  }
725
 
726
  updateSeedCollectionDisplay();
@@ -736,15 +770,31 @@
736
  if (selectedSeeds.length === 0) {
737
  selectedSeedsDiv.innerHTML = '<div style="color: #666666; text-align: center; font-size: 0.8em;">No seed papers selected. Search and click papers to add them.</div>';
738
  } else {
739
- selectedSeedsDiv.innerHTML = selectedSeeds.map((seed, index) => `
740
- <div style="background: #333333; border: 1px solid #666666; padding: 8px; margin: 5px 0; display: flex; justify-content: space-between; align-items: center;">
741
- <div style="flex: 1;">
742
- <div style="font-weight: bold; color: #ffffff; font-size: 0.9em;">${seed.title}</div>
743
- <div style="font-size: 0.7em; color: #aaaaaa;">${seed.authors} | ${seed.year} | ${seed.venue}</div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
744
  </div>
745
- <button onclick="removeSeed(${index})" style="background: #ff4444; color: #ffffff; border: none; padding: 4px 8px; font-size: 10px; cursor: pointer; margin-left: 10px;">×</button>
746
- </div>
747
- `).join('');
748
  }
749
  }
750
 
@@ -757,8 +807,17 @@
757
  document.querySelectorAll('.paper-match').forEach(match => {
758
  const workId = match.getAttribute('data-work-id');
759
  const isSelected = selectedSeeds.some(seed => seed.work_id === workId);
760
- match.style.background = isSelected ? '#ffffff' : '#000000';
761
- match.style.color = isSelected ? '#000000' : '#ffffff';
 
 
 
 
 
 
 
 
 
762
  });
763
  }
764
 
@@ -783,16 +842,146 @@
783
  alert('SEED PAPERS INFO:\n\n• Search for papers by title\n• Click papers to add them to your seed collection\n• Each selected paper will be used to find related papers (cited, citing, and related works)\n• You can select multiple seed papers for a comprehensive collection\n• Click the × button to remove papers from your selection');
784
  }
785
 
786
- async function collectPapers() {
787
- const paperTitle = document.getElementById('paperTitle').value.trim();
 
 
 
 
788
 
789
- if (!paperTitle) {
790
- showStatus('collectStatus', 'Please enter a paper title', 'error');
 
 
 
 
 
791
  return;
792
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
793
  if (selectedSeeds.length === 0) {
794
  showStatus('collectStatus', 'Please search and select at least one paper first', 'error');
795
- return;
796
  }
797
 
798
  const collectBtn = document.getElementById('collectBtn');
@@ -871,7 +1060,7 @@
871
 
872
  // Show appropriate completion message
873
  if (result && result.total_seeds && result.total_seeds > 1) {
874
- progressMessage.textContent = `Multi-Seed Collection completed! (${result.total_seeds} seeds)`;
875
  } else {
876
  progressMessage.textContent = 'Collection completed!';
877
  }
@@ -886,7 +1075,7 @@
886
  let breakdown = `${result.cited_papers} cited + ${result.citing_papers} citing + ${result.related_papers} related`;
887
 
888
  if (result.total_seeds && result.total_seeds > 1) {
889
- collectionType = 'Multi-Seed Collection';
890
  const dedupStats = result.deduplication_stats;
891
  if (dedupStats) {
892
  breakdown += ` (${dedupStats.duplicates_removed} duplicates removed)`;
@@ -895,12 +1084,26 @@
895
 
896
  showStatus('collectStatus', `Successfully completed ${collectionType} - ${result.total_papers} papers (${breakdown})`, 'success');
897
  document.getElementById('filterBtn').disabled = false;
 
898
  document.getElementById('resultsSection').style.display = 'block';
899
  updateStats(result.total_papers, 0, result.cited_papers, result.citing_papers, result.related_papers);
900
  currentCollectionFile = result.db_filename || null;
901
  historyIndex.currentCollectionId = result.work_id ? (result.work_id.replace('https://api.openalex.org/works/','').replace('https://openalex.org/','')) : null;
902
  document.getElementById('collectDownload').style.display = currentCollectionFile ? 'block' : 'none';
903
 
 
 
 
 
 
 
 
 
 
 
 
 
 
904
  // Reset button
905
  document.getElementById('collectBtn').disabled = false;
906
  document.getElementById('collectBtn').textContent = 'Collect Papers';
@@ -929,14 +1132,14 @@
929
  progressFill.style.width = `${progressPercent}%`;
930
  progressText.textContent = `${Math.round(progressPercent)}%`;
931
 
932
- // Show detailed progress for multi-seed collections
933
  if (progress.total_seeds && progress.total_seeds > 1) {
934
  const currentSeed = progress.current_seed || 0;
935
  const remainingSeeds = progress.remaining_seeds || 0;
936
  const totalSeeds = progress.total_seeds || 0;
937
  progressMessage.textContent = `${progress.message || 'Processing...'} (Seed ${currentSeed}/${totalSeeds}, ${remainingSeeds} remaining)`;
938
  } else {
939
- progressMessage.textContent = progress.message || 'Processing...';
940
  }
941
  }
942
  } catch (error) {
@@ -954,6 +1157,14 @@
954
  return;
955
  }
956
 
 
 
 
 
 
 
 
 
957
  // Check if user wants to analyze more than 50 papers
958
  if (paperLimit > 50) {
959
  const userApiKey = prompt(`You want to analyze ${paperLimit} papers, which exceeds the limit of 50.\n\nPlease provide your own OpenAI API key to continue:\n\n(Your API key will be used only for this analysis and not stored)`);
@@ -995,7 +1206,7 @@
995
  research_question: researchQuestion,
996
  limit: paperLimit,
997
  source_collection: historyIndex.currentCollectionId || null,
998
- papers: collectedPapers.length > 0 ? collectedPapers : null,
999
  user_api_key: window.userApiKey || null
1000
  })
1001
  });
@@ -1003,6 +1214,10 @@
1003
  const data = await response.json();
1004
 
1005
  if (data.success) {
 
 
 
 
1006
  // Simulate progress for filtering (since it's synchronous in backend)
1007
  let progress = 0;
1008
  const progressInterval = setInterval(() => {
@@ -1065,29 +1280,33 @@
1065
  function updateStats(total, relevant, cited = 0, citing = 0, related = 0, relevantAbs = null, totalAbs = null, tested = null, oaPercentage = null, abstractPercentage = null) {
1066
  const statsDiv = document.getElementById('stats');
1067
  const rate = tested && tested > 0 ? Math.round((relevant / tested) * 100) : 0;
 
 
 
 
1068
  statsDiv.innerHTML = `
1069
  <div class="stat-item">
1070
  <div class="stat-number">${total}</div>
1071
  <div class="stat-label">Total Papers</div>
1072
  </div>
1073
  <div class="stat-item">
1074
- <div class="stat-number">${tested || total}</div>
1075
  <div class="stat-label">Tested Papers</div>
1076
  </div>
1077
  <div class="stat-item">
1078
- <div class="stat-number">${relevant}</div>
1079
  <div class="stat-label">Relevant Papers</div>
1080
  </div>
1081
  <div class="stat-item">
1082
- <div class="stat-number">${rate}%</div>
1083
  <div class="stat-label">Rel. Rate</div>
1084
  </div>
1085
  <div class="stat-item">
1086
- <div class="stat-number">${oaPercentage !== null ? oaPercentage + '%' : 'N/A'}</div>
1087
  <div class="stat-label">Open Access</div>
1088
  </div>
1089
  <div class="stat-item">
1090
- <div class="stat-number">${abstractPercentage !== null ? abstractPercentage + '%' : 'N/A'}</div>
1091
  <div class="stat-label">With Abstract</div>
1092
  </div>
1093
  `;
@@ -1294,6 +1513,8 @@
1294
  if (data.success) {
1295
  buildHistoryIndex(data.files);
1296
  displayHistory(data.files);
 
 
1297
  }
1298
  } catch (error) {
1299
  console.error('Error loading history:', error);
@@ -1333,12 +1554,43 @@
1333
 
1334
  // Display collections
1335
  collectionsList.innerHTML = collections.map(collection => {
1336
- const title = collection.title || collection.work_identifier || 'UNTITLED COLLECTION';
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1337
  const linkedFilters = filters.filter(filter => filter.source_collection === collection.work_identifier);
1338
 
1339
  return `
1340
- <div class="history-item collection-item" data-collection="${collection.work_identifier || ''}" onclick="selectCollection('${collection.filename}', '${collection.work_identifier || ''}', '${title}')" draggable="true" ondragstart="dragCollection(event, '${collection.filename}', '${title}', ${collection.total_papers || 0})">
1341
- <div class="history-title">${title}</div>
1342
  <div class="history-meta">${collection.created}</div>
1343
  <div class="history-meta">${(collection.size / 1024).toFixed(1)} KB</div>
1344
  <div class="history-meta">${collection.total_papers || 0} PAPER${(collection.total_papers || 0) !== 1 ? 'S' : ''}</div>
@@ -1354,6 +1606,70 @@
1354
  }).join('');
1355
  }
1356
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1357
  function selectCollection(filename, workIdentifier, title) {
1358
  // Get filters for this collection
1359
  const filters = historyIndex.filters[workIdentifier] || [];
@@ -1397,24 +1713,114 @@
1397
  }
1398
 
1399
  window.openCollection = async function(filename, workIdentifier) {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1400
  try {
1401
  const response = await fetch(`/api/load-database-file/${filename}`);
1402
  const data = await response.json();
1403
  if (data.success) {
1404
  const fileData = data.data || {};
1405
  const papers = fileData.papers || [];
1406
- displayPapers(papers);
1407
- document.getElementById('resultsSection').style.display = 'block';
1408
- updateStats(fileData.total_papers || papers.length || 0, 0, fileData.cited_papers || 0, fileData.citing_papers || 0, fileData.related_papers || 0);
 
 
 
 
 
 
 
 
 
1409
  currentCollectionFile = filename; currentFilterFile = null; historyIndex.currentCollectionId = workIdentifier || (fileData.work_identifier || '');
1410
  document.getElementById('collectDownload').style.display = 'block';
1411
  document.getElementById('filterDownload').style.display = 'none';
1412
  // Enable filter button when opening a collection
1413
  document.getElementById('filterBtn').disabled = false;
 
1414
  // Save papers to temp file for filtering
1415
  collectedPapers = papers;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1416
  }
1417
  } catch (error) {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1418
  alert(`Error opening collection: ${error.message}`);
1419
  }
1420
  }
 
24
 
25
  .main-content {
26
  flex: 1;
27
+ max-width: 60%;
28
  margin-right: 15px;
29
  }
30
 
31
  .history-panel {
32
+ width: 220px;
33
  border: 3px solid #ffffff;
34
  padding: 15px;
35
  background: #000000;
 
40
  }
41
 
42
  .merge-panel {
43
+ width: 220px;
44
  border: 3px solid #ffffff;
45
  padding: 15px;
46
  background: #000000;
 
90
  }
91
 
92
  .filters-panel {
93
+ width: 220px;
94
  border: 3px solid #ffffff;
95
  padding: 15px;
96
  background: #000000;
 
514
 
515
  <div id="titleInput">
516
  <p>Enter a paper title to search for and collect related papers.</p>
517
+ <input type="text" id="paperTitle" placeholder="Enter paper title..." value="" />
518
  <button onclick="searchPapers()" id="searchBtn" style="margin-left: 10px;">Search Papers</button>
519
  <div id="paperMatches" style="display: none; margin-top: 15px;"></div>
520
 
 
527
  </h4>
528
  <div id="selectedSeeds" style="min-height: 60px; border: 1px dashed #666666; padding: 10px; background: #000000;">
529
  <div style="color: #666666; text-align: center; font-size: 0.8em;">No seed papers selected. Search and click papers to add them.</div>
530
+ </div>
531
  <div style="margin-top: 10px;">
532
  <button onclick="clearAllSeeds()" style="background: #333333; color: #ffffff; border: 1px solid #666666; padding: 5px 10px; font-size: 10px; margin-right: 10px;">Clear All</button>
533
  <span style="font-size: 0.8em; color: #aaaaaa;">Click papers above to add/remove them from collection</span>
534
+ </div>
535
+ </div>
536
+
537
+ <!-- Collection Stats Box -->
538
+ <div id="collectionStatsBox" style="display: none; margin-top: 15px; border: 1px solid #666666; padding: 12px; background: #2a2a2a; border-radius: 5px;">
539
+ <h5 style="color: #ffffff; margin: 0 0 8px 0; font-size: 0.85em; font-weight: bold;">COLLECTION STATISTICS</h5>
540
+ <div id="collectionStatsContent" style="display: flex; gap: 15px; flex-wrap: wrap; font-size: 0.8em; color: #aaaaaa;">
541
+ <!-- Stats will be populated here -->
542
  </div>
543
  </div>
544
  </div>
 
550
  </div>
551
  </div>
552
 
553
+ <!-- Step 2: Pre-Filter Papers -->
554
  <div class="section">
555
+ <h2>Step 2: Pre-Filter Papers (Optional)</h2>
556
+ <p>Optionally filter papers by publication date and keywords before GPT analysis. Leave blank to analyze all papers.</p>
557
+
558
+ <div style="display: flex; gap: 20px; margin: 15px 0; flex-wrap: wrap;">
559
+ <div style="flex: 1; min-width: 200px;">
560
+ <label style="display: block; margin-bottom: 5px; font-weight: bold;">Publication Date Range:</label>
561
+ <div style="display: flex; gap: 10px; align-items: center;">
562
+ <input type="number" id="startYear" placeholder="Start Year" min="1900" max="2025" style="width: 80px; padding: 5px;">
563
+ <span>to</span>
564
+ <input type="number" id="endYear" placeholder="End Year" min="1900" max="2025" style="width: 80px; padding: 5px;">
565
+ </div>
566
+ </div>
567
+ <div style="flex: 1; min-width: 200px;">
568
+ <label style="display: block; margin-bottom: 5px; font-weight: bold;">Keyword Search:</label>
569
+ <input type="text" id="keywordFilter" placeholder="Enter keywords (comma-separated)" style="width: 100%; padding: 5px;">
570
+ </div>
571
+ </div>
572
+
573
+ <div style="margin: 10px 0;">
574
+ <button onclick="applyPreFilter()" id="preFilterBtn" disabled style="background: #333333; color: #ffffff; border: 1px solid #666666; padding: 8px 15px; margin-right: 10px;">Apply Pre-Filter</button>
575
+ <button onclick="clearPreFilter()" id="clearPreFilterBtn" disabled style="background: #666666; color: #ffffff; border: 1px solid #888888; padding: 8px 15px;">Clear Filter</button>
576
+ <span id="preFilterStatus" style="margin-left: 15px; font-size: 0.9em; color: #aaaaaa;"></span>
577
+ </div>
578
+ </div>
579
+
580
+ <!-- Step 3: Filter Papers -->
581
+ <div class="section">
582
+ <h2>Step 3: Filter by Research Question</h2>
583
  <p>Enter your research question to filter the collected papers for relevance.</p>
584
  <textarea id="researchQuestion" rows="3" placeholder="What are the main impacts of climate change on ocean circulation patterns?">What are the key aspects of just transitions in climate policy and energy systems?</textarea>
585
  <div style="margin: 10px 0;">
 
642
  let collectedPapers = [];
643
  let lastDisplayedPapers = [];
644
  let selectedSeeds = []; // Array to store multiple selected seed papers
645
+ let preFilteredPapers = []; // Array to store pre-filtered papers
646
+ let isPreFiltered = false; // Flag to track if pre-filtering is active
647
 
648
  // Set default values when page loads
649
  document.addEventListener('DOMContentLoaded', function() {
650
  document.getElementById('researchQuestion').value = 'What are the key aspects of just transitions in climate policy and energy systems?';
651
  loadHistory();
652
 
653
+ // Don't auto-search on page load - let user search manually
 
 
 
 
654
  });
655
 
656
  let currentCollectionFile = null;
 
707
  function displayPaperMatches(matches) {
708
  const matchesDiv = document.getElementById('paperMatches');
709
  matchesDiv.innerHTML = `
710
+ <h4 style="color: #ffffff; margin-bottom: 7px; font-size: 0.63em;">SELECT PAPER:</h4>
711
  ${matches.map((match, index) => `
712
  <div class="paper-match" data-work-id="${match.work_id}" onclick="selectPaper('${match.work_id}', this)" style="
713
  border: 2px solid #ffffff;
714
+ padding: 7px;
715
+ margin-bottom: 6px;
716
  cursor: pointer;
717
  background: #000000;
718
  transition: all 0.2s ease;
719
  ">
720
+ <div style="font-weight: bold; color: #ffffff; margin-bottom: 3px; font-size: 0.7em;">${match.title}</div>
721
+ <div style="font-size: 0.56em; color: #aaaaaa; margin-bottom: 2px;">Authors: ${match.authors}</div>
722
+ <div style="font-size: 0.56em; color: #aaaaaa; margin-bottom: 2px;">Year: ${match.year} | Venue: ${match.venue}</div>
723
+ <div style="font-size: 0.49em; color: #666666;">Relevance: ${match.relevance_score}</div>
724
  </div>
725
  `).join('')}
726
  `;
 
751
  venue: element.querySelectorAll('div')[2].textContent.split(' | ')[1].replace('Venue: ', '')
752
  };
753
  selectedSeeds.push(paperData);
754
+ element.style.background = '#444444';
755
+ element.style.color = '#ffffff';
756
  element.style.borderColor = '#ffffff';
757
+ element.style.boxShadow = '0 0 10px rgba(255, 255, 255, 0.3)';
758
  }
759
 
760
  updateSeedCollectionDisplay();
 
770
  if (selectedSeeds.length === 0) {
771
  selectedSeedsDiv.innerHTML = '<div style="color: #666666; text-align: center; font-size: 0.8em;">No seed papers selected. Search and click papers to add them.</div>';
772
  } else {
773
+ selectedSeedsDiv.innerHTML = selectedSeeds.map((seed, index) => {
774
+ // Format authors
775
+ const authorsText = Array.isArray(seed.authors) ? seed.authors.join(', ') : (seed.authors || 'Unknown authors');
776
+ const yearText = seed.year || 'Unknown year';
777
+ const venueText = seed.venue || 'Unknown venue';
778
+
779
+ // Show paper counts if available
780
+ let countsText = '';
781
+ if (seed.papers_found !== undefined) {
782
+ countsText = ` | Papers found: ${seed.papers_found}`;
783
+ if (seed.cited_papers !== undefined) {
784
+ countsText += ` (${seed.cited_papers} cited, ${seed.citing_papers} citing, ${seed.related_papers} related)`;
785
+ }
786
+ }
787
+
788
+ return `
789
+ <div style="background: #444444; border: 1px solid #888888; padding: 8px; margin: 5px 0; display: flex; justify-content: space-between; align-items: center; box-shadow: 0 0 8px rgba(255, 255, 255, 0.2);">
790
+ <div style="flex: 1;">
791
+ <div style="font-weight: bold; color: #ffffff; font-size: 0.9em;">${seed.title}</div>
792
+ <div style="font-size: 0.7em; color: #cccccc;">${authorsText} | ${yearText} | ${venueText}${countsText}</div>
793
+ </div>
794
+ <button onclick="removeSeed(${index})" style="background: #ff4444; color: #ffffff; border: none; padding: 4px 8px; font-size: 10px; cursor: pointer; margin-left: 10px;">×</button>
795
  </div>
796
+ `;
797
+ }).join('');
 
798
  }
799
  }
800
 
 
807
  document.querySelectorAll('.paper-match').forEach(match => {
808
  const workId = match.getAttribute('data-work-id');
809
  const isSelected = selectedSeeds.some(seed => seed.work_id === workId);
810
+ if (isSelected) {
811
+ match.style.background = '#444444';
812
+ match.style.color = '#ffffff';
813
+ match.style.borderColor = '#ffffff';
814
+ match.style.boxShadow = '0 0 10px rgba(255, 255, 255, 0.3)';
815
+ } else {
816
+ match.style.background = '#000000';
817
+ match.style.color = '#ffffff';
818
+ match.style.borderColor = '#ffffff';
819
+ match.style.boxShadow = 'none';
820
+ }
821
  });
822
  }
823
 
 
842
  alert('SEED PAPERS INFO:\n\n• Search for papers by title\n• Click papers to add them to your seed collection\n• Each selected paper will be used to find related papers (cited, citing, and related works)\n• You can select multiple seed papers for a comprehensive collection\n• Click the × button to remove papers from your selection');
843
  }
844
 
845
+ function applyPreFilter() {
846
+ console.log('applyPreFilter called. collectedPapers length:', collectedPapers.length);
847
+ if (collectedPapers.length === 0) {
848
+ alert('Please collect papers first before applying pre-filters.');
849
+ return;
850
+ }
851
 
852
+ const startYear = document.getElementById('startYear').value;
853
+ const endYear = document.getElementById('endYear').value;
854
+ const keywords = document.getElementById('keywordFilter').value.trim();
855
+
856
+ // Check if any filter is applied
857
+ if (!startYear && !endYear && !keywords) {
858
+ alert('Please enter at least one filter criteria (year range or keywords).');
859
  return;
860
  }
861
+
862
+ // Validate year range
863
+ if (startYear && endYear && parseInt(startYear) > parseInt(endYear)) {
864
+ alert('Start year cannot be greater than end year.');
865
+ return;
866
+ }
867
+
868
+ preFilteredPapers = collectedPapers.filter(paper => {
869
+ let matchesDate = true;
870
+ let matchesKeywords = true;
871
+
872
+ // Check date filter
873
+ if (startYear || endYear) {
874
+ const paperYear = paper.publication_date ? new Date(paper.publication_date).getFullYear() :
875
+ (paper.year ? parseInt(paper.year) : null);
876
+
877
+ if (paperYear) {
878
+ if (startYear && paperYear < parseInt(startYear)) matchesDate = false;
879
+ if (endYear && paperYear > parseInt(endYear)) matchesDate = false;
880
+ } else {
881
+ matchesDate = false; // Exclude papers without publication date
882
+ }
883
+ }
884
+
885
+ // Check keyword filter
886
+ if (keywords) {
887
+ const keywordList = keywords.split(',').map(k => k.trim().toLowerCase()).filter(k => k.length > 0);
888
+ if (keywordList.length > 0) {
889
+ const searchText = [
890
+ paper.title || '',
891
+ paper.abstract || '',
892
+ (paper.authors || []).join(' '),
893
+ (paper.venue || '')
894
+ ].join(' ').toLowerCase();
895
+
896
+ matchesKeywords = keywordList.some(keyword => searchText.includes(keyword));
897
+ }
898
+ }
899
+
900
+ return matchesDate && matchesKeywords;
901
+ });
902
+
903
+ isPreFiltered = true;
904
+ document.getElementById('preFilterBtn').disabled = true;
905
+ document.getElementById('clearPreFilterBtn').disabled = false;
906
+ document.getElementById('filterBtn').disabled = false;
907
+
908
+ const filterStatus = document.getElementById('preFilterStatus');
909
+ filterStatus.textContent = `Pre-filtered: ${preFilteredPapers.length} papers (from ${collectedPapers.length} total)`;
910
+ filterStatus.style.color = '#0a0';
911
+
912
+ // Update the paper limit to not exceed filtered papers
913
+ const paperLimit = document.getElementById('paperLimit');
914
+ const maxLimit = Math.min(50, preFilteredPapers.length);
915
+ if (parseInt(paperLimit.value) > maxLimit) {
916
+ paperLimit.value = maxLimit;
917
+ }
918
+ paperLimit.max = maxLimit;
919
+ }
920
+
921
+ function clearPreFilter() {
922
+ preFilteredPapers = [];
923
+ isPreFiltered = false;
924
+
925
+ document.getElementById('startYear').value = '';
926
+ document.getElementById('endYear').value = '';
927
+ document.getElementById('keywordFilter').value = '';
928
+ document.getElementById('preFilterBtn').disabled = false;
929
+ document.getElementById('clearPreFilterBtn').disabled = true;
930
+ document.getElementById('filterBtn').disabled = false;
931
+
932
+ const filterStatus = document.getElementById('preFilterStatus');
933
+ filterStatus.textContent = '';
934
+
935
+ // Reset paper limit
936
+ const paperLimit = document.getElementById('paperLimit');
937
+ paperLimit.max = 50;
938
+ }
939
+
940
+ function showCollectionStats(fileData) {
941
+ const statsBox = document.getElementById('collectionStatsBox');
942
+ const statsContent = document.getElementById('collectionStatsContent');
943
+
944
+ const totalPapers = fileData.total_papers || 0;
945
+ const citedPapers = fileData.cited_papers || 0;
946
+ const citingPapers = fileData.citing_papers || 0;
947
+ const relatedPapers = fileData.related_papers || 0;
948
+ const totalSeeds = fileData.total_seeds || 0;
949
+
950
+ let statsHTML = `
951
+ <div><strong>Total Papers:</strong> ${totalPapers}</div>
952
+ <div><strong>Cited:</strong> ${citedPapers}</div>
953
+ <div><strong>Citing:</strong> ${citingPapers}</div>
954
+ <div><strong>Related:</strong> ${relatedPapers}</div>
955
+ `;
956
+
957
+ if (totalSeeds > 0) {
958
+ statsHTML += `<div><strong>Seeds:</strong> ${totalSeeds}</div>`;
959
+ }
960
+
961
+ if (fileData.deduplication_stats) {
962
+ const dedupStats = fileData.deduplication_stats;
963
+ statsHTML += `<div><strong>Duplicates Removed:</strong> ${dedupStats.duplicates_removed || 0}</div>`;
964
+ }
965
+
966
+ statsContent.innerHTML = statsHTML;
967
+ statsBox.style.display = 'block';
968
+ }
969
+
970
+ function hideCollectionStats() {
971
+ const statsBox = document.getElementById('collectionStatsBox');
972
+ statsBox.style.display = 'none';
973
+ }
974
+
975
+ async function collectPapers() {
976
+ const paperTitle = document.getElementById('paperTitle').value.trim();
977
+
978
+ if (!paperTitle) {
979
+ showStatus('collectStatus', 'Please enter a paper title', 'error');
980
+ return;
981
+ }
982
  if (selectedSeeds.length === 0) {
983
  showStatus('collectStatus', 'Please search and select at least one paper first', 'error');
984
+ return;
985
  }
986
 
987
  const collectBtn = document.getElementById('collectBtn');
 
1060
 
1061
  // Show appropriate completion message
1062
  if (result && result.total_seeds && result.total_seeds > 1) {
1063
+ progressMessage.textContent = `Multi-Seed completed! (${result.total_seeds} seeds)`;
1064
  } else {
1065
  progressMessage.textContent = 'Collection completed!';
1066
  }
 
1075
  let breakdown = `${result.cited_papers} cited + ${result.citing_papers} citing + ${result.related_papers} related`;
1076
 
1077
  if (result.total_seeds && result.total_seeds > 1) {
1078
+ collectionType = 'Multi-Seed ';
1079
  const dedupStats = result.deduplication_stats;
1080
  if (dedupStats) {
1081
  breakdown += ` (${dedupStats.duplicates_removed} duplicates removed)`;
 
1084
 
1085
  showStatus('collectStatus', `Successfully completed ${collectionType} - ${result.total_papers} papers (${breakdown})`, 'success');
1086
  document.getElementById('filterBtn').disabled = false;
1087
+ document.getElementById('preFilterBtn').disabled = false;
1088
  document.getElementById('resultsSection').style.display = 'block';
1089
  updateStats(result.total_papers, 0, result.cited_papers, result.citing_papers, result.related_papers);
1090
  currentCollectionFile = result.db_filename || null;
1091
  historyIndex.currentCollectionId = result.work_id ? (result.work_id.replace('https://api.openalex.org/works/','').replace('https://openalex.org/','')) : null;
1092
  document.getElementById('collectDownload').style.display = currentCollectionFile ? 'block' : 'none';
1093
 
1094
+ // Clear any existing pre-filter status when new collection is completed
1095
+ const preFilterStatus = document.getElementById('preFilterStatus');
1096
+ preFilterStatus.textContent = '';
1097
+ preFilterStatus.style.color = '#aaaaaa';
1098
+
1099
+ // Reset pre-filter state
1100
+ preFilteredPapers = [];
1101
+ isPreFiltered = false;
1102
+ document.getElementById('clearPreFilterBtn').disabled = true;
1103
+
1104
+ // Debug: Log the collectedPapers length
1105
+ console.log('Collection completed. collectedPapers length:', collectedPapers.length);
1106
+
1107
  // Reset button
1108
  document.getElementById('collectBtn').disabled = false;
1109
  document.getElementById('collectBtn').textContent = 'Collect Papers';
 
1132
  progressFill.style.width = `${progressPercent}%`;
1133
  progressText.textContent = `${Math.round(progressPercent)}%`;
1134
 
1135
+ // Show detailed progress for Multi-Seed s
1136
  if (progress.total_seeds && progress.total_seeds > 1) {
1137
  const currentSeed = progress.current_seed || 0;
1138
  const remainingSeeds = progress.remaining_seeds || 0;
1139
  const totalSeeds = progress.total_seeds || 0;
1140
  progressMessage.textContent = `${progress.message || 'Processing...'} (Seed ${currentSeed}/${totalSeeds}, ${remainingSeeds} remaining)`;
1141
  } else {
1142
+ progressMessage.textContent = progress.message || 'Processing...';
1143
  }
1144
  }
1145
  } catch (error) {
 
1157
  return;
1158
  }
1159
 
1160
+ // Use pre-filtered papers if available, otherwise use all collected papers
1161
+ const papersToAnalyze = isPreFiltered ? preFilteredPapers : collectedPapers;
1162
+
1163
+ if (papersToAnalyze.length === 0) {
1164
+ showStatus('filterStatus', 'No papers match the pre-filter criteria', 'error');
1165
+ return;
1166
+ }
1167
+
1168
  // Check if user wants to analyze more than 50 papers
1169
  if (paperLimit > 50) {
1170
  const userApiKey = prompt(`You want to analyze ${paperLimit} papers, which exceeds the limit of 50.\n\nPlease provide your own OpenAI API key to continue:\n\n(Your API key will be used only for this analysis and not stored)`);
 
1206
  research_question: researchQuestion,
1207
  limit: paperLimit,
1208
  source_collection: historyIndex.currentCollectionId || null,
1209
+ papers: papersToAnalyze.length > 0 ? papersToAnalyze : null,
1210
  user_api_key: window.userApiKey || null
1211
  })
1212
  });
 
1214
  const data = await response.json();
1215
 
1216
  if (data.success) {
1217
+ // Hide collection stats box and show results section when filtering
1218
+ hideCollectionStats();
1219
+ document.getElementById('resultsSection').style.display = 'block';
1220
+
1221
  // Simulate progress for filtering (since it's synchronous in backend)
1222
  let progress = 0;
1223
  const progressInterval = setInterval(() => {
 
1280
  function updateStats(total, relevant, cited = 0, citing = 0, related = 0, relevantAbs = null, totalAbs = null, tested = null, oaPercentage = null, abstractPercentage = null) {
1281
  const statsDiv = document.getElementById('stats');
1282
  const rate = tested && tested > 0 ? Math.round((relevant / tested) * 100) : 0;
1283
+
1284
+ // Show NA for unfiltered collections (when no filtering has been applied)
1285
+ const showNA = relevant === 0 && tested === 0;
1286
+
1287
  statsDiv.innerHTML = `
1288
  <div class="stat-item">
1289
  <div class="stat-number">${total}</div>
1290
  <div class="stat-label">Total Papers</div>
1291
  </div>
1292
  <div class="stat-item">
1293
+ <div class="stat-number">${showNA ? 'N/A' : (tested || total)}</div>
1294
  <div class="stat-label">Tested Papers</div>
1295
  </div>
1296
  <div class="stat-item">
1297
+ <div class="stat-number">${showNA ? 'N/A' : relevant}</div>
1298
  <div class="stat-label">Relevant Papers</div>
1299
  </div>
1300
  <div class="stat-item">
1301
+ <div class="stat-number">${showNA ? 'N/A' : rate + '%'}</div>
1302
  <div class="stat-label">Rel. Rate</div>
1303
  </div>
1304
  <div class="stat-item">
1305
+ <div class="stat-number">${showNA ? 'N/A' : (oaPercentage !== null ? oaPercentage + '%' : 'N/A')}</div>
1306
  <div class="stat-label">Open Access</div>
1307
  </div>
1308
  <div class="stat-item">
1309
+ <div class="stat-number">${showNA ? 'N/A' : (abstractPercentage !== null ? abstractPercentage + '%' : 'N/A')}</div>
1310
  <div class="stat-label">With Abstract</div>
1311
  </div>
1312
  `;
 
1513
  if (data.success) {
1514
  buildHistoryIndex(data.files);
1515
  displayHistory(data.files);
1516
+ // Load detailed seed information for Multi-Seed s
1517
+ setTimeout(() => updateCollectionDisplayWithSeeds(), 100);
1518
  }
1519
  } catch (error) {
1520
  console.error('Error loading history:', error);
 
1554
 
1555
  // Display collections
1556
  collectionsList.innerHTML = collections.map(collection => {
1557
+ // Determine collection type and display format
1558
+ let displayTitle = '';
1559
+ let tooltipText = '';
1560
+
1561
+ if (collection.type === 'merged') {
1562
+ displayTitle = 'Merged Collection';
1563
+ tooltipText = 'Multiple collections merged together';
1564
+ } else if (collection.type === 'multiseed') {
1565
+ // Always use display_title if available, otherwise create from seed results
1566
+ if (collection.display_title) {
1567
+ displayTitle = collection.display_title;
1568
+ } else if (collection.seed_results && collection.seed_results.length > 0) {
1569
+ // Create display title from seed results
1570
+ const seedTitles = collection.seed_results.map(seed => seed.title);
1571
+ if (seedTitles.length === 1) {
1572
+ displayTitle = seedTitles[0];
1573
+ } else if (seedTitles.length === 2) {
1574
+ displayTitle = `${seedTitles[0]} & ${seedTitles[1]}`;
1575
+ } else {
1576
+ displayTitle = `${seedTitles[0]}, ${seedTitles[1]} + ${seedTitles.length - 2} others`;
1577
+ }
1578
+ } else {
1579
+ // Fallback: try to get title from collection title or work_identifier
1580
+ displayTitle = collection.title || collection.work_identifier || 'Collection';
1581
+ }
1582
+ tooltipText = 'Collection from multiple seed papers';
1583
+ } else {
1584
+ // Single seed - show the paper title
1585
+ displayTitle = collection.title || collection.work_identifier || 'UNTITLED COLLECTION';
1586
+ tooltipText = 'Single seed paper collection';
1587
+ }
1588
+
1589
  const linkedFilters = filters.filter(filter => filter.source_collection === collection.work_identifier);
1590
 
1591
  return `
1592
+ <div class="history-item collection-item" data-collection="${collection.work_identifier || ''}" onclick="selectCollection('${collection.filename}', '${collection.work_identifier || ''}', '${displayTitle}')" draggable="true" ondragstart="dragCollection(event, '${collection.filename}', '${displayTitle}', ${collection.total_papers || 0})" title="${tooltipText}">
1593
+ <div class="history-title">${displayTitle}</div>
1594
  <div class="history-meta">${collection.created}</div>
1595
  <div class="history-meta">${(collection.size / 1024).toFixed(1)} KB</div>
1596
  <div class="history-meta">${collection.total_papers || 0} PAPER${(collection.total_papers || 0) !== 1 ? 'S' : ''}</div>
 
1606
  }).join('');
1607
  }
1608
 
1609
+ // Function to load collection details and show seed information
1610
+ async function loadCollectionSeedDetails(filename, workIdentifier) {
1611
+ try {
1612
+ const response = await fetch(`/api/load-database-file/${filename}`);
1613
+ const data = await response.json();
1614
+ if (data.success) {
1615
+ const fileData = data.data || {};
1616
+ const seedResults = fileData.seed_results || [];
1617
+
1618
+ if (seedResults.length === 0) {
1619
+ return { title: 'No seed information available', tooltip: '' };
1620
+ }
1621
+
1622
+ if (seedResults.length === 1) {
1623
+ // Single seed
1624
+ const seed = seedResults[0];
1625
+ return {
1626
+ title: `${seed.title} (${seed.year || 'Unknown year'})`,
1627
+ tooltip: `Single seed: ${seed.title}`
1628
+ };
1629
+ } else if (seedResults.length === 2) {
1630
+ // Two seeds
1631
+ const seed1 = seedResults[0];
1632
+ const seed2 = seedResults[1];
1633
+ return {
1634
+ title: `${seed1.title} & ${seed2.title}`,
1635
+ tooltip: `Two seeds: ${seed1.title} & ${seed2.title}`
1636
+ };
1637
+ } else {
1638
+ // Multiple seeds - show first 2 + count
1639
+ const seed1 = seedResults[0];
1640
+ const seed2 = seedResults[1];
1641
+ const remaining = seedResults.length - 2;
1642
+ const allTitles = seedResults.map(s => s.title).join('\n• ');
1643
+ return {
1644
+ title: `${seed1.title}, ${seed2.title} + ${remaining} others`,
1645
+ tooltip: `Multi-Seed (${seedResults.length} seeds):\n• ${allTitles}`
1646
+ };
1647
+ }
1648
+ }
1649
+ } catch (error) {
1650
+ console.error('Error loading collection details:', error);
1651
+ }
1652
+ return { title: 'Multi-Seed ', tooltip: 'Collection from multiple seed papers' };
1653
+ }
1654
+
1655
+ // Function to update collection display with detailed seed information
1656
+ async function updateCollectionDisplayWithSeeds() {
1657
+ const collectionItems = document.querySelectorAll('.collection-item[data-collection]');
1658
+
1659
+ for (const item of collectionItems) {
1660
+ const filename = item.getAttribute('onclick').match(/'([^']+)'/)[1];
1661
+ const workIdentifier = item.getAttribute('data-collection');
1662
+
1663
+ // Check if this is a Multi-Seed
1664
+ const titleElement = item.querySelector('.history-title');
1665
+ if (titleElement.textContent === 'Multi-Seed ') {
1666
+ const seedDetails = await loadCollectionSeedDetails(filename, workIdentifier);
1667
+ titleElement.textContent = seedDetails.title;
1668
+ titleElement.title = seedDetails.tooltip;
1669
+ }
1670
+ }
1671
+ }
1672
+
1673
  function selectCollection(filename, workIdentifier, title) {
1674
  // Get filters for this collection
1675
  const filters = historyIndex.filters[workIdentifier] || [];
 
1713
  }
1714
 
1715
  window.openCollection = async function(filename, workIdentifier) {
1716
+ // Show loading indicator
1717
+ const loadingIndicator = document.createElement('div');
1718
+ loadingIndicator.id = 'collectionLoadingIndicator';
1719
+ loadingIndicator.innerHTML = `
1720
+ <div style="display: flex; align-items: center; justify-content: center; padding: 20px; background: #1a1a1a; border: 2px solid #00ff00; border-radius: 10px; margin: 20px 0;">
1721
+ <div style="width: 12px; height: 12px; background: #00ff00; border-radius: 50%; margin-right: 15px; animation: pulse 1.5s infinite;"></div>
1722
+ <span style="color: #00ff00; font-weight: bold; font-size: 16px;">Loading collection...</span>
1723
+ </div>
1724
+ `;
1725
+
1726
+ // Add CSS for pulsing animation
1727
+ if (!document.getElementById('loadingAnimationCSS')) {
1728
+ const style = document.createElement('style');
1729
+ style.id = 'loadingAnimationCSS';
1730
+ style.textContent = `
1731
+ @keyframes pulse {
1732
+ 0% { opacity: 1; transform: scale(1); }
1733
+ 50% { opacity: 0.5; transform: scale(1.2); }
1734
+ 100% { opacity: 1; transform: scale(1); }
1735
+ }
1736
+ `;
1737
+ document.head.appendChild(style);
1738
+ }
1739
+
1740
+ // Insert loading indicator before results section
1741
+ const resultsSection = document.getElementById('resultsSection');
1742
+ resultsSection.parentNode.insertBefore(loadingIndicator, resultsSection);
1743
+
1744
  try {
1745
  const response = await fetch(`/api/load-database-file/${filename}`);
1746
  const data = await response.json();
1747
  if (data.success) {
1748
  const fileData = data.data || {};
1749
  const papers = fileData.papers || [];
1750
+
1751
+ // Clear any existing collection status messages
1752
+ const collectStatus = document.getElementById('collectStatus');
1753
+ collectStatus.textContent = '';
1754
+ collectStatus.style.color = '#aaaaaa';
1755
+
1756
+ // Show collection stats box instead of results section
1757
+ showCollectionStats(fileData);
1758
+
1759
+ // Hide results section for unfiltered collections
1760
+ document.getElementById('resultsSection').style.display = 'none';
1761
+
1762
  currentCollectionFile = filename; currentFilterFile = null; historyIndex.currentCollectionId = workIdentifier || (fileData.work_identifier || '');
1763
  document.getElementById('collectDownload').style.display = 'block';
1764
  document.getElementById('filterDownload').style.display = 'none';
1765
  // Enable filter button when opening a collection
1766
  document.getElementById('filterBtn').disabled = false;
1767
+ document.getElementById('preFilterBtn').disabled = false;
1768
  // Save papers to temp file for filtering
1769
  collectedPapers = papers;
1770
+
1771
+ // Populate seed papers if this is a Multi-Seed
1772
+ if (fileData.seed_results && fileData.seed_results.length > 0) {
1773
+ selectedSeeds = fileData.seed_results.map(seed => ({
1774
+ work_id: seed.work_id,
1775
+ title: seed.title,
1776
+ authors: seed.authors || [],
1777
+ year: seed.year || '',
1778
+ venue: seed.venue || '',
1779
+ papers_found: seed.papers_found || 0,
1780
+ cited_papers: seed.cited_papers || 0,
1781
+ citing_papers: seed.citing_papers || 0,
1782
+ related_papers: seed.related_papers || 0
1783
+ }));
1784
+ updateSeedCollectionDisplay();
1785
+ updateCollectButton();
1786
+ } else {
1787
+ // Clear seed papers for single-seed collections
1788
+ selectedSeeds = [];
1789
+ updateSeedCollectionDisplay();
1790
+ updateCollectButton();
1791
+ }
1792
+
1793
+ // Update loading indicator to show loaded status
1794
+ loadingIndicator.innerHTML = `
1795
+ <div style="display: flex; align-items: center; justify-content: center; padding: 20px; background: #1a1a1a; border: 2px solid #00ff00; border-radius: 10px; margin: 20px 0;">
1796
+ <div style="width: 12px; height: 12px; background: #00ff00; border-radius: 50%; margin-right: 15px;"></div>
1797
+ <span style="color: #00ff00; font-weight: bold; font-size: 16px;">Collection loaded successfully!</span>
1798
+ </div>
1799
+ `;
1800
+
1801
+ // Remove loading indicator after 2 seconds
1802
+ setTimeout(() => {
1803
+ if (loadingIndicator.parentNode) {
1804
+ loadingIndicator.parentNode.removeChild(loadingIndicator);
1805
+ }
1806
+ }, 2000);
1807
  }
1808
  } catch (error) {
1809
+ // Update loading indicator to show error
1810
+ loadingIndicator.innerHTML = `
1811
+ <div style="display: flex; align-items: center; justify-content: center; padding: 20px; background: #1a1a1a; border: 2px solid #ff0000; border-radius: 10px; margin: 20px 0;">
1812
+ <div style="width: 12px; height: 12px; background: #ff0000; border-radius: 50%; margin-right: 15px;"></div>
1813
+ <span style="color: #ff0000; font-weight: bold; font-size: 16px;">Error loading collection</span>
1814
+ </div>
1815
+ `;
1816
+
1817
+ // Remove error indicator after 3 seconds
1818
+ setTimeout(() => {
1819
+ if (loadingIndicator.parentNode) {
1820
+ loadingIndicator.parentNode.removeChild(loadingIndicator);
1821
+ }
1822
+ }, 3000);
1823
+
1824
  alert(`Error opening collection: ${error.message}`);
1825
  }
1826
  }