Spaces:
Sleeping
Sleeping
| title: PC-Bench | |
| emoji: ๐ | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.3.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Paper Discovery Benchmark | |
| tags: | |
| - leaderboard | |
| - research | |
| - multi-agent | |
| - paper-retrieval | |
| # PC-Bench: Paper Discovery Benchmark | |
| Leaderboard for evaluating AI agents on academic paper retrieval and analysis. | |
| ## Benchmarks | |
| | Benchmark | Queries | Description | | |
| |-----------|---------|-------------| | |
| | SemanticBench | 50 | Template-based semantic queries | | |
| | RAbench | 500 | LLM-perturbed natural queries | | |
| ## Metrics | |
| - **MRR** - Mean Reciprocal Rank | |
| - **R@K** - Recall at K (K=1,5,10,20,50) | |
| - **Hit Rate** - Successful retrieval percentage | |
| ## Top Results | |
| | Model | Hit Rate | MRR | Time | | |
| |-------|----------|-----|------| | |
| | Qwen3-Coder-30B | 80% | 0.627 | 22s | | |
| | BM25 Baseline | 78% | 0.541 | - | | |
| ## Links | |
| - [GitHub Repository](https://github.com/MAXNORM8650/papercircle) | |