sebsigma commited on
Commit
138c11c
Β·
verified Β·
1 Parent(s): a3a4124

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -2
README.md CHANGED
@@ -1,11 +1,102 @@
1
  ---
2
  title: SemanticCite
3
- emoji: πŸ‘€
4
  colorFrom: green
5
  colorTo: green
6
  sdk: static
7
  pinned: false
8
  license: mit
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: SemanticCite
 
3
  colorFrom: green
4
  colorTo: green
5
  sdk: static
6
  pinned: false
7
  license: mit
8
+ short_description: 'AI system for automated full-text citation verification'
9
  ---
10
 
11
+ # SemanticCite
12
+
13
+ **The first AI system for automated full-text citation verification**
14
+
15
+ πŸ“„ **[Paper](https://arxiv.org/abs/xxxx-xxxx)** | πŸ’» **[GitHub Repository](https://github.com/sebhaan/SemanticCite)** | 🌐 **[Project Homepage](https://semanticcite.com)**
16
+
17
+ This Hugging Face Space hosts the complete SemanticCite project, including fine-tuned models, training dataset, and interactive demo for AI-powered citation verification.
18
+
19
+ <div align="center">
20
+ <img src="assets/io.png" alt="SemanticCite Input/Output" width="900"/>
21
+ </div>
22
+
23
+ SemanticCite transforms citation verification by analysing complete source documents and providing nuanced classification through four categories: Supported, Partially Supported, Unsupported, and Uncertain. Beyond simple validation, the system delivers detailed reasoning, confidence scores, and evidence reference snippets that show researchers exactly how their claims connect to the supporting literature.
24
+
25
+
26
+ ## ✨ Key Features
27
+
28
+ - **Deep Semantic Analysis**: Full-text document analysis with 4-class classification (Supported, Partially Supported, Unsupported, Uncertain)
29
+ - **Lightweight AI Models**: Fine-tuned Qwen3 models (1.7B & 4B parameters) with performance comparable to GPT-4
30
+ - **Triple Retrieval System**: Dense vector search + sparse BM25 matching + neural reranking with FlashRank
31
+ - **Evidence-Based Reasoning**: Ranked text snippets with transparent explanations and confidence scores
32
+ - **Multiple Deployment Options**: Web interface, Python API, local/cloud deployment
33
+
34
+ ## πŸ€— Hugging Face Resources
35
+
36
+ ### Models
37
+ - **[SemanticCite-Refiner-Qwen3-1B](https://huggingface.co/sebsigma/SemanticCite-Refiner-Qwen3-1B)** - Claim extraction model (1.7B parameters)
38
+ - **[SemanticCite-Checker-Qwen3-4B](https://huggingface.co/sebsigma/SemanticCite-Checker-Qwen3-4B)** - Citation verification model (4B parameters)
39
+
40
+ ### Dataset
41
+ - **[SemanticCite-Dataset](https://huggingface.co/datasets/sebsigma/SemanticCite-Dataset)** - 1,111 citation-reference pairs across 8 academic fields with expert annotations
42
+
43
+ ## πŸ”§ Technical Architecture
44
+
45
+ - **Hybrid Retrieval:** BM25 + Dense Vector Search
46
+ - **Reranking:** FlashRank neural reranking
47
+ - **Classification:** Fine-tuned Qwen3 models with structured output
48
+ - **Embeddings:** Local SentenceTransformers or OpenAI embeddings
49
+ - **Storage:** ChromaDB vector database
50
+
51
+ ## πŸ“¦ Installation
52
+
53
+ ```bash
54
+ # Clone repository
55
+ git clone https://github.com/sebhaan/SemanticCite
56
+ cd SemanticCite
57
+
58
+ # Setup environment
59
+ conda env create -f environment.yaml
60
+ conda activate cite
61
+
62
+ # Run web interface
63
+ streamlit run src/app.py
64
+ ```
65
+
66
+ For local deployment with Ollama:
67
+ ```bash
68
+ # Install models
69
+ ollama pull sebsigma/semanticcite-refiner-qwen3-1b
70
+ ollama pull sebsigma/semanticcite-checker-qwen3-4b
71
+ ```
72
+
73
+ Full documentation available in the [GitHub repository](https://github.com/sebhaan/SemanticCite).
74
+
75
+ ## πŸ’Ό Tailored Solutions
76
+
77
+ Need to verify entire documents automatically? Visit [semanticcite.com](https://semanticcite.com) for:
78
+ - Complete citation system with automatic extraction and verification
79
+ - Batch processing for large-scale workflows
80
+ - API integration for editorial and publishing systems
81
+ - On-premise deployment with custom model training
82
+
83
+ ## πŸ“„ Citation
84
+
85
+ If you use SemanticCite in your research, please cite:
86
+
87
+ ```bibtex
88
+ @article{semanticcite2025,
89
+ title={SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning},
90
+ author={Sebastian Haan},
91
+ journal={ArXiv Preprint},
92
+ year={2025},
93
+ url={https://arxiv.org/abs/xxxx-xxxx}
94
+ }
95
+ ```
96
+
97
+
98
+ ---
99
+
100
+ <div align="center">
101
+ <p><strong>SemanticCite</strong> - Enhancing research quality through AI-powered citation verification</p>
102
+ </div>