iitolstykh commited on
Commit
9efb58c
·
verified ·
1 Parent(s): a00c0fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - ru
6
+ library_name: gigacheck
7
+ tags:
8
+ - token-classification
9
+ - detr
10
+ - ai-detection
11
+ - multilingual
12
+ - gigacheck
13
+ datasets:
14
+ - iitolstykh/LLMTrace_detection
15
+ ---
16
+
17
+ # GigaCheck-Detector-Multi
18
+
19
+ <p style="text-align: center;">
20
+ <div align="center">
21
+ <img src= width="40%"/>
22
+ </div>
23
+ <p align="center">
24
+ <a href=""> 🌐 LLMTrace Website </a> |
25
+ <a href=""> 📜 LLMTrace Paper on arXiv </a> |
26
+ <a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_detection"> 🤗 LLMTrace - Detection Dataset </a> |
27
+ </p>
28
+
29
+ ## Model Card
30
+
31
+ ### Model Description
32
+
33
+ This is the official `GigaCheck-Detector-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for **AI interval detection**. Its purpose is to identify and localize the specific spans of text within a document that were generated by an AI.
34
+
35
+ The model was trained jointly on the English and Russian portions of the `LLMTrace Detection dataset`, which includes human, fully AI, and mixed-authorship texts with character-level annotations.
36
+
37
+ For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)
38
+
39
+ ### Intended Use & Limitations
40
+
41
+ This model is intended for fine-grained analysis of documents, academic integrity tools, and research into human-AI collaboration.
42
+
43
+ **Limitations:**
44
+ * The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
45
+ * It is not infallible and may miss some AI-generated spans or incorrectly flag human-written parts.
46
+ * The boundary predictions may not be perfectly precise in all cases.
47
+
48
+ ## Evaluation
49
+
50
+ The model was evaluated on the test split of the `LLMTrace Detection dataset`. The performance is measured using standard mean Average Precision (mAP) metrics for object detection, adapted for text spans.
51
+
52
+ | Metric | Value |
53
+ |---------------|--------|
54
+ | mAP @ IoU=0.5 | 0.8976 |
55
+ | mAP @ IoU=0.5:0.95 | 0.7921 |
56
+
57
+
58
+ ## Citation
59
+
60
+ If you use this model in your research, please cite our papers:
61
+
62
+ ```bibtex
63
+ @article{Layer2025LLMTrace,
64
+ title={{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
65
+ author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
66
+ eprint={},
67
+ journal={},
68
+ archivePrefix={},
69
+ primaryClass={},
70
+ url={}
71
+ }
72
+ @article{tolstykh2024gigacheck,
73
+ title={{GigaCheck: Detecting LLM-generated Content}},
74
+ author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
75
+ journal={arXiv preprint arXiv:2410.23728},
76
+ year={2024}
77
+ }
78
+ ```