iitolstykh
/

GigaCheck-Detector-Multi

Token Classification

Model card Files Files and versions

iitolstykh commited on Sep 25, 2025

Commit

9efb58c

·

verified ·

1 Parent(s): a00c0fa

Update README.md

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+- ru
+library_name: gigacheck
+tags:
+- token-classification
+- detr
+- ai-detection
+- multilingual
+- gigacheck
+datasets:
+- iitolstykh/LLMTrace_detection
+---
+# GigaCheck-Detector-Multi
+<p style="text-align: center;">
+  <div align="center">
+  <img src= width="40%"/>
+  </div>
+  <p align="center">
+  <a href=""> 🌐 LLMTrace Website </a> |
+  <a href=""> 📜 LLMTrace Paper on arXiv </a> |
+  <a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_detection"> 🤗 LLMTrace - Detection Dataset </a> |
+</p>
+## Model Card
+### Model Description
+This is the official `GigaCheck-Detector-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for **AI interval detection**. Its purpose is to identify and localize the specific spans of text within a document that were generated by an AI.
+The model was trained jointly on the English and Russian portions of the `LLMTrace Detection dataset`, which includes human, fully AI, and mixed-authorship texts with character-level annotations.
+For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)
+### Intended Use & Limitations
+This model is intended for fine-grained analysis of documents, academic integrity tools, and research into human-AI collaboration.
+**Limitations:**
+*   The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
+*   It is not infallible and may miss some AI-generated spans or incorrectly flag human-written parts.
+*   The boundary predictions may not be perfectly precise in all cases.
+## Evaluation
+The model was evaluated on the test split of the `LLMTrace Detection dataset`. The performance is measured using standard mean Average Precision (mAP) metrics for object detection, adapted for text spans.
+| Metric        | Value  |
+|---------------|--------|
+| mAP @ IoU=0.5 | 0.8976 |
+| mAP @ IoU=0.5:0.95 | 0.7921 |
+## Citation
+If you use this model in your research, please cite our papers:
+```bibtex
+@article{Layer2025LLMTrace,
+  title={{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
+  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
+  eprint={},
+  journal={},
+  archivePrefix={},
+  primaryClass={},
+  url={}
+}
+@article{tolstykh2024gigacheck,
+  title={{GigaCheck: Detecting LLM-generated Content}},
+  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
+  journal={arXiv preprint arXiv:2410.23728},
+  year={2024}
+}
+```