File size: 4,516 Bytes
9efb58c
 
 
 
 
 
 
 
 
 
 
 
 
 
a8864f8
 
9efb58c
 
 
 
 
 
a5dc720
9efb58c
 
a5dc720
6f7c208
9efb58c
81db16a
9efb58c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f08510a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9efb58c
 
 
 
 
 
 
d7f3a4f
 
 
 
9efb58c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
language:
- en
- ru
library_name: gigacheck
tags:
- token-classification
- detr
- ai-detection
- multilingual
- gigacheck
datasets:
- iitolstykh/LLMTrace_detection
base_model:
  - mistralai/Mistral-7B-v0.3
---

# GigaCheck-Detector-Multi

<p style="text-align: center;">
  <div align="center">
  <img src="https://raw.githubusercontent.com/sweetdream779/LLMTrace-info/refs/heads/main/images/logo/GigaCheck-detector-multi.PNG" width="40%"/>
  </div>
  <p align="center">
  <a href="https://sweetdream779.github.io/LLMTrace-info"> 🌐 LLMTrace Website </a> | 
  <a href="http://arxiv.org/abs/2509.21269"> 📜 LLMTrace Paper on arXiv </a> | 
  <a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_detection"> 🤗 LLMTrace - Detection Dataset </a> | 
  <a href="https://github.com/ai-forever/gigacheck"> Github </a> | 
</p>

## Model Card

### Model Description

This is the official `GigaCheck-Detector-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for **AI interval detection**. Its purpose is to identify and localize the specific spans of text within a document that were generated by an AI.

The model was trained jointly on the English and Russian portions of the `LLMTrace Detection dataset`, which includes human, fully AI, and mixed-authorship texts with character-level annotations.

For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)

### Intended Use & Limitations

This model is intended for fine-grained analysis of documents, academic integrity tools, and research into human-AI collaboration.

**Limitations:**
*   The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
*   It is not infallible and may miss some AI-generated spans or incorrectly flag human-written parts.
*   The boundary predictions may not be perfectly precise in all cases.

## Evaluation

The model was evaluated on the test split of the `LLMTrace Detection dataset`. The performance is measured using standard mean Average Precision (mAP) metrics for object detection, adapted for text spans.

| Metric        | Value  |
|---------------|--------|
| mAP @ IoU=0.5 | 0.8976 |
| mAP @ IoU=0.5:0.95 | 0.7921 |

## Quick start

Requirements:
- python3.11
- [gigacheck](https://github.com/ai-forever/gigacheck)

```bash
pip install git+https://github.com/ai-forever/gigacheck
```

### Inference with transformers (with trust_remote_code=True)

```python
from transformers import AutoModel
import torch

model_name = "iitolstykh/GigaCheck-Detector-Multi"
gigacheck_model = AutoModel.from_pretrained(
    model_name, trust_remote_code=True, device_map="cuda:0", torch_dtype=torch.float32
)

text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."

output = gigacheck_model([text], conf_interval_thresh=0.5)

# [(start_char, end_char, score)]
print(output.ai_intervals)
```

### Inference with gigacheck

```python
from transformers import AutoConfig
from gigacheck.inference.src.mistral_detector import MistralDetector
import torch

model_name = "iitolstykh/GigaCheck-Detector-Multi"

config = AutoConfig.from_pretrained(model_name)
model = MistralDetector(
    max_seq_len=config.max_length,
    with_detr=config.with_detr,
    id2label=config.id2label,
    device="cpu" if not torch.cuda.is_available() else "cuda:0",
    conf_interval_thresh=0.5,
).from_pretrained(model_name)

text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."
output = model.predict(text)
print(output)
```

## Citation

If you use this model in your research, please cite our papers:

```bibtex
@article{Layer2025LLMTrace,
  Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
  Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
  Year = {2025},
  Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
  title={{GigaCheck: Detecting LLM-generated Content}},
  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
  journal={arXiv preprint arXiv:2410.23728},
  year={2024}
}
```