Update README.md

f08510a verified 15 days ago

4.52 kB

	---
	license: apache-2.0
	language:
	- en
	- ru
	library_name: gigacheck
	tags:
	- token-classification
	- detr
	- ai-detection
	- multilingual
	- gigacheck
	datasets:
	- iitolstykh/LLMTrace_detection
	base_model:
	- mistralai/Mistral-7B-v0.3
	---

	# GigaCheck-Detector-Multi

	<p style="text-align: center;">
	<div align="center">
	<img src="https://raw.githubusercontent.com/sweetdream779/LLMTrace-info/refs/heads/main/images/logo/GigaCheck-detector-multi.PNG" width="40%"/>
	</div>
	<p align="center">
	<a href="https://sweetdream779.github.io/LLMTrace-info"> 🌐 LLMTrace Website </a> \|
	<a href="http://arxiv.org/abs/2509.21269"> 📜 LLMTrace Paper on arXiv </a> \|
	<a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_detection"> 🤗 LLMTrace - Detection Dataset </a> \|
	<a href="https://github.com/ai-forever/gigacheck"> Github </a> \|
	</p>

	## Model Card

	### Model Description

	This is the official `GigaCheck-Detector-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for AI interval detection. Its purpose is to identify and localize the specific spans of text within a document that were generated by an AI.

	The model was trained jointly on the English and Russian portions of the `LLMTrace Detection dataset`, which includes human, fully AI, and mixed-authorship texts with character-level annotations.

	For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)

	### Intended Use & Limitations

	This model is intended for fine-grained analysis of documents, academic integrity tools, and research into human-AI collaboration.

	Limitations:
	* The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
	* It is not infallible and may miss some AI-generated spans or incorrectly flag human-written parts.
	* The boundary predictions may not be perfectly precise in all cases.

	## Evaluation

	The model was evaluated on the test split of the `LLMTrace Detection dataset`. The performance is measured using standard mean Average Precision (mAP) metrics for object detection, adapted for text spans.

	\| Metric \| Value \|
	\|---------------\|--------\|
	\| mAP @ IoU=0.5 \| 0.8976 \|
	\| mAP @ IoU=0.5:0.95 \| 0.7921 \|

	## Quick start

	Requirements:
	- python3.11
	- [gigacheck](https://github.com/ai-forever/gigacheck)

	```bash
	pip install git+https://github.com/ai-forever/gigacheck
	```

	### Inference with transformers (with trust_remote_code=True)

	```python
	from transformers import AutoModel
	import torch

	model_name = "iitolstykh/GigaCheck-Detector-Multi"
	gigacheck_model = AutoModel.from_pretrained(
	model_name, trust_remote_code=True, device_map="cuda:0", torch_dtype=torch.float32
	)

	text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."

	output = gigacheck_model([text], conf_interval_thresh=0.5)

	# [(start_char, end_char, score)]
	print(output.ai_intervals)
	```

	### Inference with gigacheck

	```python
	from transformers import AutoConfig
	from gigacheck.inference.src.mistral_detector import MistralDetector
	import torch

	model_name = "iitolstykh/GigaCheck-Detector-Multi"

	config = AutoConfig.from_pretrained(model_name)
	model = MistralDetector(
	max_seq_len=config.max_length,
	with_detr=config.with_detr,
	id2label=config.id2label,
	device="cpu" if not torch.cuda.is_available() else "cuda:0",
	conf_interval_thresh=0.5,
	).from_pretrained(model_name)

	text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."
	output = model.predict(text)
	print(output)
	```

	## Citation

	If you use this model in your research, please cite our papers:

	```bibtex
	@article{Layer2025LLMTrace,
	Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
	Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
	Year = {2025},
	Eprint = {arXiv:2509.21269}
	}
	@article{tolstykh2024gigacheck,
	title={{GigaCheck: Detecting LLM-generated Content}},
	author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
	journal={arXiv preprint arXiv:2410.23728},
	year={2024}
	}
	```