Jinxing1
/

MQ-Auditor

mask-quality-assessment

audio-visual-segmentation

Model card Files Files and versions

MQ-Auditor / README.md

Jinxing1's picture

Update README.md

5d69316 verified 3 days ago

|

history blame contribute delete

2.62 kB

	---
	base_model: meta-llama/Llama-2-7b-chat-hf
	library_name: peft
	license: cc-by-nc-sa-4.0
	tags:
	- audio
	- video
	- segmentation
	- mask-quality-assessment
	- audio-visual-segmentation
	- lora
	---

	# MQ-Auditor HyperLoRA Weights

	This repository contains the released MQ-Auditor pretrained weights for reference-free mask quality assessment in language-referred audio-visual segmentation.

	The checkpoint corresponds to:

	```text
	epochs96_lr1e-4_bs4_gradacc8_lora_r32alpha64_pos0.5_ioulosswei0
	```

	## Model

	MQ-Auditor takes a video clip, audio, a referring expression, a frame, and a candidate segmentation mask, then predicts mask quality attributes such as mask type, IoU, and recommended action.

	The released weights are intended to be used with the MQ-Auditor codebase and MQ-RAVSBench dataset. The base LLM checkpoint and external encoders are not included in this package.

	## Release Contents

	The public weight package should include:

	```text
	adapter_config.json
	adapter_model.safetensors
	config.json
	model.txt
	model_trainable_params.txt
	non_lora_trainables.bin
	saved_config.json
	trainer_state.json
	checkpoint-960/
	config.json
	finetune_weights.bin
	```

	Intermediate epoch checkpoints and TensorBoard logs are not part of the release package.

	## Training Data

	The model was trained on MQ-RAVSBench with:

	```text
	train_test_meta_files/metadata.csv
	train_test_meta_files/train_audit_only_filtered.json
	```

	`null` masks are used during training as empty-mask examples. They are not part of the default/reported test-time evaluation protocol.

	## Evaluation

	Evaluation is reported on the seen and unseen MQ-RAVSBench test splits:

	```text
	test_s_image_filtered.json
	test_u_image_filtered.json
	test_s_video_filtered.json
	test_u_video_filtered.json
	```

	Reported mask types focus on non-empty candidate masks: `perfect`, `cutout`, `erode`, `dilate`, `merge`, and `full_neg`.

	## License

	The released MQ-Auditor weights are provided for non-commercial research purposes only under CC BY-NC-SA 4.0-style terms. The weights depend on the Llama-2 base model and other pretrained encoders, so users must also comply with the applicable upstream model licenses and access terms.

	## Citation

	```bibtex
	@article{zhou2026audit,
	title={Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation},
	author={Zhou, Jinxing and Zhou, Yanghao and Wang, Yaoting and Han, Zongyan and Ma, Jiaqi and Ding, Henghui and Anwer, Rao Muhammad and Cholakkal, Hisham},
	journal={arXiv preprint arXiv:2602.03892},
	year={2026}
	}
	```

	Paper: https://arxiv.org/pdf/2602.03892