--- base_model: meta-llama/Llama-2-7b-chat-hf library_name: peft license: cc-by-nc-sa-4.0 tags: - audio - video - segmentation - mask-quality-assessment - audio-visual-segmentation - lora --- # MQ-Auditor HyperLoRA Weights This repository contains the released MQ-Auditor pretrained weights for reference-free mask quality assessment in language-referred audio-visual segmentation. The checkpoint corresponds to: ```text epochs96_lr1e-4_bs4_gradacc8_lora_r32alpha64_pos0.5_ioulosswei0 ``` ## Model MQ-Auditor takes a video clip, audio, a referring expression, a frame, and a candidate segmentation mask, then predicts mask quality attributes such as mask type, IoU, and recommended action. The released weights are intended to be used with the MQ-Auditor codebase and MQ-RAVSBench dataset. The base LLM checkpoint and external encoders are not included in this package. ## Release Contents The public weight package should include: ```text adapter_config.json adapter_model.safetensors config.json model.txt model_trainable_params.txt non_lora_trainables.bin saved_config.json trainer_state.json checkpoint-960/ config.json finetune_weights.bin ``` Intermediate epoch checkpoints and TensorBoard logs are not part of the release package. ## Training Data The model was trained on MQ-RAVSBench with: ```text train_test_meta_files/metadata.csv train_test_meta_files/train_audit_only_filtered.json ``` `null` masks are used during training as empty-mask examples. They are not part of the default/reported test-time evaluation protocol. ## Evaluation Evaluation is reported on the seen and unseen MQ-RAVSBench test splits: ```text test_s_image_filtered.json test_u_image_filtered.json test_s_video_filtered.json test_u_video_filtered.json ``` Reported mask types focus on non-empty candidate masks: `perfect`, `cutout`, `erode`, `dilate`, `merge`, and `full_neg`. ## License The released MQ-Auditor weights are provided for non-commercial research purposes only under CC BY-NC-SA 4.0-style terms. The weights depend on the Llama-2 base model and other pretrained encoders, so users must also comply with the applicable upstream model licenses and access terms. ## Citation ```bibtex @article{zhou2026audit, title={Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation}, author={Zhou, Jinxing and Zhou, Yanghao and Wang, Yaoting and Han, Zongyan and Ma, Jiaqi and Ding, Henghui and Anwer, Rao Muhammad and Cholakkal, Hisham}, journal={arXiv preprint arXiv:2602.03892}, year={2026} } ``` Paper: https://arxiv.org/pdf/2602.03892