UIT-fine-tuned-Models
Collection
This collections includes fine-tuned models from CS221, CS431 at UIT โข 11 items โข Updated
This model combines text and video features using Late Fusion with Cross-Attention to classify TikTok content as safe or harmful.
Text Backbone (XLM-RoBERTa) โ Text Features (768-dim)
Video Backbone (VideoMAE) โ Video Features (768-dim)
โ
Cross-Attention Fusion
โ
Gating Mechanism
โ
Classifier (2 classes)
This is a custom model. You need to download and use the LateFusionModel class:
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch.nn as nn
# Download model weights
weights_path = hf_hub_download(repo_id="KhoiBui/tiktok-multimodal-fusion-classifier", filename="model.safetensors")
config_path = hf_hub_download(repo_id="KhoiBui/tiktok-multimodal-fusion-classifier", filename="fusion_config.json")
# Load config
import json
with open(config_path) as f:
config = json.load(f)
# Initialize and load model (using LateFusionModel class from your codebase)
# model = LateFusionModel(config)
# model.load_state_dict(load_file(weights_path))