Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - multiple-instance-learning
5
+ - fraud-detection
6
+ - risk-assessment
7
+ - anomaly-detection
8
+ - graph-transformer
9
+ - interpretability
10
+ license: apache-2.0
11
+ library_name: pytorch
12
+ ---
13
+
14
+ # AC-MIL
15
+
16
+ AC-MIL (Action-Aware Capsule Multiple Instance Learning for Live-Streaming Room Risk Assessment) is a weakly supervised model for **room-level risk assessment** in live-streaming platforms. It is designed for scenarios where only **binary room-level labels** are available, while risk evidence is often **sparse, localized, and manifested through coordinated behaviors** across users and time.
17
+
18
+ AC-MIL formulates each live room as a **Multiple Instance Learning (MIL)** bag, where each instance is a **user–timeslot capsule**—a short action subsequence performed by a particular user within a fixed time window. The model produces:
19
+ - a **room-level risk score**, indicating the probability that the room is risky, and
20
+ - **capsule-level attributions**, providing interpretable evidence by highlighting suspicious user–time segments that contribute most to the prediction.
21
+
22
+
23
+ ---
24
+
25
+ ## Key idea
26
+
27
+ Given a room’s action stream, we construct a 2D grid of capsules over **users × timeslots**. Each capsule summarizes localized behavioral patterns within a specific user–time window. AC-MIL then models:
28
+ - **temporal dynamics**: how users’ behaviors evolve over time,
29
+ - **cross-user dependencies**: interactions between viewers and the streamer, as well as coordination patterns among viewers,
30
+ - **multi-level signals**: evidence captured at the action, capsule, user, and timeslot levels,
31
+ and fuses these signals to produce robust room-level risk predictions.
32
+
33
+ ---
34
+
35
+ ## Architecture overview
36
+
37
+ AC-MIL follows a hierarchical serial–parallel design:
38
+
39
+ <p align="center">
40
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/671b2ede3fd1d03dc687c641/Vry7GWMLjGKLsCBqq5B6g.png" width="80%"/>
41
+ <p>
42
+
43
+
44
+ 1. **Action Field Encoder**
45
+ - Encodes the full action sequence with a Transformer to produce contextualized action embeddings.
46
+ - Produces an action-level room representation via a learnable `[CLS]` token.
47
+
48
+ 2. **Capsule Constructor**
49
+ - Partitions actions into **user–timeslot capsules**.
50
+ - Encodes each capsule with an LSTM (final hidden state as capsule embedding).
51
+
52
+ 3. **Relational Capsule Reasoner**
53
+ - Builds an **adaptive relation-aware graph** over capsules using emantic similarity and relation masks.
54
+ - Runs a **graph-aware Transformer** to refine capsule embeddings.
55
+ - Provides **capsule-level interpretability** via `[CLS] → capsule` attention.
56
+
57
+ 4. **Dual-View Integrator**
58
+ - **User-view**: GRU over each user’s capsule sequence and attention pooling across users.
59
+ - **Timeslot-view**: attention pooling within each timeslot and GRU across timeslots.
60
+
61
+ 5. **Cross-Level Risk Decoder**
62
+ - Learns gates over multi-level room representations.
63
+ - Produces the final room embedding and risk score.
64
+
65
+
66
+ ---
67
+
68
+
69
+ ## Input / output specification
70
+
71
+ ### Input (conceptual)
72
+
73
+ Each dataset sample corresponds to a live room `room_id` with a binary room-level label `room_label ∈ {0,1,2,3}` (>0, risky).
74
+ A room is represented as an **action sequence** `patch_list = {α_i}` ordered by tuple (user, time), where each action follows the paper’s definition:
75
+ `α = (u, t, a, x)` (user, timestamp, action type, and optional textual/multimodal feature).
76
+
77
+ In our May/June datasets, each action record is stored with the following fields:
78
+
79
+ - `u_idx` (int): user index within the room, used to build the `users × timeslots` grid (e.g., 0 = streamer, 1..U = selected viewers).
80
+ - `global_user_idx` (int/str): global user identifier across the whole dataset (before remapping to `u_idx`).
81
+ - `timestamp` (int/float): the action timestamp`t`. In the formulation, timestamps are within a window `[0, T]` after the room starts.
82
+ - `t` (int): timeslot index derived by discretizing `timestamp` into fixed-length windows. This is the column index when constructing the `users × timeslots` capsule grid.
83
+ - `l` (int): role indicator (recommended convention: `0 = viewer`, `1 = streamer`).
84
+ - `action_id` (int): the action type id `a` (e.g., enter, comment, like, gift, share; streamer-side actions may include stream start, ASR text, OCR text, etc.).
85
+ - `action_desc` (str / null): raw textual content associated with the action (e.g., comment text, ASR transcript, OCR text).
86
+ - `action_vec` (numpy): pre-encoded feature vector for `action_desc`.
87
+
88
+
89
+ Example (JSONL-like):
90
+ ```json
91
+ {
92
+ "room_id": "1",
93
+ "room_label": "2",
94
+ "patch_list": [
95
+ (u_idx, t, l, action_id, action_vec, timestamp, action_desc, user_id),
96
+ (0, 1, 0, 5, [0.0, 0.3, ...], 4, "主播口播:...", 5415431),
97
+ ...
98
+ ]
99
+ }
100
+ ```
101
+
102
+ ---
103
+
104
+ ## Intended use
105
+
106
+ **Primary use cases**
107
+ - Early detection of risky rooms (fraud, collusion, policy-violating coordinated behaviors)
108
+ - Evidence-based moderation: highlight localized suspicious segments (user��time capsules)
109
+
110
+ **Out of scope**
111
+ - Identifying or tracking specific individuals
112
+ - Any use that violates privacy laws, platform policies, or user consent requirements
113
+
114
+ ---