koreashin commited on
Commit
5a64d6e
·
verified ·
1 Parent(s): e161a98

Upload 6 files

Browse files
Files changed (6) hide show
  1. LICENSE +190 -0
  2. README.md +301 -1
  3. baramnuri_beta.pth +3 -0
  4. config.json +92 -0
  5. model.py +365 -0
  6. requirements.txt +3 -0
LICENSE ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to the Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ Copyright 2025 C-Team
179
+
180
+ Licensed under the Apache License, Version 2.0 (the "License");
181
+ you may not use this file except in compliance with the License.
182
+ You may obtain a copy of the License at
183
+
184
+ http://www.apache.org/licenses/LICENSE-2.0
185
+
186
+ Unless required by applicable law or agreed to in writing, software
187
+ distributed under the License is distributed on an "AS IS" BASIS,
188
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
189
+ See the License for the specific language governing permissions and
190
+ limitations under the License.
README.md CHANGED
@@ -1,3 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BaramNuri (바람누리) - Driver Behavior Detection Model
2
+
3
+ <div align="center">
4
+
5
+ **바람누리** | *Wind that watches over the world*
6
+
7
+ 경량화된 운전자 이상행동 탐지 AI 모델
8
+
9
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
10
+ [![Python](https://img.shields.io/badge/Python-3.8+-green.svg)](https://python.org)
11
+ [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org)
12
+
13
+ </div>
14
+
15
  ---
16
+
17
+ ## Model Description
18
+
19
+ **바람누리(BaramNuri)**는 차량 내 카메라 영상에서 운전자의 이상행동을 실시간으로 탐지하는 경량화 딥러닝 모델입니다.
20
+
21
+ ### Key Features
22
+
23
+ - **경량화**: Teacher 모델(27.86M) 대비 **49% 파라미터 감소** (14.20M)
24
+ - **고성능**: Knowledge Distillation으로 **98% 성능 유지**
25
+ - **실시간**: 엣지 디바이스 배포 가능 (INT8: ~13MB)
26
+ - **5종 분류**: 정상, 졸음운전, 물건찾기, 휴대폰 사용, 운전자 폭행
27
+
28
  ---
29
+
30
+ ## Architecture
31
+
32
+ ```
33
+ ┌─────────────────────────────────────────────────────────────────┐
34
+ │ BaramNuri Architecture │
35
+ ├─────────────────────────────────────────────────────────────────┤
36
+ │ │
37
+ │ Input: [B, 3, 30, 224, 224] (1초 영상, 30fps) │
38
+ │ │ │
39
+ │ ▼ │
40
+ │ ┌─────────────────────────────────────┐ │
41
+ │ │ Video Swin-T (Stage 1-3) │ ← Kinetics-400 │
42
+ │ │ Shifted Window Attention │ Pretrained │
43
+ │ │ Output: 384 dim features │ │
44
+ │ └─────────────────────────────────────┘ │
45
+ │ │ │
46
+ │ ▼ │
47
+ │ ┌─────────────────────────────────────┐ │
48
+ │ │ Selective SSM Block (x2) │ ← Mamba-style │
49
+ │ │ - 1D Conv for local context │ Temporal │
50
+ │ │ - Selective state space │ Modeling │
51
+ │ │ - Input-dependent B, C, delta │ │
52
+ │ └─────────────────────────────────────┘ │
53
+ │ │ │
54
+ │ ▼ │
55
+ │ ┌─────────────────────────────────────┐ │
56
+ │ │ Classification Head │ │
57
+ │ │ LayerNorm → Dropout → Linear │ │
58
+ │ └─────────────────────────────────────┘ │
59
+ │ │ │
60
+ │ ▼ │
61
+ │ Output: [B, 5] (5-class logits) │
62
+ │ │
63
+ └─────────────────────────────────────────────────────────────────┘
64
+ ```
65
+
66
+ ### Why This Architecture?
67
+
68
+ | Component | Purpose | Benefit |
69
+ |-----------|---------|---------|
70
+ | **Video Swin (Stage 1-3)** | Spatial feature extraction | Proven performance on video |
71
+ | **Stage 4 Removal** | 55% parameter reduction | Lightweight without quality loss |
72
+ | **Selective SSM** | Temporal modeling | O(n) complexity vs O(n²) attention |
73
+ | **Knowledge Distillation** | Performance retention | Learn from larger teacher model |
74
+
75
+ ---
76
+
77
+ ## Performance
78
+
79
+ ### Classification Metrics
80
+
81
+ | Metric | Score |
82
+ |--------|-------|
83
+ | **Accuracy** | 96.17% |
84
+ | **Macro F1** | 0.9504 |
85
+ | **Precision** | 0.95 |
86
+ | **Recall** | 0.95 |
87
+
88
+ ### Per-Class Performance
89
+
90
+ | Class | Precision | Recall | F1-Score |
91
+ |-------|:---------:|:------:|:--------:|
92
+ | 정상 (Normal) | 0.93 | 0.93 | 0.93 |
93
+ | 졸음운전 (Drowsy) | 0.98 | 0.97 | 0.97 |
94
+ | 물건찾기 (Searching) | 0.93 | 0.95 | 0.94 |
95
+ | 휴대폰 사용 (Phone) | 0.94 | 0.93 | 0.94 |
96
+ | 운전자 폭행 (Assault) | 0.99 | 0.99 | 0.99 |
97
+
98
+ ### Comparison with Teacher
99
+
100
+ | Metric | Teacher | BaramNuri | Comparison |
101
+ |--------|---------|-----------|------------|
102
+ | **Parameters** | 27.86M | 14.20M | **-49%** |
103
+ | **Model Size (FP32)** | ~106 MB | ~54 MB | **-49%** |
104
+ | **Model Size (INT8)** | ~26 MB | ~13 MB | **-50%** |
105
+ | **Accuracy** | 98.05% | 96.17% | 98.1% retained |
106
+ | **Macro F1** | 0.9757 | 0.9504 | 97.4% retained |
107
+
108
+ ---
109
+
110
+ ## Quick Start
111
+
112
+ ### Installation
113
+
114
+ ```bash
115
+ pip install torch torchvision
116
+ ```
117
+
118
+ ### Inference
119
+
120
+ ```python
121
+ import torch
122
+ from model import BaramNuri
123
+
124
+ # Load model
125
+ model = BaramNuri(num_classes=5, pretrained=False)
126
+ checkpoint = torch.load('baramnuri_beta.pth', map_location='cpu')
127
+ model.load_state_dict(checkpoint['model_state_dict'])
128
+ model.eval()
129
+
130
+ # Prepare input (1 second video, 30fps, 224x224)
131
+ # Shape: [batch, channels, frames, height, width]
132
+ video = torch.randn(1, 3, 30, 224, 224)
133
+
134
+ # Inference
135
+ with torch.no_grad():
136
+ logits = model(video)
137
+ probs = torch.softmax(logits, dim=-1)
138
+ pred_class = probs.argmax(dim=-1).item()
139
+
140
+ # Class names
141
+ class_names = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
142
+ print(f"Predicted: {class_names[pred_class]} ({probs[0, pred_class]:.2%})")
143
+ ```
144
+
145
+ ### With Prediction Helper
146
+
147
+ ```python
148
+ # Single prediction with confidence
149
+ result = model.predict(video)
150
+ print(f"Class: {result['class_name']}")
151
+ print(f"Confidence: {result['confidence']:.2%}")
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Input Specification
157
+
158
+ | Parameter | Value |
159
+ |-----------|-------|
160
+ | **Format** | `[B, C, T, H, W]` (BCTHW) |
161
+ | **Channels** | 3 (RGB) |
162
+ | **Frames** | 30 (1 second at 30fps) |
163
+ | **Resolution** | 224 x 224 |
164
+ | **Normalization** | ImageNet mean/std |
165
+
166
+ ### Preprocessing
167
+
168
+ ```python
169
+ from torchvision import transforms
170
+
171
+ transform = transforms.Compose([
172
+ transforms.Resize((224, 224)),
173
+ transforms.ToTensor(),
174
+ transforms.Normalize(
175
+ mean=[0.485, 0.456, 0.406],
176
+ std=[0.229, 0.224, 0.225]
177
+ ),
178
+ ])
179
+ ```
180
+
181
+ ---
182
+
183
+ ## Training Details
184
+
185
+ ### Knowledge Distillation
186
+
187
+ ```
188
+ Teacher: Video Swin-T (27.86M, 98.05% acc)
189
+
190
+ │ Soft Labels (Temperature=4.0)
191
+
192
+ Student: BaramNuri (14.20M)
193
+
194
+ │ L = 0.5 * L_hard + 0.5 * L_soft
195
+
196
+ Result: 96.17% acc (98% of teacher performance)
197
+ ```
198
+
199
+ ### Training Configuration
200
+
201
+ | Parameter | Value |
202
+ |-----------|-------|
203
+ | Optimizer | AdamW |
204
+ | Learning Rate | 1e-4 |
205
+ | Weight Decay | 0.05 |
206
+ | Batch Size | 96 (effective) |
207
+ | Epochs | 6 |
208
+ | Loss | CE + KL Divergence |
209
+ | Temperature | 4.0 |
210
+ | Alpha (hard/soft) | 0.5 |
211
+
212
+ ---
213
+
214
+ ## Deployment
215
+
216
+ ### Server Deployment (GPU)
217
+
218
+ ```python
219
+ model = BaramNuri(num_classes=5)
220
+ model.load_state_dict(torch.load('baramnuri_beta.pth')['model_state_dict'])
221
+ model = model.cuda().eval()
222
+
223
+ # FP16 for faster inference
224
+ model = model.half()
225
+ ```
226
+
227
+ ### Edge Deployment (INT8 Quantization)
228
+
229
+ ```python
230
+ import torch.quantization as quant
231
+
232
+ model_int8 = quant.quantize_dynamic(
233
+ model, {torch.nn.Linear}, dtype=torch.qint8
234
+ )
235
+ # Model size: ~13MB
236
+ ```
237
+
238
+ ### ONNX Export
239
+
240
+ ```python
241
+ dummy_input = torch.randn(1, 3, 30, 224, 224)
242
+ torch.onnx.export(
243
+ model, dummy_input, "baramnuri.onnx",
244
+ input_names=['video'],
245
+ output_names=['logits'],
246
+ dynamic_axes={'video': {0: 'batch'}}
247
+ )
248
+ ```
249
+
250
+ ---
251
+
252
+ ## Use Cases
253
+
254
+ 1. **Fleet Management**: Monitor driver behavior in commercial vehicles
255
+ 2. **Insurance Telematics**: Risk assessment based on driving behavior
256
+ 3. **ADAS Integration**: Advanced driver assistance systems
257
+ 4. **Safety Research**: Analyze driving patterns and fatigue
258
+
259
+ ---
260
+
261
+ ## Limitations
262
+
263
+ - Trained on Korean driving environment data
264
+ - Requires frontal camera facing the driver
265
+ - Optimal performance at 30fps input
266
+ - May require fine-tuning for different camera angles
267
+
268
+ ---
269
+
270
+ ## Citation
271
+
272
+ ```bibtex
273
+ @misc{baramnuri2025,
274
+ title={BaramNuri: Lightweight Driver Behavior Detection with Knowledge Distillation},
275
+ author={C-Team},
276
+ year={2025},
277
+ howpublished={\url{https://huggingface.co/c-team/baramnuri-beta}}
278
+ }
279
+ ```
280
+
281
+ ---
282
+
283
+ ## License
284
+
285
+ This model is released under the [Apache 2.0 License](LICENSE).
286
+
287
+ ---
288
+
289
+ ## Acknowledgments
290
+
291
+ - Video Swin Transformer: Liu et al. (CVPR 2022)
292
+ - Knowledge Distillation: Hinton et al. (2015)
293
+ - Mamba/S4: Gu & Dao (2023)
294
+
295
+ ---
296
+
297
+ <div align="center">
298
+
299
+ **바람누리** - 안전한 운전을 위한 AI
300
+
301
+ Made with care by C-Team
302
+
303
+ </div>
baramnuri_beta.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb367c1643c1d53b08f703224f6174fb336ece611a0c4ca295e41befa9aca760
3
+ size 182925595
config.json ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "baramnuri",
3
+ "architectures": ["BaramNuri"],
4
+ "model_name": "baramnuri-beta",
5
+ "version": "0.1.0-beta",
6
+
7
+ "num_classes": 5,
8
+ "class_names": ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"],
9
+ "class_names_en": ["normal", "drowsy_driving", "searching_object", "phone_usage", "driver_assault"],
10
+
11
+ "backbone": {
12
+ "type": "video_swin_t",
13
+ "pretrained_on": "kinetics-400",
14
+ "stages_used": [1, 2, 3],
15
+ "feature_dim": 384
16
+ },
17
+
18
+ "ssm_block": {
19
+ "type": "selective_ssm",
20
+ "d_state": 16,
21
+ "d_conv": 4,
22
+ "expand": 2,
23
+ "n_layers": 2,
24
+ "dropout": 0.2
25
+ },
26
+
27
+ "input_spec": {
28
+ "channels": 3,
29
+ "num_frames": 30,
30
+ "height": 224,
31
+ "width": 224,
32
+ "fps": 30,
33
+ "format": "BCTHW"
34
+ },
35
+
36
+ "model_stats": {
37
+ "total_parameters": 14203205,
38
+ "total_parameters_readable": "14.20M",
39
+ "model_size_fp32_mb": 54,
40
+ "model_size_fp16_mb": 27,
41
+ "model_size_int8_mb": 13
42
+ },
43
+
44
+ "training": {
45
+ "method": "knowledge_distillation",
46
+ "teacher_model": "Video Swin-T (27.86M)",
47
+ "teacher_accuracy": 0.9805,
48
+ "teacher_f1": 0.9757,
49
+ "epochs_trained": 6,
50
+ "best_accuracy": 0.9617,
51
+ "best_macro_f1": 0.9504,
52
+ "optimizer": "AdamW",
53
+ "learning_rate": 1e-4,
54
+ "weight_decay": 0.05,
55
+ "batch_size": 96,
56
+ "data_augmentation": ["resize", "normalize"]
57
+ },
58
+
59
+ "performance": {
60
+ "accuracy": 0.9617,
61
+ "macro_f1": 0.9504,
62
+ "per_class_f1": {
63
+ "정상": 0.93,
64
+ "졸음운전": 0.97,
65
+ "물건찾기": 0.94,
66
+ "휴대폰 사용": 0.94,
67
+ "운전자 폭행": 0.99
68
+ }
69
+ },
70
+
71
+ "comparison_with_teacher": {
72
+ "parameter_reduction": "49%",
73
+ "size_reduction": "49%",
74
+ "accuracy_retention": "98.1%",
75
+ "f1_retention": "97.4%",
76
+ "training_speed_improvement": "40%"
77
+ },
78
+
79
+ "license": "Apache-2.0",
80
+ "language": ["ko", "en"],
81
+ "tags": [
82
+ "video-classification",
83
+ "driver-behavior",
84
+ "knowledge-distillation",
85
+ "video-swin-transformer",
86
+ "state-space-model",
87
+ "ssm",
88
+ "mamba-style",
89
+ "lightweight",
90
+ "korean"
91
+ ]
92
+ }
model.py ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ BaramNuri (바람누리) - Lightweight Driver Behavior Detection Model
3
+
4
+ A hybrid architecture combining:
5
+ - Video Swin Transformer (Stage 1-3) for spatial features
6
+ - Selective State Space Model (SSM) for temporal modeling
7
+
8
+ Trained via Knowledge Distillation from Video Swin-T teacher.
9
+
10
+ Author: C-Team
11
+ License: Apache-2.0
12
+ """
13
+
14
+ import torch
15
+ import torch.nn as nn
16
+ import torch.nn.functional as F
17
+ from torchvision.models.video import swin3d_t, Swin3D_T_Weights
18
+ from typing import Dict, Tuple
19
+
20
+
21
+ class SelectiveSSM(nn.Module):
22
+ """
23
+ Selective State Space Model (Mamba-style)
24
+
25
+ Key: Dynamically generates B, C, delta based on input
26
+ - Important information is remembered
27
+ - Less important information is quickly forgotten
28
+ """
29
+
30
+ def __init__(self, d_model: int, d_state: int = 16, d_conv: int = 4, expand: int = 2, dropout: float = 0.1):
31
+ super().__init__()
32
+
33
+ self.d_model = d_model
34
+ self.d_state = d_state
35
+ self.d_conv = d_conv
36
+ self.expand = expand
37
+ self.d_inner = d_model * expand
38
+
39
+ # Input projection (expansion)
40
+ self.in_proj = nn.Linear(d_model, self.d_inner * 2, bias=False)
41
+
42
+ # 1D convolution (local context)
43
+ self.conv1d = nn.Conv1d(
44
+ self.d_inner, self.d_inner,
45
+ kernel_size=d_conv,
46
+ padding=d_conv - 1,
47
+ groups=self.d_inner
48
+ )
49
+
50
+ # SSM parameter generator (selective!)
51
+ self.x_proj = nn.Linear(self.d_inner, d_state * 2 + 1, bias=False)
52
+
53
+ # A parameter (learnable diagonal matrix)
54
+ self.A_log = nn.Parameter(torch.log(torch.arange(1, d_state + 1, dtype=torch.float32)))
55
+ self.D = nn.Parameter(torch.ones(self.d_inner))
56
+
57
+ # Output projection
58
+ self.out_proj = nn.Linear(self.d_inner, d_model, bias=False)
59
+
60
+ self.dropout = nn.Dropout(dropout)
61
+ self.layer_norm = nn.LayerNorm(d_model)
62
+
63
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
64
+ """
65
+ Args:
66
+ x: [B, T, D]
67
+ Returns:
68
+ y: [B, T, D]
69
+ """
70
+ residual = x
71
+ x = self.layer_norm(x)
72
+
73
+ B, T, D = x.shape
74
+
75
+ # Input projection -> [B, T, 2*d_inner]
76
+ xz = self.in_proj(x)
77
+ x, z = xz.chunk(2, dim=-1)
78
+
79
+ # 1D Conv (capture local context)
80
+ x = x.transpose(1, 2)
81
+ x = self.conv1d(x)[:, :, :T]
82
+ x = x.transpose(1, 2)
83
+
84
+ x = F.silu(x)
85
+
86
+ # Selective SSM parameter generation
87
+ x_ssm = self.x_proj(x)
88
+ B_t = x_ssm[:, :, :self.d_state]
89
+ C_t = x_ssm[:, :, self.d_state:self.d_state*2]
90
+ delta = F.softplus(x_ssm[:, :, -1:])
91
+
92
+ # A parameter (negative for stability)
93
+ A = -torch.exp(self.A_log)
94
+
95
+ # Discretization: A_bar = exp(delta * A)
96
+ A_bar = torch.exp(delta * A.view(1, 1, -1))
97
+
98
+ # SSM scan
99
+ h = torch.zeros(B, self.d_inner, self.d_state, device=x.device, dtype=x.dtype)
100
+ outputs = []
101
+
102
+ for t in range(T):
103
+ x_t = x[:, t, :]
104
+ B_t_t = B_t[:, t, :]
105
+ C_t_t = C_t[:, t, :]
106
+ A_bar_t = A_bar[:, t, :]
107
+
108
+ # h = A_bar * h + B_t * x
109
+ h = h * A_bar_t.unsqueeze(1) + B_t_t.unsqueeze(1) * x_t.unsqueeze(-1)
110
+
111
+ # y = C_t * h + D * x
112
+ y_t = (C_t_t.unsqueeze(1) * h).sum(dim=-1) + self.D * x_t
113
+ outputs.append(y_t)
114
+
115
+ y = torch.stack(outputs, dim=1)
116
+
117
+ # Gating
118
+ y = y * F.silu(z)
119
+
120
+ # Output projection
121
+ y = self.out_proj(y)
122
+ y = self.dropout(y)
123
+
124
+ return y + residual
125
+
126
+
127
+ class TemporalSSMBlock(nn.Module):
128
+ """
129
+ Temporal SSM Block for video
130
+
131
+ Takes [B, T, C] sequence and applies SSM layers
132
+ """
133
+
134
+ def __init__(self, d_model: int, d_state: int = 16, n_layers: int = 2, dropout: float = 0.1):
135
+ super().__init__()
136
+
137
+ self.ssm_layers = nn.ModuleList([
138
+ SelectiveSSM(d_model, d_state=d_state, dropout=dropout)
139
+ for _ in range(n_layers)
140
+ ])
141
+
142
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
143
+ """
144
+ Args:
145
+ x: [B, T, D] sequence
146
+ Returns:
147
+ y: [B, D] final representation
148
+ """
149
+ for ssm in self.ssm_layers:
150
+ x = ssm(x)
151
+
152
+ return x.mean(dim=1)
153
+
154
+
155
+ class BaramNuri(nn.Module):
156
+ """
157
+ BaramNuri (바람누리) - Lightweight Driver Behavior Detection Model
158
+
159
+ Architecture:
160
+ 1. Video Swin-T Stages 1-3 (spatial features, 384 dim)
161
+ 2. Selective SSM Block (temporal modeling)
162
+ 3. Classification Head
163
+
164
+ Parameters: 14.20M (49% reduction from teacher's 27.86M)
165
+ Performance: 96.17% accuracy, 0.9504 Macro F1
166
+ """
167
+
168
+ CLASS_NAMES = ["정상", "졸음운전", "물건찾기", "휴대폰 사용", "운전자 폭행"]
169
+ CLASS_NAMES_EN = ["normal", "drowsy_driving", "searching_object", "phone_usage", "driver_assault"]
170
+
171
+ def __init__(
172
+ self,
173
+ num_classes: int = 5,
174
+ pretrained: bool = True,
175
+ d_state: int = 16,
176
+ ssm_layers: int = 2,
177
+ dropout: float = 0.2,
178
+ ):
179
+ super().__init__()
180
+
181
+ self.num_classes = num_classes
182
+
183
+ # Load Video Swin-T backbone (only Stage 1-3)
184
+ if pretrained:
185
+ print("Loading Swin backbone (Kinetics-400 pretrained)...")
186
+ full_swin = swin3d_t(weights=Swin3D_T_Weights.KINETICS400_V1)
187
+ else:
188
+ full_swin = swin3d_t(weights=None)
189
+
190
+ # Patch embedding
191
+ self.patch_embed = full_swin.patch_embed
192
+
193
+ # Use only Stage 1-3 (features[0:5]) for 384 dim output
194
+ self.features = nn.Sequential(*[full_swin.features[i] for i in range(5)])
195
+
196
+ # Stage 3 output: 384 dim
197
+ self.feature_dim = 384
198
+
199
+ # Global average pooling
200
+ self.avgpool = nn.AdaptiveAvgPool3d(output_size=1)
201
+
202
+ # SSM temporal modeling block
203
+ self.temporal_ssm = TemporalSSMBlock(
204
+ d_model=self.feature_dim,
205
+ d_state=d_state,
206
+ n_layers=ssm_layers,
207
+ dropout=dropout,
208
+ )
209
+
210
+ # Classification head
211
+ self.head = nn.Sequential(
212
+ nn.LayerNorm(self.feature_dim),
213
+ nn.Dropout(p=dropout),
214
+ nn.Linear(self.feature_dim, num_classes),
215
+ )
216
+
217
+ # Initialize head
218
+ self._init_head()
219
+
220
+ # Delete Stage 4 parameters (memory saving)
221
+ del full_swin
222
+
223
+ def _init_head(self):
224
+ """Initialize head weights"""
225
+ for m in self.head.modules():
226
+ if isinstance(m, nn.Linear):
227
+ nn.init.trunc_normal_(m.weight, std=0.02)
228
+ if m.bias is not None:
229
+ nn.init.zeros_(m.bias)
230
+
231
+ def extract_features(self, x: torch.Tensor) -> torch.Tensor:
232
+ """
233
+ Extract features (for knowledge distillation)
234
+
235
+ Args:
236
+ x: [B, C, T, H, W]
237
+ Returns:
238
+ features: [B, feature_dim]
239
+ """
240
+ # Patch embedding
241
+ x = self.patch_embed(x)
242
+
243
+ # Swin Stages
244
+ x = self.features(x)
245
+
246
+ B, T, H, W, C = x.shape
247
+
248
+ # Spatial average -> [B, T, C] sequence
249
+ x = x.mean(dim=[2, 3])
250
+
251
+ # SSM temporal modeling
252
+ x = self.temporal_ssm(x)
253
+
254
+ return x
255
+
256
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
257
+ """
258
+ Forward pass
259
+
260
+ Args:
261
+ x: [B, C, T, H, W] video tensor
262
+ Returns:
263
+ logits: [B, num_classes]
264
+ """
265
+ features = self.extract_features(x)
266
+ logits = self.head(features)
267
+ return logits
268
+
269
+ def forward_with_features(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
270
+ """
271
+ Return both features and logits (for knowledge distillation)
272
+ """
273
+ features = self.extract_features(x)
274
+ logits = self.head(features)
275
+ return logits, features
276
+
277
+ def predict(self, x: torch.Tensor, return_english: bool = False) -> Dict:
278
+ """
279
+ Inference prediction
280
+
281
+ Args:
282
+ x: [1, C, T, H, W] single video
283
+ return_english: Return English class names
284
+ Returns:
285
+ dict with class, confidence, class_name
286
+ """
287
+ self.eval()
288
+ with torch.no_grad():
289
+ logits = self.forward(x)
290
+ probs = F.softmax(logits, dim=-1)[0]
291
+ class_idx = probs.argmax().item()
292
+
293
+ class_names = self.CLASS_NAMES_EN if return_english else self.CLASS_NAMES
294
+
295
+ return {
296
+ "class": class_idx,
297
+ "confidence": probs[class_idx].item(),
298
+ "class_name": class_names[class_idx],
299
+ "all_probs": {
300
+ name: probs[i].item()
301
+ for i, name in enumerate(class_names)
302
+ }
303
+ }
304
+
305
+ @classmethod
306
+ def from_pretrained(cls, checkpoint_path: str, device: str = 'cpu'):
307
+ """
308
+ Load pretrained model from checkpoint
309
+
310
+ Args:
311
+ checkpoint_path: Path to .pth file
312
+ device: 'cpu' or 'cuda'
313
+ Returns:
314
+ Loaded model in eval mode
315
+ """
316
+ model = cls(num_classes=5, pretrained=True)
317
+ checkpoint = torch.load(checkpoint_path, map_location=device)
318
+
319
+ if 'model_state_dict' in checkpoint:
320
+ model.load_state_dict(checkpoint['model_state_dict'])
321
+ else:
322
+ model.load_state_dict(checkpoint)
323
+
324
+ model = model.to(device)
325
+ model.eval()
326
+
327
+ return model
328
+
329
+
330
+ def count_parameters(model: nn.Module) -> int:
331
+ """Count total model parameters"""
332
+ return sum(p.numel() for p in model.parameters())
333
+
334
+
335
+ if __name__ == "__main__":
336
+ print("=" * 60)
337
+ print("BaramNuri Model Test")
338
+ print("=" * 60)
339
+
340
+ # Create model
341
+ model = BaramNuri(num_classes=5, pretrained=True)
342
+
343
+ # Parameter count
344
+ total_params = count_parameters(model)
345
+ print(f"\nTotal parameters: {total_params:,} ({total_params/1e6:.2f}M)")
346
+
347
+ # Test with dummy input
348
+ dummy_input = torch.randn(2, 3, 30, 224, 224)
349
+ print(f"\nInput shape: {dummy_input.shape}")
350
+
351
+ # Forward pass
352
+ model.eval()
353
+ with torch.no_grad():
354
+ output = model(dummy_input)
355
+ print(f"Output shape: {output.shape}")
356
+
357
+ # Single sample prediction test
358
+ single_input = torch.randn(1, 3, 30, 224, 224)
359
+ prediction = model.predict(single_input)
360
+ print(f"\nPrediction (Korean): {prediction['class_name']} ({prediction['confidence']:.2%})")
361
+
362
+ prediction_en = model.predict(single_input, return_english=True)
363
+ print(f"Prediction (English): {prediction_en['class_name']} ({prediction_en['confidence']:.2%})")
364
+
365
+ print("\nModel test passed!")
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ torch>=2.0.0
2
+ torchvision>=0.15.0
3
+ numpy>=1.21.0