File size: 3,612 Bytes
17c6d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# TimeSformer [[timesformer]]

## κ°œμš” [[overview]]

TimeSformer λͺ¨λΈμ€ Facebook Researchμ—μ„œ μ œμ•ˆν•œ [TimeSformer: Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095)μ—μ„œ μ†Œκ°œλ˜μ—ˆμŠ΅λ‹ˆλ‹€. 이 μ—°κ΅¬λŠ” 첫 번째 λΉ„λ””μ˜€ Transformerλ‘œμ„œ, 행동 인식 λΆ„μ•Όμ—μ„œ μ€‘μš”ν•œ μ΄μ •ν‘œκ°€ λ˜μ—ˆμŠ΅λ‹ˆλ‹€. λ˜ν•œ Transformer 기반의 λΉ„λ””μ˜€ 이해 및 λΆ„λ₯˜ 논문에 λ§Žμ€ μ˜κ°μ„ μ£Όμ—ˆμŠ΅λ‹ˆλ‹€.

λ…Όλ¬Έμ˜ μ΄ˆλ‘μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€. 

*μš°λ¦¬λŠ” 곡간과 μ‹œκ°„μ— 걸쳐 μ…€ν”„ μ–΄ν…μ…˜λ§Œμ„ μ‚¬μš©ν•˜λŠ” 합성곱이 μ—†λŠ”(convolution-free) λΉ„λ””μ˜€ λΆ„λ₯˜ 방법을 μ œμ•ˆν•©λ‹ˆλ‹€. 이 방법은 β€œTimeSformer”라고 뢈리며, ν‘œμ€€ Transformer μ•„ν‚€ν…μ²˜λ₯Ό λΉ„λ””μ˜€μ— μ μš©ν•˜μ—¬ ν”„λ ˆμž„ μˆ˜μ€€ 패치 μ‹œν€€μŠ€λ‘œλΆ€ν„° 직접 μ‹œκ³΅κ°„μ  νŠΉμ§•μ„ ν•™μŠ΅ν•  수 있게 ν•©λ‹ˆλ‹€. 우리의 μ‹€ν—˜μ  μ—°κ΅¬λŠ” λ‹€μ–‘ν•œ μ…€ν”„ μ–΄ν…μ…˜ 방식을 λΉ„κ΅ν•˜λ©°, μ‹œκ°„μ  μ–΄ν…μ…˜κ³Ό 곡간적 μ–΄ν…μ…˜μ„ 각각의 블둝 λ‚΄μ—μ„œ λ³„λ„λ‘œ μ μš©ν•˜λŠ” β€œλΆ„ν•  μ–΄ν…μ…˜β€ 방식이 고렀된 섀계 선택 쀑 κ°€μž₯ μš°μˆ˜ν•œ λΉ„λ””μ˜€ λΆ„λ₯˜ 정확도λ₯Ό μ œκ³΅ν•œλ‹€λŠ” 것을 μ‹œμ‚¬ν•©λ‹ˆλ‹€. 이 ν˜μ‹ μ μΈ 섀계에도 λΆˆκ΅¬ν•˜κ³ , TimeSformerλŠ” Kinetics-400 및 Kinetics-600을 ν¬ν•¨ν•œ μ—¬λŸ¬ 행동 인식 λ²€μΉ˜λ§ˆν¬μ—μ„œ μ΅œμ²¨λ‹¨ κ²°κ³Όλ₯Ό λ‹¬μ„±ν–ˆμœΌλ©°, ν˜„μž¬κΉŒμ§€ 보고된 κ°€μž₯ 높은 정확도λ₯Ό κΈ°λ‘ν–ˆμŠ΅λ‹ˆλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, 3D ν•©μ„±κ³± λ„€νŠΈμ›Œν¬μ™€ λΉ„κ΅ν–ˆμ„ λ•Œ, TimeSformerλŠ” 더 λΉ λ₯΄κ²Œ ν•™μŠ΅ν•  수 있으며, μ•½κ°„μ˜ 정확도 μ €ν•˜λ₯Ό κ°μˆ˜ν•˜λ©΄ ν…ŒμŠ€νŠΈ νš¨μœ¨μ„±μ΄ 크게 ν–₯μƒλ˜κ³ , 1λΆ„ μ΄μƒμ˜ κΈ΄ λΉ„λ””μ˜€ 클립에도 μ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ½”λ“œμ™€ λͺ¨λΈμ€ λ‹€μŒ λ§ν¬μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€: [https URL 링크](https://github.com/facebookresearch/TimeSformer).*

이 λͺ¨λΈμ€ [fcakyon](https://huggingface.co/fcakyon)이 κΈ°μ—¬ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
원본 μ½”λ“œλŠ” [μ—¬κΈ°](https://github.com/facebookresearch/TimeSformer)μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

## μ‚¬μš© 팁 [[usage-tips]]

λ‹€μ–‘ν•œ 사전 ν•™μŠ΅λœ λͺ¨λΈμ˜ λ³€ν˜•λ“€μ΄ μžˆμŠ΅λ‹ˆλ‹€. μ‚¬μš©ν•˜λ €λŠ” 데이터셋에 맞좰 사전 ν•™μŠ΅λœ λͺ¨λΈμ„ 선택해야 ν•©λ‹ˆλ‹€. λ˜ν•œ, λͺ¨λΈ 크기에 따라 클립당 μž…λ ₯ ν”„λ ˆμž„ μˆ˜κ°€ λ‹¬λΌμ§€λ―€λ‘œ, 사전 ν•™μŠ΅λœ λͺ¨λΈμ„ 선택할 λ•Œ 이 λ§€κ°œλ³€μˆ˜λ₯Ό κ³ λ €ν•΄μ•Ό ν•©λ‹ˆλ‹€.


## λ¦¬μ†ŒμŠ€ [[resources]]

- [Video classification task guide](../tasks/video_classification)

## TimesformerConfig [[transformers.TimesformerConfig]]

[[autodoc]] TimesformerConfig

## TimesformerModel [[transformers.TimesformerModel]]

[[autodoc]] TimesformerModel
    - forward

## TimesformerForVideoClassification [[transformers.TimesformerForVideoClassification]]

[[autodoc]] TimesformerForVideoClassification
    - forward