| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # ์ค๋์ค ๋ถ๋ฅ[[audio_classification]] | |
| [[open-in-colab]] | |
| <Youtube id="KWwzcmG98Ds"/> | |
| ์ค๋์ค ๋ถ๋ฅ๋ ํ ์คํธ์ ๋ง์ฐฌ๊ฐ์ง๋ก ์ ๋ ฅ ๋ฐ์ดํฐ์ ํด๋์ค ๋ ์ด๋ธ ์ถ๋ ฅ์ ํ ๋นํฉ๋๋ค. ์ ์ผํ ์ฐจ์ด์ ์ ํ ์คํธ ์ ๋ ฅ ๋์ ์์ ์ค๋์ค ํํ์ด ์๋ค๋ ๊ฒ์ ๋๋ค. ์ค๋์ค ๋ถ๋ฅ์ ์ค์ ์ ์ฉ ๋ถ์ผ์๋ ํ์์ ์๋ ํ์ , ์ธ์ด ๋ถ๋ฅ, ์๋ฆฌ๋ก ๋๋ฌผ ์ข ์ ์๋ณํ๋ ๊ฒ ๋ฑ์ด ์์ต๋๋ค. | |
| ์ด ๋ฌธ์์์ ๋ฐฉ๋ฒ์ ์์๋ณด๊ฒ ์ต๋๋ค: | |
| 1. [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) ๋ฐ์ดํฐ ์ธํธ๋ฅผ [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base)๋ก ๋ฏธ์ธ ์กฐ์ ํ์ฌ ํ์์ ์๋๋ฅผ ๋ถ๋ฅํฉ๋๋ค. | |
| 2. ์ถ๋ก ์ ๋ฏธ์ธ ์กฐ์ ๋ ๋ชจ๋ธ์ ์ฌ์ฉํ์ธ์. | |
| <Tip> | |
| ์ด ํํ ๋ฆฌ์ผ์์ ์ค๋ช ํ๋ ์์ ์ ์๋์ ๋ชจ๋ธ ์ํคํ ์ฒ์์ ์ง์๋ฉ๋๋ค: | |
| <!--This tip is automatically generated by `make fix-copies`, do not fill manually!--> | |
| [Audio Spectrogram Transformer](../model_doc/audio-spectrogram-transformer), [Data2VecAudio](../model_doc/data2vec-audio), [Hubert](../model_doc/hubert), [SEW](../model_doc/sew), [SEW-D](../model_doc/sew-d), [UniSpeech](../model_doc/unispeech), [UniSpeechSat](../model_doc/unispeech-sat), [Wav2Vec2](../model_doc/wav2vec2), [Wav2Vec2-Conformer](../model_doc/wav2vec2-conformer), [WavLM](../model_doc/wavlm), [Whisper](../model_doc/whisper) | |
| <!--End of the generated tip--> | |
| </Tip> | |
| ์์ํ๊ธฐ ์ ์ ํ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ๋ชจ๋ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์: | |
| ```bash | |
| pip install transformers datasets evaluate | |
| ``` | |
| ๋ชจ๋ธ์ ์ ๋ก๋ํ๊ณ ์ปค๋ฎค๋ํฐ์ ๊ณต์ ํ ์ ์๋๋ก ํ๊น ํ์ด์ค ๊ณ์ ์ ๋ก๊ทธ์ธํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ๋ฉ์์ง๊ฐ ํ์๋๋ฉด ํ ํฐ์ ์ ๋ ฅํ์ฌ ๋ก๊ทธ์ธํฉ๋๋ค: | |
| ```py | |
| >>> from huggingface_hub import notebook_login | |
| >>> notebook_login() | |
| ``` | |
| ## MInDS-14 ๋ฐ์ดํฐ์ ๋ถ๋ฌ์ค๊ธฐ[[load_minds_14_dataset]] | |
| ๋จผ์ ๐ค Datasets ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ MinDS-14 ๋ฐ์ดํฐ ์ธํธ๋ฅผ ๊ฐ์ ธ์ต๋๋ค: | |
| ```py | |
| >>> from datasets import load_dataset, Audio | |
| >>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train") | |
| ``` | |
| ๋ฐ์ดํฐ ์ธํธ์ `train` ๋ถํ ์ [`~datasets.Dataset.train_test_split`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ๋ ์์ ํ๋ จ ๋ฐ ํ ์คํธ ์งํฉ์ผ๋ก ๋ถํ ํฉ๋๋ค. ์ด๋ ๊ฒ ํ๋ฉด ์ ์ฒด ๋ฐ์ดํฐ ์ธํธ์ ๋ ๋ง์ ์๊ฐ์ ์๋นํ๊ธฐ ์ ์ ๋ชจ๋ ๊ฒ์ด ์๋ํ๋์ง ์คํํ๊ณ ํ์ธํ ์ ์์ต๋๋ค. | |
| ```py | |
| >>> minds = minds.train_test_split(test_size=0.2) | |
| ``` | |
| ์ด์ ๋ฐ์ดํฐ ์งํฉ์ ์ดํด๋ณผ๊ฒ์: | |
| ```py | |
| >>> minds | |
| DatasetDict({ | |
| train: Dataset({ | |
| features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'], | |
| num_rows: 450 | |
| }) | |
| test: Dataset({ | |
| features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'], | |
| num_rows: 113 | |
| }) | |
| }) | |
| ``` | |
| ๋ฐ์ดํฐ ์ธํธ์๋ `lang_id` ๋ฐ `english_transcription`๊ณผ ๊ฐ์ ์ ์ฉํ ์ ๋ณด๊ฐ ๋ง์ด ํฌํจ๋์ด ์์ง๋ง ์ด ๊ฐ์ด๋์์๋ `audio` ๋ฐ `intent_class`์ ์ค์ ์ ๋ ๊ฒ์ ๋๋ค. ๋ค๋ฅธ ์ด์ [`~datasets.Dataset.remove_columns`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ์ ๊ฑฐํฉ๋๋ค: | |
| ```py | |
| >>> minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"]) | |
| ``` | |
| ์์๋ฅผ ์ดํด๋ณด๊ฒ ์ต๋๋ค: | |
| ```py | |
| >>> minds["train"][0] | |
| {'audio': {'array': array([ 0. , 0. , 0. , ..., -0.00048828, | |
| -0.00024414, -0.00024414], dtype=float32), | |
| 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav', | |
| 'sampling_rate': 8000}, | |
| 'intent_class': 2} | |
| ``` | |
| ๋ ๊ฐ์ ํ๋๊ฐ ์์ต๋๋ค: | |
| - `audio`: ์ค๋์ค ํ์ผ์ ๊ฐ์ ธ์ค๊ณ ๋ฆฌ์ํ๋งํ๊ธฐ ์ํด ํธ์ถํด์ผ ํ๋ ์์ฑ ์ ํธ์ 1์ฐจ์ `๋ฐฐ์ด`์ ๋๋ค. | |
| - `intent_class`: ํ์์ ์๋์ ๋ํ ํด๋์ค ID๋ฅผ ๋ํ๋ ๋๋ค. | |
| ๋ชจ๋ธ์ด ๋ ์ด๋ธ ID์์ ๋ ์ด๋ธ ์ด๋ฆ์ ์ฝ๊ฒ ๊ฐ์ ธ์ฌ ์ ์๋๋ก ๋ ์ด๋ธ ์ด๋ฆ์ ์ ์๋ก ๋งคํํ๋ ์ฌ์ ์ ๋ง๋ค๊ฑฐ๋ ๊ทธ ๋ฐ๋๋ก ๋งคํํ๋ ์ฌ์ ์ ๋ง๋ญ๋๋ค: | |
| ```py | |
| >>> labels = minds["train"].features["intent_class"].names | |
| >>> label2id, id2label = dict(), dict() | |
| >>> for i, label in enumerate(labels): | |
| ... label2id[label] = str(i) | |
| ... id2label[str(i)] = label | |
| ``` | |
| ์ด์ ๋ ์ด๋ธ ID๋ฅผ ๋ ์ด๋ธ ์ด๋ฆ์ผ๋ก ๋ณํํ ์ ์์ต๋๋ค: | |
| ```py | |
| >>> id2label[str(2)] | |
| 'app_error' | |
| ``` | |
| ## ์ ์ฒ๋ฆฌ[[preprocess]] | |
| ๋ค์ ๋จ๊ณ๋ ์ค๋์ค ์ ํธ๋ฅผ ์ฒ๋ฆฌํ๊ธฐ ์ํด Wav2Vec2 ํน์ง ์ถ์ถ๊ธฐ๋ฅผ ๊ฐ์ ธ์ค๋ ๊ฒ์ ๋๋ค: | |
| ```py | |
| >>> from transformers import AutoFeatureExtractor | |
| >>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base") | |
| ``` | |
| MinDS-14 ๋ฐ์ดํฐ ์ธํธ์ ์ํ๋ง ์๋๋ 8000khz์ด๋ฏ๋ก(์ด ์ ๋ณด๋ [๋ฐ์ดํฐ์ธํธ ์นด๋](https://huggingface.co/datasets/PolyAI/minds14)์์ ํ์ธํ ์ ์์ต๋๋ค), ์ฌ์ ํ๋ จ๋ Wav2Vec2 ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ค๋ฉด ๋ฐ์ดํฐ ์ธํธ๋ฅผ 16000kHz๋ก ๋ฆฌ์ํ๋งํด์ผ ํฉ๋๋ค: | |
| ```py | |
| >>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) | |
| >>> minds["train"][0] | |
| {'audio': {'array': array([ 2.2098757e-05, 4.6582241e-05, -2.2803260e-05, ..., | |
| -2.8419291e-04, -2.3305941e-04, -1.1425107e-04], dtype=float32), | |
| 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav', | |
| 'sampling_rate': 16000}, | |
| 'intent_class': 2} | |
| ``` | |
| ์ด์ ์ ์ฒ๋ฆฌ ํจ์๋ฅผ ๋ง๋ญ๋๋ค: | |
| 1. ๊ฐ์ ธ์ฌ `์ค๋์ค` ์ด์ ํธ์ถํ๊ณ ํ์ํ ๊ฒฝ์ฐ ์ค๋์ค ํ์ผ์ ๋ฆฌ์ํ๋งํฉ๋๋ค. | |
| 2. ์ค๋์ค ํ์ผ์ ์ํ๋ง ์๋๊ฐ ๋ชจ๋ธ์ ์ฌ์ ํ๋ จ๋ ์ค๋์ค ๋ฐ์ดํฐ์ ์ํ๋ง ์๋์ ์ผ์นํ๋์ง ํ์ธํฉ๋๋ค. ์ด ์ ๋ณด๋ Wav2Vec2 [๋ชจ๋ธ ์นด๋](https://huggingface.co/facebook/wav2vec2-base)์์ ํ์ธํ ์ ์์ต๋๋ค. | |
| 3. ๊ธด ์ ๋ ฅ์ด ์๋ฆฌ์ง ์๊ณ ์ผ๊ด ์ฒ๋ฆฌ๋๋๋ก ์ต๋ ์ ๋ ฅ ๊ธธ์ด๋ฅผ ์ค์ ํฉ๋๋ค. | |
| ```py | |
| >>> def preprocess_function(examples): | |
| ... audio_arrays = [x["array"] for x in examples["audio"]] | |
| ... inputs = feature_extractor( | |
| ... audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16000, truncation=True | |
| ... ) | |
| ... return inputs | |
| ``` | |
| ์ ์ฒด ๋ฐ์ดํฐ ์ธํธ์ ์ ์ฒ๋ฆฌ ๊ธฐ๋ฅ์ ์ ์ฉํ๋ ค๋ฉด ๐ค Datasets [`~datasets.Dataset.map`] ํจ์๋ฅผ ์ฌ์ฉํฉ๋๋ค. `batched=True`๋ฅผ ์ค์ ํ์ฌ ๋ฐ์ดํฐ ์งํฉ์ ์ฌ๋ฌ ์์๋ฅผ ํ ๋ฒ์ ์ฒ๋ฆฌํ๋ฉด `map`์ ์๋๋ฅผ ๋์ผ ์ ์์ต๋๋ค. ํ์ํ์ง ์์ ์ด์ ์ ๊ฑฐํ๊ณ `intent_class`์ ์ด๋ฆ์ ๋ชจ๋ธ์ด ์์ํ๋ ์ด๋ฆ์ธ `label`๋ก ๋ณ๊ฒฝํฉ๋๋ค: | |
| ```py | |
| >>> encoded_minds = minds.map(preprocess_function, remove_columns="audio", batched=True) | |
| >>> encoded_minds = encoded_minds.rename_column("intent_class", "label") | |
| ``` | |
| ## ํ๊ฐํ๊ธฐ[[evaluate]] | |
| ํ๋ จ ์ค์ ๋ฉํธ๋ฆญ์ ํฌํจํ๋ฉด ๋ชจ๋ธ์ ์ฑ๋ฅ์ ํ๊ฐํ๋ ๋ฐ ๋์์ด ๋๋ ๊ฒฝ์ฐ๊ฐ ๋ง์ต๋๋ค. ๐ค [Evaluate](https://huggingface.co/docs/evaluate/index) ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ์ฌ ํ๊ฐ ๋ฐฉ๋ฒ์ ๋น ๋ฅด๊ฒ ๊ฐ์ ธ์ฌ ์ ์์ต๋๋ค. ์ด ์์ ์์๋ [accuracy(์ ํ๋)](https://huggingface.co/spaces/evaluate-metric/accuracy) ๋ฉํธ๋ฆญ์ ๊ฐ์ ธ์ต๋๋ค(๋ฉํธ๋ฆญ์ ๊ฐ์ ธ์ค๊ณ ๊ณ์ฐํ๋ ๋ฐฉ๋ฒ์ ๋ํ ์์ธํ ๋ด์ฉ์ ๐ค Evalutate [๋น ๋ฅธ ๋๋ฌ๋ณด๊ธฐ](https://huggingface.co/docs/evaluate/a_quick_tour) ์ฐธ์กฐํ์ธ์): | |
| ```py | |
| >>> import evaluate | |
| >>> accuracy = evaluate.load("accuracy") | |
| ``` | |
| ๊ทธ๋ฐ ๋ค์ ์์ธก๊ณผ ๋ ์ด๋ธ์ [`~evaluate.EvaluationModule.compute`]์ ์ ๋ฌํ์ฌ ์ ํ๋๋ฅผ ๊ณ์ฐํ๋ ํจ์๋ฅผ ๋ง๋ญ๋๋ค: | |
| ```py | |
| >>> import numpy as np | |
| >>> def compute_metrics(eval_pred): | |
| ... predictions = np.argmax(eval_pred.predictions, axis=1) | |
| ... return accuracy.compute(predictions=predictions, references=eval_pred.label_ids) | |
| ``` | |
| ์ด์ `compute_metrics` ํจ์๋ฅผ ์ฌ์ฉํ ์ค๋น๊ฐ ๋์์ผ๋ฉฐ, ํธ๋ ์ด๋์ ์ค์ ํ ๋ ์ด ํจ์๋ฅผ ์ฌ์ฉํฉ๋๋ค. | |
| ## ํ๋ จ[[train]] | |
| <frameworkcontent> | |
| <pt> | |
| <Tip> | |
| [`Trainer`]๋ก ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ๋ฐ ์ต์ํ์ง ์๋ค๋ฉด ๊ธฐ๋ณธ ํํ ๋ฆฌ์ผ [์ฌ๊ธฐ](../training#train-with-pytorch-trainer)์ ์ดํด๋ณด์ธ์! | |
| </Tip> | |
| ์ด์ ๋ชจ๋ธ ํ๋ จ์ ์์ํ ์ค๋น๊ฐ ๋์์ต๋๋ค! [`AutoModelForAudioClassification`]์ ์ด์ฉํด์ Wav2Vec2๋ฅผ ๋ถ๋ฌ์ต๋๋ค. ์์๋๋ ๋ ์ด๋ธ ์์ ๋ ์ด๋ธ ๋งคํ์ ์ง์ ํฉ๋๋ค: | |
| ```py | |
| >>> from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer | |
| >>> num_labels = len(id2label) | |
| >>> model = AutoModelForAudioClassification.from_pretrained( | |
| ... "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label | |
| ... ) | |
| ``` | |
| ์ด์ ์ธ ๋จ๊ณ๋ง ๋จ์์ต๋๋ค: | |
| 1. ํ๋ จ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ [`TrainingArguments`]์ ์ ์ํฉ๋๋ค. ์ ์ผํ ํ์ ๋งค๊ฐ๋ณ์๋ ๋ชจ๋ธ์ ์ ์ฅํ ์์น๋ฅผ ์ง์ ํ๋ `output_dir`์ ๋๋ค. `push_to_hub = True`๋ฅผ ์ค์ ํ์ฌ ์ด ๋ชจ๋ธ์ ํ๋ธ๋ก ํธ์ํฉ๋๋ค(๋ชจ๋ธ์ ์ ๋ก๋ํ๋ ค๋ฉด ํ๊น ํ์ด์ค์ ๋ก๊ทธ์ธํด์ผ ํฉ๋๋ค). ๊ฐ ์ํญ์ด ๋๋ ๋๋ง๋ค [`Trainer`]๊ฐ ์ ํ๋๋ฅผ ํ๊ฐํ๊ณ ํ๋ จ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํฉ๋๋ค. | |
| 2. ๋ชจ๋ธ, ๋ฐ์ดํฐ ์ธํธ, ํ ํฌ๋์ด์ , ๋ฐ์ดํฐ ์ฝ๋ ์ดํฐ, `compute_metrics` ํจ์์ ํจ๊ป ํ๋ จ ์ธ์๋ฅผ [`Trainer`]์ ์ ๋ฌํฉ๋๋ค. | |
| 3. [`~Trainer.train`]์ ํธ์ถํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํฉ๋๋ค. | |
| ```py | |
| >>> training_args = TrainingArguments( | |
| ... output_dir="my_awesome_mind_model", | |
| ... evaluation_strategy="epoch", | |
| ... save_strategy="epoch", | |
| ... learning_rate=3e-5, | |
| ... per_device_train_batch_size=32, | |
| ... gradient_accumulation_steps=4, | |
| ... per_device_eval_batch_size=32, | |
| ... num_train_epochs=10, | |
| ... warmup_ratio=0.1, | |
| ... logging_steps=10, | |
| ... load_best_model_at_end=True, | |
| ... metric_for_best_model="accuracy", | |
| ... push_to_hub=True, | |
| ... ) | |
| >>> trainer = Trainer( | |
| ... model=model, | |
| ... args=training_args, | |
| ... train_dataset=encoded_minds["train"], | |
| ... eval_dataset=encoded_minds["test"], | |
| ... tokenizer=feature_extractor, | |
| ... compute_metrics=compute_metrics, | |
| ... ) | |
| >>> trainer.train() | |
| ``` | |
| ํ๋ จ์ด ์๋ฃ๋๋ฉด ๋ชจ๋ ์ฌ๋์ด ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์๋๋ก [`~transformers.Trainer.push_to_hub`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ํ๋ธ์ ๊ณต์ ํ์ธ์: | |
| ```py | |
| >>> trainer.push_to_hub() | |
| ``` | |
| </pt> | |
| </frameworkcontent> | |
| <Tip> | |
| For a more in-depth example of how to finetune a model for audio classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/audio_classification.ipynb). | |
| </Tip> | |
| ## ์ถ๋ก [[inference]] | |
| ์ด์ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ์ผ๋ ์ถ๋ก ์ ์ฌ์ฉํ ์ ์์ต๋๋ค! | |
| ์ถ๋ก ์ ์คํํ ์ค๋์ค ํ์ผ์ ๊ฐ์ ธ์ต๋๋ค. ํ์ํ ๊ฒฝ์ฐ ์ค๋์ค ํ์ผ์ ์ํ๋ง ์๋๋ฅผ ๋ชจ๋ธ์ ์ํ๋ง ์๋์ ์ผ์นํ๋๋ก ๋ฆฌ์ํ๋งํ๋ ๊ฒ์ ์์ง ๋ง์ธ์! | |
| ```py | |
| >>> from datasets import load_dataset, Audio | |
| >>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") | |
| >>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000)) | |
| >>> sampling_rate = dataset.features["audio"].sampling_rate | |
| >>> audio_file = dataset[0]["audio"]["path"] | |
| ``` | |
| ์ถ๋ก ์ ์ํด ๋ฏธ์ธ ์กฐ์ ํ ๋ชจ๋ธ์ ์ํํด ๋ณด๋ ๊ฐ์ฅ ๊ฐ๋จํ ๋ฐฉ๋ฒ์ [`pipeline`]์์ ์ฌ์ฉํ๋ ๊ฒ์ ๋๋ค. ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ์ค๋์ค ๋ถ๋ฅ๋ฅผ ์ํ `pipeline`์ ์ธ์คํด์คํํ๊ณ ์ค๋์ค ํ์ผ์ ์ ๋ฌํฉ๋๋ค: | |
| ```py | |
| >>> from transformers import pipeline | |
| >>> classifier = pipeline("audio-classification", model="stevhliu/my_awesome_minds_model") | |
| >>> classifier(audio_file) | |
| [ | |
| {'score': 0.09766869246959686, 'label': 'cash_deposit'}, | |
| {'score': 0.07998877018690109, 'label': 'app_error'}, | |
| {'score': 0.0781070664525032, 'label': 'joint_account'}, | |
| {'score': 0.07667109370231628, 'label': 'pay_bill'}, | |
| {'score': 0.0755252093076706, 'label': 'balance'} | |
| ] | |
| ``` | |
| ์ํ๋ ๊ฒฝ์ฐ `pipeline`์ ๊ฒฐ๊ณผ๋ฅผ ์๋์ผ๋ก ๋ณต์ ํ ์๋ ์์ต๋๋ค: | |
| <frameworkcontent> | |
| <pt> | |
| ํน์ง ์ถ์ถ๊ธฐ๋ฅผ ๊ฐ์ ธ์์ ์ค๋์ค ํ์ผ์ ์ ์ฒ๋ฆฌํ๊ณ `์ ๋ ฅ`์ PyTorch ํ ์๋ก ๋ฐํํฉ๋๋ค: | |
| ```py | |
| >>> from transformers import AutoFeatureExtractor | |
| >>> feature_extractor = AutoFeatureExtractor.from_pretrained("stevhliu/my_awesome_minds_model") | |
| >>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") | |
| ``` | |
| ๋ชจ๋ธ์ ์ ๋ ฅ์ ์ ๋ฌํ๊ณ ๋ก์ง์ ๋ฐํํฉ๋๋ค: | |
| ```py | |
| >>> from transformers import AutoModelForAudioClassification | |
| >>> model = AutoModelForAudioClassification.from_pretrained("stevhliu/my_awesome_minds_model") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| ``` | |
| ํ๋ฅ ์ด ๊ฐ์ฅ ๋์ ํด๋์ค๋ฅผ ๊ฐ์ ธ์จ ๋ค์ ๋ชจ๋ธ์ `id2label` ๋งคํ์ ์ฌ์ฉํ์ฌ ์ด๋ฅผ ๋ ์ด๋ธ๋ก ๋ณํํฉ๋๋ค: | |
| ```py | |
| >>> import torch | |
| >>> predicted_class_ids = torch.argmax(logits).item() | |
| >>> predicted_label = model.config.id2label[predicted_class_ids] | |
| >>> predicted_label | |
| 'cash_deposit' | |
| ``` | |
| </pt> | |
| </frameworkcontent> |