File size: 1,269 Bytes

b67399c
b268f93
 
9b46957
 
b268f93
 
 
b67399c
 
fd7bdba
b67399c
 
 
 
fd7bdba

---
language:
- ko
library_name: transformers
license: apache-2.0
metrics:
- f1
pipeline_tag: text-classification
---

# roberta-base-infringement-detect

## Model Details

### Model Description
[klue/roberta-base](https://huggingface.co/klue/roberta-base) 모델을 이용하여, 두 컨텐츠간의 유사여부를 확인하는 모델입니다. 

## Train
자체구축된 1,310개의 참인 유사 컨텐츠 쌍을 이용하여, 셔플 후 참/거짓 비율 1:2인 데이터셋을 생성하여 학습시켰습니다. 

이외의 학습시 파라미터는 다음과 같습니다. 

| Parameter          | Value |
| ------------------ | ----- |
| `train_batch_size` | 16    |
| `num_train_epochs` | 5     |
| `weight_decay`     | 0.01  |
| `learning_rate`    | 2e-5  |

## How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "kms7530/roberta-base-infringement-detect"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```

모델에 추론 시 다음과 같이 입력해야 합니다. 

```plain
[CLS]\
[unused0]<ORIGINAL_CONTENT_TITLE>\
[unused1]<ORIGINAL_CONTENT>[SEP] \
[unused0]<TEST_CONTENT_TITLE>\
[unused1]<TEST_CONTENT>[SEP]
```