| | --- |
| | library_name: transformers |
| | tags: |
| | - BERT |
| | - Transformers |
| | - BETO |
| | - Clickbait |
| | license: mit |
| | language: |
| | - es |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | # BETO Spanish Clickbaits Model |
| |
|
| | This clickbait analysis model is based on the BETO, a Spanish variant of BERT. |
| |
|
| | ## Model Details |
| |
|
| | BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. |
| |
|
| | [BETO huggingface](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) |
| |
|
| | Model fine-tuned with a news (around ~30k) of several Spanish Newspapers. |
| |
|
| | ## Training evaluate |
| |
|
| | Using transformers |
| |
|
| | ``` |
| | BATCH_SIZE = 100 |
| | NUM_PROCS = 32 |
| | LR = 0.00005 |
| | EPOCHS = 5 |
| | MAX_LENGTH = 25 |
| | MODEL = 'dccuchile/bert-base-spanish-wwm-cased' |
| | |
| | {'eval_loss': 0.0386480949819088, |
| | 'eval_accuracy': 0.9872786230980294, |
| | 'eval_runtime': 10.0476, |
| | 'eval_samples_per_second': 398.999, |
| | 'eval_steps_per_second': 4.081, |
| | 'epoch': 5.0} |
| | ``` |
| |
|
| | ## Uses |
| |
|
| | This model is designed to classify newspaper news as clickbaits or not. |
| |
|
| | You can see a use case in this url: |
| | [Spanish Newspapers](https://clickbait.taniwa.es/) |
| |
|
| | ### Direct Use |
| |
|
| | ``` |
| | from transformers import ( |
| | AutoTokenizer, |
| | AutoModelForSequenceClassification, |
| | TextClassificationPipeline, |
| | ) |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("taniwasl/clickbait_es") |
| | model = AutoModelForSequenceClassification.from_pretrained("taniwasl/clickbait_es") |
| | |
| | review_text = 'La explosión destruye parcialmente el edificio, Egipto' |
| | |
| | nlp = TextClassificationPipeline(task = "text-classification", |
| | model = model, |
| | tokenizer = tokenizer, |
| | max_length = 25, |
| | truncation=True, |
| | add_special_tokens=True |
| | ) |
| | |
| | print(nlp(review_text)) |
| | ``` |
| |
|
| | ## License Disclaimer |
| |
|
| | The license MIT best describes our intentions for our work. |
| | However we are not sure that all the datasets used to train BETO have licenses compatible with MIT (specially for commercial use). |
| | Please use at your own discretion only for no commercial use. |