taniwasl
/

clickbait_es

Text Classification

text-embeddings-inference

Model card Files Files and versions

clickbait_es / README.md

jlmarina's picture

Update README.md

0554350 verified almost 2 years ago

|

history blame contribute delete

2.04 kB

	---
	library_name: transformers
	tags:
	- BERT
	- Transformers
	- BETO
	- Clickbait
	license: mit
	language:
	- es
	pipeline_tag: text-classification
	---

	# BETO Spanish Clickbaits Model

	This clickbait analysis model is based on the BETO, a Spanish variant of BERT.

	## Model Details

	BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique.

	[BETO huggingface](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased)

	Model fine-tuned with a news (around ~30k) of several Spanish Newspapers.

	## Training evaluate

	Using transformers

	```
	BATCH_SIZE = 100
	NUM_PROCS = 32
	LR = 0.00005
	EPOCHS = 5
	MAX_LENGTH = 25
	MODEL = 'dccuchile/bert-base-spanish-wwm-cased'

	{'eval_loss': 0.0386480949819088,
	'eval_accuracy': 0.9872786230980294,
	'eval_runtime': 10.0476,
	'eval_samples_per_second': 398.999,
	'eval_steps_per_second': 4.081,
	'epoch': 5.0}
	```

	## Uses

	This model is designed to classify newspaper news as clickbaits or not.

	You can see a use case in this url:
	[Spanish Newspapers](https://clickbait.taniwa.es/)

	### Direct Use

	```
	from transformers import (
	AutoTokenizer,
	AutoModelForSequenceClassification,
	TextClassificationPipeline,
	)

	tokenizer = AutoTokenizer.from_pretrained("taniwasl/clickbait_es")
	model = AutoModelForSequenceClassification.from_pretrained("taniwasl/clickbait_es")

	review_text = 'La explosión destruye parcialmente el edificio, Egipto'

	nlp = TextClassificationPipeline(task = "text-classification",
	model = model,
	tokenizer = tokenizer,
	max_length = 25,
	truncation=True,
	add_special_tokens=True
	)

	print(nlp(review_text))
	```

	## License Disclaimer

	The license MIT best describes our intentions for our work.
	However we are not sure that all the datasets used to train BETO have licenses compatible with MIT (specially for commercial use).
	Please use at your own discretion only for no commercial use.