Russian Telegram news detection

About

Model based on cointegrated/rubert-tiny2
The model allows you to classify russian texts into two classes 'news' and 'not_news'
Estimates of the accuracy of the model in the validation sample:

Accuracy	Precision	Recall	F1-score
0.965223	0.965222	0.986871	0.971459

Getting started

from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import pickle

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_path = 'desllre/ru_telegram_news_detection'

encoder_path = hf_hub_download(repo_id=model_path, filename="encoder.pkl")
with open(encoder_path, "rb") as f:
    encoder = pickle.load(f)

tokenizer = AutoTokenizer.from_pretrained(model_path)
classifier = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)

text = 'Tesla дала добро на взлом ПО своих автомобилей\n\nКомпания  изменила условия программы Bug Bounty, предусматривающей выплату вознаграждений за поиск уязвимостей. Теперь энтузиасты могут взламывать электрокары Tesla, не боясь отзыва гарантии. Более того, в соответствии с новой политикой компании, автопроизводитель будет перепрошивать автомобили, ПО которых вышло из строя в процессе экспериментов специалистов кибербезопасности.\n\nИзменения в политике компании Telsa очень тепло встретили представители индустрии.'

tokenized = tokenizer(
    text,
    padding="max_length",
    truncation=True,
    max_length=512,
    return_tensors="pt"
)
tokenized = {key: value.to(device) for key, value in tokenized.items()}
with torch.no_grad():
    output = classifier(**tokenized)

predicted_class_id = torch.argmax(output.logits, dim=1).item()
label = encoder.inverse_transform([predicted_class_id])[0]

print(label)

Downloads last month: 6

Safetensors

Model size

29.2M params

Tensor type

F32

Model tree for desllre/ru_telegram_news_detection

Base model

cointegrated/rubert-tiny2

Finetuned

(75)

this model

desllre
/

ru_telegram_news_detection

Russian Telegram news detection

About

Getting started

Model tree for desllre/ru_telegram_news_detection

Dataset used to train desllre/ru_telegram_news_detection