| | --- |
| | language: uk |
| | tags: |
| | - text-classification |
| | - roberta |
| | - ukraine |
| | - binary-classification |
| | - question-detection |
| | datasets: |
| | - squad_v2 |
| | - SDSJ-uk |
| | widget: |
| | - text: "Це питання чи ні?" |
| | --- |
| | |
| | # 🇺🇦 Ukrainian Question vs Statement Classifier |
| |
|
| | Це модель на базі `xlm-roberta-base`, натренована для класифікації українських фраз на: |
| | - **Питання** (`1`) |
| | - **Твердження** (`0`) |
| |
|
| | Модель підходить для використання в чат-ботах, LLM-фільтрах, обробці коментарів, автоматичних відповідях тощо. |
| |
|
| | ## demo |
| |
|
| | 🧪 [click here for testing](https://huggingface.co/spaces/Serhii228/ukr-question-classifier-ui) |
| |
|
| |
|
| | ## 📊 Архітектура |
| | - `TFAutoModelForSequenceClassification` |
| | - 1 вихідний нейрон із `sigmoid` |
| |
|
| | ## 📦 Використання |
| |
|
| | ```python |
| | from transformers import TFAutoModelForSequenceClassification, AutoTokenizer |
| | |
| | model = TFAutoModelForSequenceClassification.from_pretrained("Serhii228/ukr_quest-statement-classifier") |
| | tokenizer = AutoTokenizer.from_pretrained("Serhii228/ukr_quest-statement-classifier") |
| | |
| | text = "Чи буде доступно завтра?" |
| | inputs = tokenizer(text, return_tensors="tf", truncation=True, padding=True) |
| | outputs = model(**inputs) |
| | prob = tf.nn.sigmoid(outputs.logits) |
| | label = int(prob > 0.5) |
| | ``` |
| |
|
| | # 🇬🇧 English |
| |
|
| | This model is based on xlm-roberta-base and is fine-tuned to classify Ukrainian sentences into: |
| |
|
| | Questions (1) |
| | |
| | Statements (0) |
| | |
| | It is suitable for use in chatbots, LLM pre-filtering, comment analysis, and automatic response systems. |
| | ## 📊 Architecture |
| |
|
| | TFAutoModelForSequenceClassification |
| | |
| | 1 output neuron with sigmoid activation |
| | |
| | ## 📦 Usage |
| | ```python |
| | from transformers import TFAutoModelForSequenceClassification, AutoTokenizer |
| | |
| | model = TFAutoModelForSequenceClassification.from_pretrained("Serhii228/ukr_quest-statement-classifier") |
| | tokenizer = AutoTokenizer.from_pretrained("Serhii228/ukr_quest-statement-classifier") |
| | |
| | text = "Чи буде доступно завтра?" |
| | inputs = tokenizer(text, return_tensors="tf", truncation=True, padding=True) |
| | outputs = model(**inputs) |
| | prob = tf.nn.sigmoid(outputs.logits) |
| | label = int(prob > 0.5) |
| | ``` |
| |
|
| | ## 🧠 Training |
| |
|
| | The model was trained on a combination of SQuAD v2, SDSJ-uk, and additional manually annotated Ukrainian examples. |
| |
|
| |
|
| | ## 🔒 License |
| |
|
| | MIT |