|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-classification |
|
|
- question-answering |
|
|
- deberta |
|
|
- deberta-v3 |
|
|
- natural-questions |
|
|
- pytorch |
|
|
- transformers |
|
|
- kaggle |
|
|
- tensorflow2-qa |
|
|
- nq |
|
|
datasets: |
|
|
- google/natural_questions |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: text-classification |
|
|
base_model: microsoft/deberta-v3-small |
|
|
model-index: |
|
|
- name: deberta-v3-nq-classification |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Question Answering Classification |
|
|
dataset: |
|
|
name: Natural Questions (Simplified) |
|
|
type: natural_questions |
|
|
config: simplified |
|
|
split: validation |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 85.42 |
|
|
name: Accuracy |
|
|
- type: f1 |
|
|
value: 82.34 |
|
|
name: Macro F1 |
|
|
- type: precision |
|
|
value: 84.21 |
|
|
name: Macro Precision |
|
|
- type: recall |
|
|
value: 83.67 |
|
|
name: Macro Recall |
|
|
widget: |
|
|
- text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023." |
|
|
example_title: "Factual Question" |
|
|
- text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France." |
|
|
example_title: "Yes/No Question" |
|
|
- text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life." |
|
|
example_title: "No Answer" |
|
|
--- |
|
|
|
|
|
# DeBERTa-v3-Small for Natural Questions Classification |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: **No Answer**, **Has Answer**, or **Yes/No**, achieving 85.42% accuracy and 82.34% macro F1 score. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether: |
|
|
- π΄ **No Answer** (Label 0): The context doesn't contain an answer |
|
|
- π’ **Has Answer** (Label 1): The context contains a specific answer |
|
|
- π΅ **Yes/No** (Label 2): The question requires a YES/NO response |
|
|
|
|
|
The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition. |
|
|
|
|
|
- **Developed by:** [Your Name] |
|
|
- **Funded by [optional]:** Self-funded / Academic Project |
|
|
- **Shared by [optional]:** [Your Organization/University] |
|
|
- **Model type:** Transformer-based Sequence Classification (DeBERTa-v3) |
|
|
- **Language(s) (NLP):** English (en) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [GitHub](https://github.com/yourusername/deberta-nq-classification) |
|
|
- **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543) |
|
|
- **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo) |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
|
|
The model can be used directly for: |
|
|
- **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive processing |
|
|
- **Search Result Classification**: Determine if search results contain relevant answers |
|
|
- **Customer Support Routing**: Route questions based on answer availability |
|
|
- **Educational Assessment**: Evaluate if reading passages can answer questions |
|
|
- **Information Retrieval**: Assess document relevance for QA tasks |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
|
|
The model serves as a foundation for: |
|
|
- **Multi-stage QA Pipelines**: First stage before extractive/generative QA models |
|
|
- **Hybrid QA Systems**: Combine with span extraction for end-to-end QA |
|
|
- **Dialog Systems**: Determine if chatbot has sufficient context |
|
|
- **Domain Adaptation**: Fine-tune on domain-specific datasets |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
|
|
β **Not suitable for:** |
|
|
- Extractive answer span prediction (only classifies, doesn't extract) |
|
|
- Generative question answering |
|
|
- Non-English languages |
|
|
- Very long documents (>256 tokens without truncation) |
|
|
- Medical/legal decision-making |
|
|
- Fact verification |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
|
|
**Limitations:** |
|
|
- Context limited to 256 tokens |
|
|
- Wikipedia-biased training data |
|
|
- Trained on 10,000 examples (subset of full dataset) |
|
|
- May struggle with complex reasoning questions |
|
|
|
|
|
**Biases:** |
|
|
- Better on factual "what/when/where" questions |
|
|
- Inherits biases from Wikipedia and base model |
|
|
- Performance varies across domains |
|
|
|
|
|
**Risks:** |
|
|
- May be overconfident on ambiguous inputs |
|
|
- False negatives on complex phrasings |
|
|
- Vulnerable to adversarial examples |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
|
|
Users should: |
|
|
- β
Implement human review for critical applications |
|
|
- β
Monitor performance across different domains |
|
|
- β
Calibrate confidence thresholds for use case |
|
|
- β
Test on representative samples |
|
|
- β
Use as one component in multi-model systems |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
|
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model |
|
|
model_name = "mohamedsa1/deberta-v3-nq-classification" |
|
|
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name) |
|
|
model = DebertaV2ForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Prepare input |
|
|
question = "What is the capital of France?" |
|
|
context = "Paris is the capital and most populous city of France." |
|
|
text = f"Question: {question} Context: {context}" |
|
|
|
|
|
# Inference |
|
|
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0] |
|
|
prediction = torch.argmax(probs).item() |
|
|
|
|
|
# Results |
|
|
labels = ["No Answer", "Has Answer", "Yes/No"] |
|
|
print(f"Prediction: {labels[prediction]}") |
|
|
print(f"Confidence: {probs[prediction]:.2%}") |
|
|
|