metadata
language:
- en
license: mit
library_name: transformers
tags:
- text-classification
- question-answering
- deberta
- deberta-v3
- natural-questions
- pytorch
- transformers
- kaggle
- tensorflow2-qa
- nq
datasets:
- google/natural_questions
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
base_model: microsoft/deberta-v3-small
model-index:
- name: deberta-v3-nq-classification
results:
- task:
type: text-classification
name: Question Answering Classification
dataset:
name: Natural Questions (Simplified)
type: natural_questions
config: simplified
split: validation
metrics:
- type: accuracy
value: 85.42
name: Accuracy
- type: f1
value: 82.34
name: Macro F1
- type: precision
value: 84.21
name: Macro Precision
- type: recall
value: 83.67
name: Macro Recall
widget:
- text: >-
Question: What is the capital of France? Context: Paris is the capital and
most populous city of France, with an estimated population of 2,102,650
residents as of 1 January 2023.
example_title: Factual Question
- text: >-
Question: Is Paris the capital of France? Context: Paris is the capital
and most populous city of France.
example_title: Yes/No Question
- text: >-
Question: What is the population of Mars? Context: Earth is the third
planet from the Sun and the only astronomical object known to harbor life.
example_title: No Answer
DeBERTa-v3-Small for Natural Questions Classification
This model is a fine-tuned version of microsoft/deberta-v3-small on the Natural Questions dataset. It classifies question-context pairs into three categories: No Answer, Has Answer, or Yes/No, achieving 85.42% accuracy and 82.34% macro F1 score.
Model Details
Model Description
This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether:
- π΄ No Answer (Label 0): The context doesn't contain an answer
- π’ Has Answer (Label 1): The context contains a specific answer
- π΅ Yes/No (Label 2): The question requires a YES/NO response
The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition.
- Developed by: [Your Name]
- Funded by [optional]: Self-funded / Academic Project
- Shared by [optional]: [Your Organization/University]
- Model type: Transformer-based Sequence Classification (DeBERTa-v3)
- Language(s) (NLP): English (en)
- License: MIT
- Finetuned from model: microsoft/deberta-v3-small
Model Sources
- Repository: GitHub
- Paper: DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training
- Demo: Gradio Space
Uses
Direct Use
The model can be used directly for:
- Question Answering System Pre-filtering: Filter out unanswerable questions before expensive processing
- Search Result Classification: Determine if search results contain relevant answers
- Customer Support Routing: Route questions based on answer availability
- Educational Assessment: Evaluate if reading passages can answer questions
- Information Retrieval: Assess document relevance for QA tasks
Downstream Use
The model serves as a foundation for:
- Multi-stage QA Pipelines: First stage before extractive/generative QA models
- Hybrid QA Systems: Combine with span extraction for end-to-end QA
- Dialog Systems: Determine if chatbot has sufficient context
- Domain Adaptation: Fine-tune on domain-specific datasets
Out-of-Scope Use
β Not suitable for:
- Extractive answer span prediction (only classifies, doesn't extract)
- Generative question answering
- Non-English languages
- Very long documents (>256 tokens without truncation)
- Medical/legal decision-making
- Fact verification
Bias, Risks, and Limitations
Limitations:
- Context limited to 256 tokens
- Wikipedia-biased training data
- Trained on 10,000 examples (subset of full dataset)
- May struggle with complex reasoning questions
Biases:
- Better on factual "what/when/where" questions
- Inherits biases from Wikipedia and base model
- Performance varies across domains
Risks:
- May be overconfident on ambiguous inputs
- False negatives on complex phrasings
- Vulnerable to adversarial examples
Recommendations
Users should:
- β Implement human review for critical applications
- β Monitor performance across different domains
- β Calibrate confidence thresholds for use case
- β Test on representative samples
- β Use as one component in multi-model systems
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch
# Load model
model_name = "mohamedsa1/deberta-v3-nq-classification"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
# Prepare input
question = "What is the capital of France?"
context = "Paris is the capital and most populous city of France."
text = f"Question: {question} Context: {context}"
# Inference
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
prediction = torch.argmax(probs).item()
# Results
labels = ["No Answer", "Has Answer", "Yes/No"]
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probs[prediction]:.2%}")