---
language:
- en
license: mit
library_name: transformers
tags:
- text-classification
- question-answering
- deberta
- deberta-v3
- natural-questions
- pytorch
- transformers
- kaggle
- tensorflow2-qa
- nq
datasets:
- google/natural_questions
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
base_model: microsoft/deberta-v3-small
model-index:
- name: deberta-v3-nq-classification
  results:
  - task:
      type: text-classification
      name: Question Answering Classification
    dataset:
      name: Natural Questions (Simplified)
      type: natural_questions
      config: simplified
      split: validation
    metrics:
    - type: accuracy
      value: 85.42
      name: Accuracy
    - type: f1
      value: 82.34
      name: Macro F1
    - type: precision
      value: 84.21
      name: Macro Precision
    - type: recall
      value: 83.67
      name: Macro Recall
widget:
- text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023."
  example_title: "Factual Question"
- text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France."
  example_title: "Yes/No Question"
- text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life."
  example_title: "No Answer"
---

# DeBERTa-v3-Small for Natural Questions Classification

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: **No Answer**, **Has Answer**, or **Yes/No**, achieving 85.42% accuracy and 82.34% macro F1 score.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether:
- 🔴 **No Answer** (Label 0): The context doesn't contain an answer
- 🟢 **Has Answer** (Label 1): The context contains a specific answer
- 🔵 **Yes/No** (Label 2): The question requires a YES/NO response

The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition.

- **Developed by:** [Your Name]
- **Funded by [optional]:** Self-funded / Academic Project
- **Shared by [optional]:** [Your Organization/University]
- **Model type:** Transformer-based Sequence Classification (DeBERTa-v3)
- **Language(s) (NLP):** English (en)
- **License:** MIT
- **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [GitHub](https://github.com/yourusername/deberta-nq-classification)
- **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
- **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

The model can be used directly for:
- **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive processing
- **Search Result Classification**: Determine if search results contain relevant answers
- **Customer Support Routing**: Route questions based on answer availability
- **Educational Assessment**: Evaluate if reading passages can answer questions
- **Information Retrieval**: Assess document relevance for QA tasks

### Downstream Use

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

The model serves as a foundation for:
- **Multi-stage QA Pipelines**: First stage before extractive/generative QA models
- **Hybrid QA Systems**: Combine with span extraction for end-to-end QA
- **Dialog Systems**: Determine if chatbot has sufficient context
- **Domain Adaptation**: Fine-tune on domain-specific datasets

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

❌ **Not suitable for:**
- Extractive answer span prediction (only classifies, doesn't extract)
- Generative question answering
- Non-English languages
- Very long documents (>256 tokens without truncation)
- Medical/legal decision-making
- Fact verification

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

**Limitations:**
- Context limited to 256 tokens
- Wikipedia-biased training data
- Trained on 10,000 examples (subset of full dataset)
- May struggle with complex reasoning questions

**Biases:**
- Better on factual "what/when/where" questions
- Inherits biases from Wikipedia and base model
- Performance varies across domains

**Risks:**
- May be overconfident on ambiguous inputs
- False negatives on complex phrasings
- Vulnerable to adversarial examples

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users should:
- ✅ Implement human review for critical applications
- ✅ Monitor performance across different domains
- ✅ Calibrate confidence thresholds for use case
- ✅ Test on representative samples
- ✅ Use as one component in multi-model systems

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch

# Load model
model_name = "mohamedsa1/deberta-v3-nq-classification"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)

# Prepare input
question = "What is the capital of France?"
context = "Paris is the capital and most populous city of France."
text = f"Question: {question} Context: {context}"

# Inference
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
    prediction = torch.argmax(probs).item()

# Results
labels = ["No Answer", "Has Answer", "Yes/No"]
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probs[prediction]:.2%}")