--- language: - en license: mit library_name: transformers tags: - text-classification - question-answering - deberta - deberta-v3 - natural-questions - pytorch - transformers - kaggle - tensorflow2-qa - nq datasets: - google/natural_questions metrics: - accuracy - f1 - precision - recall pipeline_tag: text-classification base_model: microsoft/deberta-v3-small model-index: - name: deberta-v3-nq-classification results: - task: type: text-classification name: Question Answering Classification dataset: name: Natural Questions (Simplified) type: natural_questions config: simplified split: validation metrics: - type: accuracy value: 85.42 name: Accuracy - type: f1 value: 82.34 name: Macro F1 - type: precision value: 84.21 name: Macro Precision - type: recall value: 83.67 name: Macro Recall widget: - text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023." example_title: "Factual Question" - text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France." example_title: "Yes/No Question" - text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life." example_title: "No Answer" --- # DeBERTa-v3-Small for Natural Questions Classification This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: **No Answer**, **Has Answer**, or **Yes/No**, achieving 85.42% accuracy and 82.34% macro F1 score. ## Model Details ### Model Description This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether: - 🔴 **No Answer** (Label 0): The context doesn't contain an answer - 🟢 **Has Answer** (Label 1): The context contains a specific answer - 🔵 **Yes/No** (Label 2): The question requires a YES/NO response The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition. - **Developed by:** [Your Name] - **Funded by [optional]:** Self-funded / Academic Project - **Shared by [optional]:** [Your Organization/University] - **Model type:** Transformer-based Sequence Classification (DeBERTa-v3) - **Language(s) (NLP):** English (en) - **License:** MIT - **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) ### Model Sources - **Repository:** [GitHub](https://github.com/yourusername/deberta-nq-classification) - **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543) - **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo) ## Uses ### Direct Use The model can be used directly for: - **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive processing - **Search Result Classification**: Determine if search results contain relevant answers - **Customer Support Routing**: Route questions based on answer availability - **Educational Assessment**: Evaluate if reading passages can answer questions - **Information Retrieval**: Assess document relevance for QA tasks ### Downstream Use The model serves as a foundation for: - **Multi-stage QA Pipelines**: First stage before extractive/generative QA models - **Hybrid QA Systems**: Combine with span extraction for end-to-end QA - **Dialog Systems**: Determine if chatbot has sufficient context - **Domain Adaptation**: Fine-tune on domain-specific datasets ### Out-of-Scope Use ❌ **Not suitable for:** - Extractive answer span prediction (only classifies, doesn't extract) - Generative question answering - Non-English languages - Very long documents (>256 tokens without truncation) - Medical/legal decision-making - Fact verification ## Bias, Risks, and Limitations **Limitations:** - Context limited to 256 tokens - Wikipedia-biased training data - Trained on 10,000 examples (subset of full dataset) - May struggle with complex reasoning questions **Biases:** - Better on factual "what/when/where" questions - Inherits biases from Wikipedia and base model - Performance varies across domains **Risks:** - May be overconfident on ambiguous inputs - False negatives on complex phrasings - Vulnerable to adversarial examples ### Recommendations Users should: - ✅ Implement human review for critical applications - ✅ Monitor performance across different domains - ✅ Calibrate confidence thresholds for use case - ✅ Test on representative samples - ✅ Use as one component in multi-model systems ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification import torch # Load model model_name = "mohamedsa1/deberta-v3-nq-classification" tokenizer = DebertaV2Tokenizer.from_pretrained(model_name) model = DebertaV2ForSequenceClassification.from_pretrained(model_name) # Prepare input question = "What is the capital of France?" context = "Paris is the capital and most populous city of France." text = f"Question: {question} Context: {context}" # Inference inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True) with torch.no_grad(): outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0] prediction = torch.argmax(probs).item() # Results labels = ["No Answer", "Has Answer", "Yes/No"] print(f"Prediction: {labels[prediction]}") print(f"Confidence: {probs[prediction]:.2%}")