Update README.md

Browse files

Files changed (1) hide show

README.md +125 -95

README.md CHANGED Viewed

@@ -1,160 +1,190 @@
 ---
 # DeBERTa-v3-Small for Natural Questions Classification
-<div align="center">
-  <img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm.svg" alt="Hugging Face Model">
-  <img src="https://img.shields.io/badge/PyTorch-2.0+-red.svg" alt="PyTorch">
-  <img src="https://img.shields.io/badge/Transformers-4.30+-blue.svg" alt="Transformers">
-  <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License">
-</div>
-## Model Summary
-This model is a fine-tuned version of **microsoft/deberta-v3-small** specifically trained for question-answering classification on the **Natural Questions** dataset. It classifies question-context pairs into three distinct categories, helping determine whether a given context contains an answer to a question and what type of answer it is.
-The model achieves **85.42% accuracy** and **82.34% macro F1 score** on the validation set, making it highly effective for question-answering classification tasks in production environments.
-### Key Features
-- 🎯 **Three-way Classification**: Distinguishes between no answer, factual answers, and yes/no questions
-- ⚡ **Fast Inference**: ~45ms per query on GPU, ~38ms on quantized CPU
-- 🔧 **Production-Ready**: Optimized with mixed precision training and dynamic quantization
-- 📊 **High Performance**: 85%+ accuracy on diverse question types
-- 🌐 **Real-world Training**: Trained on actual user queries from Google Search
 ## Model Details
 ### Model Description
-This model performs **question-answering classification** by analyzing a question-context pair and predicting one of three outcomes:
-- 🔴 **Label 0 - No Answer**: The provided context does not contain sufficient information to answer the question
-- 🟢 **Label 1 - Has Answer**: The context contains a specific answer (either short span or longer passage)
-- 🔵 **Label 2 - Yes/No**: The question requires a binary YES or NO response
-The model was developed as part of the **TensorFlow 2.0 Question Answering** Kaggle competition and represents a practical approach to pre-filtering question-answering systems in production environments.
-- **Developed by:** [Your Name/Organization]
-- **Model type:** DeBERTa-v3 (Sequence Classification)
-- **Language(s):** English
 - **License:** MIT
 - **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
-- **Parameters:** ~140 million
-- **Model size:** ~540 MB (full precision), ~280 MB (quantized)
 ### Model Sources
-- **Repository:** [GitHub Repository](https://github.com/yourusername/deberta-nq-classification)
 - **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
-- **Dataset:** [Natural Questions](https://ai.google.com/research/NaturalQuestions)
 - **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)
-- **Competition:** [TensorFlow 2.0 Question Answering](https://www.kaggle.com/c/tensorflow2-question-answering)
 ## Uses
 ### Direct Use
-The model can be directly used for:
-1. **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive span extraction
-2. **Search Result Classification**: Determine if search results contain answers to user queries
-3. **Customer Support Routing**: Route questions based on whether knowledge base contains answers
-4. **Educational Assessment**: Classify whether reading passages can answer comprehension questions
-5. **Information Retrieval**: Evaluate document relevance for question-answering tasks
 ### Downstream Use
-This model serves as an excellent foundation for:
-- **Multi-stage QA Pipelines**: Use as first stage before extractive or generative QA models
-- **Hybrid QA Systems**: Combine with span extraction models for end-to-end question answering
-- **Dialog Systems**: Determine if chatbot has sufficient context to answer user queries
-- **Domain Adaptation**: Fine-tune further on domain-specific question-answering datasets
-- **Active Learning**: Prioritize annotation of examples where model is uncertain
 ### Out-of-Scope Use
-The model is **not suitable** for:
-- ❌ **Extractive Answer Span Prediction**: This model only classifies, it doesn't extract specific answer text
-- ❌ **Generative Question Answering**: Cannot generate free-form answers to questions
-- ❌ **Non-English Languages**: Trained exclusively on English text
-- ❌ **Long-Form Context**: Limited to 256 tokens; very long documents require truncation
-- ❌ **Real-time Medical/Legal Advice**: Should not be used for critical decision-making
-- ❌ **Fact Verification**: Not designed to validate factual accuracy of statements
 ## Bias, Risks, and Limitations
-### Known Limitations
-1. **Context Length Restriction**: Maximum 256 tokens may truncate important information in long documents
-2. **Wikipedia Bias**: Training on Wikipedia-based questions may not generalize perfectly to other domains
-3. **Binary Yes/No Ambiguity**: Complex questions requiring nuanced answers may be misclassified as yes/no
-4. **Temporal Knowledge Cutoff**: Training data reflects knowledge up to a certain date
-5. **Language Variety**: May perform differently across English dialects and formal/informal language
-6. **Sample Size**: Trained on 10,000 examples; full dataset training could improve performance
-### Potential Biases
-- **Topic Bias**: Better performance on Wikipedia-common topics (history, geography, science)
-- **Question Type Bias**: May favor factual "what/when/where" questions over complex "why/how" questions
-- **Cultural Bias**: Inherits biases from DeBERTa pre-training and Wikipedia content
-- **Length Bias**: Performance may vary based on context and question length
-- **Demographic Representation**: Training data may not equally represent all perspectives
-### Risks
-- **Overconfidence**: Model may confidently predict "has answer" even when context is ambiguous
-- **False Negatives**: May miss valid answers in complex or indirect phrasings
-- **Adversarial Vulnerability**: Can be fooled by carefully crafted misleading contexts
-- **Downstream Amplification**: Errors in classification stage cascade to downstream QA components
 ### Recommendations
-Users should:
-- ✅ **Validate Critical Applications**: Implement human-in-the-loop for high-stakes decisions
-- ✅ **Monitor Performance**: Track metrics across different question types and domains
-- ✅ **Calibrate Thresholds**: Adjust confidence thresholds based on use case requirements
-- ✅ **Test Diverse Inputs**: Evaluate on representative samples from target domain
-- ✅ **Combine with Other Signals**: Use as one component in multi-model systems
-- ✅ **Regular Updates**: Retrain periodically with new data to maintain performance
 ## How to Get Started with the Model
-### Quick Start
 ```python
 from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
 import torch
-# Load model and tokenizer
 model_name = "mohamedsa1/deberta-v3-nq-classification"
 tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
 model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
 # Prepare input
 question = "What is the capital of France?"
-context = "Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents."
 text = f"Question: {question} Context: {context}"
-# Tokenize
-inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
 # Inference
-model.eval()
 with torch.no_grad():
     outputs = model(**inputs)
-    logits = outputs.logits
-    probabilities = torch.nn.functional.softmax(logits, dim=-1)[0]
-    predicted_class = torch.argmax(probabilities).item()
-# Interpret results
 labels = ["No Answer", "Has Answer", "Yes/No"]
-print(f"Question: {question}")
-print(f"Prediction: {labels[predicted_class]}")
-print(f"Confidence: {probabilities[predicted_class]:.2%}")
-print(f"\nAll Probabilities:")
-for label, prob in zip(labels, probabilities):
-    print(f"  {label}: {prob:.2%}")

 ---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- text-classification
+- question-answering
+- deberta
+- deberta-v3
+- natural-questions
+- pytorch
+- transformers
+- kaggle
+- tensorflow2-qa
+- nq
+datasets:
+- google/natural_questions
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+pipeline_tag: text-classification
+base_model: microsoft/deberta-v3-small
+model-index:
+- name: deberta-v3-nq-classification
+  results:
+  - task:
+      type: text-classification
+      name: Question Answering Classification
+    dataset:
+      name: Natural Questions (Simplified)
+      type: natural_questions
+      config: simplified
+      split: validation
+    metrics:
+    - type: accuracy
+      value: 85.42
+      name: Accuracy
+    - type: f1
+      value: 82.34
+      name: Macro F1
+    - type: precision
+      value: 84.21
+      name: Macro Precision
+    - type: recall
+      value: 83.67
+      name: Macro Recall
+widget:
+- text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023."
+  example_title: "Factual Question"
+- text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France."
+  example_title: "Yes/No Question"
+- text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life."
+  example_title: "No Answer"
+---
 # DeBERTa-v3-Small for Natural Questions Classification
+<!-- Provide a quick summary of what the model is/does. -->
+This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: **No Answer**, **Has Answer**, or **Yes/No**, achieving 85.42% accuracy and 82.34% macro F1 score.
 ## Model Details
 ### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether:
+- 🔴 **No Answer** (Label 0): The context doesn't contain an answer
+- 🟢 **Has Answer** (Label 1): The context contains a specific answer
+- 🔵 **Yes/No** (Label 2): The question requires a YES/NO response
+The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition.
+- **Developed by:** [Your Name]
+- **Funded by [optional]:** Self-funded / Academic Project
+- **Shared by [optional]:** [Your Organization/University]
+- **Model type:** Transformer-based Sequence Classification (DeBERTa-v3)
+- **Language(s) (NLP):** English (en)
 - **License:** MIT
 - **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
 ### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [GitHub](https://github.com/yourusername/deberta-nq-classification)
 - **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
 - **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)
 ## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+The model can be used directly for:
+- **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive processing
+- **Search Result Classification**: Determine if search results contain relevant answers
+- **Customer Support Routing**: Route questions based on answer availability
+- **Educational Assessment**: Evaluate if reading passages can answer questions
+- **Information Retrieval**: Assess document relevance for QA tasks
 ### Downstream Use
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+The model serves as a foundation for:
+- **Multi-stage QA Pipelines**: First stage before extractive/generative QA models
+- **Hybrid QA Systems**: Combine with span extraction for end-to-end QA
+- **Dialog Systems**: Determine if chatbot has sufficient context
+- **Domain Adaptation**: Fine-tune on domain-specific datasets
 ### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+❌ **Not suitable for:**
+- Extractive answer span prediction (only classifies, doesn't extract)
+- Generative question answering
+- Non-English languages
+- Very long documents (>256 tokens without truncation)
+- Medical/legal decision-making
+- Fact verification
 ## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+**Limitations:**
+- Context limited to 256 tokens
+- Wikipedia-biased training data
+- Trained on 10,000 examples (subset of full dataset)
+- May struggle with complex reasoning questions
+**Biases:**
+- Better on factual "what/when/where" questions
+- Inherits biases from Wikipedia and base model
+- Performance varies across domains
+**Risks:**
+- May be overconfident on ambiguous inputs
+- False negatives on complex phrasings
+- Vulnerable to adversarial examples
 ### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users should:
+- ✅ Implement human review for critical applications
+- ✅ Monitor performance across different domains
+- ✅ Calibrate confidence thresholds for use case
+- ✅ Test on representative samples
+- ✅ Use as one component in multi-model systems
 ## How to Get Started with the Model
+Use the code below to get started with the model.
 ```python
 from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
 import torch
+# Load model
 model_name = "mohamedsa1/deberta-v3-nq-classification"
 tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
 model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
 # Prepare input
 question = "What is the capital of France?"
+context = "Paris is the capital and most populous city of France."
 text = f"Question: {question} Context: {context}"
 # Inference
+inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
 with torch.no_grad():
     outputs = model(**inputs)
+    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
+    prediction = torch.argmax(probs).item()
+# Results
 labels = ["No Answer", "Has Answer", "Yes/No"]
+print(f"Prediction: {labels[prediction]}")
+print(f"Confidence: {probs[prediction]:.2%}")