mohamedsa1 commited on
Commit
ea53bd1
Β·
verified Β·
1 Parent(s): e0074a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -95
README.md CHANGED
@@ -1,160 +1,190 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  # DeBERTa-v3-Small for Natural Questions Classification
4
 
5
- <div align="center">
6
- <img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm.svg" alt="Hugging Face Model">
7
- <img src="https://img.shields.io/badge/PyTorch-2.0+-red.svg" alt="PyTorch">
8
- <img src="https://img.shields.io/badge/Transformers-4.30+-blue.svg" alt="Transformers">
9
- <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License">
10
- </div>
11
-
12
- ## Model Summary
13
-
14
- This model is a fine-tuned version of **microsoft/deberta-v3-small** specifically trained for question-answering classification on the **Natural Questions** dataset. It classifies question-context pairs into three distinct categories, helping determine whether a given context contains an answer to a question and what type of answer it is.
15
-
16
- The model achieves **85.42% accuracy** and **82.34% macro F1 score** on the validation set, making it highly effective for question-answering classification tasks in production environments.
17
 
18
- ### Key Features
19
-
20
- - 🎯 **Three-way Classification**: Distinguishes between no answer, factual answers, and yes/no questions
21
- - ⚑ **Fast Inference**: ~45ms per query on GPU, ~38ms on quantized CPU
22
- - πŸ”§ **Production-Ready**: Optimized with mixed precision training and dynamic quantization
23
- - πŸ“Š **High Performance**: 85%+ accuracy on diverse question types
24
- - 🌐 **Real-world Training**: Trained on actual user queries from Google Search
25
 
26
  ## Model Details
27
 
28
  ### Model Description
29
 
30
- This model performs **question-answering classification** by analyzing a question-context pair and predicting one of three outcomes:
31
 
32
- - πŸ”΄ **Label 0 - No Answer**: The provided context does not contain sufficient information to answer the question
33
- - 🟒 **Label 1 - Has Answer**: The context contains a specific answer (either short span or longer passage)
34
- - πŸ”΅ **Label 2 - Yes/No**: The question requires a binary YES or NO response
 
35
 
36
- The model was developed as part of the **TensorFlow 2.0 Question Answering** Kaggle competition and represents a practical approach to pre-filtering question-answering systems in production environments.
37
 
38
- - **Developed by:** [Your Name/Organization]
39
- - **Model type:** DeBERTa-v3 (Sequence Classification)
40
- - **Language(s):** English
 
 
41
  - **License:** MIT
42
  - **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
43
- - **Parameters:** ~140 million
44
- - **Model size:** ~540 MB (full precision), ~280 MB (quantized)
45
 
46
  ### Model Sources
47
 
48
- - **Repository:** [GitHub Repository](https://github.com/yourusername/deberta-nq-classification)
 
 
49
  - **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
50
- - **Dataset:** [Natural Questions](https://ai.google.com/research/NaturalQuestions)
51
  - **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)
52
- - **Competition:** [TensorFlow 2.0 Question Answering](https://www.kaggle.com/c/tensorflow2-question-answering)
53
 
54
  ## Uses
55
 
 
 
56
  ### Direct Use
57
 
58
- The model can be directly used for:
59
 
60
- 1. **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive span extraction
61
- 2. **Search Result Classification**: Determine if search results contain answers to user queries
62
- 3. **Customer Support Routing**: Route questions based on whether knowledge base contains answers
63
- 4. **Educational Assessment**: Classify whether reading passages can answer comprehension questions
64
- 5. **Information Retrieval**: Evaluate document relevance for question-answering tasks
 
65
 
66
  ### Downstream Use
67
 
68
- This model serves as an excellent foundation for:
69
 
70
- - **Multi-stage QA Pipelines**: Use as first stage before extractive or generative QA models
71
- - **Hybrid QA Systems**: Combine with span extraction models for end-to-end question answering
72
- - **Dialog Systems**: Determine if chatbot has sufficient context to answer user queries
73
- - **Domain Adaptation**: Fine-tune further on domain-specific question-answering datasets
74
- - **Active Learning**: Prioritize annotation of examples where model is uncertain
75
 
76
  ### Out-of-Scope Use
77
 
78
- The model is **not suitable** for:
79
 
80
- - ❌ **Extractive Answer Span Prediction**: This model only classifies, it doesn't extract specific answer text
81
- - ❌ **Generative Question Answering**: Cannot generate free-form answers to questions
82
- - ❌ **Non-English Languages**: Trained exclusively on English text
83
- - ❌ **Long-Form Context**: Limited to 256 tokens; very long documents require truncation
84
- - ❌ **Real-time Medical/Legal Advice**: Should not be used for critical decision-making
85
- - ❌ **Fact Verification**: Not designed to validate factual accuracy of statements
 
86
 
87
  ## Bias, Risks, and Limitations
88
 
89
- ### Known Limitations
90
 
91
- 1. **Context Length Restriction**: Maximum 256 tokens may truncate important information in long documents
92
- 2. **Wikipedia Bias**: Training on Wikipedia-based questions may not generalize perfectly to other domains
93
- 3. **Binary Yes/No Ambiguity**: Complex questions requiring nuanced answers may be misclassified as yes/no
94
- 4. **Temporal Knowledge Cutoff**: Training data reflects knowledge up to a certain date
95
- 5. **Language Variety**: May perform differently across English dialects and formal/informal language
96
- 6. **Sample Size**: Trained on 10,000 examples; full dataset training could improve performance
97
 
98
- ### Potential Biases
 
 
 
99
 
100
- - **Topic Bias**: Better performance on Wikipedia-common topics (history, geography, science)
101
- - **Question Type Bias**: May favor factual "what/when/where" questions over complex "why/how" questions
102
- - **Cultural Bias**: Inherits biases from DeBERTa pre-training and Wikipedia content
103
- - **Length Bias**: Performance may vary based on context and question length
104
- - **Demographic Representation**: Training data may not equally represent all perspectives
105
-
106
- ### Risks
107
-
108
- - **Overconfidence**: Model may confidently predict "has answer" even when context is ambiguous
109
- - **False Negatives**: May miss valid answers in complex or indirect phrasings
110
- - **Adversarial Vulnerability**: Can be fooled by carefully crafted misleading contexts
111
- - **Downstream Amplification**: Errors in classification stage cascade to downstream QA components
112
 
113
  ### Recommendations
114
 
115
- Users should:
116
 
117
- - βœ… **Validate Critical Applications**: Implement human-in-the-loop for high-stakes decisions
118
- - βœ… **Monitor Performance**: Track metrics across different question types and domains
119
- - βœ… **Calibrate Thresholds**: Adjust confidence thresholds based on use case requirements
120
- - βœ… **Test Diverse Inputs**: Evaluate on representative samples from target domain
121
- - βœ… **Combine with Other Signals**: Use as one component in multi-model systems
122
- - βœ… **Regular Updates**: Retrain periodically with new data to maintain performance
123
 
124
  ## How to Get Started with the Model
125
 
126
- ### Quick Start
127
 
128
  ```python
129
  from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
130
  import torch
131
 
132
- # Load model and tokenizer
133
  model_name = "mohamedsa1/deberta-v3-nq-classification"
134
  tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
135
  model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
136
 
137
  # Prepare input
138
  question = "What is the capital of France?"
139
- context = "Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents."
140
  text = f"Question: {question} Context: {context}"
141
 
142
- # Tokenize
143
- inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
144
-
145
  # Inference
146
- model.eval()
147
  with torch.no_grad():
148
  outputs = model(**inputs)
149
- logits = outputs.logits
150
- probabilities = torch.nn.functional.softmax(logits, dim=-1)[0]
151
- predicted_class = torch.argmax(probabilities).item()
152
 
153
- # Interpret results
154
  labels = ["No Answer", "Has Answer", "Yes/No"]
155
- print(f"Question: {question}")
156
- print(f"Prediction: {labels[predicted_class]}")
157
- print(f"Confidence: {probabilities[predicted_class]:.2%}")
158
- print(f"\nAll Probabilities:")
159
- for label, prob in zip(labels, probabilities):
160
- print(f" {label}: {prob:.2%}")
 
1
  ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - text-classification
8
+ - question-answering
9
+ - deberta
10
+ - deberta-v3
11
+ - natural-questions
12
+ - pytorch
13
+ - transformers
14
+ - kaggle
15
+ - tensorflow2-qa
16
+ - nq
17
+ datasets:
18
+ - google/natural_questions
19
+ metrics:
20
+ - accuracy
21
+ - f1
22
+ - precision
23
+ - recall
24
+ pipeline_tag: text-classification
25
+ base_model: microsoft/deberta-v3-small
26
+ model-index:
27
+ - name: deberta-v3-nq-classification
28
+ results:
29
+ - task:
30
+ type: text-classification
31
+ name: Question Answering Classification
32
+ dataset:
33
+ name: Natural Questions (Simplified)
34
+ type: natural_questions
35
+ config: simplified
36
+ split: validation
37
+ metrics:
38
+ - type: accuracy
39
+ value: 85.42
40
+ name: Accuracy
41
+ - type: f1
42
+ value: 82.34
43
+ name: Macro F1
44
+ - type: precision
45
+ value: 84.21
46
+ name: Macro Precision
47
+ - type: recall
48
+ value: 83.67
49
+ name: Macro Recall
50
+ widget:
51
+ - text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023."
52
+ example_title: "Factual Question"
53
+ - text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France."
54
+ example_title: "Yes/No Question"
55
+ - text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life."
56
+ example_title: "No Answer"
57
+ ---
58
 
59
  # DeBERTa-v3-Small for Natural Questions Classification
60
 
61
+ <!-- Provide a quick summary of what the model is/does. -->
 
 
 
 
 
 
 
 
 
 
 
62
 
63
+ This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: **No Answer**, **Has Answer**, or **Yes/No**, achieving 85.42% accuracy and 82.34% macro F1 score.
 
 
 
 
 
 
64
 
65
  ## Model Details
66
 
67
  ### Model Description
68
 
69
+ <!-- Provide a longer summary of what this model is. -->
70
 
71
+ This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether:
72
+ - πŸ”΄ **No Answer** (Label 0): The context doesn't contain an answer
73
+ - 🟒 **Has Answer** (Label 1): The context contains a specific answer
74
+ - πŸ”΅ **Yes/No** (Label 2): The question requires a YES/NO response
75
 
76
+ The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition.
77
 
78
+ - **Developed by:** [Your Name]
79
+ - **Funded by [optional]:** Self-funded / Academic Project
80
+ - **Shared by [optional]:** [Your Organization/University]
81
+ - **Model type:** Transformer-based Sequence Classification (DeBERTa-v3)
82
+ - **Language(s) (NLP):** English (en)
83
  - **License:** MIT
84
  - **Finetuned from model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
 
 
85
 
86
  ### Model Sources
87
 
88
+ <!-- Provide the basic links for the model. -->
89
+
90
+ - **Repository:** [GitHub](https://github.com/yourusername/deberta-nq-classification)
91
  - **Paper:** [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
 
92
  - **Demo:** [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)
 
93
 
94
  ## Uses
95
 
96
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
97
+
98
  ### Direct Use
99
 
100
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
101
 
102
+ The model can be used directly for:
103
+ - **Question Answering System Pre-filtering**: Filter out unanswerable questions before expensive processing
104
+ - **Search Result Classification**: Determine if search results contain relevant answers
105
+ - **Customer Support Routing**: Route questions based on answer availability
106
+ - **Educational Assessment**: Evaluate if reading passages can answer questions
107
+ - **Information Retrieval**: Assess document relevance for QA tasks
108
 
109
  ### Downstream Use
110
 
111
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
112
 
113
+ The model serves as a foundation for:
114
+ - **Multi-stage QA Pipelines**: First stage before extractive/generative QA models
115
+ - **Hybrid QA Systems**: Combine with span extraction for end-to-end QA
116
+ - **Dialog Systems**: Determine if chatbot has sufficient context
117
+ - **Domain Adaptation**: Fine-tune on domain-specific datasets
118
 
119
  ### Out-of-Scope Use
120
 
121
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
122
 
123
+ ❌ **Not suitable for:**
124
+ - Extractive answer span prediction (only classifies, doesn't extract)
125
+ - Generative question answering
126
+ - Non-English languages
127
+ - Very long documents (>256 tokens without truncation)
128
+ - Medical/legal decision-making
129
+ - Fact verification
130
 
131
  ## Bias, Risks, and Limitations
132
 
133
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
134
 
135
+ **Limitations:**
136
+ - Context limited to 256 tokens
137
+ - Wikipedia-biased training data
138
+ - Trained on 10,000 examples (subset of full dataset)
139
+ - May struggle with complex reasoning questions
 
140
 
141
+ **Biases:**
142
+ - Better on factual "what/when/where" questions
143
+ - Inherits biases from Wikipedia and base model
144
+ - Performance varies across domains
145
 
146
+ **Risks:**
147
+ - May be overconfident on ambiguous inputs
148
+ - False negatives on complex phrasings
149
+ - Vulnerable to adversarial examples
 
 
 
 
 
 
 
 
150
 
151
  ### Recommendations
152
 
153
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
154
 
155
+ Users should:
156
+ - βœ… Implement human review for critical applications
157
+ - βœ… Monitor performance across different domains
158
+ - βœ… Calibrate confidence thresholds for use case
159
+ - βœ… Test on representative samples
160
+ - βœ… Use as one component in multi-model systems
161
 
162
  ## How to Get Started with the Model
163
 
164
+ Use the code below to get started with the model.
165
 
166
  ```python
167
  from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
168
  import torch
169
 
170
+ # Load model
171
  model_name = "mohamedsa1/deberta-v3-nq-classification"
172
  tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
173
  model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
174
 
175
  # Prepare input
176
  question = "What is the capital of France?"
177
+ context = "Paris is the capital and most populous city of France."
178
  text = f"Question: {question} Context: {context}"
179
 
 
 
 
180
  # Inference
181
+ inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
182
  with torch.no_grad():
183
  outputs = model(**inputs)
184
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
185
+ prediction = torch.argmax(probs).item()
 
186
 
187
+ # Results
188
  labels = ["No Answer", "Has Answer", "Yes/No"]
189
+ print(f"Prediction: {labels[prediction]}")
190
+ print(f"Confidence: {probs[prediction]:.2%}")