deberta-v3-nq-classification / README.md

Update README.md

ea53bd1 verified about 1 month ago

6.77 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- text-classification
	- question-answering
	- deberta
	- deberta-v3
	- natural-questions
	- pytorch
	- transformers
	- kaggle
	- tensorflow2-qa
	- nq
	datasets:
	- google/natural_questions
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	base_model: microsoft/deberta-v3-small
	model-index:
	- name: deberta-v3-nq-classification
	results:
	- task:
	type: text-classification
	name: Question Answering Classification
	dataset:
	name: Natural Questions (Simplified)
	type: natural_questions
	config: simplified
	split: validation
	metrics:
	- type: accuracy
	value: 85.42
	name: Accuracy
	- type: f1
	value: 82.34
	name: Macro F1
	- type: precision
	value: 84.21
	name: Macro Precision
	- type: recall
	value: 83.67
	name: Macro Recall
	widget:
	- text: "Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,102,650 residents as of 1 January 2023."
	example_title: "Factual Question"
	- text: "Question: Is Paris the capital of France? Context: Paris is the capital and most populous city of France."
	example_title: "Yes/No Question"
	- text: "Question: What is the population of Mars? Context: Earth is the third planet from the Sun and the only astronomical object known to harbor life."
	example_title: "No Answer"
	---

	# DeBERTa-v3-Small for Natural Questions Classification

	<!-- Provide a quick summary of what the model is/does. -->

	This model is a fine-tuned version of [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the Natural Questions dataset. It classifies question-context pairs into three categories: No Answer, Has Answer, or Yes/No, achieving 85.42% accuracy and 82.34% macro F1 score.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is a DeBERTa-v3-Small model fine-tuned for question-answering classification. Given a question and context, it predicts whether:
	- 🔴 No Answer (Label 0): The context doesn't contain an answer
	- 🟢 Has Answer (Label 1): The context contains a specific answer
	- 🔵 Yes/No (Label 2): The question requires a YES/NO response

	The model was trained on the Natural Questions dataset as part of the TensorFlow 2.0 Question Answering Kaggle competition.

	- Developed by: [Your Name]
	- Funded by [optional]: Self-funded / Academic Project
	- Shared by [optional]: [Your Organization/University]
	- Model type: Transformer-based Sequence Classification (DeBERTa-v3)
	- Language(s) (NLP): English (en)
	- License: MIT
	- Finetuned from model: [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [GitHub](https://github.com/yourusername/deberta-nq-classification)
	- Paper: [DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training](https://arxiv.org/abs/2111.09543)
	- Demo: [Gradio Space](https://huggingface.co/spaces/your-username/nq-qa-demo)

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	The model can be used directly for:
	- Question Answering System Pre-filtering: Filter out unanswerable questions before expensive processing
	- Search Result Classification: Determine if search results contain relevant answers
	- Customer Support Routing: Route questions based on answer availability
	- Educational Assessment: Evaluate if reading passages can answer questions
	- Information Retrieval: Assess document relevance for QA tasks

	### Downstream Use

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	The model serves as a foundation for:
	- Multi-stage QA Pipelines: First stage before extractive/generative QA models
	- Hybrid QA Systems: Combine with span extraction for end-to-end QA
	- Dialog Systems: Determine if chatbot has sufficient context
	- Domain Adaptation: Fine-tune on domain-specific datasets

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	❌ Not suitable for:
	- Extractive answer span prediction (only classifies, doesn't extract)
	- Generative question answering
	- Non-English languages
	- Very long documents (>256 tokens without truncation)
	- Medical/legal decision-making
	- Fact verification

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	Limitations:
	- Context limited to 256 tokens
	- Wikipedia-biased training data
	- Trained on 10,000 examples (subset of full dataset)
	- May struggle with complex reasoning questions

	Biases:
	- Better on factual "what/when/where" questions
	- Inherits biases from Wikipedia and base model
	- Performance varies across domains

	Risks:
	- May be overconfident on ambiguous inputs
	- False negatives on complex phrasings
	- Vulnerable to adversarial examples

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users should:
	- ✅ Implement human review for critical applications
	- ✅ Monitor performance across different domains
	- ✅ Calibrate confidence thresholds for use case
	- ✅ Test on representative samples
	- ✅ Use as one component in multi-model systems

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
	import torch

	# Load model
	model_name = "mohamedsa1/deberta-v3-nq-classification"
	tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
	model = DebertaV2ForSequenceClassification.from_pretrained(model_name)

	# Prepare input
	question = "What is the capital of France?"
	context = "Paris is the capital and most populous city of France."
	text = f"Question: {question} Context: {context}"

	# Inference
	inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True, padding=True)
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
	prediction = torch.argmax(probs).item()

	# Results
	labels = ["No Answer", "Has Answer", "Yes/No"]
	print(f"Prediction: {labels[prediction]}")
	print(f"Confidence: {probs[prediction]:.2%}")