| **π§ Q&AMODEL-SQUAD** |
|
|
| A roberta-base-squad2 extractive Question Answering model fine-tuned on the SQuAD v2.0 dataset to predict precise answers from context passages, including handling unanswerable questions. |
|
|
| --- |
|
|
| β¨ **Model Highlights** |
|
|
| - π Based on roberta-base-squad2 |
| - π Fine-tuned on SQuAD v2.0 (or your custom QA dataset) |
| - β‘ Supports extractive question answering finds precise answers from context passages |
| - πΎ Suitable for real-time inference with minimal latency on both CPU and GPU |
| - π οΈ Easily integrable into web apps, enterprise tools, and virtual assistants |
| - π Handles unanswerable questions gracefully with no-answer detection (if trained on SQuAD v2) |
|
|
| --- |
|
|
| π§ Intended Uses |
|
|
| - β
Customer support bots that extract answers from product manuals or FAQs |
| - β
Educational tools that answer student queries based on textbooks or syllabus |
| - β
Legal, financial, or technical document analysis |
| - β
Search engines with context-aware question answering |
| - β
Chatbots that require contextual comprehension for precise responses |
| |
| --- |
|
|
| - π« Limitations |
|
|
| - βTrained primarily on formal text performance may degrade on informal or slang-heavy input |
| - βDoes not support multi-hop questions requiring reasoning across multiple paragraphs |
| - β May struggle with ambiguous questions or context with multiple possible answers |
| - β Not designed for very long documents (performance may drop for inputs >512 tokens) |
|
|
| --- |
|
|
| ποΈββοΈ Training Details |
|
|
| | Field | Value | |
| | -------------- | ------------------------------ | |
| | **Base Model** | `roberta-base-squad2` | |
| | **Dataset** | SQuAD v2.0 | |
| | **Framework** | PyTorch with Transformers | |
| | **Epochs** | 3 | |
| | **Batch Size** | 16 | |
| | **Optimizer** | AdamW | |
| | **Loss** | CrossEntropyLoss (token-level) | |
| | **Device** | Trained on CUDA-enabled GPU | |
|
|
| --- |
|
|
| π Evaluation Metrics |
|
|
| | Metric | Score | |
| | ----------------------------------------------- | ----- | |
| | Accuracy | 0.80 | |
| | F1-Score | 0.78 | |
| | Precision | 0.79 | |
| | Recall | 0.78 | |
|
|
| --- |
|
|
| π Usage |
| ```python |
| from transformers import BertTokenizerFast, BertForTokenClassification |
| from transformers import pipeline |
| import torch |
| |
| model_name = "AventIQ-AI/QA-Squad-Model" |
| tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) |
| model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint) |
| model.eval() |
| |
| |
| |
| #Inference |
| |
| |
| qa_pipeline = pipeline("question-answering", model="./qa_model", tokenizer="./qa_model") |
| |
| # Provide a context and a question |
| context = """ |
| The Amazon rainforest, also known as Amazonia, is a moist broadleaf tropical rainforest in the Amazon biome |
| that covers most of the Amazon basin of South America. This region includes territory belonging to nine nations. |
| """ |
| question = "What is the Amazon rainforest also known as?" |
| |
| # Run inference |
| result = qa_pipeline(question=question, context=context) |
| |
| # Print the result |
| print(f"Question: {question}") |
| print(f"Answer: {result['answer']}") |
| print(f"Score: {result['score']:.4f}") |
| ``` |
| --- |
|
|
| - π§© Quantization |
| - Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. |
|
|
| ---- |
|
|
| π Repository Structure |
| ``` |
| . |
| βββ model/ # Quantized model files |
| βββ tokenizer_config/ # Tokenizer and vocab files |
| βββ model.safensors/ # Fine-tuned model in safetensors format |
| βββ README.md # Model card |
| |
| ``` |
| --- |
| π€ Contributing |
|
|
| Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model. |
|
|
|
|
|
|