ntphuc149
/

ViBidLEQA_large

+---
+license: mit
+datasets:
+- ntphuc149/ViBidLQA
+language:
+- vi
+metrics:
+- exact_match
+- f1
+base_model:
+- nguyenvulebinh/vi-mrc-large
+pipeline_tag: question-answering
+library_name: transformers
+tags:
+- legal
+- question-answering
+- machine-reading-comprehension
+- vietnamese
+---
+---
+license: mit
+datasets:
+- ntphuc149/ViBidLQA
+language:
+- vi
+metrics:
+- exact_match
+- f1
+base_model: nguyenvulebinh/vi-mrc-large
+pipeline_tag: question-answering
+library_name: transformers
+tags:
+- legal
+- question-answering
+- machine-reading-comprehension
+- vietnamese
+- bidding-law
+---
+# ViBidLEQA_large: A Vietnamese Bidding Law Extractive Question Answering Model
+## Overview
+ViBidLEQA_large is an Extractive Question-Answering (EQA) model specifically developed for the Vietnamese bidding law domain. Built upon the nguyenvulebinh/vi-mrc-large architecture and fine-tuned with a specialized bidding law dataset, this model achieves state-of-the-art performance in extracting precise answers from legal documents for bidding law queries.
+## Model Description
+- **Task**: Extractive Question Answering
+- **Domain**: Vietnamese Bidding Law
+- **Base Model**: nguyenvulebinh/vi-mrc-large
+- **Approach**: Fine-tuning
+- **Language**: Vietnamese
+## Dataset
+The ViBidLQA dataset consists of:
+- **Training set**: 5,300 samples
+- **Test set**: 1,000 samples
+- **Data Creation Process**:
+  - Training data was automatically generated using Claude 3.5 Sonnet and validated by two legal experts
+  - The test set was manually created and verified by two Vietnamese legal experts
+  - All samples focus on Vietnamese bidding law content
+## Performance
+Our model achieves exceptional performance on the test set:
+| Metric | Score |
+|--------|-------|
+| Exact Match | 88.30 |
+| F1-Score | 94.25 |
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForQuestionAnswering
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("ntphuc149/ViBidLEQA_large")
+model = AutoModelForQuestionAnswering.from_pretrained("ntphuc149/ViBidLEQA_large")
+# Example usage
+question = "Thế nào là đấu thầu hạn chế?"
+context = "Đấu thầu hạn chế là phương thức lựa chọn nhà thầu trong đó chỉ một số nhà thầu đáp ứng yêu cầu về năng lực và kinh nghiệm được bên mời thầu mời tham gia."
+# Tokenize input
+inputs = tokenizer(
+    question,
+    context,
+    return_tensors="pt",
+    max_length=512,
+    truncation=True,
+    padding=True
+)
+# Get model predictions
+with torch.no_grad():
+    outputs = model(**inputs)
+# Get answer span
+answer_start = torch.argmax(outputs.start_logits)
+answer_end = torch.argmax(outputs.end_logits) + 1
+answer = tokenizer.decode(inputs.input_ids[0][answer_start:answer_end])
+print(f"Question: {question}")
+print(f"Answer: {answer}")
+```
+## Applications
+This model is advantageous for:
+- Legal document analysis systems
+- Bidding law information retrieval systems
+- Legal advisory chatbots
+- Automated question-answering systems for bidding law
+- Legal research and documentation tools
+## Limitations
+- Domain Specificity: The model is specifically trained for Vietnamese bidding law and may not generalize well to other legal domains
+- Language Constraint: Optimized for Vietnamese language only
+- Context Length: Maximum input length is 512 tokens
+- Legal Disclaimer: Should be used as a reference tool, not as a replacement for professional legal advice
+## Citation
+```bibtex
+comming soon...
+```
+## Contact
+For questions, feedback, or collaborations:
+- Email: nguyentruongphuc_12421TN@utehy.edu.vn
+- GitHub Issues: [@ntphuc149](https://github.com/ntphuc149)
+- HuggingFace: [@ntphuc149](https://huggingface.co/ntphuc149)
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.