|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-classification |
|
|
- cybersecurity |
|
|
- data-validation |
|
|
- form-validation |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Cybersecurity Data Validation Model |
|
|
|
|
|
## π‘οΈ Overview |
|
|
This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements. |
|
|
|
|
|
## π― Model Purpose |
|
|
- **Task**: Binary Text Classification (VALID/INVALID) |
|
|
- **Domain**: Cybersecurity & Data Validation |
|
|
- **Use Case**: Form validation, data quality checking, input sanitization |
|
|
|
|
|
## π Validation Rules |
|
|
|
|
|
The model checks if input data follows these cybersecurity standards: |
|
|
|
|
|
- **firstName**: Must be proper case (First letter capital, rest lowercase) |
|
|
- β
Valid: "John", "Alice", "Maria" |
|
|
- β Invalid: "john", "ALICE", "mArIa" |
|
|
|
|
|
- **address**: Each word should be properly capitalized |
|
|
- β
Valid: "123 Main Street", "789 Pine Road" |
|
|
- β Invalid: "123 main street", "789 PINE ROAD" |
|
|
|
|
|
- **mobile**: Must be exactly 10 digits |
|
|
- β
Valid: "9876543210" |
|
|
- β Invalid: "98765", "98765432109" |
|
|
|
|
|
- **pincode**: Must be exactly 6 digits |
|
|
- β
Valid: "560001", "400001" |
|
|
- β Invalid: "560", "5600012" |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Basic Usage |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1") |
|
|
|
|
|
# Test input |
|
|
text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001" |
|
|
result = classifier(text) |
|
|
|
|
|
print(result) |
|
|
# Output: [{'label': 'VALID', 'score': 0.95}] |
|
|
``` |
|
|
|
|
|
### Batch Processing |
|
|
```python |
|
|
# Multiple inputs |
|
|
inputs = [ |
|
|
"firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001", |
|
|
"firstName: bob, address: main street, mobile: 98765, pincode: 123" |
|
|
] |
|
|
|
|
|
results = classifier(inputs) |
|
|
for i, result in enumerate(results): |
|
|
status = "β
VALID" if result['label'] == 'VALID' else "β INVALID" |
|
|
print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})") |
|
|
``` |
|
|
|
|
|
### Integration Function |
|
|
```python |
|
|
def validate_user_data(firstname, address, mobile, pincode): |
|
|
input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}" |
|
|
result = classifier(input_text)[0] |
|
|
|
|
|
return { |
|
|
'is_valid': result['label'] == 'VALID', |
|
|
'confidence': result['score'], |
|
|
'status': result['label'], |
|
|
'input': input_text |
|
|
} |
|
|
|
|
|
# Example usage |
|
|
validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001") |
|
|
print(validation_result) |
|
|
``` |
|
|
|
|
|
## π Examples |
|
|
|
|
|
### β
Valid Examples |
|
|
``` |
|
|
Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001" |
|
|
Output: VALID (High Confidence) |
|
|
|
|
|
Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001" |
|
|
Output: VALID (High Confidence) |
|
|
``` |
|
|
|
|
|
### β Invalid Examples |
|
|
``` |
|
|
Input: "firstName: john, address: main street, mobile: 98765, pincode: 123" |
|
|
Output: INVALID (Multiple violations) |
|
|
|
|
|
Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567" |
|
|
Output: INVALID (Formatting issues) |
|
|
``` |
|
|
|
|
|
## π§ Technical Details |
|
|
- **Base Model**: distilbert-base-uncased |
|
|
- **Architecture**: DistilBERT for Sequence Classification |
|
|
- **Labels**: 2 classes (VALID, INVALID) |
|
|
- **Max Sequence Length**: 128 tokens |
|
|
- **Framework**: Transformers, PyTorch |
|
|
|
|
|
## π― Intended Applications |
|
|
- **Web Form Validation**: Real-time validation of user registration forms |
|
|
- **Data Quality Assurance**: Batch processing of existing datasets |
|
|
- **API Integration**: RESTful services for validation endpoints |
|
|
- **Mobile Apps**: Client-side or server-side validation |
|
|
- **Compliance Checking**: Ensure data meets cybersecurity standards |
|
|
|
|
|
## β οΈ Limitations |
|
|
- Designed for English language inputs |
|
|
- Specific to the defined validation rules |
|
|
- May require fine-tuning for domain-specific requirements |
|
|
- Performance may vary with inputs significantly different from training examples |
|
|
|
|
|
## π¨βπ» Created By |
|
|
**Abinash V** - Cybersecurity Data Validation System |
|
|
|
|
|
## π License |
|
|
Apache 2.0 |
|
|
|
|
|
## π Version History |
|
|
- v1.0: Initial model with basic cybersecurity validation rules |
|
|
|