File size: 4,229 Bytes
21b31f9 76d9d75 21b31f9 76d9d75 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
tags:
- text-classification
- cybersecurity
- data-validation
- form-validation
language:
- en
pipeline_tag: text-classification
---
# Cybersecurity Data Validation Model
## π‘οΈ Overview
This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements.
## π― Model Purpose
- **Task**: Binary Text Classification (VALID/INVALID)
- **Domain**: Cybersecurity & Data Validation
- **Use Case**: Form validation, data quality checking, input sanitization
## π Validation Rules
The model checks if input data follows these cybersecurity standards:
- **firstName**: Must be proper case (First letter capital, rest lowercase)
- β
Valid: "John", "Alice", "Maria"
- β Invalid: "john", "ALICE", "mArIa"
- **address**: Each word should be properly capitalized
- β
Valid: "123 Main Street", "789 Pine Road"
- β Invalid: "123 main street", "789 PINE ROAD"
- **mobile**: Must be exactly 10 digits
- β
Valid: "9876543210"
- β Invalid: "98765", "98765432109"
- **pincode**: Must be exactly 6 digits
- β
Valid: "560001", "400001"
- β Invalid: "560", "5600012"
## π Usage
### Basic Usage
```python
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1")
# Test input
text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
result = classifier(text)
print(result)
# Output: [{'label': 'VALID', 'score': 0.95}]
```
### Batch Processing
```python
# Multiple inputs
inputs = [
"firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001",
"firstName: bob, address: main street, mobile: 98765, pincode: 123"
]
results = classifier(inputs)
for i, result in enumerate(results):
status = "β
VALID" if result['label'] == 'VALID' else "β INVALID"
print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})")
```
### Integration Function
```python
def validate_user_data(firstname, address, mobile, pincode):
input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}"
result = classifier(input_text)[0]
return {
'is_valid': result['label'] == 'VALID',
'confidence': result['score'],
'status': result['label'],
'input': input_text
}
# Example usage
validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001")
print(validation_result)
```
## π Examples
### β
Valid Examples
```
Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
Output: VALID (High Confidence)
Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001"
Output: VALID (High Confidence)
```
### β Invalid Examples
```
Input: "firstName: john, address: main street, mobile: 98765, pincode: 123"
Output: INVALID (Multiple violations)
Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567"
Output: INVALID (Formatting issues)
```
## π§ Technical Details
- **Base Model**: distilbert-base-uncased
- **Architecture**: DistilBERT for Sequence Classification
- **Labels**: 2 classes (VALID, INVALID)
- **Max Sequence Length**: 128 tokens
- **Framework**: Transformers, PyTorch
## π― Intended Applications
- **Web Form Validation**: Real-time validation of user registration forms
- **Data Quality Assurance**: Batch processing of existing datasets
- **API Integration**: RESTful services for validation endpoints
- **Mobile Apps**: Client-side or server-side validation
- **Compliance Checking**: Ensure data meets cybersecurity standards
## β οΈ Limitations
- Designed for English language inputs
- Specific to the defined validation rules
- May require fine-tuning for domain-specific requirements
- Performance may vary with inputs significantly different from training examples
## π¨βπ» Created By
**Abinash V** - Cybersecurity Data Validation System
## π License
Apache 2.0
## π Version History
- v1.0: Initial model with basic cybersecurity validation rules
|