Cybersecurity Data Validation Model
π‘οΈ Overview
This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements.
π― Model Purpose
- Task: Binary Text Classification (VALID/INVALID)
- Domain: Cybersecurity & Data Validation
- Use Case: Form validation, data quality checking, input sanitization
π Validation Rules
The model checks if input data follows these cybersecurity standards:
firstName: Must be proper case (First letter capital, rest lowercase)
- β Valid: "John", "Alice", "Maria"
- β Invalid: "john", "ALICE", "mArIa"
address: Each word should be properly capitalized
- β Valid: "123 Main Street", "789 Pine Road"
- β Invalid: "123 main street", "789 PINE ROAD"
mobile: Must be exactly 10 digits
- β Valid: "9876543210"
- β Invalid: "98765", "98765432109"
pincode: Must be exactly 6 digits
- β Valid: "560001", "400001"
- β Invalid: "560", "5600012"
π Usage
Basic Usage
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1")
# Test input
text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
result = classifier(text)
print(result)
# Output: [{'label': 'VALID', 'score': 0.95}]
Batch Processing
# Multiple inputs
inputs = [
"firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001",
"firstName: bob, address: main street, mobile: 98765, pincode: 123"
]
results = classifier(inputs)
for i, result in enumerate(results):
status = "β
VALID" if result['label'] == 'VALID' else "β INVALID"
print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})")
Integration Function
def validate_user_data(firstname, address, mobile, pincode):
input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}"
result = classifier(input_text)[0]
return {
'is_valid': result['label'] == 'VALID',
'confidence': result['score'],
'status': result['label'],
'input': input_text
}
# Example usage
validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001")
print(validation_result)
π Examples
β Valid Examples
Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
Output: VALID (High Confidence)
Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001"
Output: VALID (High Confidence)
β Invalid Examples
Input: "firstName: john, address: main street, mobile: 98765, pincode: 123"
Output: INVALID (Multiple violations)
Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567"
Output: INVALID (Formatting issues)
π§ Technical Details
- Base Model: distilbert-base-uncased
- Architecture: DistilBERT for Sequence Classification
- Labels: 2 classes (VALID, INVALID)
- Max Sequence Length: 128 tokens
- Framework: Transformers, PyTorch
π― Intended Applications
- Web Form Validation: Real-time validation of user registration forms
- Data Quality Assurance: Batch processing of existing datasets
- API Integration: RESTful services for validation endpoints
- Mobile Apps: Client-side or server-side validation
- Compliance Checking: Ensure data meets cybersecurity standards
β οΈ Limitations
- Designed for English language inputs
- Specific to the defined validation rules
- May require fine-tuning for domain-specific requirements
- Performance may vary with inputs significantly different from training examples
π¨βπ» Created By
Abinash V - Cybersecurity Data Validation System
π License
Apache 2.0
π Version History
- v1.0: Initial model with basic cybersecurity validation rules
- Downloads last month
- 2