Cybersecurity Data Validation Model

πŸ›‘οΈ Overview

This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements.

🎯 Model Purpose

  • Task: Binary Text Classification (VALID/INVALID)
  • Domain: Cybersecurity & Data Validation
  • Use Case: Form validation, data quality checking, input sanitization

πŸ“‹ Validation Rules

The model checks if input data follows these cybersecurity standards:

  • firstName: Must be proper case (First letter capital, rest lowercase)

    • βœ… Valid: "John", "Alice", "Maria"
    • ❌ Invalid: "john", "ALICE", "mArIa"
  • address: Each word should be properly capitalized

    • βœ… Valid: "123 Main Street", "789 Pine Road"
    • ❌ Invalid: "123 main street", "789 PINE ROAD"
  • mobile: Must be exactly 10 digits

    • βœ… Valid: "9876543210"
    • ❌ Invalid: "98765", "98765432109"
  • pincode: Must be exactly 6 digits

    • βœ… Valid: "560001", "400001"
    • ❌ Invalid: "560", "5600012"

πŸš€ Usage

Basic Usage

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1")

# Test input
text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
result = classifier(text)

print(result)
# Output: [{'label': 'VALID', 'score': 0.95}]

Batch Processing

# Multiple inputs
inputs = [
    "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001",
    "firstName: bob, address: main street, mobile: 98765, pincode: 123"
]

results = classifier(inputs)
for i, result in enumerate(results):
    status = "βœ… VALID" if result['label'] == 'VALID' else "❌ INVALID"
    print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})")

Integration Function

def validate_user_data(firstname, address, mobile, pincode):
    input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}"
    result = classifier(input_text)[0]
    
    return {
        'is_valid': result['label'] == 'VALID',
        'confidence': result['score'],
        'status': result['label'],
        'input': input_text
    }

# Example usage
validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001")
print(validation_result)

πŸ“Š Examples

βœ… Valid Examples

Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
Output: VALID (High Confidence)

Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001"
Output: VALID (High Confidence)

❌ Invalid Examples

Input: "firstName: john, address: main street, mobile: 98765, pincode: 123"
Output: INVALID (Multiple violations)

Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567"
Output: INVALID (Formatting issues)

πŸ”§ Technical Details

  • Base Model: distilbert-base-uncased
  • Architecture: DistilBERT for Sequence Classification
  • Labels: 2 classes (VALID, INVALID)
  • Max Sequence Length: 128 tokens
  • Framework: Transformers, PyTorch

🎯 Intended Applications

  • Web Form Validation: Real-time validation of user registration forms
  • Data Quality Assurance: Batch processing of existing datasets
  • API Integration: RESTful services for validation endpoints
  • Mobile Apps: Client-side or server-side validation
  • Compliance Checking: Ensure data meets cybersecurity standards

⚠️ Limitations

  • Designed for English language inputs
  • Specific to the defined validation rules
  • May require fine-tuning for domain-specific requirements
  • Performance may vary with inputs significantly different from training examples

πŸ‘¨β€πŸ’» Created By

Abinash V - Cybersecurity Data Validation System

πŸ“„ License

Apache 2.0

πŸ”„ Version History

  • v1.0: Initial model with basic cybersecurity validation rules
Downloads last month
2
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support