File size: 4,229 Bytes
21b31f9
76d9d75
 
 
 
 
 
 
 
 
21b31f9
 
76d9d75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
tags:
- text-classification
- cybersecurity
- data-validation
- form-validation
language:
- en
pipeline_tag: text-classification
---

# Cybersecurity Data Validation Model

## πŸ›‘οΈ Overview
This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements.

## 🎯 Model Purpose
- **Task**: Binary Text Classification (VALID/INVALID)
- **Domain**: Cybersecurity & Data Validation
- **Use Case**: Form validation, data quality checking, input sanitization

## πŸ“‹ Validation Rules

The model checks if input data follows these cybersecurity standards:

- **firstName**: Must be proper case (First letter capital, rest lowercase)
  - βœ… Valid: "John", "Alice", "Maria"
  - ❌ Invalid: "john", "ALICE", "mArIa"

- **address**: Each word should be properly capitalized
  - βœ… Valid: "123 Main Street", "789 Pine Road"
  - ❌ Invalid: "123 main street", "789 PINE ROAD"

- **mobile**: Must be exactly 10 digits
  - βœ… Valid: "9876543210"
  - ❌ Invalid: "98765", "98765432109"

- **pincode**: Must be exactly 6 digits
  - βœ… Valid: "560001", "400001"
  - ❌ Invalid: "560", "5600012"

## πŸš€ Usage

### Basic Usage
```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1")

# Test input
text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
result = classifier(text)

print(result)
# Output: [{'label': 'VALID', 'score': 0.95}]
```

### Batch Processing
```python
# Multiple inputs
inputs = [
    "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001",
    "firstName: bob, address: main street, mobile: 98765, pincode: 123"
]

results = classifier(inputs)
for i, result in enumerate(results):
    status = "βœ… VALID" if result['label'] == 'VALID' else "❌ INVALID"
    print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})")
```

### Integration Function
```python
def validate_user_data(firstname, address, mobile, pincode):
    input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}"
    result = classifier(input_text)[0]
    
    return {
        'is_valid': result['label'] == 'VALID',
        'confidence': result['score'],
        'status': result['label'],
        'input': input_text
    }

# Example usage
validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001")
print(validation_result)
```

## πŸ“Š Examples

### βœ… Valid Examples
```
Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
Output: VALID (High Confidence)

Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001"
Output: VALID (High Confidence)
```

### ❌ Invalid Examples
```
Input: "firstName: john, address: main street, mobile: 98765, pincode: 123"
Output: INVALID (Multiple violations)

Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567"
Output: INVALID (Formatting issues)
```

## πŸ”§ Technical Details
- **Base Model**: distilbert-base-uncased
- **Architecture**: DistilBERT for Sequence Classification
- **Labels**: 2 classes (VALID, INVALID)
- **Max Sequence Length**: 128 tokens
- **Framework**: Transformers, PyTorch

## 🎯 Intended Applications
- **Web Form Validation**: Real-time validation of user registration forms
- **Data Quality Assurance**: Batch processing of existing datasets
- **API Integration**: RESTful services for validation endpoints
- **Mobile Apps**: Client-side or server-side validation
- **Compliance Checking**: Ensure data meets cybersecurity standards

## ⚠️ Limitations
- Designed for English language inputs
- Specific to the defined validation rules
- May require fine-tuning for domain-specific requirements
- Performance may vary with inputs significantly different from training examples

## πŸ‘¨β€πŸ’» Created By
**Abinash V** - Cybersecurity Data Validation System

## πŸ“„ License
Apache 2.0

## πŸ”„ Version History
- v1.0: Initial model with basic cybersecurity validation rules