cybersec-validation-model-v1 / README.md

Upload README.md with huggingface_hub

76d9d75 verified 4 months ago

4.23 kB

	---
	license: apache-2.0
	tags:
	- text-classification
	- cybersecurity
	- data-validation
	- form-validation
	language:
	- en
	pipeline_tag: text-classification
	---

	# Cybersecurity Data Validation Model

	## 🛡️ Overview
	This model validates user input data according to cybersecurity standards. It performs binary classification to determine if personal information fields meet security formatting requirements.

	## 🎯 Model Purpose
	- Task: Binary Text Classification (VALID/INVALID)
	- Domain: Cybersecurity & Data Validation
	- Use Case: Form validation, data quality checking, input sanitization

	## 📋 Validation Rules

	The model checks if input data follows these cybersecurity standards:

	- firstName: Must be proper case (First letter capital, rest lowercase)
	- ✅ Valid: "John", "Alice", "Maria"
	- ❌ Invalid: "john", "ALICE", "mArIa"

	- address: Each word should be properly capitalized
	- ✅ Valid: "123 Main Street", "789 Pine Road"
	- ❌ Invalid: "123 main street", "789 PINE ROAD"

	- mobile: Must be exactly 10 digits
	- ✅ Valid: "9876543210"
	- ❌ Invalid: "98765", "98765432109"

	- pincode: Must be exactly 6 digits
	- ✅ Valid: "560001", "400001"
	- ❌ Invalid: "560", "5600012"

	## 🚀 Usage

	### Basic Usage
	```python
	from transformers import pipeline

	# Load the model
	classifier = pipeline("text-classification", model="abinashv29gmailcom/cybersec-validation-model-v1")

	# Test input
	text = "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
	result = classifier(text)

	print(result)
	# Output: [{'label': 'VALID', 'score': 0.95}]
	```

	### Batch Processing
	```python
	# Multiple inputs
	inputs = [
	"firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001",
	"firstName: bob, address: main street, mobile: 98765, pincode: 123"
	]

	results = classifier(inputs)
	for i, result in enumerate(results):
	status = "✅ VALID" if result['label'] == 'VALID' else "❌ INVALID"
	print(f"Input {i+1}: {status} (Confidence: {result['score']:.3f})")
	```

	### Integration Function
	```python
	def validate_user_data(firstname, address, mobile, pincode):
	input_text = f"firstName: {firstname}, address: {address}, mobile: {mobile}, pincode: {pincode}"
	result = classifier(input_text)[0]

	return {
	'is_valid': result['label'] == 'VALID',
	'confidence': result['score'],
	'status': result['label'],
	'input': input_text
	}

	# Example usage
	validation_result = validate_user_data("John", "123 Main Street", "9876543210", "560001")
	print(validation_result)
	```

	## 📊 Examples

	### ✅ Valid Examples
	```
	Input: "firstName: John, address: 123 Main Street, mobile: 9876543210, pincode: 560001"
	Output: VALID (High Confidence)

	Input: "firstName: Alice, address: 789 Pine Road, mobile: 7654321098, pincode: 400001"
	Output: VALID (High Confidence)
	```

	### ❌ Invalid Examples
	```
	Input: "firstName: john, address: main street, mobile: 98765, pincode: 123"
	Output: INVALID (Multiple violations)

	Input: "firstName: MARY, address: APARTMENT 5B, mobile: 1234567890, pincode: 1234567"
	Output: INVALID (Formatting issues)
	```

	## 🔧 Technical Details
	- Base Model: distilbert-base-uncased
	- Architecture: DistilBERT for Sequence Classification
	- Labels: 2 classes (VALID, INVALID)
	- Max Sequence Length: 128 tokens
	- Framework: Transformers, PyTorch

	## 🎯 Intended Applications
	- Web Form Validation: Real-time validation of user registration forms
	- Data Quality Assurance: Batch processing of existing datasets
	- API Integration: RESTful services for validation endpoints
	- Mobile Apps: Client-side or server-side validation
	- Compliance Checking: Ensure data meets cybersecurity standards

	## ⚠️ Limitations
	- Designed for English language inputs
	- Specific to the defined validation rules
	- May require fine-tuning for domain-specific requirements
	- Performance may vary with inputs significantly different from training examples

	## 👨‍💻 Created By
	Abinash V - Cybersecurity Data Validation System

	## 📄 License
	Apache 2.0

	## 🔄 Version History
	- v1.0: Initial model with basic cybersecurity validation rules