AventIQ-AI
/

BERT-Spam-Job-Posting-Detection-Model

Safetensors

bert

Model card Files Files and versions

xet

Community

vishal1364 commited on May 26, 2025

Commit

16a5de0

verified ·

1 Parent(s): c0789e8

Create README.md

Browse files

Files changed (1) hide show

README.md +99 -0

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+# 🧠 BERT-Spam-Job-Posting-Detection-Model
+A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.
+---
+## ✨ Model Highlights
+- 📌 Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
+- 🔍 Fine-tuned on a custom dataset of job postings labeled as fake or real
+- ⚡ Binary classification: Fake Job Posting vs Real Job Posting
+- 💾 Lightweight and optimized for CPU and GPU inference
+---
+## 🧠 Intended Uses
+- Automated detection of fraudulent job postings
+- Job board moderation and quality control
+- Enhancing recruitment platform security
+- Improving user trust in job marketplaces
+- Regulatory compliance monitoring for job ads
+---
+## 🚫 Limitations
+- Trained primarily on English-language job postings
+- May underperform on postings from less-represented industries or regions
+- Not optimized for job descriptions longer than 128 tokens
+- Not suitable for multilingual or multimedia job posting content
+---
+## 🏋️‍♂️ Training Details
+| Field          | Value                         |
+| -------------- | ----------------------------- |
+| **Base Model** | `bert-base-uncased`           |
+| **Dataset**    | Custom labeled job postings   |
+| **Framework**  | PyTorch with Transformers  |
+| **Epochs**     | 3                             |
+| **Batch Size** | 16                            |
+| **Max Length** | 128 tokens                    |
+| **Optimizer**  | AdamW                        |
+| **Loss**       | CrossEntropyLoss               |
+| **Device**     | CUDA-enabled GPU              |
+---
+## 📊 Evaluation Metrics
+| Metric    | Score  |
+| --------- | ------ |
+| Accuracy  | 0.97   |
+| Precision | 0.81   |
+---
+## 🚀 Usage
+```python
+from transformers import BertTokenizerFast, BertForSequenceClassification
+import torch
+model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
+tokenizer = BertTokenizerFast.from_pretrained(model_name)
+model = BertForSequenceClassification.from_pretrained(model_name)
+model.eval()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+def predict_with_bert(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
+    device = next(model.parameters()).device  # Get model device (cpu or cuda)
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    with torch.no_grad():
+        logits = model(**inputs).logits
+    predicted_class_id = logits.argmax().item()
+    return "Fake Job" if predicted_class_id == 1 else "Real Job"
+# Example
+print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
+print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
+```
+## 🗂 Repository Structure
+```
+.
+├── model/               # Quantized model files
+├── tokenizer_config/    # Tokenizer and vocab files
+├── model.safensors/     # Fine-tuned model in safetensors format
+├── README.md            # Model card
+```
+---
+## 🤝 Contributing
+Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.