--- language: en license: mit tags: - text-classification - survey-classification - james-river - bert datasets: - custom metrics: - accuracy - f1 model-index: - name: james-river-classifier results: - task: type: text-classification name: Text Classification dataset: type: custom name: James River Survey Classification metrics: - type: accuracy value: 0.996 # Based on test prediction confidence --- # James River Survey Classifier This model classifies survey-related text messages into different job types for James River surveying services. ## Model Description - **Model Type**: BERT-based text classification - **Base Model**: bert-base-uncased - **Language**: English - **Task**: Multi-class text classification - **Classes**: 6 survey job types ## Classes The model can classify text into the following survey job types: - **Boundary Survey** (ID: 0) - **Construction Survey** (ID: 1) - **Fence Staking** (ID: 2) - **Other/General** (ID: 3) - **Real Estate Survey** (ID: 4) - **Subdivision Survey** (ID: 5) ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import json # Load model and tokenizer model_name = "ityndall/james-river-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Load label mapping import requests label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json" label_mapping = requests.get(label_mapping_url).json() def classify_text(text): # Tokenize input inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) # Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class_id = predictions.argmax().item() confidence = predictions[0][predicted_class_id].item() # Get label predicted_label = label_mapping["id2label"][str(predicted_class_id)] return { "label": predicted_label, "confidence": confidence, "class_id": predicted_class_id } # Example usage text = "I need a boundary survey for my property" result = classify_text(text) print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})") ``` ## Training Data The model was trained on 1,000 survey-related text messages with the following distribution: - **Other/General**: 919 samples (91.9%) - **Real Estate Survey**: 49 samples (4.9%) - **Fence Staking**: 21 samples (2.1%) - **Subdivision Survey**: 4 samples (0.4%) - **Boundary Survey**: 4 samples (0.4%) - **Construction Survey**: 3 samples (0.3%) ## Training Details - **Training Framework**: Hugging Face Transformers - **Base Model**: bert-base-uncased - **Training Epochs**: 3 - **Batch Size**: 8 - **Learning Rate**: 5e-05 - **Optimizer**: AdamW - **Training Loss**: 0.279 - **Training Time**: ~19.5 minutes ## Model Performance The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary. ## Limitations - The model was trained on a small, imbalanced dataset - Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples - The model may have a bias toward predicting "Other/General" due to class imbalance ## Intended Use This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training. ## Files - `config.json`: Model configuration - `model.safetensors`: Model weights - `tokenizer.json`, `tokenizer_config.json`, `vocab.txt`: Tokenizer files - `label_encoder.pkl`: Original scikit-learn label encoder - `label_mapping.json`: Human-readable label mappings ## Citation If you use this model, please cite: ``` @misc{james-river-classifier, title={James River Survey Classifier}, author={James River Surveying}, year={2025}, url={https://huggingface.co/ityndall/james-river-classifier} } ```