James River Survey Classifier
This model classifies survey-related text messages into different job types for James River surveying services.
Model Description
- Model Type: BERT-based text classification
- Base Model: bert-base-uncased
- Language: English
- Task: Multi-class text classification
- Classes: 6 survey job types
Classes
The model can classify text into the following survey job types:
- Boundary Survey (ID: 0)
- Construction Survey (ID: 1)
- Fence Staking (ID: 2)
- Other/General (ID: 3)
- Real Estate Survey (ID: 4)
- Subdivision Survey (ID: 5)
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json
# Load model and tokenizer
model_name = "ityndall/james-river-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Load label mapping
import requests
label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json"
label_mapping = requests.get(label_mapping_url).json()
def classify_text(text):
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_id = predictions.argmax().item()
confidence = predictions[0][predicted_class_id].item()
# Get label
predicted_label = label_mapping["id2label"][str(predicted_class_id)]
return {
"label": predicted_label,
"confidence": confidence,
"class_id": predicted_class_id
}
# Example usage
text = "I need a boundary survey for my property"
result = classify_text(text)
print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})")
Training Data
The model was trained on 1,000 survey-related text messages with the following distribution:
- Other/General: 919 samples (91.9%)
- Real Estate Survey: 49 samples (4.9%)
- Fence Staking: 21 samples (2.1%)
- Subdivision Survey: 4 samples (0.4%)
- Boundary Survey: 4 samples (0.4%)
- Construction Survey: 3 samples (0.3%)
Training Details
- Training Framework: Hugging Face Transformers
- Base Model: bert-base-uncased
- Training Epochs: 3
- Batch Size: 8
- Learning Rate: 5e-05
- Optimizer: AdamW
- Training Loss: 0.279
- Training Time: ~19.5 minutes
Model Performance
The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary.
Limitations
- The model was trained on a small, imbalanced dataset
- Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples
- The model may have a bias toward predicting "Other/General" due to class imbalance
Intended Use
This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training.
Files
config.json: Model configurationmodel.safetensors: Model weightstokenizer.json,tokenizer_config.json,vocab.txt: Tokenizer fileslabel_encoder.pkl: Original scikit-learn label encoderlabel_mapping.json: Human-readable label mappings
Citation
If you use this model, please cite:
@misc{james-river-classifier,
title={James River Survey Classifier},
author={James River Surveying},
year={2025},
url={https://huggingface.co/ityndall/james-river-classifier}
}
- Downloads last month
- -
Space using ityndall/james-river-classifier 1
Evaluation results
- accuracy on James River Survey Classificationself-reported0.996