James River Survey Classifier

This model classifies survey-related text messages into different job types for James River surveying services.

Model Description

  • Model Type: BERT-based text classification
  • Base Model: bert-base-uncased
  • Language: English
  • Task: Multi-class text classification
  • Classes: 6 survey job types

Classes

The model can classify text into the following survey job types:

  • Boundary Survey (ID: 0)
  • Construction Survey (ID: 1)
  • Fence Staking (ID: 2)
  • Other/General (ID: 3)
  • Real Estate Survey (ID: 4)
  • Subdivision Survey (ID: 5)

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json

# Load model and tokenizer
model_name = "ityndall/james-river-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load label mapping
import requests
label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json"
label_mapping = requests.get(label_mapping_url).json()

def classify_text(text):
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class_id = predictions.argmax().item()
        confidence = predictions[0][predicted_class_id].item()
    
    # Get label
    predicted_label = label_mapping["id2label"][str(predicted_class_id)]
    
    return {
        "label": predicted_label,
        "confidence": confidence,
        "class_id": predicted_class_id
    }

# Example usage
text = "I need a boundary survey for my property"
result = classify_text(text)
print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})")

Training Data

The model was trained on 1,000 survey-related text messages with the following distribution:

  • Other/General: 919 samples (91.9%)
  • Real Estate Survey: 49 samples (4.9%)
  • Fence Staking: 21 samples (2.1%)
  • Subdivision Survey: 4 samples (0.4%)
  • Boundary Survey: 4 samples (0.4%)
  • Construction Survey: 3 samples (0.3%)

Training Details

  • Training Framework: Hugging Face Transformers
  • Base Model: bert-base-uncased
  • Training Epochs: 3
  • Batch Size: 8
  • Learning Rate: 5e-05
  • Optimizer: AdamW
  • Training Loss: 0.279
  • Training Time: ~19.5 minutes

Model Performance

The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary.

Limitations

  • The model was trained on a small, imbalanced dataset
  • Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples
  • The model may have a bias toward predicting "Other/General" due to class imbalance

Intended Use

This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training.

Files

  • config.json: Model configuration
  • model.safetensors: Model weights
  • tokenizer.json, tokenizer_config.json, vocab.txt: Tokenizer files
  • label_encoder.pkl: Original scikit-learn label encoder
  • label_mapping.json: Human-readable label mappings

Citation

If you use this model, please cite:

@misc{james-river-classifier,
  title={James River Survey Classifier},
  author={James River Surveying},
  year={2025},
  url={https://huggingface.co/ityndall/james-river-classifier}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ityndall/james-river-classifier 1

Evaluation results