File size: 4,380 Bytes

---

language: en
license: mit
tags:
- text-classification
- survey-classification
- james-river
- bert
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: james-river-classifier
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: custom
      name: James River Survey Classification
    metrics:
    - type: accuracy
      value: 0.996  # Based on test prediction confidence
---


# James River Survey Classifier

This model classifies survey-related text messages into different job types for James River surveying services.

## Model Description

- **Model Type**: BERT-based text classification
- **Base Model**: bert-base-uncased
- **Language**: English
- **Task**: Multi-class text classification
- **Classes**: 6 survey job types

## Classes

The model can classify text into the following survey job types:

- **Boundary Survey** (ID: 0)
- **Construction Survey** (ID: 1)
- **Fence Staking** (ID: 2)
- **Other/General** (ID: 3)
- **Real Estate Survey** (ID: 4)
- **Subdivision Survey** (ID: 5)

## Usage

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch

import json



# Load model and tokenizer

model_name = "ityndall/james-river-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



# Load label mapping

import requests

label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json"

label_mapping = requests.get(label_mapping_url).json()



def classify_text(text):

    # Tokenize input

    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

    

    # Get prediction

    with torch.no_grad():

        outputs = model(**inputs)

        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

        predicted_class_id = predictions.argmax().item()

        confidence = predictions[0][predicted_class_id].item()

    

    # Get label

    predicted_label = label_mapping["id2label"][str(predicted_class_id)]

    

    return {

        "label": predicted_label,

        "confidence": confidence,

        "class_id": predicted_class_id

    }



# Example usage

text = "I need a boundary survey for my property"

result = classify_text(text)

print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})")

```

## Training Data

The model was trained on 1,000 survey-related text messages with the following distribution:

- **Other/General**: 919 samples (91.9%)
- **Real Estate Survey**: 49 samples (4.9%)
- **Fence Staking**: 21 samples (2.1%)
- **Subdivision Survey**: 4 samples (0.4%)
- **Boundary Survey**: 4 samples (0.4%)
- **Construction Survey**: 3 samples (0.3%)

## Training Details

- **Training Framework**: Hugging Face Transformers
- **Base Model**: bert-base-uncased
- **Training Epochs**: 3
- **Batch Size**: 8
- **Learning Rate**: 5e-05
- **Optimizer**: AdamW
- **Training Loss**: 0.279
- **Training Time**: ~19.5 minutes

## Model Performance

The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary.

## Limitations

- The model was trained on a small, imbalanced dataset
- Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples
- The model may have a bias toward predicting "Other/General" due to class imbalance

## Intended Use

This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training.

## Files

- `config.json`: Model configuration
- `model.safetensors`: Model weights
- `tokenizer.json`, `tokenizer_config.json`, `vocab.txt`: Tokenizer files
- `label_encoder.pkl`: Original scikit-learn label encoder
- `label_mapping.json`: Human-readable label mappings

## Citation

If you use this model, please cite:

```

@misc{james-river-classifier,

  title={James River Survey Classifier},

  author={James River Surveying},

  year={2025},

  url={https://huggingface.co/ityndall/james-river-classifier}

}

```