File size: 4,380 Bytes
b261e4e
a8002b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b261e4e
a8002b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---

language: en
license: mit
tags:
- text-classification
- survey-classification
- james-river
- bert
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: james-river-classifier
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: custom
      name: James River Survey Classification
    metrics:
    - type: accuracy
      value: 0.996  # Based on test prediction confidence
---


# James River Survey Classifier

This model classifies survey-related text messages into different job types for James River surveying services.

## Model Description

- **Model Type**: BERT-based text classification
- **Base Model**: bert-base-uncased
- **Language**: English
- **Task**: Multi-class text classification
- **Classes**: 6 survey job types

## Classes

The model can classify text into the following survey job types:

- **Boundary Survey** (ID: 0)
- **Construction Survey** (ID: 1)
- **Fence Staking** (ID: 2)
- **Other/General** (ID: 3)
- **Real Estate Survey** (ID: 4)
- **Subdivision Survey** (ID: 5)

## Usage

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch

import json



# Load model and tokenizer

model_name = "ityndall/james-river-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



# Load label mapping

import requests

label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json"

label_mapping = requests.get(label_mapping_url).json()



def classify_text(text):

    # Tokenize input

    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

    

    # Get prediction

    with torch.no_grad():

        outputs = model(**inputs)

        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

        predicted_class_id = predictions.argmax().item()

        confidence = predictions[0][predicted_class_id].item()

    

    # Get label

    predicted_label = label_mapping["id2label"][str(predicted_class_id)]

    

    return {

        "label": predicted_label,

        "confidence": confidence,

        "class_id": predicted_class_id

    }



# Example usage

text = "I need a boundary survey for my property"

result = classify_text(text)

print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})")

```

## Training Data

The model was trained on 1,000 survey-related text messages with the following distribution:

- **Other/General**: 919 samples (91.9%)
- **Real Estate Survey**: 49 samples (4.9%)
- **Fence Staking**: 21 samples (2.1%)
- **Subdivision Survey**: 4 samples (0.4%)
- **Boundary Survey**: 4 samples (0.4%)
- **Construction Survey**: 3 samples (0.3%)

## Training Details

- **Training Framework**: Hugging Face Transformers
- **Base Model**: bert-base-uncased
- **Training Epochs**: 3
- **Batch Size**: 8
- **Learning Rate**: 5e-05
- **Optimizer**: AdamW
- **Training Loss**: 0.279
- **Training Time**: ~19.5 minutes

## Model Performance

The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary.

## Limitations

- The model was trained on a small, imbalanced dataset
- Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples
- The model may have a bias toward predicting "Other/General" due to class imbalance

## Intended Use

This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training.

## Files

- `config.json`: Model configuration
- `model.safetensors`: Model weights
- `tokenizer.json`, `tokenizer_config.json`, `vocab.txt`: Tokenizer files
- `label_encoder.pkl`: Original scikit-learn label encoder
- `label_mapping.json`: Human-readable label mappings

## Citation

If you use this model, please cite:

```

@misc{james-river-classifier,

  title={James River Survey Classifier},

  author={James River Surveying},

  year={2025},

  url={https://huggingface.co/ityndall/james-river-classifier}

}

```