Upload folder using huggingface_hub

a8002b1 verified 8 months ago

4.38 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- survey-classification
	- james-river
	- bert
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	model-index:
	- name: james-river-classifier
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	type: custom
	name: James River Survey Classification
	metrics:
	- type: accuracy
	value: 0.996 # Based on test prediction confidence
	---

	# James River Survey Classifier

	This model classifies survey-related text messages into different job types for James River surveying services.

	## Model Description

	- Model Type: BERT-based text classification
	- Base Model: bert-base-uncased
	- Language: English
	- Task: Multi-class text classification
	- Classes: 6 survey job types

	## Classes

	The model can classify text into the following survey job types:

	- Boundary Survey (ID: 0)
	- Construction Survey (ID: 1)
	- Fence Staking (ID: 2)
	- Other/General (ID: 3)
	- Real Estate Survey (ID: 4)
	- Subdivision Survey (ID: 5)

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	import json

	# Load model and tokenizer
	model_name = "ityndall/james-river-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Load label mapping
	import requests
	label_mapping_url = f"https://huggingface.co/{model_name}/resolve/main/label_mapping.json"
	label_mapping = requests.get(label_mapping_url).json()

	def classify_text(text):
	# Tokenize input
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class_id = predictions.argmax().item()
	confidence = predictions[0][predicted_class_id].item()

	# Get label
	predicted_label = label_mapping["id2label"][str(predicted_class_id)]

	return {
	"label": predicted_label,
	"confidence": confidence,
	"class_id": predicted_class_id
	}

	# Example usage
	text = "I need a boundary survey for my property"
	result = classify_text(text)
	print(f"Predicted: {result['label']} (confidence: {result['confidence']:.3f})")
	```

	## Training Data

	The model was trained on 1,000 survey-related text messages with the following distribution:

	- Other/General: 919 samples (91.9%)
	- Real Estate Survey: 49 samples (4.9%)
	- Fence Staking: 21 samples (2.1%)
	- Subdivision Survey: 4 samples (0.4%)
	- Boundary Survey: 4 samples (0.4%)
	- Construction Survey: 3 samples (0.3%)

	## Training Details

	- Training Framework: Hugging Face Transformers
	- Base Model: bert-base-uncased
	- Training Epochs: 3
	- Batch Size: 8
	- Learning Rate: 5e-05
	- Optimizer: AdamW
	- Training Loss: 0.279
	- Training Time: ~19.5 minutes

	## Model Performance

	The model achieved a training loss of 0.279 after 3 epochs. However, note that this is a highly imbalanced dataset, and performance on minority classes may vary.

	## Limitations

	- The model was trained on a small, imbalanced dataset
	- Performance on minority classes (Construction Survey, Boundary Survey, Subdivision Survey) may be limited due to few training examples
	- The model may have a bias toward predicting "Other/General" due to class imbalance

	## Intended Use

	This model is specifically designed for classifying survey-related inquiries for James River surveying services. It should not be used for other domains without additional training.

	## Files

	- `config.json`: Model configuration
	- `model.safetensors`: Model weights
	- `tokenizer.json`, `tokenizer_config.json`, `vocab.txt`: Tokenizer files
	- `label_encoder.pkl`: Original scikit-learn label encoder
	- `label_mapping.json`: Human-readable label mappings

	## Citation

	If you use this model, please cite:

	```
	@misc{james-river-classifier,
	title={James River Survey Classifier},
	author={James River Surveying},
	year={2025},
	url={https://huggingface.co/ityndall/james-river-classifier}
	}
	```