nlp-testing-setfit / README.md

Upload README.md with huggingface_hub

2777e79 verified 6 days ago

4.44 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: setfit
	tags:
	- setfit
	- sentence-transformers
	- text-classification
	- sentiment-analysis
	- few-shot-learning
	pipeline_tag: text-classification
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: SetFit Sentiment Analysis
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9
	- name: F1 (Weighted)
	type: f1
	value: 0.8984430773904458
	- name: Precision (Weighted)
	type: precision
	value: 0.9060606060606061
	- name: Recall (Weighted)
	type: recall
	value: 0.9
	---

	# SetFit Sentiment Analysis Model

	This is a [SetFit](https://github.com/huggingface/setfit) model fine-tuned for sentiment classification on customer feedback data.

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \|
	\| Total Parameters \| 109,482,240 \|
	\| Trainable Parameters \| 109,482,240 \|
	\| Body Parameters \| 109,482,240 \|
	\| Head Parameters \| 0 \|
	\| Model Size \| 417.64 MB \|
	\| Labels \| [0, 1, 2, 3, 4] \|
	\| Number of Classes \| 5 \|
	\| Serialization \| safetensors \|

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Batch Size \| 4 \|
	\| Epochs \| [1, 16] \|
	\| Training Samples \| 540 \|
	\| Test Samples \| 100 \|
	\| Loss Function \| CosineSimilarityLoss \|
	\| Metric for Best Model \| embedding_loss \|

	### Training Progress

	- Initial Loss: 0.1474
	- Final Loss: 0.0648
	- Eval Loss: 0.0918
	- Training Runtime: 2943.9747 seconds
	- Samples/Second: 3.6690

	## Evaluation Results

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 0.9000 \|
	\| F1 (Weighted) \| 0.8984 \|
	\| F1 (Macro) \| 0.8984 \|
	\| Precision (Weighted) \| 0.9061 \|
	\| Precision (Macro) \| 0.9061 \|
	\| Recall (Weighted) \| 0.9000 \|
	\| Recall (Macro) \| 0.9000 \|

	### Per-Class Performance

	```
	precision recall f1-score support

	0 0.86 0.95 0.90 20
	1 0.83 0.75 0.79 20
	2 0.83 1.00 0.91 20
	3 1.00 0.80 0.89 20
	4 1.00 1.00 1.00 20

	accuracy 0.90 100
	macro avg 0.91 0.90 0.90 100
	weighted avg 0.91 0.90 0.90 100

	```

	## Visualizations

	### Evaluation Metrics Overview
	<p align="center">
	<img src="evaluation_metrics.png" alt="Evaluation Metrics" width="800"/>
	</p>

	### Confusion Matrix
	<p align="center">
	<img src="confusion_matrix.png" alt="Confusion Matrix" width="600"/>
	</p>

	### Training Loss Curve
	<p align="center">
	<img src="loss_curve.png" alt="Training Loss Curve" width="600"/>
	</p>

	### Learning Rate Schedule
	<p align="center">
	<img src="learning_rate.png" alt="Learning Rate Schedule" width="600"/>
	</p>

	## Usage

	```python
	from setfit import SetFitModel

	# Load the model
	model = SetFitModel.from_pretrained("loganh274/nlp-testing-setfit")

	# Single prediction
	text = "This product exceeded my expectations!"
	prediction = model.predict([text])
	print(f"Sentiment: {prediction[0]}")

	# Batch prediction
	texts = [
	"Amazing quality, highly recommend!",
	"It's okay, nothing special.",
	"Terrible experience, very disappointed.",
	]
	predictions = model.predict(texts)
	probabilities = model.predict_proba(texts)

	for text, pred, prob in zip(texts, predictions, probabilities):
	print(f"Text: {text}")
	print(f" Prediction: {pred}, Confidence: {max(prob):.2%}")
	```

	## Label Mapping

	\| Label \| Sentiment \|
	\|-------\|-----------\|
	\| 0 \| Negative \|
	\| 1 \| Somewhat Negative \|
	\| 2 \| Neutral \|
	\| 3 \| Somewhat Positive \|
	\| 4 \| Positive \|

	## Environment

	\| Package \| Version \|
	\|---------\|---------\|
	\| Python \| 3.11.14 \|
	\| SetFit \| 1.1.3 \|
	\| PyTorch \| 2.9.1 \|
	\| scikit-learn \| 1.8.0 \|
	\| Transformers \| N/A \|

	## Citation

	If you use this model, please cite the SetFit paper:

	```bibtex
	@article{tunstall2022efficient,
	title={Efficient Few-Shot Learning Without Prompts},
	author={Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
	journal={arXiv preprint arXiv:2209.11055},
	year={2022}
	}
	```

	## License

	Apache 2.0