--- language: en license: apache-2.0 tags: - distilbert - text-classification - animals - education - final-project datasets: - Isamu136/big-animal-dataset metrics: - accuracy - f1 --- # ZooGuide-BERT Animal Fact Assistant This model is a DistilBERT-based animal text classification model created for an INFOST 470 final project. ## Model Details - **Base model:** distilbert-base-uncased - **Fine-tuned model:** cloudwoowoo/finalprojectanimal - **Task:** Multi-class text classification - **Dataset:** Isamu136/big-animal-dataset - **Dataset link:** https://huggingface.co/datasets/Isamu136/big-animal-dataset ## Dataset This project uses the public Hugging Face dataset `Isamu136/big-animal-dataset`. The original dataset contains animal image examples and caption/class labels. Because DistilBERT is a text model, the public caption/class labels were converted into short animal clue and fact sentences for text classification. ## Training Details - **Model type:** DistilBERT sequence classifier - **Epochs:** 5 - **Learning rate:** 2e-5 - **Batch size:** 16 - **Train/validation/test split:** 70/15/15 - **Focused classes:** 10 - **Training examples:** 2100 - **Validation examples:** 450 - **Test examples:** 450 ## Evaluation Final evaluation results after running the notebook: - **Accuracy:** 1.0000 - **Macro F1:** 1.0000 ## Intended Uses This model is intended for educational demonstrations, animal learning activities, classroom examples, and small text-classification projects. ## Limitations This model is not a general animal expert. It only predicts among the focused animal classes selected from the public dataset. It may perform poorly on animals not included in the selected training labels, vague clues, misspellings, or prompts outside the project domain. ## How to Load the Model ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline model_id = "cloudwoowoo/finalprojectanimal" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) classifier("Tell me a fact about a dog.") ```