--- language: en datasets: yahoo_answers_topics tags: - text-classification - topic-classification - yahoo-answers - distilbert - transformers - pytorch license: apache-2.0 model-index: - name: DistilBERT Yahoo Answers Classifier results: - task: name: Topic Classification type: text-classification dataset: name: Yahoo Answers Topics type: yahoo_answers_topics metrics: - name: Accuracy type: accuracy value: 0.71 --- # DistilBERT Fine-Tuned on Yahoo Answers Topics This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc. ## ๐Ÿง  Model Details - **Base model**: `distilbert-base-uncased` - **Task**: Multi-class Text Classification (10 classes) - **Dataset**: Yahoo Answers Topics - **Training samples**: 50,000 (subset) - **Evaluation samples**: 5,000 (subset) - **Metrics**: Accuracy ## ๐Ÿงช How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers") model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers") text = "How do I improve my math skills for competitive exams?" inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) predicted_class = outputs.logits.argmax(dim=1).item() print("Predicted class:", predicted_class) ```` ## ๐Ÿ“Š Classes (Labels) 0. Society & Culture 1. Science & Mathematics 2. Health 3. Education & Reference 4. Computers & Internet 5. Sports 6. Business & Finance 7. Entertainment & Music 8. Family & Relationships 9. Politics & Government ## ๐Ÿ“ฆ Training Details * Optimizer: AdamW * Learning rate: 2e-5 * Batch size: 16 (train), 32 (eval) * Epochs: 3 * Weight decay: 0.01 * Framework: PyTorch + ๐Ÿค— Transformers ## ๐Ÿ“ Repository Structure * `config.json` โ€“ Model config * `pytorch_model.bin` โ€“ Trained model weights * `tokenizer.json`, `vocab.txt` โ€“ Tokenizer files ## โœ๏ธ Author * Hugging Face Hub: [Koushim](https://huggingface.co/Koushim) * Model trained using `transformers.Trainer` API ## ๐Ÿ“„ License Apache 2.0 ````