| # Text classification demo (Hugging Face) | |
| This repo contains a minimal example to fine-tune a Hugging Face model for text classification. | |
| Quick start (PowerShell): | |
| 1. Activate your venv: | |
| ```powershell | |
| & "C:\Users\Humberto Arias\recipe_bot\venv\Scripts\Activate.ps1" | |
| ``` | |
| 2. Install dependencies: | |
| ```powershell | |
| pip install --upgrade pip | |
| pip install transformers datasets accelerate evaluate huggingface-hub | |
| ``` | |
| 3. Smoke test: | |
| ```powershell | |
| python text_classification_demo.py --smoke-test | |
| ``` | |
| 4. Prepare `data/train.csv` with `text,label` columns and run training: | |
| ```powershell | |
| python text_classification_demo.py --train_file data/train.csv --model_name_or_path bert-base-uncased --output_dir ./outputs | |
| ``` | |
| Notes: | |
| - This example is intentionally minimal for learning. For larger runs, use `accelerate` and GPU instances. | |
| - To push to the Hub, `huggingface-cli login` then `trainer.push_to_hub()` can be added. | |
| Model on the Hub | |
| ----------------- | |
| The demo model was pushed to: https://huggingface.co/x2-world/recipe-bert | |
| Example inference (after pushing to Hub): | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline | |
| model_id = "x2-world/recipe-bert" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) | |
| clf = pipeline('text-classification', model=model, tokenizer=tokenizer) | |
| print(clf('The pizza was great')) | |
| ``` | |