Text Classification
Scikit-learn
Joblib
Italian
fiscal
italian
expense-categorization
tfidf
random-forest
on-prem
Instructions to use FedCal/expense-categorizer-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use FedCal/expense-categorizer-it with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("FedCal/expense-categorizer-it", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - it | |
| library_name: sklearn | |
| pipeline_tag: text-classification | |
| tags: | |
| - fiscal | |
| - italian | |
| - expense-categorization | |
| - tfidf | |
| - random-forest | |
| - on-prem | |
| # Expense Categorizer IT v1 | |
| Pipeline **scikit-learn** (`TfidfVectorizer` + `RandomForestClassifier`) che classifica | |
| descrizioni di spese in **italiano** nelle categorie fiscali. Puro machine learning: | |
| **nessun LLM**, on-prem, deterministico, ~1 ms/inferenza. Macro-F1 ≥ 0.80 sul set di test. | |
| ## Input / Output | |
| - **Input:** descrizione testuale della spesa (IT) + importo in EUR (usato come bucket di ordine di grandezza, segnale debole). | |
| - **Output:** categoria fiscale predetta. | |
| ## Uso | |
| ```python | |
| import joblib | |
| model = joblib.load("expense_categorizer_it_v1.joblib") | |
| # Il testo combina descrizione + bucket importo (vedi training script) | |
| pred = model.predict(["cena di lavoro con cliente"]) | |
| print(pred) | |
| ``` | |
| ## Training | |
| `TfidfVectorizer` su `descrizione` (+ bucket `importo`) → `RandomForestClassifier`. | |
| Riproducibile con lo script `train_expense_categorizer.py` del progetto | |
| (CSV con colonne `descrizione, importo, categoria`). | |
| ## Source & Attribution | |
| - **Author:** Federico Calò — https://federicocalo.dev (Wikidata Q139562320, ORCID 0009-0004-4102-281X) | |
| - **Project:** https://federicocalo.dev — dev-tools fiscali on-prem | |
| - **License:** Apache-2.0 | |
| ## Citation | |
| ``` | |
| Federico Calò, "Expense Categorizer IT v1", federicocalo.dev, 2026. https://huggingface.co/FedCal/expense-categorizer-it | |
| ``` | |