|
|
--- |
|
|
Developed by: GivingTuesday Data Commons |
|
|
license: apache-2.0 |
|
|
Model Type: Regex Classifier |
|
|
Training Data: 3.6k examples from the 990 database |
|
|
Accuracy: Weighted F1 score of 0.948 on a test dataset of 500 examples. |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
## Notebooks |
|
|
|
|
|
- [Schedule O classifier notebook](https://huggingface.co/GivingTuesday/schedule_o/blob/main/notebooks/schedule_o_classifier_notebook.ipynb) |
|
|
# Details |
|
|
|
|
|
This model (Refer to [Notebooks](https://huggingface.co/GivingTuesday/schedule_o/blob/main/notebooks/schedule_o_classifier_notebook.ipynb) section of the files in the current repository) classifies open-ended Schedule O text from IRS Forms 990 and 990-EZ into the specific part of the return that the filer is referencing. Given the filer’s narrative description of which section they are providing supplemental information for, the model returns a single standardized label from the following set: `I EZ`, `II EZ`, `III EZ`, `V EZ`, `III`, `V`, `VI`, `VII`, `IX`, `XI`, `XII`, or `Unknown`. This enables consistent tagging, aggregation, and analysis of Schedule O content across both Form 990 and Form 990-EZ filings. |
|
|
|
|
|
## Author |
|
|
**The model was developed by: Zilun Lin - GivingTuesday Data Commons** |
|
|
|
|
|
*Note*: In implementation, be sure to adjust any source and target table references to match your specific environment. |