GivingTuesday
/

religious_org_v1

Safetensors

English

bert

Model card Files Files and versions

xet

Community

hassaanulhaq01 commited on Nov 19, 2025

Commit

e4484d7

verified ·

1 Parent(s): 30b417a

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -11,21 +11,20 @@ language:
 ---
 **Technical Specifications Document is available at**: https://docs.google.com/document/d/1eLUFC-8FtJkaQT9dUhjwRRKn8bXrHaZsXdMlIvCoeT4/edit?usp=sharing
 # Non-Profit Mapping Project Documentation: Religious Orgs Segmentation
-Author: Zilun Lin - GivingTuesday Data Commons
-**Note for external readers**: Databricks links in this document point to internal notebooks and may not be accessible to people outside GivingTuesday.
 # 1\. Approach
 ## Definition
 We use the following definition for categorizing religious orgs provided in the academic literature:
-“Religious organizations are organizations whose identity and mission are derived from a religious or spiritual tradition and which operate as registered or unregistered, nonprofit, voluntary entities.” ([source](https://www.montclair.edu/profilepages/media/11259/user/religiousorganizationsglobalencyclope.pdf))
 This definition is operationalized in how we prompt GPT 4 to classify the training and testing datasets. Namely, we give it information on the name, mission statement and key activities and prompt it to find mentions/wording/terminology that reveal an org’s religious affiliations.
 ## Religious Recipient Orgs
@@ -46,17 +45,17 @@ All of the notebooks should be reasonably documented. Please message Zilun Lin i
 This notebook randomly samples from the 990 datamart and classifies the sample orgs using GPT4. It also generates a curated dataset of artificial orgs that are associated with under-represented religions. These two datasets are combined, formatted into an appropriate instruct-prompt-output format for fine-tuning and uploaded to HuggingFace. The final dataset has over 2k examples for training and validation, and 500 examples for testing.
-[https://dbc-3a4d04f2-8cab.cloud.databricks.com/editor/notebooks/1182041857993717?o=4203893953353865](https://dbc-3a4d04f2-8cab.cloud.databricks.com/editor/notebooks/1182041857993717?o=4203893953353865)
 ## Fine-tuning the LLM and testing for accuracy
-We downloaded the fine-tuning dataset from HuggingFace and fine-tune a set of LLMs. The resulting models are uploaded to HuggingFace. We also test these model’s accuracy on an unseen testing dataset.
 (Llama Models)
-[https://colab.research.google.com/drive/1tZBVcQ\_XQeb11HUBKxKjPTGBhMwJCiDF?usp=sharing](https://colab.research.google.com/drive/1tZBVcQ_XQeb11HUBKxKjPTGBhMwJCiDF?usp=sharing)
 (Bert models)
-[https://colab.research.google.com/drive/1OaV9wwqCzWqRXFmKzzDYW3Hwq\_zwaUE5?usp=sharing](https://colab.research.google.com/drive/1OaV9wwqCzWqRXFmKzzDYW3Hwq_zwaUE5?usp=sharing)
 # 3\. Outputs and Results
@@ -89,7 +88,7 @@ In comparison, BERT is much faster for inference, thanks to its streamlined mode
 # 4\. Deployment
-The chosen BERT model is now hosted on MLFlow (Databricks) in the model registry under the name \`religious\_orgs\_model\`, and has been released to the public under the apache-2 license on [Huggingface](https://huggingface.co/GivingTuesday/religious_org_v1). The processed data will be available for download in a data mart or API.
 The API endpoint will output five fields, three for BERT classification and two based on 1023 EZ data availability:
 BERT Natural Language Outputs:
@@ -98,3 +97,4 @@ BERT Natural Language Outputs:
 (3) Classification probability for whether the organisation is religious or not (and probability)

 ---
 **Technical Specifications Document is available at**: https://docs.google.com/document/d/1eLUFC-8FtJkaQT9dUhjwRRKn8bXrHaZsXdMlIvCoeT4/edit?usp=sharing
+---------------------------------------------------------------------------------------------------------------------------------------------------------
 # Non-Profit Mapping Project Documentation: Religious Orgs Segmentation
+**Author**: Zilun Lin \- GivingTuesday Data Commons
+**Note for external readers:** Some Databricks links in this document point to internal notebooks and may not be accessible to people outside GivingTuesday.
 # 1\. Approach
 ## Definition
 We use the following definition for categorizing religious orgs provided in the academic literature:
+“Religious organizations are organizations whose identity and mission are derived from a religious or spiritual tradition and which operate as registered or unregistered, nonprofit, voluntary entities.” ([Source: Montclair.ed](https://www.montclair.edu/profilepages/media/11259/user/religiousorganizationsglobalencyclope.pdf))
 This definition is operationalized in how we prompt GPT 4 to classify the training and testing datasets. Namely, we give it information on the name, mission statement and key activities and prompt it to find mentions/wording/terminology that reveal an org’s religious affiliations.
 ## Religious Recipient Orgs
 This notebook randomly samples from the 990 datamart and classifies the sample orgs using GPT4. It also generates a curated dataset of artificial orgs that are associated with under-represented religions. These two datasets are combined, formatted into an appropriate instruct-prompt-output format for fine-tuning and uploaded to HuggingFace. The final dataset has over 2k examples for training and validation, and 500 examples for testing.
+[Link to EDA Notebook (Databricks)](https://dbc-3a4d04f2-8cab.cloud.databricks.com/editor/notebooks/1182041857993717?o=4203893953353865)
 ## Fine-tuning the LLM and testing for accuracy
+We downloaded the fine-tuning dataset from HuggingFace and fine-tuned a set of LLMs. The resulting models are uploaded to HuggingFace. We also test these model’s accuracy on an unseen testing dataset.
 (Llama Models)
+[Llama Model Fine-tuning (Google Collab)](https://colab.research.google.com/drive/1tZBVcQ_XQeb11HUBKxKjPTGBhMwJCiDF?usp=sharing)
 (Bert models)
+[BERT Model Fine-tuning (Google Collab)](https://colab.research.google.com/drive/1OaV9wwqCzWqRXFmKzzDYW3Hwq_zwaUE5?usp=sharing)
 # 3\. Outputs and Results
 # 4\. Deployment
+The curated BERT model is now hosted on MLFlow (Databricks) in the model registry under the name \`religious\_orgs\_model\`, and has been released to the public under the apache-2 license on [Huggingface](https://huggingface.co/GivingTuesday/religious_org_v1). The processed data will be available for download in a data mart or API.
 The API endpoint will output five fields, three for BERT classification and two based on 1023 EZ data availability:
 BERT Natural Language Outputs:
 (3) Classification probability for whether the organisation is religious or not (and probability)