NIHRDataInsights
/

HRCSResearchActivityCodes

@@ -86,7 +86,7 @@ Overall RAG Metrics:
 For a comprehensive breakdown of the model's performance, including Overall Metrics, Metrics per Category across both validation and test sets and Metrics per Funder across the validation set, please refer to the detailed evaluation spreadsheet included in this repository.
-**Download/View the Evaluation Results](https://huggingface.co/NIHRDataInsights/HRCSResearchActivityCodes/resolve/main/evaluation/health_category_rac_evaluation_results.xlsx)** *(Located in the `Files and versions` tab of this repository)*.
 ## Intended use
 This model is intended for:
@@ -106,18 +106,27 @@ This model is intended for:
 * **Annotation Ambiguity and Niche Categories:** The model's performance reflects the historical consistency of human coding within the training data. Categories that are historically difficult for human coders to classify consistently under HRCS guidelines (such as 7.1, 8.1 and 8.3) are naturally more challenging for the model.
 ## Inference / How to use
-A companion script is provided to run this model (and the companion health category model) on new award data.
-**The script:**
-1. Loads the trained model and tokenizer
-2. Applies sigmoid to obtain probabilities
-3. Converts probabilities to labels using the per-category thresholds stored in `metadata.json`
-4. Outputs a CSV containing predicted Health Categories and confidence indicators
-**Expected input format:**
-The script expects a CSV containing at minimum: `AwardTitle`, `AwardAbstract`. Optional columns such as `ID` or `FunderAcronym` will be preserved in the output.
-See the inference script in this repository for full usage details.
 ## Selective automation and human-in-the-loop use
 In addition to predicted labels, the inference script reports how close each prediction is to the model’s decision boundary in logit space. This is computed as the smallest absolute difference between any category’s logit and its corresponding decision threshold.

 For a comprehensive breakdown of the model's performance, including Overall Metrics, Metrics per Category across both validation and test sets and Metrics per Funder across the validation set, please refer to the detailed evaluation spreadsheet included in this repository.
+**[Download/View the Evaluation Results](https://huggingface.co/NIHRDataInsights/HRCSResearchActivityCodes/resolve/main/evaluation/health_category_rac_evaluation_results.xlsx)** *(Located in the `Files and versions` tab of this repository)*.
 ## Intended use
 This model is intended for:
 * **Annotation Ambiguity and Niche Categories:** The model's performance reflects the historical consistency of human coding within the training data. Categories that are historically difficult for human coders to classify consistently under HRCS guidelines (such as 7.1, 8.1 and 8.3) are naturally more challenging for the model.
 ## Inference / How to use
+We have provided a ready-to-use Python script that runs both this model (Research Activity Codes) and a Health Categories model on new award data simultaneously.
+You can download the script and a sample dataset directly from the 'inference' subfolder in the **Files and version** tab of this repository.
+### Instructions
+**Prerequisites:**
+1. Download the script and test data to your computer from the inference subfolder.
+2. Open your terminal or command prompt and install the required libraries by running:
+   `pip install torch pandas numpy tqdm transformers huggingface_hub`
+3. Use the 'test_data.csv' or prepare a CSV file in the same format containing your grant data. It **must** include two columns named exactly: `AwardTitle` and `AwardAbstract`.
+**Running the Code:**
+1. Open the script.
+2. Under the `# --- USER SETTINGS ---` section, update `DATA_FOLDERS` to point to the folder containing your CSV. *(Leave it as `["./"]` if your CSV is in the same folder as the script).*
+3. Update `TEST_FILENAME` to match the name of your CSV.
+4. Run the script.
+The script will automatically download the necessary AI models, process your text, and output a new CSV containing the predicted categories and an "AI Certainty Score" (`SmallestLogitDiff`) to help you identify which borderline grants require human review.
 ## Selective automation and human-in-the-loop use
 In addition to predicted labels, the inference script reports how close each prediction is to the model’s decision boundary in logit space. This is computed as the smallest absolute difference between any category’s logit and its corresponding decision threshold.