lrschuman17 commited on
Commit
75cd848
Β·
verified Β·
1 Parent(s): 1687794

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -8
README.md CHANGED
@@ -1,19 +1,172 @@
1
  ---
2
- title: InjuryDetection
3
- emoji: πŸš€
4
  colorFrom: red
5
  colorTo: red
6
  sdk: docker
7
  app_port: 8501
8
  tags:
9
- - streamlit
 
 
 
 
 
 
10
  pinned: false
11
- short_description: Streamlit template space
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- # Welcome to Streamlit!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
1
  ---
2
+ title: πŸ€ InjuryDetection
3
+ emoji: πŸ€
4
  colorFrom: red
5
  colorTo: red
6
  sdk: docker
7
  app_port: 8501
8
  tags:
9
+ - streamlit
10
+ - transformers
11
+ - nlp
12
+ - pytorch
13
+ - nba
14
+ - healthcare
15
+ - sports
16
  pinned: false
17
+ description: Predict NBA injury type and duration using a fine-tuned DistilBERT + structured features.
18
+
19
+ ---
20
+
21
+ # πŸ€ Injury Detection & Recovery Duration Estimator
22
+
23
+ A powerful **Streamlit app** powered by a fine-tuned **DistilBERT** transformer that predicts:
24
+
25
+ - πŸ” **Injury Type** (e.g., bone, muscle, joint, illness, concussion)
26
+ - ⏳ **Recovery Duration**:
27
+ - `short` (< 7 days)
28
+ - `medium` (7–45 days)
29
+ - `long` (> 45 days)
30
+
31
+ ---
32
+
33
+ ## πŸš€ Why This Project?
34
+
35
+ NBA injuries are unpredictable, and doctors often rely on vague reports or historical intuition. I wanted to go beyond zero-shot text classification by:
36
+
37
+ - Cleaning and normalizing raw injury logs (1950–2022)
38
+ - Labeling 8K+ examples by type and duration
39
+ - Incrementally testing feature combinations
40
+ - Analyzing attention weights and feature influence
41
+ - Making predictions explainable and interactive via Streamlit
42
+
43
+ ---
44
+
45
+ ## 🧠 Model Highlights
46
+
47
+ | Component | Description |
48
+ |------------------|-----------------------------------------|
49
+ | Model | `distilbert-base-uncased` |
50
+ | Task | Dual classification |
51
+ | Inputs | Text + prior injuries, position, type ID|
52
+ | Output Heads | `label_type_id` and `label_duration_id` |
53
+ | Loss | Weighted cross-entropy (multi-task) |
54
+ | Extras | Attention score input for interpretability|
55
+
56
  ---
57
 
58
+ ## πŸ’‘ Features
59
+
60
+ - πŸ“ **Free-text input** for injury reports
61
+ - 🧱 **Structured context**: prior injuries, position, injury category
62
+ - 🎯 **Fine-tuned BERT** with dual-head classification
63
+ - πŸ” **Live predictions** with **confidence scores**
64
+ - πŸ“Š **Built-in feature importance + attention hooks**
65
+ - 🌐 **Streamlit UI** with dropdowns, metrics, and sample cases
66
+
67
+ ---
68
+
69
+ ## πŸ—‚οΈ File Structure
70
+ ```text
71
+ project/
72
+ β”œβ”€β”€ src/
73
+ β”‚ β”œβ”€β”€ streamlit_app.py # main Streamlit UI app
74
+ β”‚ β”œβ”€β”€ predict_utils.py # logic for prediction function
75
+ β”‚ └── final_injury_model.pt # fine-tuned dual-head transformer model (DistilBERT)
76
+ β”œβ”€β”€ requirements.txt # all dependencies (Torch, HF Transformers, Streamlit, etc.)
77
+ β”œβ”€β”€ modeling_notebooks/ # experimentation, feature importance, attention
78
+ β”œβ”€β”€ cleaned_data/ # cleaned dataset used for training
79
+ β”œβ”€β”€ raw_data/ # original source CSVs
80
+ └── README.md # this file
81
+ ```
82
+
83
+ ---
84
+
85
+ ## βš™οΈ Setup
86
+
87
+ 1. **Install dependencies**
88
+
89
+ ```bash
90
+ pip install -r requirements.txt
91
+ ```
92
+ 2. **Run App**
93
+ ```bash
94
+ streamlit run app.py
95
+ ```
96
+ ---
97
+
98
+ ## βš™οΈ Model Overview
99
+
100
+ | **Component** | **Description** |
101
+ | ---------------------- | --------------------------------------------------------- |
102
+ | **Base Model** | `distilbert-base-uncased` |
103
+ | **Input** | Injury description text + structured inputs |
104
+ | **Structured Inputs** | Prior injuries, position ID, injury type ID |
105
+ | **Output Heads** | `label_type_id`, `label_duration_id` |
106
+ | **Optimization** | Multi-task cross-entropy loss |
107
+ | **Performance Boosts** | Attention score injection, feature dropout, class weights |
108
+
109
+ ---
110
+
111
+ ## πŸ“ˆ Results Summary
112
+
113
+ | Metric | Value | Description |
114
+ |---------------------|---------|---------------------------------------------------------------|
115
+ | **Type Accuracy** | 99.5% | Nearly perfect prediction for general injury type |
116
+ | **Duration Accuracy** | 65.0% | More challenging task due to overlap in medium/long classes |
117
+ | **Macro F1 (Duration)** | ~0.64 | Balanced F1 across duration classes |
118
+ | **Most Confused Pair** | `long` vs `medium` | Long and medium often overlapped in symptoms and context |
119
+ | **Evaluation Set Size** | 200 samples | Held-out test subset from full dataset |
120
+ | **Unknown Labels Removed** | βœ… | Improved class balance and duration accuracy |
121
+ | **Feature Importance** | moderate | Prior injuries most useful, attention least influential |
122
+
123
+ ---
124
+
125
+ ## What I learned
126
+
127
+ Structured + Textual fusion drastically improves performance over text-only
128
+
129
+ Class balancing + weighted loss helped fix bias toward short injuries
130
+
131
+ Adding features like injury type, position, prior injuries sequentially allowed modular experimentation
132
+
133
+ Attention scores showed low influence, but modeling it validated model interpretability
134
+
135
+ ---
136
+
137
+ ## πŸ’¬ Sample Predictions
138
+
139
+ "torn ACL expected to miss rest of season"
140
+ β†’ Type: **joint** (99%), Duration: **long** (91%)
141
+
142
+ "minor hamstring strain"
143
+ β†’ Type: **muscle** (96%), Duration: **short** (87%)
144
+
145
+ "fractured tibia, placed on IL"
146
+ β†’ Type: **bone** (98%), Duration: **long** (89%)
147
+
148
+ ---
149
+
150
+ ## Hugging Face Transformers & Datasets
151
+
152
+ PyTorch, Scikit-learn, and Streamlit for all underlying tech
153
+
154
+ ---
155
+
156
+ ## Datasets
157
+ - Player_Info: [NBA Players stats since 1950](https://www.kaggle.com/datasets/drgilermo/nba-players-stats?select=player_data.csv)
158
+ - Player_Info:[NBA Players data (1950 to 2022)](https://www.kaggle.com/datasets/blitzapurv/nba-players-data-1950-to-2021?select=player_data.csv)
159
+ - Injury Info: [πŸ€ NBA Injury Stats (1951–2023)](https://www.kaggle.com/datasets/loganlauton/nba-injury-stats-1951-2023)
160
+
161
+ ---
162
+
163
+ ## 🧠 Try it Out Now
164
+ Enter a short injury description, and get:
165
+
166
+ βœ… Predicted injury type
167
+
168
+ ⏳ Estimated recovery time
169
 
170
+ πŸ“ˆ Confidence scores
171
 
172
+ ✨ Fork this space and customize it to your league, team, or medical use case.