AndrewMaru's picture
Update README.md
ee0f68d verified
---
license: mit
tags:
- sklearn
- classification
- healthcare
- lung-cancer
- streamlit
library_name: scikit-learn
model_name: Datathon Lung Cancer Detector
datasets:
- custom
language: en
---
# 🫁 Datathon Lung Cancer Detector
This model predicts whether a patient is likely to have lung cancer based on clinical and behavioral risk factors.
It was trained on a dataset of 309 entries with 15 input features and a binary diagnosis label.
---
## πŸ“Š Input Features
| Feature | Type | Description |
|------------------------|----------|---------------------------------|
| `GENDER` | 0 = Female, 1 = Male | Biological sex |
| `AGE` | Integer | Patient age |
| `SMOKING` | 0/1 | Smoking habit |
| `YELLOW_FINGERS` | 0/1 | Stained fingers from smoking |
| `ANXIETY` | 0/1 | Anxiety symptoms |
| `PEER_PRESSURE` | 0/1 | Influence from peers |
| `CHRONIC DISEASE` | 0/1 | History of chronic illness |
| `FATIGUE` | 0/1 | Feeling of tiredness |
| `ALLERGY` | 0/1 | Known allergies |
| `WHEEZING` | 0/1 | Wheezing symptoms |
| `ALCOHOL CONSUMING` | 0/1 | Alcohol consumption |
| `COUGHING` | 0/1 | Persistent coughing |
| `SHORTNESS OF BREATH` | 0/1 | Difficulty breathing |
| `SWALLOWING DIFFICULTY`| 0/1 | Trouble swallowing |
| `CHEST PAIN` | 0/1 | Pain in chest area |
---
## 🧠 Model Info
- **Algorithm**: XG Boost Classifier(Highest Score)
- **Framework**: Scikit-learn
- **Target**: `DIAGNOSIS_LUNG_CANCER` (`YES` = Lung Cancer, `NO` = No Cancer)
- **Dataset Size**: 309 samples
- **Preprocessing**: Label encoding, binary encoding for yes/no inputs
---
## πŸš€ Try It in Streamlit
This model is also available as a web app built using [Streamlit]. Access on https://datathonlungcancer.streamlit.app/
```python
import streamlit as st
import pandas as pd
import joblib
model = joblib.load('model.pkl')
st.title('🫁 Lung Cancer Diagnosis')
st.write("Please fill out the following information to assess the likelihood of lung cancer.")
gender = st.selectbox('Gender', [0, 1], format_func=lambda x: "Female" if x == 0 else "Male")
age = st.number_input('Age', max_value=120, value=0)
smoking = st.selectbox('Smoking', ['Yes', 'No'])
yellow_fingers = st.selectbox('Yellow Fingers', ['Yes', 'No'])
anxiety = st.selectbox('Anxiety', ['Yes', 'No'])
peer_pressure = st.selectbox('Peer Pressure', ['Yes', 'No'])
chronic_disease = st.selectbox('Chronic Disease', ['Yes', 'No'])
fatigue = st.selectbox('Fatigue', ['Yes', 'No'])
allergy = st.selectbox('Allergy', ['Yes', 'No'])
wheezing = st.selectbox('Wheezing', ['Yes', 'No'])
alcohol = st.selectbox('Alcohol Consuming', ['Yes', 'No'])
coughing = st.selectbox('Coughing', ['Yes', 'No'])
shortness_of_breath = st.selectbox('Shortness of Breath', ['Yes', 'No'])
swallowing_difficulty = st.selectbox('Swallowing Difficulty', ['Yes', 'No'])
chest_pain = st.selectbox('Chest Pain', ['Yes', 'No'])
def binary_encode(value):
return 1 if value == 'Yes' else 0
data = pd.DataFrame([[gender, age,
binary_encode(smoking),
binary_encode(yellow_fingers),
binary_encode(anxiety),
binary_encode(peer_pressure),
binary_encode(chronic_disease),
binary_encode(fatigue),
binary_encode(allergy),
binary_encode(wheezing),
binary_encode(alcohol),
binary_encode(coughing),
binary_encode(shortness_of_breath),
binary_encode(swallowing_difficulty),
binary_encode(chest_pain)]],
columns=['GENDER', 'AGE', 'SMOKING', 'YELLOW_FINGERS', 'ANXIETY',
'PEER_PRESSURE', 'CHRONIC DISEASE', 'FATIGUE', 'ALLERGY',
'WHEEZING', 'ALCOHOL CONSUMING', 'COUGHING',
'SHORTNESS OF BREATH', 'SWALLOWING DIFFICULTY', 'CHEST PAIN'])
if st.button('Predict'):
prediction = model.predict(data)[0]
if prediction == 1:
st.error("⚠️ High risk of lung cancer. Please consult a doctor.")
else:
st.success("βœ… No Lung Cancer.")