File size: 4,599 Bytes
32a0028
 
 
 
 
 
 
c2bde20
32a0028
 
 
 
 
 
c2bde20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee0f68d
c2bde20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: mit
tags:
  - sklearn
  - classification
  - healthcare
  - lung-cancer
  - streamlit
library_name: scikit-learn
model_name: Datathon Lung Cancer Detector
datasets:
  - custom
language: en
---

# 🫁 Datathon Lung Cancer Detector

This model predicts whether a patient is likely to have lung cancer based on clinical and behavioral risk factors.  
It was trained on a dataset of 309 entries with 15 input features and a binary diagnosis label.

---

## 📊 Input Features

| Feature                 | Type     | Description                     |
|------------------------|----------|---------------------------------|
| `GENDER`               | 0 = Female, 1 = Male | Biological sex |
| `AGE`                  | Integer  | Patient age                     |
| `SMOKING`              | 0/1      | Smoking habit                   |
| `YELLOW_FINGERS`       | 0/1      | Stained fingers from smoking    |
| `ANXIETY`              | 0/1      | Anxiety symptoms                |
| `PEER_PRESSURE`        | 0/1      | Influence from peers            |
| `CHRONIC DISEASE`      | 0/1      | History of chronic illness      |
| `FATIGUE`              | 0/1      | Feeling of tiredness            |
| `ALLERGY`              | 0/1      | Known allergies                 |
| `WHEEZING`             | 0/1      | Wheezing symptoms               |
| `ALCOHOL CONSUMING`    | 0/1      | Alcohol consumption             |
| `COUGHING`             | 0/1      | Persistent coughing             |
| `SHORTNESS OF BREATH`  | 0/1      | Difficulty breathing            |
| `SWALLOWING DIFFICULTY`| 0/1      | Trouble swallowing              |
| `CHEST PAIN`           | 0/1      | Pain in chest area              |

---

## 🧠 Model Info

- **Algorithm**: XG Boost Classifier(Highest Score)
- **Framework**: Scikit-learn
- **Target**: `DIAGNOSIS_LUNG_CANCER` (`YES` = Lung Cancer, `NO` = No Cancer)
- **Dataset Size**: 309 samples
- **Preprocessing**: Label encoding, binary encoding for yes/no inputs

---

## 🚀 Try It in Streamlit

This model is also available as a web app built using [Streamlit]. Access on https://datathonlungcancer.streamlit.app/

```python
import streamlit as st
import pandas as pd
import joblib

model = joblib.load('model.pkl')

st.title('🫁 Lung Cancer Diagnosis')
st.write("Please fill out the following information to assess the likelihood of lung cancer.")

gender = st.selectbox('Gender', [0, 1], format_func=lambda x: "Female" if x == 0 else "Male")
age = st.number_input('Age', max_value=120, value=0)
smoking = st.selectbox('Smoking', ['Yes', 'No'])
yellow_fingers = st.selectbox('Yellow Fingers', ['Yes', 'No'])
anxiety = st.selectbox('Anxiety', ['Yes', 'No'])
peer_pressure = st.selectbox('Peer Pressure', ['Yes', 'No'])
chronic_disease = st.selectbox('Chronic Disease', ['Yes', 'No'])
fatigue = st.selectbox('Fatigue', ['Yes', 'No'])
allergy = st.selectbox('Allergy', ['Yes', 'No'])
wheezing = st.selectbox('Wheezing', ['Yes', 'No'])
alcohol = st.selectbox('Alcohol Consuming', ['Yes', 'No'])
coughing = st.selectbox('Coughing', ['Yes', 'No'])
shortness_of_breath = st.selectbox('Shortness of Breath', ['Yes', 'No'])
swallowing_difficulty = st.selectbox('Swallowing Difficulty', ['Yes', 'No'])
chest_pain = st.selectbox('Chest Pain', ['Yes', 'No'])

def binary_encode(value):
    return 1 if value == 'Yes' else 0

data = pd.DataFrame([[gender, age,
                      binary_encode(smoking),
                      binary_encode(yellow_fingers),
                      binary_encode(anxiety),
                      binary_encode(peer_pressure),
                      binary_encode(chronic_disease),
                      binary_encode(fatigue),
                      binary_encode(allergy),
                      binary_encode(wheezing),
                      binary_encode(alcohol),
                      binary_encode(coughing),
                      binary_encode(shortness_of_breath),
                      binary_encode(swallowing_difficulty),
                      binary_encode(chest_pain)]],
                    columns=['GENDER', 'AGE', 'SMOKING', 'YELLOW_FINGERS', 'ANXIETY',
                             'PEER_PRESSURE', 'CHRONIC DISEASE', 'FATIGUE', 'ALLERGY',
                             'WHEEZING', 'ALCOHOL CONSUMING', 'COUGHING',
                             'SHORTNESS OF BREATH', 'SWALLOWING DIFFICULTY', 'CHEST PAIN'])

if st.button('Predict'):
    prediction = model.predict(data)[0]
    if prediction == 1:
        st.error("⚠️ High risk of lung cancer. Please consult a doctor.")
    else:
        st.success("✅ No Lung Cancer.")