--- license: mit tags: - sklearn - classification - healthcare - lung-cancer - streamlit library_name: scikit-learn model_name: Datathon Lung Cancer Detector datasets: - custom language: en --- # 🫁 Datathon Lung Cancer Detector This model predicts whether a patient is likely to have lung cancer based on clinical and behavioral risk factors. It was trained on a dataset of 309 entries with 15 input features and a binary diagnosis label. --- ## 📊 Input Features | Feature | Type | Description | |------------------------|----------|---------------------------------| | `GENDER` | 0 = Female, 1 = Male | Biological sex | | `AGE` | Integer | Patient age | | `SMOKING` | 0/1 | Smoking habit | | `YELLOW_FINGERS` | 0/1 | Stained fingers from smoking | | `ANXIETY` | 0/1 | Anxiety symptoms | | `PEER_PRESSURE` | 0/1 | Influence from peers | | `CHRONIC DISEASE` | 0/1 | History of chronic illness | | `FATIGUE` | 0/1 | Feeling of tiredness | | `ALLERGY` | 0/1 | Known allergies | | `WHEEZING` | 0/1 | Wheezing symptoms | | `ALCOHOL CONSUMING` | 0/1 | Alcohol consumption | | `COUGHING` | 0/1 | Persistent coughing | | `SHORTNESS OF BREATH` | 0/1 | Difficulty breathing | | `SWALLOWING DIFFICULTY`| 0/1 | Trouble swallowing | | `CHEST PAIN` | 0/1 | Pain in chest area | --- ## 🧠 Model Info - **Algorithm**: XG Boost Classifier(Highest Score) - **Framework**: Scikit-learn - **Target**: `DIAGNOSIS_LUNG_CANCER` (`YES` = Lung Cancer, `NO` = No Cancer) - **Dataset Size**: 309 samples - **Preprocessing**: Label encoding, binary encoding for yes/no inputs --- ## 🚀 Try It in Streamlit This model is also available as a web app built using [Streamlit]. Access on https://datathonlungcancer.streamlit.app/ ```python import streamlit as st import pandas as pd import joblib model = joblib.load('model.pkl') st.title('🫁 Lung Cancer Diagnosis') st.write("Please fill out the following information to assess the likelihood of lung cancer.") gender = st.selectbox('Gender', [0, 1], format_func=lambda x: "Female" if x == 0 else "Male") age = st.number_input('Age', max_value=120, value=0) smoking = st.selectbox('Smoking', ['Yes', 'No']) yellow_fingers = st.selectbox('Yellow Fingers', ['Yes', 'No']) anxiety = st.selectbox('Anxiety', ['Yes', 'No']) peer_pressure = st.selectbox('Peer Pressure', ['Yes', 'No']) chronic_disease = st.selectbox('Chronic Disease', ['Yes', 'No']) fatigue = st.selectbox('Fatigue', ['Yes', 'No']) allergy = st.selectbox('Allergy', ['Yes', 'No']) wheezing = st.selectbox('Wheezing', ['Yes', 'No']) alcohol = st.selectbox('Alcohol Consuming', ['Yes', 'No']) coughing = st.selectbox('Coughing', ['Yes', 'No']) shortness_of_breath = st.selectbox('Shortness of Breath', ['Yes', 'No']) swallowing_difficulty = st.selectbox('Swallowing Difficulty', ['Yes', 'No']) chest_pain = st.selectbox('Chest Pain', ['Yes', 'No']) def binary_encode(value): return 1 if value == 'Yes' else 0 data = pd.DataFrame([[gender, age, binary_encode(smoking), binary_encode(yellow_fingers), binary_encode(anxiety), binary_encode(peer_pressure), binary_encode(chronic_disease), binary_encode(fatigue), binary_encode(allergy), binary_encode(wheezing), binary_encode(alcohol), binary_encode(coughing), binary_encode(shortness_of_breath), binary_encode(swallowing_difficulty), binary_encode(chest_pain)]], columns=['GENDER', 'AGE', 'SMOKING', 'YELLOW_FINGERS', 'ANXIETY', 'PEER_PRESSURE', 'CHRONIC DISEASE', 'FATIGUE', 'ALLERGY', 'WHEEZING', 'ALCOHOL CONSUMING', 'COUGHING', 'SHORTNESS OF BREATH', 'SWALLOWING DIFFICULTY', 'CHEST PAIN']) if st.button('Predict'): prediction = model.predict(data)[0] if prediction == 1: st.error("⚠️ High risk of lung cancer. Please consult a doctor.") else: st.success("✅ No Lung Cancer.")