Spaces:

DeepActionPotential
/

SaleSight

Sleeping

App Files Files Community

DeepActionPotential commited on Sep 29, 2025

Commit

4127688

verified ·

1 Parent(s): e6ba27f

🚀 Initial upload of my app

Browse files

Files changed (9) hide show

README.md +36 -28
__pycache__/ui.cpython-311.pyc +0 -0
__pycache__/utils.cpython-311.pyc +0 -0
app.py +5 -23
demo/demo.mp4 +2 -2
demo/demo.png +0 -0
requirements.txt +4 -5
ui.py +22 -20
utils.py +47 -135

README.md CHANGED Viewed

@@ -1,7 +1,8 @@
-# PIIDetector 🔒
-Detecting Personally Identifiable Information (PII) using BiLSTM-CRF model
-## 🚀 Demo
 ![Demo Screenshot](./demo/demo.png)
@@ -9,20 +10,24 @@ Detecting Personally Identifiable Information (PII) using BiLSTM-CRF model
 ## ✨ Features
-- **PII Detection**: Identify various types of Personally Identifiable Information in text
-- **BiLSTM-CRF Model**: Utilizes a powerful deep learning model for sequence labeling
-- **Streamlit Web Interface**: User-friendly interface for easy interaction
-- **Multiple PII Types**: Detects various PII entities including names, addresses, financial information, and more
-## 📦 Installation
-1. **Clone the repository**
    ```bash
-   git clone https://github.com/yourusername/PIIDetector.git
-   cd PIIDetector
    ```
-2. **Create and activate a virtual environment**
    ```bash
    # Create a virtual environment
    python -m venv .venv
@@ -34,31 +39,34 @@ Detecting Personally Identifiable Information (PII) using BiLSTM-CRF model
    .venv\Scripts\activate
    ```
-3. **Install dependencies**
    ```bash
    pip install -r requirements.txt
    ```
-## 🚀 Usage
-1. **Run the Streamlit app**
    ```bash
-   streamlit run app.py
    ```
-2. **Enter text** in the text area and click "Analyze" to detect PII entities
-3. **View results** in the table showing tokens and their predicted PII labels
-## 🛠 Configuration
-The application uses a pre-trained BiLSTM-CRF model located in the `models/` directory. The model supports the following PII entity types:
-- Personal Information (names, age, gender, etc.)
-- Contact Information (emails, phone numbers, addresses)
-- Financial Information (credit cards, account numbers, IBAN, etc.)
-- Identification Numbers (SSN, passport numbers, etc.)
-- And many more...
 ## 🤝 Contributing
@@ -76,6 +84,6 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
 ## 🙏 Acknowledgements
-- [Hugging Face Transformers](https://huggingface.co/transformers/)
-- [PyTorch](https://pytorch.org/)
-- [Streamlit](https://streamlit.io/)

+# Sales Forecasting with LightGBM
+A retail sales prediction application built with LightGBM and Gradio for interactive forecasting.
+## 📊 Demo
 ![Demo Screenshot](./demo/demo.png)
 ## ✨ Features
+- Interactive web interface for sales prediction
+- Takes into account various features including:
+  - Promotional events
+  - Holiday status
+  - Historical sales data (various lags and rolling means)
+  - Temporal features (day, month, year, day of week)
+- Built with LightGBM for fast and accurate predictions
+- Simple and intuitive user interface
+## 🚀 Installation
+1. Clone the repository:
    ```bash
+   git clone https://github.com/yourusername/sales-forecasting.git
+   cd sales-forecasting
    ```
+2. Create and activate a virtual environment:
    ```bash
    # Create a virtual environment
    python -m venv .venv
    .venv\Scripts\activate
    ```
+3. Install the required dependencies:
    ```bash
    pip install -r requirements.txt
    ```
+## 🛠️ Usage
+1. Run the application:
    ```bash
+   python app.py
    ```
+2. Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:7860)
+3. Input the required information:
+   - Promo status (0 or 1)
+   - Holiday status (0 or 1)
+   - Date in YYYY-MM-DD format
+   - Sales lags and rolling means
+4. Click "Predict Sales" to see the prediction
+## 📦 Dependencies
+- gradio >= 3.50.0
+- joblib >= 1.3.0
+- lightgbm >= 4.0.0
+- pandas >= 2.0.0
 ## 🤝 Contributing
 ## 🙏 Acknowledgements
+- [LightGBM](https://github.com/microsoft/LightGBM) - The gradient boosting framework used for predictions
+- [Gradio](https://gradio.app/) - For the simple web interface
+- [Pandas](https://pandas.pydata.org/) - For data manipulation and analysis

__pycache__/ui.cpython-311.pyc CHANGED Viewed

Binary files a/__pycache__/ui.cpython-311.pyc and b/__pycache__/ui.cpython-311.pyc differ

__pycache__/utils.cpython-311.pyc CHANGED Viewed

Binary files a/__pycache__/utils.cpython-311.pyc and b/__pycache__/utils.cpython-311.pyc differ

app.py CHANGED Viewed

@@ -1,25 +1,7 @@
-import streamlit as st
-from utils import load_full_model_and_tokenizer
-from ui import render_ui
-from model import BiLSTMCRF
-# Cache model and tokenizer
-@st.cache_resource
-def get_model_and_tokenizer():
-    return load_full_model_and_tokenizer("models/best_bilstm_crf_model.pt")
-model, tokenizer, idx2tag = get_model_and_tokenizer()
-def main():
-    st.title("🔒 Detecting PII with BiLSTM-CRF")
-    text = st.text_area("Enter text to analyze:", height=200)
-    if st.button("Analyze"):
-        if text.strip():
-            render_ui(text, model, tokenizer, idx2tag)
-        else:
-            st.warning("⚠️ Please enter some text.")
 if __name__ == "__main__":
-    main()

+from utils import load_artifacts, predict_sales
+from ui import build_ui
 if __name__ == "__main__":
+    model, feature_cols = load_artifacts()
+    iface = build_ui(model, feature_cols, predict_sales)
+    iface.launch()

demo/demo.mp4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:79e2d0b8ad23dfd91431fb1299a7c3c380cefccfa6eacb91e28d8c7921ccaf61
-size 1011984

 version https://git-lfs.github.com/spec/v1
+oid sha256:709f027723ef11b7699671bfb67b904580a63b70330dbef4069ffed351f4af8f
+size 896228

demo/demo.png CHANGED Viewed

requirements.txt CHANGED Viewed

@@ -1,5 +1,4 @@
-streamlit==1.31.0
-torch==2.2.1
-transformers==4.38.2
-pandas==2.1.4
-pytorch-crf==0.7.2

+gradio>=3.50.0
+joblib>=1.3.0
+lightgbm>=4.0.0
+pandas>=2.0.0

ui.py CHANGED Viewed

@@ -1,26 +1,28 @@
-import streamlit as st
-from utils import prepare_inputs
-import torch
-import pandas as pd
-def render_ui(text, model, tokenizer, idx2tag):
-    # Prepare inputs
-    input_ids, mask = prepare_inputs(text, tokenizer)
-    # Run model
-    with torch.no_grad():
-        predictions = model(input_ids, mask=mask)
-    tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
-    labels = [idx2tag.get(tag, "O") for tag in predictions[0]]
-    # Build table data
-    rows = []
-    for token, label in zip(tokens, labels):
-        rows.append({"Token": token, "Predicted Label": label})
-    df = pd.DataFrame(rows)
-    # Show in Streamlit
-    st.subheader("🔍 Predictions")
-    st.dataframe(df, use_container_width=True)  # or st.table(df) for static table

+import gradio as gr
+def build_ui(model, feature_cols, predict_fn):
+    with gr.Blocks() as demo:
+        gr.Markdown("## 🛒 Retail Sales Prediction App")
+        with gr.Row():
+            promo = gr.Radio([0, 1], label="Promo", value=0)
+            holiday = gr.Radio([0, 1], label="Holiday", value=0)
+        date = gr.Textbox(label="Date (YYYY-MM-DD)", value="2023-11-01")
+        with gr.Row():
+            lag_1 = gr.Number(label="Sales Lag 1 Day", value=100)
+            lag_7 = gr.Number(label="Sales Lag 7 Days", value=120)
+            mean_3 = gr.Number(label="Rolling Mean (3 Days)", value=110)
+            mean_7 = gr.Number(label="Rolling Mean (7 Days)", value=115)
+        predict_btn = gr.Button("Predict Sales")
+        output = gr.Number(label="Predicted Sales", precision=2)
+        predict_btn.click(
+            fn=lambda p, h, d, l1, l7, m3, m7: predict_fn(model, feature_cols, p, h, d, l1, l7, m3, m7),
+            inputs=[promo, holiday, date, lag_1, lag_7, mean_3, mean_7],
+            outputs=output
+        )
+    return demo

utils.py CHANGED Viewed

@@ -1,141 +1,53 @@
-import torch
-from transformers import BertTokenizerFast
-from model import BiLSTMCRF   # make sure model.py exists
-def load_full_model_and_tokenizer(path):
-    """
-    Loads the FULL BiLSTM-CRF model (torch.save(model, ...)) and tokenizer.
     """
-    tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
-    # Load full model
-    model = torch.load(path, map_location="cpu", weights_only=False)
-    model.eval()
-    # Define tag mapping (must match training)
-    idx2tag = {0: 'B-ACCOUNTNAME',
- 1: 'B-ACCOUNTNUMBER',
- 2: 'B-AGE',
- 3: 'B-AMOUNT',
- 4: 'B-BIC',
- 5: 'B-BITCOINADDRESS',
- 6: 'B-BUILDINGNUMBER',
- 7: 'B-CITY',
- 8: 'B-COMPANYNAME',
- 9: 'B-COUNTY',
- 10: 'B-CREDITCARDCVV',
- 11: 'B-CREDITCARDISSUER',
- 12: 'B-CREDITCARDNUMBER',
- 13: 'B-CURRENCY',
- 14: 'B-CURRENCYCODE',
- 15: 'B-CURRENCYNAME',
- 16: 'B-CURRENCYSYMBOL',
- 17: 'B-DATE',
- 18: 'B-DOB',
- 19: 'B-EMAIL',
- 20: 'B-ETHEREUMADDRESS',
- 21: 'B-EYECOLOR',
- 22: 'B-FIRSTNAME',
- 23: 'B-GENDER',
- 24: 'B-HEIGHT',
- 25: 'B-IBAN',
- 26: 'B-IP',
- 27: 'B-IPV4',
- 28: 'B-IPV6',
- 29: 'B-JOBAREA',
- 30: 'B-JOBTITLE',
- 31: 'B-JOBTYPE',
- 32: 'B-LASTNAME',
- 33: 'B-LITECOINADDRESS',
- 34: 'B-MAC',
- 35: 'B-MASKEDNUMBER',
- 36: 'B-MIDDLENAME',
- 37: 'B-NEARBYGPSCOORDINATE',
- 38: 'B-ORDINALDIRECTION',
- 39: 'B-PASSWORD',
- 40: 'B-PHONEIMEI',
- 41: 'B-PHONENUMBER',
- 42: 'B-PIN',
- 43: 'B-PREFIX',
- 44: 'B-SECONDARYADDRESS',
- 45: 'B-SEX',
- 46: 'B-SSN',
- 47: 'B-STATE',
- 48: 'B-STREET',
- 49: 'B-TIME',
- 50: 'B-URL',
- 51: 'B-USERAGENT',
- 52: 'B-USERNAME',
- 53: 'B-VEHICLEVIN',
- 54: 'B-VEHICLEVRM',
- 55: 'B-ZIPCODE',
- 56: 'I-ACCOUNTNAME',
- 57: 'I-ACCOUNTNUMBER',
- 58: 'I-AGE',
- 59: 'I-AMOUNT',
- 60: 'I-BIC',
- 61: 'I-BITCOINADDRESS',
- 62: 'I-BUILDINGNUMBER',
- 63: 'I-CITY',
- 64: 'I-COMPANYNAME',
- 65: 'I-COUNTY',
- 66: 'I-CREDITCARDCVV',
- 67: 'I-CREDITCARDISSUER',
- 68: 'I-CREDITCARDNUMBER',
- 69: 'I-CURRENCY',
- 70: 'I-CURRENCYCODE',
- 71: 'I-CURRENCYNAME',
- 72: 'I-CURRENCYSYMBOL',
- 73: 'I-DATE',
- 74: 'I-DOB',
- 75: 'I-EMAIL',
- 76: 'I-ETHEREUMADDRESS',
- 77: 'I-EYECOLOR',
- 78: 'I-FIRSTNAME',
- 79: 'I-GENDER',
- 80: 'I-HEIGHT',
- 81: 'I-IBAN',
- 82: 'I-IP',
- 83: 'I-IPV4',
- 84: 'I-IPV6',
- 85: 'I-JOBAREA',
- 86: 'I-JOBTITLE',
- 87: 'I-JOBTYPE',
- 88: 'I-LASTNAME',
- 89: 'I-LITECOINADDRESS',
- 90: 'I-MAC',
- 91: 'I-MASKEDNUMBER',
- 92: 'I-MIDDLENAME',
- 93: 'I-NEARBYGPSCOORDINATE',
- 94: 'I-PASSWORD',
- 95: 'I-PHONEIMEI',
- 96: 'I-PHONENUMBER',
- 97: 'I-PIN',
- 98: 'I-PREFIX',
- 99: 'I-SECONDARYADDRESS',
- 100: 'I-SSN',
- 101: 'I-STATE',
- 102: 'I-STREET',
- 103: 'I-TIME',
- 104: 'I-URL',
- 105: 'I-USERAGENT',
- 106: 'I-USERNAME',
- 107: 'I-VEHICLEVIN',
- 108: 'I-VEHICLEVRM',
- 109: 'I-ZIPCODE',
- 110: 'O'}
-    return model, tokenizer, idx2tag
-def prepare_inputs(text, tokenizer, max_length=128):
-    encoding = tokenizer(
-        text.split(),
-        is_split_into_words=True,
-        padding="max_length",
-        truncation=True,
-        max_length=max_length,
-        return_tensors="pt"
-    )
-    input_ids = encoding["input_ids"]
-    mask = encoding["attention_mask"].bool()
-    return input_ids, mask

+import joblib
+import lightgbm as lgb
+import pandas as pd
+# Load artifacts
+def load_artifacts():
+    model = lgb.Booster(model_file="models/lgb_sales_model.txt")
+    feature_cols = joblib.load("models/feature_cols.pkl")
+    return model, feature_cols
+# Preprocess new input row into model-ready features
+def preprocess_input(promo, holiday, date, past_sales):
     """
+    Args:
+        promo: int (0/1)
+        holiday: int (0/1)
+        date: datetime-like
+        past_sales: dict with keys ['lag_1','lag_7','mean_3','mean_7']
+    Returns:
+        pd.DataFrame with a single row ready for prediction
+    """
+    date = pd.to_datetime(date)
+    features = {
+        "promo": promo,
+        "holiday": holiday,
+        "day": date.day,
+        "month": date.month,
+        "year": date.year,
+        "day_of_week": date.weekday(),
+        "is_weekend": 1 if date.weekday() >= 5 else 0,
+        "sales_lag_1": past_sales.get("lag_1", 0),
+        "sales_lag_7": past_sales.get("lag_7", 0),
+        "rolling_mean_3": past_sales.get("mean_3", 0),
+        "rolling_mean_7": past_sales.get("mean_7", 0),
+    }
+    return pd.DataFrame([features])
+# Prediction
+def predict_sales(model, feature_cols, promo, holiday, date, lag_1, lag_7, mean_3, mean_7):
+    past_sales = {
+        "lag_1": lag_1,
+        "lag_7": lag_7,
+        "mean_3": mean_3,
+        "mean_7": mean_7,
+    }
+    X = preprocess_input(promo, holiday, date, past_sales)
+    X = X[feature_cols]  # ensure correct column order
+    prediction = model.predict(X)[0]
+    return round(prediction, 2)