tonadeleon's picture
proper cloning final push
7ec734c verified
metadata
title: Tona's App Challenge Cstores
emoji: 🏢
colorFrom: yellow
colorTo: gray
sdk: docker
pinned: false
license: apache-2.0
app_port: 8501
short_description: The challenge hosting space. Develop in Github repo first.

Data → Dashboard

Welcome to the Data → Dashboard project! This repository hosts an interactive data science application built using Streamlit and packaged with Docker. The app is designed to help store owners explore the reliability of our data sources using our Cstore data for the Rigby and Ririe stores.


Overview

This project demonstrates how to combine modern data science tools in a containerized environment. It incorporates:

  • Streamlit: For building interactive dashboards.

  • Docker: For containerizing the app to ensure a consistent runtime environment.

  • Polars: For fast data manipulation.

  • Plotly Express (optional): For interactive visualizations.

  • Modular Web Development: This app is organized within wheel files all being streamlined to be used in the final main.py app

  • API and Secrets Usage: This app leverages hugging face's secrets environment to get all its data. (both parquet files and images datasets)


The application answers key business questions:

  • Top 5 Products by Weekly Sales (Excluding Fuels): Identify the highest performing products.

  • Packaged Beverage Analysis: Determine which brands might be dropped.

  • Customer Comparison: Compare purchasing patterns between cash and credit customers.


Each dashboard view includes:

  • Data caching for efficient processing.

  • Key Performance Indicators (KPIs) using st.metric().

  • Clean summary tables powered by Great Tables.

  • At least two dynamic graphs (using Plotly or Altair) with temporal comparisons.

  • Filter options for month selection and variable levels.

  • Layouts and container features of Streamlit for a polished, responsive design.

  • Customizable inputs (e.g., vertical/horizontal reference lines) to enhance visualizations.


Getting Started

Prerequisites

  • Docker: Ensure Docker is installed on your system.

Running the Application

You can launch the application using either Docker Compose or Docker's build-and-run commands.

Using Docker Compose

  1. Clone the Repository:

    git clone https://github.com/yourusername/your-repo.git
    cd your-repo
    
  2. Start the Application:

    docker compose up
    
  3. View the App:

    Open your browser and navigate to http://localhost:8501.

Using Docker Build and Run

  1. Clone the Repository:

    git clone https://github.com/yourusername/your-repo.git
    cd your-repo
    
  2. Build the Docker Image:

    docker build -t streamlit .
    
  3. Run the Container:

    docker run -p 8501:8501 -v "$(pwd):/app:rw" streamlit
    
  4. View the App:

    Open your browser and go to http://localhost:8501.


Vocabulary / Lingo Challenge

This section addresses key vocabulary and concepts as part of the challenge.

1. The Added Value of DataBricks in Data Science

DataBricks is a unified analytics platform that enhances your data science process by:

  • Scalability and Performance: Efficiently handles large-scale data and complex computations.
  • Collaboration: Enables data scientists, engineers, and analysts to work together on a single platform.
  • Integrated Tools: Provides seamless integration with Apache Spark, MLflow, and Delta Lake for robust data processing and machine learning.
  • Simplified Data Management: Offers reliable, managed data lakes and streamlined access to cloud services.

2. PySpark vs. Pandas/Tidyverse

Aspect PySpark Pandas / Tidyverse
Scalability Designed for distributed computing across clusters. Best suited for in-memory operations on a single machine.
Performance Excels in processing very large datasets using parallelism. Efficient for small to medium-sized datasets; can face performance issues with larger data.
Ease of Use Requires understanding of distributed computing concepts. Intuitive API for data manipulation and exploration.
Ecosystem Integrated with the Hadoop ecosystem and scalable systems. Rich libraries for in-depth data wrangling and visualization in Python (Pandas) or R (Tidyverse).

3. Explaining Docker in Simple Terms

Docker is like a shipping container for your software. It packages your application along with everything it needs to run (libraries, dependencies, etc.) so that it works exactly the same way on any computer. This ensures you can share your application easily without worrying about differences in system configurations.


Links and Resources


License

This project is licensed under the MIT License.