Spaces:

Khedhar
/

pharma-voice-orders

Sleeping

App Files Files Community

Khedhar commited on Jan 20

Commit

ee14b8f

0 Parent(s):

project start

Browse files

Files changed (26) hide show

.dockerignore +33 -0
.gitignore +10 -0
.python-version +1 -0
Dockerfile +32 -0
HUGGINGFACE_DEPLOYMENT.md +113 -0
README.md +45 -0
app.py +623 -0
assets/styles.css +333 -0
core/asr_engine.py +46 -0
core/entity_extractor.py +162 -0
core/excel_exporter.py +81 -0
core/preprocessor.py +35 -0
data/aliases.json +83 -0
data/manufacturers.csv +7 -0
data/medicines.csv +37 -0
docs/DEPLOYMENT_GUIDE.md +190 -0
docs/GETTING_STARTED.md +79 -0
docs/HUGGINGFACE_SPACE_SETUP.md +63 -0
evaluation/metrics.py +33 -0
main.py +11 -0
prompts/asr_prompt_guide.md +145 -0
pyproject.toml +34 -0
requirements.txt +14 -0
simulation/manufacturer_db.py +98 -0
simulation/order_queue.py +26 -0
uv.lock +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,33 @@

+# Virtual environment
+.venv/
+venv/
+env/
+# Git
+.git/
+.gitignore
+# Cache
+__pycache__/
+*.pyc
+*.pyo
+.cache/
+# IDE
+.vscode/
+.idea/
+# UV files (not needed in Docker)
+uv.lock
+pyproject.toml
+# Docs (optional, reduce image size)
+docs/
+# Test files
+tests/
+*.test.py
+# OS files
+.DS_Store
+Thumbs.db

.gitignore ADDED Viewed

	@@ -0,0 +1,10 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.12

Dockerfile ADDED Viewed

	@@ -0,0 +1,32 @@

+# Use Python 3.11 slim for smaller image
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies for audio processing
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libsndfile1 \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install uv for fast Python package management
+RUN pip install uv
+# Copy project files
+COPY . .
+# Install Python dependencies
+RUN uv pip install --system --no-cache-dir -r requirements.txt
+# Expose HuggingFace Spaces default port
+EXPOSE 7860
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV STREAMLIT_SERVER_PORT=7860
+ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0
+# Run the Streamlit app
+CMD ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0"]

HUGGINGFACE_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,113 @@

+# Hugging Face Spaces Deployment Guide
+This guide explains how to deploy the **Pharma Voice Orders** application to Hugging Face Spaces using a Docker Space.
+---
+## Prerequisites
+1.  A [Hugging Face account](https://huggingface.co/join).
+2.  A Hugging Face Space created with **Docker SDK**.
+3.  Git installed on your local machine.
+---
+## Step 1: Install Hugging Face CLI
+Install the CLI globally:
+```bash
+pip install huggingface_hub
+```
+---
+## Step 2: Login to Hugging Face
+Authenticate with your HF token (get one from [Settings > Tokens](https://huggingface.co/settings/tokens)):
+```bash
+huggingface-cli login
+```
+Enter your token when prompted. This saves your credentials for Git operations.
+---
+## Step 3: Add HF Space as Git Remote
+Navigate to your project folder and add the Space as a remote:
+```bash
+cd pharma-voice-orders
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/pharma-voice-orders
+```
+Replace `YOUR_USERNAME` with your actual HuggingFace username (e.g., `Khedhar`).
+---
+## Step 4: Push to Hugging Face
+Force push your code to the Space. **Important:** HuggingFace Spaces uses `main` as the default branch.
+If your local branch is `master`:
+```bash
+git push hf master:main --force
+```
+If your local branch is already `main`:
+```bash
+git push hf main --force
+```
+> **Tip:** To check your current branch name, run: `git branch`
+---
+## Step 5: Verify Deployment
+1.  Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/pharma-voice-orders`
+2.  Wait for the build to complete (check the **Logs** tab).
+3.  Once running, the app will be live at the Space URL.
+---
+## Dockerfile Notes
+The `Dockerfile` in this project:
+-   Uses Python 3.11 slim image.
+-   Installs system dependencies for audio processing.
+-   Installs Python dependencies with `uv`.
+-   Exposes port `7860` (HF Spaces default).
+---
+## Environment Variables (Optional)
+If your app requires secrets (e.g., `HF_TOKEN`), configure them in Space Settings > Repository Secrets.
+---
+## Troubleshooting
+| Issue | Solution |
+|-------|----------|
+| `Permission denied` | Run `huggingface-cli login` again |
+| `Build failed` | Check Logs tab for error details |
+| `Port not accessible` | Ensure `Dockerfile` exposes port `7860` |
+---
+## Useful Commands
+```bash
+# Check current remotes
+git remote -v
+# Remove HF remote
+git remote remove hf
+# Re-add HF remote
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/pharma-voice-orders
+```

README.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# 🏥 Pharma Voice Orders
+> **Accent-Aware Speech-to-Text Engine for Distributor Order Processing**
+This application helps pharmaceutical manufacturers process voice orders from primary distributors efficiently. It simulates an end-to-end pipeline:
+1. **Distributor Input**: Voice recording of orders (e.g., "Send 20 strips of Augmentin 625").
+2. **AI Processing**: Transcription using OpenAI Whisper and Entity Extraction.
+3. **Simulation**: Routing orders to specific manufacturer boxes (Sun Pharma, GSK, etc.).
+4. **Export**: Generating structured Excel sheets for ERP systems.
+## 🚀 Quick Start
+1. **Install Dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+2. **Run the Application**:
+   ```bash
+   streamlit run app.py
+   ```
+## 📂 Project Structure
+- `app.py`: Main Streamlit application entry point.
+- `core/`: Contains ASR engine, Preprocessor, and Entity Extractor.
+- `simulation/`: Mock database and order routing logic.
+- `data/`: Sample medicine and manufacturer databases.
+- `evaluation/`: Scripts to calculate WER, Accuracy, and Latency.
+## 🛠️ Tech Stack
+- **Frontend**: Streamlit
+- **AI Model**: OpenAI Whisper (via HuggingFace Transformers)
+- **Data Processing**: Pandas, OpenPyXL
+- **Matching**: RapidFuzz (Fuzzy String Matching)
+- **Audio**: Librosa, SoundFile
+## 🎓 University Use
+This project demonstrates the "Minor Project" proposal deliverables:
+- Noise Reduction & Preprocessing
+- Accent-Aware STT (simulated via Whisper)
+- Entity Extraction (Medicine/Dosage/Quantity)
+- Performance Evaluation (WER Report)

app.py ADDED Viewed

	@@ -0,0 +1,623 @@

+"""
+Pharma Voice Orders - Main Application
+Streamlit UI for simulating Distributor -> Manufacturer Voice Ordering System
+"""
+import streamlit as st
+import pandas as pd
+import time
+import os
+from pathlib import Path
+# Page Config
+st.set_page_config(
+    page_title="Pharma Voice Orders",
+    page_icon="🏥",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# Custom CSS - Avant-Garde Glassmorphic Design
+def load_css(file_name):
+    with open(file_name) as f:
+        st.markdown(f'<style>{f.read()}</style>', unsafe_allow_html=True)
+load_css("assets/styles.css")
+# --- Session State Initialization ---
+if 'model_ready' not in st.session_state:
+    st.session_state.model_ready = False
+if 'orders' not in st.session_state:
+    st.session_state.orders = []
+if 'last_transcription' not in st.session_state:
+    st.session_state.last_transcription = ""
+# --- Sidebar ---
+with st.sidebar:
+    st.image("https://cdn-icons-png.flaticon.com/512/3063/3063167.png", width=50)
+    st.title("PharmaVoice")
+    st.caption("v1.0.0 | Minor Project")
+    st.markdown("---")
+    st.header("⚙️ Configuration")
+    distributor = st.selectbox(
+        "Select Distributor",
+        ["Apollo Pharmacy", "MedPlus", "Frank Ross", "Online Pharma", "Local Chemist"]
+    )
+    asr_model = st.selectbox(
+        "ASR Model",
+        [
+            "google/medasr",
+            "openai/whisper-tiny",
+            "openai/whisper-small",
+            "openai/whisper-medium",
+            "openai/whisper-large-v3",
+        ]
+    )
+    # Note about MedASR - now enabled!
+    if "medasr" in asr_model:
+        st.success("✅ MedASR enabled (transformers from GitHub installed)")
+    st.markdown("---")
+    # Inference Mode Toggle
+    st.subheader("⚡ Inference Mode")
+    # HF Token Configuration
+    # Token should be set via environment variable or entered by user
+    hf_token_input = st.text_input(
+        "🔑 HF Token",
+        value=os.environ.get("HF_TOKEN", ""),
+        type="password",
+        help="Required for Cloud mode and gated models. Set via HF_TOKEN env var or enter here.",
+    )
+    # Check for token from input or environment
+    hf_token = hf_token_input or os.environ.get("HF_TOKEN", "")
+    # Mode selection based on token availability
+    if hf_token:
+        inference_mode = st.radio(
+            "Select Mode",
+            ["💻 Local (Faster)", "☁️ Cloud (No Download)"],
+            index=0,
+            help="Cloud uses HF servers. Local downloads model to your PC."
+        )
+        use_cloud = "Cloud" in inference_mode
+        st.success("🔓 Token configured" + (" • Cloud Mode" if use_cloud else " • Local Mode"))
+    else:
+        use_cloud = False
+        st.warning("⚠️ No token → Local mode only (requires download)")
+        inference_mode = "💻 Local (Faster)"
+    st.markdown("---")
+    st.info("""
+    **Instructions:**
+    1. Select a distributor.
+    2. Record your voice order.
+    3. Watch orders route to manufacturers!
+    """)
+    if st.button("🔄 Clear Session", type="secondary"):
+        st.session_state.clear()
+        st.rerun()
+# --- Cloud Inference (HuggingFace Inference API) ---
+def transcribe_cloud(audio_data, model_name: str, token: str):
+    """Transcribe audio using HuggingFace Inference API (no local download)."""
+    from huggingface_hub import InferenceClient
+    import io
+    client = InferenceClient(token=token)
+    # Get audio bytes
+    if hasattr(audio_data, 'read'):
+        audio_bytes = audio_data.read()
+        audio_data.seek(0)  # Reset for replay
+    else:
+        audio_bytes = audio_data
+    # Call HuggingFace Inference API
+    result = client.automatic_speech_recognition(
+        audio=audio_bytes,
+        model=model_name
+    )
+    # Result is either a string or dict with 'text' key
+    if isinstance(result, str):
+        return result
+    else:
+        return result.get("text", str(result))
+# --- Local ASR Engine (Downloads Model) ---
+@st.cache_resource(show_spinner=False)
+def load_asr_engine(model_name: str, token: str = None):
+    """Load ASR engine locally with proper status handling."""
+    import torch
+    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+    # Login if token provided
+    if token:
+        from huggingface_hub import login
+        login(token=token)
+    # Determine model class based on model name
+    if "medasr" in model_name:
+        from transformers import AutoModelForCTC
+        model_class = AutoModelForCTC
+    else:
+        model_class = AutoModelForSpeechSeq2Seq
+    # Load Model with support for custom code (trust_remote_code=True)
+    try:
+        model = model_class.from_pretrained(
+            model_name,
+            dtype=torch_dtype,
+            low_cpu_mem_usage=True,
+            use_safetensors=True,
+            trust_remote_code=True
+        )
+    except OSError:
+        # Fallback for models that might not support safetensors or other issues
+        model = model_class.from_pretrained(
+            model_name,
+            dtype=torch_dtype,
+            low_cpu_mem_usage=True,
+            trust_remote_code=True
+        )
+    model.to(device)
+    processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
+    pipe = pipeline(
+        "automatic-speech-recognition",
+        model=model,
+        tokenizer=processor.tokenizer,
+        feature_extractor=processor.feature_extractor,
+        dtype=torch_dtype,
+        device=device,
+        trust_remote_code=True
+    )
+    return pipe
+# --- Other Components (Lazy Load) ---
+@st.cache_resource
+def get_db():
+    from simulation.manufacturer_db import ManufacturerDB
+    return ManufacturerDB(data_dir="data")
+@st.cache_resource
+def get_preprocessor():
+    from core.preprocessor import AudioPreprocessor
+    return AudioPreprocessor()
+@st.cache_resource
+def get_extractor(_db):
+    from core.entity_extractor import EntityExtractor
+    return EntityExtractor(_db)
+# --- Model Cache Checker ---
+def check_model_status(model_name: str) -> dict:
+    """Check if model is cached locally and get disk space info."""
+    import os
+    import shutil
+    from pathlib import Path
+    # HuggingFace cache directory
+    cache_dir = Path.home() / ".cache" / "huggingface" / "hub"
+    model_folder_name = f"models--{model_name.replace('/', '--')}"
+    model_cache_path = cache_dir / model_folder_name
+    # Check if model is cached
+    is_cached = model_cache_path.exists() and any(model_cache_path.iterdir()) if model_cache_path.exists() else False
+    # Check snapshots folder for actual model files
+    snapshots_path = model_cache_path / "snapshots" if model_cache_path.exists() else None
+    has_model_files = False
+    if snapshots_path and snapshots_path.exists():
+        for snapshot in snapshots_path.iterdir():
+            # Check for safetensors or bin files
+            if any(f.suffix in ['.safetensors', '.bin'] for f in snapshot.iterdir() if f.is_file()):
+                has_model_files = True
+                break
+    # Get free disk space (C: drive on Windows)
+    try:
+        disk_usage = shutil.disk_usage(cache_dir if cache_dir.exists() else Path.home())
+        free_gb = disk_usage.free / (1024 ** 3)
+    except:
+        free_gb = -1
+    # Model sizes (approximate)
+    model_sizes = {
+        "openai/whisper-tiny": 0.15,
+        "openai/whisper-small": 0.5,
+        "openai/whisper-medium": 1.5,
+        "openai/whisper-large-v3": 3.1,
+        "google/medasr": 0.3,  # ~300MB
+    }
+    required_gb = model_sizes.get(model_name, 2.0)
+    return {
+        "is_cached": is_cached and has_model_files,
+        "free_gb": round(free_gb, 1),
+        "required_gb": required_gb,
+        "has_space": free_gb >= required_gb or is_cached,
+        "cache_path": str(model_cache_path) if model_cache_path.exists() else None
+    }
+# Load non-blocking components
+db = get_db()
+preprocessor = get_preprocessor()
+extractor = get_extractor(db)
+# --- Main Content ---
+st.markdown('<h1 class="main-header">🏥 Order Processing Center</h1>', unsafe_allow_html=True)
+st.markdown(f'<p class="sub-header">Reviewing orders from: <strong>{distributor}</strong></p>', unsafe_allow_html=True)
+# Smart Model Status Indicator
+model_status = check_model_status(asr_model)
+# Use a flex container for the status badge (aligned right) to avoid empty column artifacts
+status_html = ""
+if use_cloud:
+    status_html = '''
+        <div class="status-ready" style="border-color: rgba(139, 92, 246, 0.3); background: rgba(139, 92, 246, 0.1); color: #a78bfa;">
+            <span class="status-dot" style="background: #a78bfa;"></span>
+            ☁️ Cloud Ready
+        </div>
+    '''
+elif model_status["is_cached"]:
+    status_html = '''
+        <div class="status-ready">
+            <span class="status-dot green"></span>
+            ✅ Cached (Local)
+        </div>
+    '''
+elif model_status["has_space"]:
+    status_html = f'''
+        <div class="status-loading" style="border-color: rgba(251, 191, 36, 0.3); background: rgba(251, 191, 36, 0.1); color: #fbbf24;">
+            <span class="status-dot" style="background: #fbbf24;"></span>
+            ⬇️ Download ({model_status["required_gb"]}GB)
+        </div>
+    '''
+else:
+    status_html = f'''
+        <div class="status-loading" style="border-color: rgba(239, 68, 68, 0.3); background: rgba(239, 68, 68, 0.1); color: #ef4444;">
+            <span class="status-dot" style="background: #ef4444;"></span>
+            ⚠️ Low Space ({model_status["free_gb"]}GB free)
+        </div>
+    '''
+    st.warning(f"Need {model_status['required_gb']}GB, only {model_status['free_gb']}GB free. Choose a smaller model or free disk space.")
+# Render Status aligned to right
+if status_html:
+    st.markdown(f'<div style="display: flex; justify-content: flex-end; margin-bottom: 20px;">{status_html}</div>', unsafe_allow_html=True)
+# Download confirmation state
+if 'download_approved' not in st.session_state:
+    st.session_state.download_approved = {}
+# Show download confirmation ONLY if Local mode AND model not cached AND not yet approved
+if not use_cloud and not model_status["is_cached"] and asr_model not in st.session_state.download_approved:
+    with st.container():
+        st.markdown("---")
+        st.markdown(f"### ⬇️ Download Required")
+        st.info(f"""**{asr_model}** is not cached locally.
+📦 Size: **{model_status['required_gb']}GB**
+💾 Free space: **{model_status['free_gb']}GB**
+📂 Cache location: `C:\\Users\\{os.environ.get('USERNAME', 'User')}\\.cache\\huggingface\\hub\\`
+💡 **Tip:** Switch to Cloud Mode to avoid downloading!
+        """)
+        col_yes, col_no = st.columns(2)
+        with col_yes:
+            if st.button("✅ Yes, Download", type="primary", use_container_width=True):
+                # UI Update: Show "Downloading" in the badge
+                status_placeholder.markdown('''
+                    <div class="status-loading" style="border-color: rgba(59, 130, 246, 0.3); background: rgba(59, 130, 246, 0.1); color: #60a5fa;">
+                        <span class="status-dot" style="background: #3b82f6; animation: pulse 0.5s infinite;"></span>
+                        ⏳ Downloading...
+                    </div>
+                ''', unsafe_allow_html=True)
+                with st.spinner(f"⬇️ Downloading {asr_model}... This may take a while."):
+                    try:
+                        # Trigger download and load into cache
+                        load_asr_engine(asr_model, hf_token)
+                        st.session_state.download_approved[asr_model] = True
+                        st.session_state.model_ready = True
+                        st.success("✅ Download complete! Model is ready.")
+                        time.sleep(1)
+                        st.rerun()
+                    except Exception as e:
+                        st.error(f"❌ Download failed: {e}")
+        with col_no:
+            if st.button("❌ Cancel", type="secondary", use_container_width=True):
+                st.info("Download cancelled. Select a cached model or use Cloud Mode.")
+# Layout: Input (Left) vs Output (Right)
+col1, col2 = st.columns([1, 2])
+with col1:
+    # Voice Container
+    st.markdown('<div class="voice-container">', unsafe_allow_html=True)
+    st.markdown('<h3 style="color: #4facfe; margin: 0 0 10px 0;">Voice Input</h3>', unsafe_allow_html=True)
+    # Example Prompt Tagline
+    example_prompt = "Send me 50 strips of Paracetamol, 20 bottles of Ascoril syrup, and also 10 tubes of Betnovate cream."
+    st.markdown(f'''
+        <div style="background: rgba(79, 172, 254, 0.1); border: 1px dashed rgba(79, 172, 254, 0.4); border-radius: 8px; padding: 12px; margin-bottom: 16px;">
+            <span style="color: #4facfe; font-weight: 600; font-size: 0.75rem;">💡 TRY SAYING:</span>
+            <p style="color: rgba(255,255,255,0.9); font-style: italic; margin: 8px 0 0 0; font-size: 0.9rem; line-height: 1.5;">"{example_prompt}"</p>
+        </div>
+    ''', unsafe_allow_html=True)
+    st.markdown('<div class="mic-icon">🎙️</div>', unsafe_allow_html=True)
+    tab1, tab2 = st.tabs(["🔴 Record", "📁 Upload"])
+    audio_data = None
+    with tab1:
+        try:
+            audio_val_rec = st.audio_input("Click to record", label_visibility="collapsed")
+            if audio_val_rec:
+                audio_data = audio_val_rec
+        except AttributeError:
+            st.warning("Update Streamlit to use `st.audio_input`.")
+    with tab2:
+        audio_val_up = st.file_uploader("Upload Audio", type=['wav', 'mp3'], label_visibility="collapsed")
+        if audio_val_up:
+            audio_data = audio_val_up
+    st.markdown('</div>', unsafe_allow_html=True)
+    # Process Audio
+    if audio_data:
+        st.success("✅ Audio captured!")
+        st.audio(audio_data)
+        if st.button("🚀 Process Order", type="primary", use_container_width=True):
+            transcription_text = ""
+            if use_cloud:
+                # CLOUD MODE - Use HuggingFace Inference API (no download)
+                with st.spinner("☁️ Transcribing via Cloud..."):
+                    try:
+                        transcription_text = transcribe_cloud(audio_data, asr_model, hf_token)
+                        st.toast("✅ Cloud Transcription Complete!")
+                    except Exception as e:
+                        st.error(f"❌ Cloud API failed: {e}")
+                        st.info("💡 Try Local mode or check your token/model.")
+                        st.stop()
+            else:
+                # LOCAL MODE - Download and run model locally
+                with st.spinner("🔄 Loading Local ASR Model..."):
+                    try:
+                        asr = load_asr_engine(asr_model, hf_token)
+                        st.session_state.model_ready = True
+                    except Exception as e:
+                        st.error(f"❌ Model load failed: {e}")
+                        st.stop()
+                with st.spinner("🎧 Transcribing Locally..."):
+                    processed_audio = preprocessor.process(audio_data)
+                    result = asr(processed_audio)
+                    transcription_text = result["text"].replace("</s>", "").strip()
+                    st.toast("✅ Local Transcription Complete!")
+            # Store transcription
+            st.session_state.last_transcription = transcription_text
+            with st.spinner("📦 Extracting Orders..."):
+                extracted_orders = extractor.extract(transcription_text)
+                if extracted_orders:
+                    st.success(f"Found {len(extracted_orders)} items!")
+                    for order in extracted_orders:
+                        st.session_state.orders.append(order)
+                    st.rerun()
+                else:
+                    st.warning("No medicines found. Try: 'Send 20 strips of Augmentin'")
+    st.markdown("---")
+    st.markdown("### 📝 Transcription")
+    current_text = st.session_state.get('last_transcription', "")
+    st.text_area(
+        "Transcription Output",
+        current_text,
+        height=120,
+        disabled=True,
+        placeholder="Transcription will appear here...",
+        label_visibility="collapsed"
+    )
+with col2:
+    st.markdown("### 🏭 Manufacturer Routing")
+    # Get grouped orders from session state
+    from simulation.order_queue import OrderQueue
+    queue = OrderQueue()
+    grouped_orders = queue.get_grouped_orders(db)
+    all_manufacturers = db.get_all_manufacturers()
+    # Grid Layout
+    import textwrap
+    row1_cols = st.columns(2)
+    row2_cols = st.columns(2)
+    row3_cols = st.columns(2)
+    # 6 Manufacturers -> 3 Rows of 2
+    for idx, mfr in enumerate(all_manufacturers):
+        if idx < 2:
+            col = row1_cols[idx]
+        elif idx < 4:
+            col = row2_cols[idx - 2]
+        elif idx < 6:
+            col = row3_cols[idx - 4]
+        else:
+            continue
+        with col:
+            mfr_name = mfr['name']
+            orders = grouped_orders.get(mfr_name, [])
+            order_count = len(orders)
+            # Determine Visual State
+            is_active = order_count > 0
+            active_class = "active" if is_active else ""
+            badge_class = "active" if is_active else ""
+            # Generate HTML - Single line to prevent Markdown parsing issues
+            html_parts = []
+            # 1. Header & Open Body
+            html_parts.append(f'<div class="node-card {active_class}">')
+            html_parts.append('<div class="node-header">')
+            html_parts.append(f'<span class="node-title"><span style="opacity:0.7">🏭</span> {mfr_name}</span>')
+            html_parts.append(f'<span class="node-badge {badge_class}">{order_count} Items</span>')
+            html_parts.append('</div><div class="node-body">')
+            # 2. Body Content
+            if is_active:
+                for order in orders:
+                    # Confidence Logic
+                    conf = order.get('confidence', 0)
+                    conf_class = "conf-low"
+                    if conf >= 90: conf_class = "conf-high"
+                    elif conf >= 75: conf_class = "conf-med"
+                    med_name = order.get('medicine_standardized', order['medicine'])
+                    dosage = order.get('dosage', '-')
+                    html_parts.append(f'<div class="order-chip {conf_class}">')
+                    html_parts.append('<div class="chip-main">')
+                    html_parts.append(f'<span class="chip-med">{med_name}</span>')
+                    html_parts.append(f'<span class="chip-meta">{dosage}</span>')
+                    html_parts.append('</div>')
+                    html_parts.append(f'<span class="chip-qty">{order["quantity"]}</span>')
+                    html_parts.append('</div>')
+            else:
+                html_parts.append('<div style="color: rgba(255,255,255,0.2); font-style: italic; font-size: 0.85rem; text-align: center; padding: 10px;">Waiting for data...</div>')
+            # 3. Close Body & Card
+            html_parts.append('</div></div>')
+            st.markdown("".join(html_parts), unsafe_allow_html=True)
+    # Unknown Orders (Quarantine Node)
+    unknowns = grouped_orders.get('Unknown', [])
+    if unknowns:
+        html_parts = []
+        html_parts.append('<div class="node-card active" style="border-color: rgba(255, 51, 102, 0.3); box-shadow: 0 0 20px rgba(255, 51, 102, 0.1);">')
+        html_parts.append('<div class="node-header">')
+        html_parts.append('<span class="node-title" style="color: #ff3366;"><span>⚠️</span> Quarantine / Unmapped</span>')
+        html_parts.append(f'<span class="node-badge" style="background: rgba(255, 51, 102, 0.1); color: #ff3366; border: 1px solid rgba(255, 51, 102, 0.2);">{len(unknowns)} Items</span>')
+        html_parts.append('</div><div class="node-body">')
+        for order in unknowns:
+            html_parts.append('<div class="order-chip conf-low">')
+            html_parts.append('<div class="chip-main">')
+            html_parts.append(f'<span class="chip-med" style="color: #ff3366;">{order["medicine"]} (Raw)</span>')
+            html_parts.append(f'<span class="chip-meta">Confidence: {order.get("confidence", 0)}%</span>')
+            html_parts.append('</div>')
+            html_parts.append(f'<span class="chip-qty">{order["quantity"]}</span>')
+            html_parts.append('</div>')
+        html_parts.append('</div></div>')
+        st.markdown("".join(html_parts), unsafe_allow_html=True)
+    st.markdown("---")
+    # Export Buttons
+    if st.session_state.orders:
+        from core.excel_exporter import ExcelExporter
+        col_excel, col_csv = st.columns(2)
+        with col_excel:
+            excel_data = ExcelExporter.export(st.session_state.orders, db=db)
+            st.download_button(
+                label="📥 Export to Excel",
+                data=excel_data,
+                file_name="pharma_orders.xlsx",
+                mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+                use_container_width=True
+            )
+        with col_csv:
+            csv_data = ExcelExporter.export_csv(st.session_state.orders, db=db)
+            st.download_button(
+                label="📄 Export to CSV",
+                data=csv_data,
+                file_name="pharma_orders.csv",
+                mime="text/csv",
+                use_container_width=True
+            )
+    # --- Informational Footer (New Section) ---
+    # --- Informational Footer (New Section) ---
+    footer_html = []
+    footer_html.append('<div class="info-container">')
+    footer_html.append('<div class="info-grid">')
+    # 1. How to Use Section
+    footer_html.append('<div class="info-section">')
+    footer_html.append('<h4>💡 How to use it</h4>')
+    footer_html.append('<ul class="info-list">')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>🔹 <span class="info-highlight">Mixed Manufacturers:</span></span>')
+    footer_html.append('<span class="info-example">"Send Paracetamol tablet 300 strips, also Azithromycin 50 strips and Volini spray 20 pieces."</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>🔹 <span class="info-highlight">Forms & Units:</span></span>')
+    footer_html.append('<span class="info-example">"Order 50 bottles of Ascoril syrup, 20 tubes of Betnovate cream, and 10 packs of Prega News."</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>🔹 <span class="info-highlight">Pronunciation/Noisy:</span></span>')
+    footer_html.append('<span class="info-example">"Uh, give me some Combiflam... maybe 20 strips? And... Zinetac 150."</span>')
+    footer_html.append('</li>')
+    footer_html.append('</ul></div>')
+    # 2. Medical Areas Section
+    footer_html.append('<div class="info-section">')
+    footer_html.append('<h4>🏥 Medical Areas Covered</h4>')
+    footer_html.append('<ul class="info-list">')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>🍬 <span class="info-highlight">Syrups</span></span>')
+    footer_html.append('<span class="info-example">("50 bottles of Ascoril")</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>🧴 <span class="info-highlight">Creams/Gels</span></span>')
+    footer_html.append('<span class="info-example">("20 tubes of Betnovate")</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>💉 <span class="info-highlight">Injections</span></span>')
+    footer_html.append('<span class="info-example">("10 vials of Amikacin")</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>💨 <span class="info-highlight">Sprays/Inhalers</span></span>')
+    footer_html.append('<span class="info-example">("5 pcs of Volini spray")</span>')
+    footer_html.append('</li>')
+    footer_html.append('<li class="info-item">')
+    footer_html.append('<span>💊 <span class="info-highlight">Tablets/Capsules</span></span>')
+    footer_html.append('<span class="info-example">("100 strips of Paracetamol")</span>')
+    footer_html.append('</li>')
+    footer_html.append('</ul></div>')
+    footer_html.append('</div></div>')
+    st.markdown("".join(footer_html), unsafe_allow_html=True)

assets/styles.css ADDED Viewed

	@@ -0,0 +1,333 @@

+/* =========================================
+   PHARMA MATRIX DESIGN SYSTEM
+   ========================================= */
+:root {
+    --glass-bg: rgba(13, 17, 23, 0.7);
+    --glass-border: rgba(255, 255, 255, 0.08);
+    --neon-cyan: #00f2ea;
+    --neon-purple: #ff0099;
+    --success-green: #00f260;
+    --warning-yellow: #f1c40f;
+    --danger-red: #ff3366;
+    --text-primary: #e6edf3;
+    --text-secondary: #8b95a5;
+}
+.stApp {
+    background: radial-gradient(circle at 50% 10%, #1a1f2e 0%, #0a0f1a 100%);
+    font-family: 'Inter', sans-serif;
+}
+/* --- ACTIVE DATA NODE (Manufacturer Card) --- */
+.node-card {
+    background: rgba(21, 26, 36, 0.6);
+    backdrop-filter: blur(12px);
+    -webkit-backdrop-filter: blur(12px);
+    border: 1px solid var(--glass-border);
+    border-radius: 16px;
+    padding: 20px;
+    margin-bottom: 20px;
+    transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
+    position: relative;
+    overflow: hidden;
+}
+.node-card.active {
+    border-color: rgba(0, 242, 234, 0.3);
+    box-shadow: 0 0 20px rgba(0, 242, 234, 0.05);
+}
+.node-card.active::before {
+    content: '';
+    position: absolute;
+    top: 0;
+    left: 0;
+    width: 100%;
+    height: 2px;
+    background: linear-gradient(90deg, var(--neon-cyan), var(--neon-purple));
+    animation: scanline 2s linear infinite;
+}
+.node-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: 12px;
+    border-bottom: 1px solid rgba(255, 255, 255, 0.05);
+    padding-bottom: 10px;
+}
+.node-title {
+    font-size: 1.1rem;
+    font-weight: 700;
+    color: var(--text-primary);
+    letter-spacing: 0.5px;
+    display: flex;
+    align-items: center;
+    gap: 8px;
+}
+.node-badge {
+    font-size: 0.75rem;
+    background: rgba(255, 255, 255, 0.05);
+    padding: 4px 8px;
+    border-radius: 12px;
+    color: var(--text-secondary);
+}
+.node-badge.active {
+    background: rgba(0, 242, 234, 0.1);
+    color: var(--neon-cyan);
+    border: 1px solid rgba(0, 242, 234, 0.2);
+}
+/* --- ORDER CHIPS (Data Units) --- */
+.order-chip {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    background: rgba(0, 0, 0, 0.3);
+    border-left: 3px solid #555;
+    padding: 10px 12px;
+    border-radius: 6px;
+    margin-bottom: 8px;
+    position: relative;
+}
+/* Confidence Levels */
+.conf-high {
+    border-left-color: var(--success-green);
+}
+.conf-med {
+    border-left-color: var(--warning-yellow);
+}
+.conf-low {
+    border-left-color: var(--danger-red);
+}
+.chip-main {
+    display: flex;
+    flex-direction: column;
+}
+.chip-med {
+    font-size: 0.95rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+.chip-meta {
+    font-size: 0.75rem;
+    color: var(--text-secondary);
+    display: flex;
+    gap: 8px;
+}
+.chip-qty {
+    color: var(--neon-cyan);
+    font-weight: 600;
+}
+/* --- ANIMATIONS --- */
+@keyframes scanline {
+    0% {
+        transform: translateX(-100%);
+        opacity: 0;
+    }
+    50% {
+        opacity: 1;
+    }
+    100% {
+        transform: translateX(100%);
+        opacity: 0;
+    }
+}
+@keyframes pulse-ring {
+    0% {
+        box-shadow: 0 0 0 0 rgba(0, 242, 234, 0.4);
+    }
+    70% {
+        box-shadow: 0 0 0 10px rgba(0, 242, 234, 0);
+    }
+    100% {
+        box-shadow: 0 0 0 0 rgba(0, 242, 234, 0);
+    }
+}
+.pulse {
+    animation: pulse-ring 2s infinite;
+}
+/* Headers */
+.main-header {
+    font-family: 'Inter', 'Helvetica Neue', sans-serif;
+    font-weight: 800;
+    font-size: 2.2rem;
+    background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
+    background-clip: text;
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    margin-bottom: 0;
+}
+.sub-header {
+    font-family: 'Inter', sans-serif;
+    color: #8b95a5;
+    font-size: 1rem;
+    margin-bottom: 1.5rem;
+}
+/* Status Indicators */
+.status-ready {
+    display: inline-flex;
+    align-items: center;
+    gap: 8px;
+    background: rgba(16, 185, 129, 0.1);
+    border: 1px solid rgba(16, 185, 129, 0.3);
+    color: #10b981;
+    padding: 8px 16px;
+    border-radius: 20px;
+    font-size: 0.85rem;
+    font-weight: 500;
+}
+.status-loading {
+    display: inline-flex;
+    align-items: center;
+    gap: 8px;
+    background: rgba(251, 191, 36, 0.1);
+    border: 1px solid rgba(251, 191, 36, 0.3);
+    color: #fbbf24;
+    padding: 8px 16px;
+    border-radius: 20px;
+    font-size: 0.85rem;
+    font-weight: 500;
+}
+.status-dot {
+    width: 8px;
+    height: 8px;
+    border-radius: 50%;
+    animation: pulse 2s infinite;
+}
+.status-dot.green {
+    background: #10b981;
+}
+.status-dot.yellow {
+    background: #fbbf24;
+}
+@keyframes pulse {
+    0%,
+    100% {
+        opacity: 1;
+    }
+    50% {
+        opacity: 0.5;
+    }
+}
+.box-title {
+    font-size: 1rem;
+    font-weight: 600;
+    color: #4facfe;
+    margin-bottom: 12px;
+    padding-bottom: 8px;
+    border-bottom: 1px solid rgba(79, 172, 254, 0.2);
+}
+.order-item {
+    background: rgba(38, 44, 61, 0.8);
+    border-radius: 8px;
+    padding: 8px 12px;
+    margin-bottom: 6px;
+    font-size: 0.85rem;
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+}
+/* Voice Input Section */
+.voice-container {
+    background: linear-gradient(135deg, rgba(79, 172, 254, 0.05) 0%, rgba(0, 242, 254, 0.05) 100%);
+    border: 1px solid rgba(79, 172, 254, 0.2);
+    border-radius: 16px;
+    padding: 24px;
+    text-align: center;
+}
+.mic-icon {
+    font-size: 3rem;
+    margin-bottom: 12px;
+}
+/* Hide Streamlit Branding */
+#MainMenu {
+    visibility: hidden;
+}
+footer {
+    visibility: hidden;
+}
+/* --- INFO FOOTER --- */
+.info-container {
+    margin-top: 40px;
+    padding: 24px;
+    background: rgba(13, 17, 23, 0.4);
+    border: 1px solid rgba(255, 255, 255, 0.05);
+    border-radius: 16px;
+    backdrop-filter: blur(10px);
+}
+.info-grid {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 24px;
+}
+.info-section h4 {
+    color: var(--text-primary);
+    font-size: 1.1rem;
+    margin-bottom: 16px;
+    border-bottom: 2px solid var(--neon-cyan);
+    display: inline-block;
+    padding-bottom: 4px;
+}
+.info-list {
+    list-style: none;
+    padding: 0;
+    margin: 0;
+}
+.info-item {
+    margin-bottom: 12px;
+    color: var(--text-secondary);
+    font-size: 0.9rem;
+    display: flex;
+    align-items: start;
+    gap: 8px;
+}
+.info-highlight {
+    color: var(--neon-cyan);
+    font-weight: 600;
+}
+.info-example {
+    color: var(--text-secondary);
+    font-style: italic;
+    opacity: 0.8;
+}

core/asr_engine.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+import streamlit as st
+class ASREngine:
+    def __init__(self, model_id: str = "openai/whisper-tiny"):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+        self.model_id = model_id
+        self.pipe = self._load_model()
+    @st.cache_resource(show_spinner=False)
+    def _load_model(_self):
+        """Load model with caching to avoid reloading on every run."""
+        model = AutoModelForSpeechSeq2Seq.from_pretrained(
+            _self.model_id,
+            torch_dtype=_self.torch_dtype,
+            low_cpu_mem_usage=True,
+            use_safetensors=True
+        )
+        model.to(_self.device)
+        processor = AutoProcessor.from_pretrained(_self.model_id)
+        pipe = pipeline(
+            "automatic-speech-recognition",
+            model=model,
+            tokenizer=processor.tokenizer,
+            feature_extractor=processor.feature_extractor,
+            max_new_tokens=128,
+            chunk_length_s=30,
+            batch_size=16,
+            return_timestamps=True,
+            torch_dtype=_self.torch_dtype,
+            device=_self.device,
+        )
+        return pipe
+    def transcribe(self, audio_array) -> str:
+        """Transcribe audio array or path."""
+        try:
+            result = self.pipe(audio_array)
+            return result["text"]
+        except Exception as e:
+            return f"Error: {str(e)}"

core/entity_extractor.py ADDED Viewed

	@@ -0,0 +1,162 @@

+import re
+import json
+from typing import List, Dict
+from pathlib import Path
+from simulation.manufacturer_db import ManufacturerDB
+class EntityExtractor:
+    def __init__(self, db: ManufacturerDB):
+        self.db = db
+        self.aliases = self._load_aliases()
+        # Form keywords that indicate a medicine nearby
+        self.form_keywords = {
+            'tablet': ['tablet', 'tab', 'tabs', 'capsule', 'cap', 'caps'],
+            'syrup': ['syrup', 'liquid', 'suspension'],
+            'injection': ['injection', 'inj', 'vial', 'ampoule'],
+            'cream': ['cream', 'gel', 'ointment', 'tube'],
+            'spray': ['spray', 'inhaler', 'puff'],
+            'drops': ['drops', 'eye drops', 'ear drops'],
+            'sachet': ['sachet', 'powder', 'granules']
+        }
+        # Unit keywords for quantity extraction
+        self.unit_keywords = ['strips', 'strip', 'slips', 'slip', 'bottles', 'bottle',
+                              'tablets', 'tabs', 'pieces', 'pcs', 'boxes', 'box',
+                              'packs', 'pack', 'vials', 'vial', 'ampoules']
+        # Spoken number mapping
+        self.spoken_numbers = {
+            'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5,
+            'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10,
+            'eleven': 11, 'twelve': 12, 'fifteen': 15, 'twenty': 20,
+            'twenty-five': 25, 'thirty': 30, 'forty': 40, 'fifty': 50,
+            'sixty': 60, 'seventy': 70, 'eighty': 80, 'ninety': 90,
+            'hundred': 100, 'two hundred': 200, 'three hundred': 300,
+            'five hundred': 500, 'thousand': 1000
+        }
+    def _load_aliases(self) -> Dict:
+        """Load pronunciation aliases from JSON file."""
+        alias_path = Path("data/aliases.json")
+        if alias_path.exists():
+            with open(alias_path, 'r') as f:
+                return json.load(f)
+        return {}
+    def _normalize_text(self, text: str) -> str:
+        """Normalize input text for parsing."""
+        text = text.lower()
+        # Remove common ASR artifacts
+        text = re.sub(r'</s>|<unk>|<s>', '', text)
+        # Remove filler words
+        text = re.sub(r'\b(uh|um|like|maybe|please|kindly)\b', '', text)
+        # Normalize punctuation
+        text = text.replace(",", " , ").replace(".", " ")
+        # Convert spoken numbers to digits
+        for word, num in self.spoken_numbers.items():
+            text = re.sub(rf'\b{word}\b', str(num), text)
+        return text.strip()
+    def _resolve_alias(self, word: str) -> str:
+        """Check if word is an alias for a known medicine."""
+        word_lower = word.lower()
+        for canonical, aliases in self.aliases.items():
+            if word_lower in aliases or word_lower == canonical:
+                return canonical
+        return word
+    def _extract_form(self, segment: str) -> str:
+        """Extract form type from segment."""
+        segment_lower = segment.lower()
+        for form_type, keywords in self.form_keywords.items():
+            for kw in keywords:
+                if kw in segment_lower:
+                    return form_type
+        return "tablet"  # Default
+    def _extract_quantity(self, segment: str) -> tuple:
+        """Extract quantity and unit from segment."""
+        # Pattern 1: Number followed by unit word
+        # e.g., "300 strips", "20 bottles"
+        qty_pattern = r'(\d+)\s*(' + '|'.join(self.unit_keywords) + r')?'
+        match = re.search(qty_pattern, segment, re.IGNORECASE)
+        if match:
+            num = match.group(1)
+            unit = match.group(2) if match.group(2) else "units"
+            # Normalize common typos
+            if unit in ['slips', 'slip']:
+                unit = 'strips'
+            return num, unit
+        return "1", "units"  # Default
+    def _extract_dosage(self, segment: str) -> str:
+        """Extract dosage from segment."""
+        # Pattern: Number followed by mg/ml/gm
+        dosage_match = re.search(r'(\d+)\s*(mg|ml|gm|mcg)', segment, re.IGNORECASE)
+        if dosage_match:
+            return f"{dosage_match.group(1)}{dosage_match.group(2)}"
+        return "-"
+    def extract(self, text: str) -> List[Dict]:
+        """
+        Extract medicine entities from text.
+        Returns: List of dicts {'medicine': str, 'form': str, 'quantity': str, 'dosage': str}
+        """
+        if not text:
+            return []
+        # Normalize text
+        text = self._normalize_text(text)
+        found_orders = []
+        # Get all known medicines from DB for matching
+        known_meds = self.db.medicines['medicine_name'].tolist()
+        # Split by multiple delimiters for multi-item orders
+        # Handles: "send", "order", "add", "also", "plus", "then", "and", comma
+        delimiters = r'\b(?:send|add|want|need|order|also|plus|then)\b|,|\band\b'
+        segments = re.split(delimiters, text)
+        for segment in segments:
+            segment = segment.strip()
+            if not segment or len(segment) < 3:
+                continue
+            # Try to find a medicine match in this segment
+            from rapidfuzz import process, fuzz
+            # First, check if any word is a known alias
+            words = segment.split()
+            resolved_segment = ' '.join([self._resolve_alias(w) for w in words])
+            # Fuzzy match against known medicines
+            match = process.extractOne(resolved_segment, known_meds, scorer=fuzz.partial_ratio)
+            if match and match[1] > 75:  # Confidence threshold
+                med_name = match[0]
+                # Extract form, quantity, dosage
+                form = self._extract_form(segment)
+                num, unit = self._extract_quantity(segment)
+                quantity = f"{num} {unit}"
+                dosage = self._extract_dosage(segment)
+                if dosage == "-":
+                    # Lookup default dosage from DB
+                    med_row = self.db.medicines[self.db.medicines['medicine_name'] == med_name].iloc[0]
+                    dosage = med_row['dosage']
+                found_orders.append({
+                    "medicine": med_name,
+                    "form": form,
+                    "quantity": quantity,
+                    "dosage": dosage,
+                    "confidence": match[1],
+                    "original_segment": segment.strip()
+                })
+        return found_orders

core/excel_exporter.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import pandas as pd
+import io
+class ExcelExporter:
+    @staticmethod
+    def _prepare_dataframe(orders: list, db=None) -> pd.DataFrame:
+        """Helper to prepare enriched DataFrame."""
+        if not orders:
+            return pd.DataFrame()
+        # Enrich data if DB is provided
+        enriched_orders = []
+        for order in orders:
+            row = order.copy()
+            if db:
+                mfr_info = db.get_manufacturer_by_medicine(order['medicine'])
+                if mfr_info:
+                    row['Manufacturer'] = mfr_info['name']
+                    row['Standardized Medicine'] = mfr_info['medicine_match']
+                else:
+                    row['Manufacturer'] = "Unknown"
+                    row['Standardized Medicine'] = "-"
+            enriched_orders.append(row)
+        df = pd.DataFrame(enriched_orders)
+        # Rename columns for better readability
+        column_map = {
+            "medicine": "Medicine Name (Extracted)",
+            "quantity": "Quantity",
+            "dosage": "Dosage",
+            "original_segment": "Raw Voice Segment",
+            "Manufacturer": "Manufacturer",
+            "Standardized Medicine": "Standardized Name"
+        }
+        df = df.rename(columns=column_map)
+        # Reorder columns if possible
+        desired_order = [
+            "Manufacturer",
+            "Standardized Name",
+            "Medicine Name (Extracted)",
+            "Quantity",
+            "Dosage",
+            "Raw Voice Segment"
+        ]
+        cols_to_keep = [c for c in desired_order if c in df.columns]
+        remaining = [c for c in df.columns if c not in cols_to_keep]
+        return df[cols_to_keep + remaining]
+    @staticmethod
+    def export(orders: list, db=None) -> bytes:
+        """Convert list of order dicts to Excel bytes."""
+        df = ExcelExporter._prepare_dataframe(orders, db)
+        if df.empty:
+            return None
+        output = io.BytesIO()
+        with pd.ExcelWriter(output, engine='openpyxl') as writer:
+            df.to_excel(writer, index=False, sheet_name='Orders')
+            # Auto-adjust column widths
+            worksheet = writer.sheets['Orders']
+            for idx, col in enumerate(df.columns):
+                max_len = max(
+                    df[col].astype(str).map(len).max(),
+                    len(col)
+                ) + 2
+                worksheet.column_dimensions[chr(65 + idx)].width = min(max_len, 50)
+        return output.getvalue()
+    @staticmethod
+    def export_csv(orders: list, db=None) -> str:
+        """Convert list of order dicts to CSV string."""
+        df = ExcelExporter._prepare_dataframe(orders, db)
+        if df.empty:
+            return ""
+        return df.to_csv(index=False).encode('utf-8')

core/preprocessor.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import librosa
+import noisereduce as nr
+import soundfile as sf
+import numpy as np
+import io
+class AudioPreprocessor:
+    def __init__(self, target_sr: int = 16000):
+        self.target_sr = target_sr
+    def process(self, audio_file) -> np.ndarray:
+        """
+        Process audio file (path or bytes) for ASR.
+        Returns: 16kHz mono audio array.
+        """
+        # Load audio (handles both paths and file-like objects)
+        try:
+            audio, sr = librosa.load(audio_file, sr=self.target_sr, mono=True)
+        except Exception as e:
+            # Fallback for file-like objects if librosa fails directly
+            if hasattr(audio_file, 'read'):
+                audio_file.seek(0)
+                audio, sr = librosa.load(audio_file, sr=self.target_sr, mono=True)
+            else:
+                raise e
+        # Noise Reduction (Spectral Gating)
+        # Only apply if audio is long enough to have a noise profile
+        if len(audio) > self.target_sr * 0.5:
+            audio = nr.reduce_noise(y=audio, sr=self.target_sr, stationary=True)
+        # Normalization
+        audio = librosa.util.normalize(audio)
+        return audio

data/aliases.json ADDED Viewed

	@@ -0,0 +1,83 @@

+{
+  "paracetamol": [
+    "paraacetamole",
+    "parcetamol",
+    "paracetmal",
+    "paracetmol"
+  ],
+  "metformin": [
+    "metformine",
+    "metforman",
+    "metphormin"
+  ],
+  "augmentin": [
+    "augmentine",
+    "agmentin",
+    "augmuntin"
+  ],
+  "azithromycin": [
+    "azithromicin",
+    "azithro",
+    "azith"
+  ],
+  "cetirizine": [
+    "cetirizin",
+    "cetrizine",
+    "cetriz"
+  ],
+  "pantoprazole": [
+    "pantoprazol"
+  ],
+  "omeprazole": [
+    "omeprazol"
+  ],
+  "calpol": [
+    "calpole",
+    "calpool"
+  ],
+  "combiflam": [
+    "combiflem",
+    "combiflame",
+    "combiflm"
+  ],
+  "volini": [
+    "volinee",
+    "voliny"
+  ],
+  "ascoril": [
+    "ascorill",
+    "ascoryl"
+  ],
+  "dolo": [
+    "dollo"
+  ],
+  "clavam": [
+    "clavame",
+    "clavaum"
+  ],
+  "nise": [
+    "nice",
+    "nisee"
+  ],
+  "telekast": [
+    "telekastl"
+  ],
+  "zinetac": [
+    "zinetack",
+    "zynetac"
+  ],
+  "manforce": [
+    "manforse"
+  ],
+  "unwanted": [
+    "unwantd",
+    "unwanteed"
+  ],
+  "moxikind": [
+    "moxykind"
+  ],
+  "crocin": [
+    "crosin",
+    "crokeen"
+  ]
+}

data/manufacturers.csv ADDED Viewed

	@@ -0,0 +1,7 @@

+id,name,code
+mfr_001,Sun Pharma,SUN
+mfr_002,Cipla,CIP
+mfr_003,GlaxoSmithKline,GSK
+mfr_004,Dr. Reddy's,RED
+mfr_005,Lupin,LUP
+mfr_006,Mankind,MAN

data/medicines.csv ADDED Viewed

	@@ -0,0 +1,37 @@

+medicine_name,dosage,unit,manufacturer_id
+Augmentin,625mg,strips,mfr_003
+Calpol,500mg,strips,mfr_003
+Crocin,650mg,strips,mfr_001
+Volini,Spray,pcs,mfr_001
+Azithromycin,500mg,strips,mfr_002
+Cetirizine,10mg,strips,mfr_002
+Omez,20mg,strips,mfr_004
+Metformin,500mg,strips,mfr_004
+Pantop,40mg,strips,mfr_001
+Dolo,650mg,strips,mfr_005
+Manforce,50mg,tablets,mfr_006
+Unwanted-72,1.5mg,tablets,mfr_006
+Telekast-L,10mg,strips,mfr_005
+Combiflam,400mg,strips,mfr_001
+Ascoril,Syrup,bottles,mfr_002
+Zinetac,150mg,strips,mfr_003
+Nise,100mg,strips,mfr_004
+Clavam,625mg,strips,mfr_005
+Moxikind-CV,625mg,strips,mfr_006
+Pan-40,40mg,strips,mfr_005
+Revital,Capsule,bottles,mfr_001
+Foracort,Inhaler,pcs,mfr_002
+Asthalin,Inhaler,pcs,mfr_002
+Betnovate,Cream,tube,mfr_003
+Stamlo,5mg,strips,mfr_004
+Gluconorm,500mg,strips,mfr_005
+Prega News,Kit,pack,mfr_006
+Gas-O-Fast,Sachet,pack,mfr_006
+Becosules,Capsule,strips,mfr_003
+Shelcal,500mg,strips,mfr_004
+Allegra,120mg,strips,mfr_002
+Sinarest,Tablet,strips,mfr_006
+Meftal-Spas,Tablet,strips,mfr_006
+Omnigel,Gel,tube,mfr_002
+Digene,Gel,bottle,mfr_001
+Paracetamol,500mg,strips,mfr_001

docs/DEPLOYMENT_GUIDE.md ADDED Viewed

	@@ -0,0 +1,190 @@

+# Pharma Voice Orders - Deployment Guide
+This guide explains how to deploy and configure the application for production use with large AI models.
+---
+## 🚀 Deployment Options Comparison
+| Feature | Streamlit Cloud (Deploy Button) | Hugging Face Spaces |
+|---------|--------------------------------|---------------------|
+| **Ease of Use** | ⭐⭐⭐⭐⭐ One-click | ⭐⭐⭐⭐ Simple |
+| **Free Tier** | 1GB RAM, limited | 16GB RAM (with GPU upgrade) |
+| **GPU Support** | ❌ No | ✅ Yes (paid: T4, A10G) |
+| **Large Models (Whisper Medium+)** | ⚠️ May timeout | ✅ Works well |
+| **Privacy/Secrets** | ✅ Secrets Manager | ✅ Secrets Manager |
+| **Best For** | Quick demos (tiny model) | Production + Large Models |
+---
+## 📱 Option 1: Streamlit Cloud (The "Deploy" Button)
+The **Deploy** button in your localhost Streamlit UI deploys directly to **Streamlit Community Cloud**.
+### How It Works:
+1. Click **Deploy** → **Streamlit Community Cloud**
+2. Connect your GitHub account
+3. Select your repository and branch
+4. Streamlit Cloud builds and hosts your app
+### ⚠️ Limitations for Your Use Case:
+- **1GB RAM limit** on free tier → Whisper Medium (3GB) will **fail**
+- **No GPU** → Slow inference
+- **Good for**: Demo with `whisper-tiny` only
+### Setup:
+```bash
+# Push your code to GitHub first
+git add .
+git commit -m "Deploy to Streamlit Cloud"
+git push origin main
+```
+Then click **Deploy** in the Streamlit UI.
+---
+## ☁️ Option 2: Hugging Face Spaces (Recommended)
+**Best for**: Large models (Whisper Medium, Large, Google Med SR) with HF Token.
+### Step-by-Step Deployment:
+#### 1. Create a Hugging Face Space
+1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
+2. Click **Create new Space**
+3. Select:
+   - **SDK**: Streamlit
+   - **Hardware**: CPU Basic (free) or upgrade for GPU
+   - **Visibility**: Public or Private
+#### 2. Create `app.py` (Already Done ✅)
+#### 3. Create `requirements.txt` for HF Spaces
+Create a file **specifically for Spaces** (different from local):
+```txt
+streamlit
+pandas
+openpyxl
+torch
+transformers
+librosa
+noisereduce
+soundfile
+rapidfuzz
+jiwer
+regex
+webrtcvad
+numpy<2
+huggingface_hub
+```
+#### 4. Add Your HF Token as a Secret
+1. Go to your Space → **Settings** → **Repository secrets**
+2. Add a new secret:
+   - **Name**: `HF_TOKEN`
+   - **Value**: Your Hugging Face read token (from [hf.co/settings/tokens](https://huggingface.co/settings/tokens))
+#### 5. Update Code to Use Token
+In `core/asr_engine.py`, the model will automatically use `HF_TOKEN`:
+```python
+import os
+from huggingface_hub import login
+# Auto-login with Space secret
+token = os.environ.get("HF_TOKEN")
+if token:
+    login(token=token)
+```
+#### 6. Push Code to the Space
+```bash
+# Clone your Space
+git clone https://huggingface.co/spaces/YOUR_USERNAME/pharma-voice-orders
+cd pharma-voice-orders
+# Copy your files
+cp -r /path/to/your/local/project/* .
+# Push
+git add .
+git commit -m "Initial deployment"
+git push
+```
+---
+## 🔑 Using Gated Models (Google Med SR, etc.)
+Some models require you to accept terms on the model page before using.
+### Steps:
+1. Visit the model page (e.g., `google/med-sr-model`)
+2. Click **Agree and access model**
+3. Add your `HF_TOKEN` to the Space secrets (as shown above)
+4. Update your code to specify the model ID:
+```python
+# In core/asr_engine.py
+model_id = "google/med-speech-recognition"  # Example
+```
+---
+## 🎯 Recommended Strategy for Your Project
+| Phase | Platform | Model | Why |
+|-------|----------|-------|-----|
+| **Development** | Local (`uv run start`) | `whisper-tiny` | Fast iteration |
+| **University Demo** | Hugging Face Spaces (Free CPU) | `whisper-small` | Balance of quality + speed |
+| **Production Demo** | HF Spaces + GPU (T4) | `whisper-medium` or Google Med SR | Best quality |
+---
+## 🔄 Pre-Caching Models (Avoid First-Run Download)
+To make the model load instantly for visitors, add a **pre-download script** in your Space:
+Create `preload.py`:
+```python
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
+# Pre-download during build
+model_id = "openai/whisper-medium"
+AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
+AutoProcessor.from_pretrained(model_id)
+print("Model pre-cached!")
+```
+Then add to your Space's `README.md`:
+```yaml
+---
+title: Pharma Voice Orders
+sdk: streamlit
+sdk_version: 1.53.0
+app_file: app.py
+pinned: false
+preload: preload.py
+---
+```
+---
+## 📁 Final File Structure for HF Spaces
+```
+pharma-voice-orders/
+├── app.py                    # Main Streamlit app
+├── requirements.txt          # Python dependencies
+├── preload.py               # Model pre-download script
+├── README.md                # Space metadata (YAML frontmatter)
+├── core/                    # Your modules
+├── simulation/
+├── evaluation/
+└── data/
+```
+---
+*Last Updated: January 2026*

docs/GETTING_STARTED.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# Pharma Voice Orders - Getting Started
+This document explains how to set up and run the **Pharma Voice Orders** application.
+---
+## 📋 Prerequisites
+- **Python** 3.12+
+- **[uv](https://github.com/astral-sh/uv)** (Modern Python package manager)
+---
+## 🚀 Quick Start
+### 1. Install Dependencies
+```bash
+cd pharma-voice-orders
+uv sync
+```
+### 2. Run the Application
+```bash
+uv run start
+```
+This will launch the Streamlit app at `http://localhost:8501`.
+---
+## 📦 Available Commands
+```bash
+# Run the app
+uv run start
+# Add a new dependency
+uv add <package-name>
+# Sync dependencies (install/update)
+uv sync
+# Run streamlit directly (alternative)
+uv run streamlit run app.py
+```
+---
+## 🔧 Project Structure
+```
+pharma-voice-orders/
+├── app.py                # Main Streamlit entry point
+├── main.py               # Script wrapper (for `uv run start`)
+├── pyproject.toml        # Project config & dependencies
+├── core/                 # Preprocessing, ASR, Entity Extraction, Export
+├── simulation/           # Manufacturer DB, Order Queue
+├── evaluation/           # Metrics (WER, Accuracy)
+└── data/                 # CSV files for medicines & manufacturers
+```
+---
+## ❓ Why Use `uv run`?
+Using `uv run` ensures the command executes within the project's **isolated virtual environment** (`.venv`), avoiding conflicts with globally installed packages (like Anaconda). This is the recommended way to run Python projects managed by `uv`.
+---
+## 🧪 Testing Your Setup
+After running `uv run start`:
+1. Open `http://localhost:8501` in your browser.
+2. Select a distributor from the sidebar.
+3. Record or upload an audio file (e.g., "Send 20 strips of Augmentin").
+4. Watch orders get routed to manufacturer boxes.
+---
+*Last Updated: January 2026*

docs/HUGGINGFACE_SPACE_SETUP.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# Hugging Face Spaces - Docker Setup Guide
+## 📋 Fill the Form (Screenshot Reference)
+| Field | Suggested Value |
+|-------|-----------------|
+| **Owner** | `Khedhar` (your account) ✅ |
+| **Space name** | `pharma-voice-orders` |
+| **Short description** | `Voice-to-Order: Speech-to-text pharmaceutical ordering system using Whisper ASR` |
+| **License** | `MIT` (or leave blank for now) |
+| **Select the Space SDK** | 🐳 **Docker** |
+| **Space hardware** | `CPU basic` (free) or `T4 GPU` for faster inference |
+| **Visibility** | `Public` (for demo) or `Private` |
+---
+## 🐳 Why Docker?
+1. **Full control** over dependencies and environment
+2. **Pre-download models** during build (instant startup for users)
+3. **Consistent behavior** across local and cloud
+4. **Streamlit works perfectly** with Docker on HF Spaces
+---
+## 📁 Files You Need in Your Space
+After creating the Space, you'll push these files:
+```
+pharma-voice-orders/
+├── Dockerfile           # Build instructions
+├── requirements.txt     # Python packages
+├── app.py              # Your Streamlit app
+├── core/               # Your modules
+├── simulation/
+├── evaluation/
+└── data/
+```
+---
+## ⚙️ Adding Secrets (HF Token)
+After creating the Space:
+1. Go to **Settings** → **Repository secrets**
+2. Add:
+   - **Name**: `HF_TOKEN`
+   - **Value**: Your token from https://huggingface.co/settings/tokens
+The app will automatically use this token for gated models.
+---
+## 🚀 Next Steps
+1. Fill the form with values above
+2. Click **Create Space**
+3. Clone the Space repo to your local machine
+4. Copy all project files
+5. Push to the Space
+I'm now creating the `Dockerfile` and updating `app.py` with proper status indicators!

evaluation/metrics.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import time
+import jiwer
+from rapidfuzz import fuzz
+class MetricsEvaluator:
+    @staticmethod
+    def calculate_wer(reference: str, hypothesis: str) -> float:
+        """Calculate Word Error Rate."""
+        if not reference or not hypothesis:
+            return 1.0
+        return jiwer.wer(reference, hypothesis)
+    @staticmethod
+    def calculate_entity_accuracy(expected_entities: list, extracted_entities: list) -> float:
+        """
+        Calculate accuracy of extracted entities vs ground truth.
+        Simple logic: (matches / total_expected)
+        """
+        if not expected_entities:
+            return 0.0
+        matches = 0
+        for exp in expected_entities:
+            # Check if this expected medicine was found in extracted list
+            found = False
+            for ext in extracted_entities:
+                if fuzz.ratio(exp['medicine'].lower(), ext['medicine'].lower()) > 85:
+                    found = True
+                    break
+            if found:
+                matches += 1
+        return matches / len(expected_entities)

main.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import sys
+import subprocess
+def main():
+    """Entry point for the application script."""
+    # Use subprocess to run streamlit command in the current environment
+    cmd = [sys.executable, "-m", "streamlit", "run", "app.py"]
+    subprocess.run(cmd)
+if __name__ == "__main__":
+    main()

prompts/asr_prompt_guide.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# ASR Prompt Engineering Guide for Pharma Voice Orders
+> **Purpose**: Define expected voice order formats, sample patterns, and entity schema to improve transcription accuracy and structured data extraction.
+---
+## Expected Order Format
+The ASR model should recognize **medicine orders** in the following patterns:
+### Pattern 1: Medicine First
+```
+<Medicine Name> <Form> <Quantity> <Unit>
+```
+**Example**: "Paracetamol tablet 300 strips"
+### Pattern 2: Form First
+```
+<Form> <Medicine Name> <Quantity> <Unit>
+```
+**Example**: "Tablet Paracetamol 300 strips"
+### Pattern 3: Quantity First
+```
+<Quantity> <Unit> <Medicine Name> [<Dosage>]
+```
+**Example**: "20 strips Augmentin 625"
+### Pattern 4: Comma-Separated List
+```
+<Order1>, <Order2>, <Order3>
+```
+**Example**: "Paracetamol 100 strips, Metformin 50 strips, Crocin 30 strips"
+### Pattern 5: Connector Words
+```
+<Order1> and/also/plus/then <Order2>
+```
+**Example**: "Send Paracetamol 100 also Metformin 50"
+---
+## Entity Schema
+Each extracted order should contain:
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `medicine` | string | Medicine name (as heard) | "Paracetamol" |
+| `form` | string | Tablet, Syrup, Injection, etc. | "tablet" |
+| `quantity` | string | Number + Unit | "300 strips" |
+| `dosage` | string | Strength (mg, ml, etc.) | "500mg" |
+---
+## Sample Voice Orders (for Testing)
+### Simple Orders
+1. "Send 20 strips of Augmentin 625"
+2. "Paracetamol tablet 100 strips"
+3. "Tablet Metformin 500mg 50 strips"
+4. "Order Crocin 650 30 strips"
+5. "50 bottles of Ascoril syrup"
+6. "20 tubes of Betnovate cream"
+7. "10 vials of Amikacin injection"
+### Multi-Item Orders
+1. "Paracetamol 100 strips, Metformin 50 strips, Crocin 30 strips"
+2. "Send Augmentin 20 strips also Calpol 15 strips and Dolo 10 strips"
+3. "I need 50 Azithromycin, 30 Cetirizine, and 20 Omez"
+### Complex/Noisy Orders
+1. "Uh, send me Paracetamol, maybe 100? And also some Metformin"
+2. "Tablet Paraacetamole 300 slips" (misspelling of Paracetamol, slips instead of strips)
+3. "Give me twenty strips of Aug-mentin six two five"
+---
+## Form Keywords
+The model should recognize these form indicators:
+| Form Type | Keywords |
+|-----------|----------|
+| Tablet | tablet, tab, tabs, capsule, cap, caps |
+| Syrup | syrup, liquid, bottle, suspension |
+| Injection | injection, inj, vial, ampoule |
+| Cream/Gel | cream, gel, ointment, tube |
+| Spray | spray, inhaler, puff |
+| Drops | drops, eye drops, ear drops |
+| Sachet | sachet, powder, granules |
+---
+## Unit Keywords
+| Unit Type | Keywords |
+|-----------|----------|
+| Strips | strips, strip, slips, slip |
+| Bottles | bottles, bottle, btl |
+| Tablets | tablets, tabs, pieces, pcs |
+| Boxes | boxes, box, packs, pack |
+| Vials | vials, vial, ampoules |
+---
+## Common Pronunciation Variations
+| Correct Name | Common Variations |
+|--------------|-------------------|
+| Paracetamol | paraacetamole, parcetamol, paracetmal |
+| Metformin | metformine, metforman, metphormin |
+| Augmentin | augmentine, agmentin, augmuntin |
+| Azithromycin | azithromicin, azithro, azith |
+| Cetirizine | cetirizin, cetrizine, cetriz |
+| Pantoprazole | pantop, pantoprazol |
+---
+## Structured Output Target
+After processing, each order should be structured as:
+```json
+{
+  "medicine": "Paracetamol",
+  "medicine_standardized": "Crocin",  // Matched from DB
+  "form": "tablet",
+  "quantity": "300 strips",
+  "dosage": "650mg",
+  "manufacturer": "Sun Pharma",
+  "original_segment": "Paracetamol tablet 300 strips"
+}
+```
+---
+## Tips for Model Training
+1. **Normalize Numbers**: Convert "twenty" → 20, "hundred" → 100
+2. **Handle Filler Words**: Ignore "uh", "um", "like", "maybe"
+3. **Fuzzy Match Medicine Names**: Use 80%+ confidence threshold
+4. **Default Values**: If unit not specified, use DB default
+5. **Case Insensitive**: All matching should be lowercase

pyproject.toml ADDED Viewed

	@@ -0,0 +1,34 @@

+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[project]
+name = "pharma-voice-orders"
+version = "1.0.0"
+description = "Voice-to-Order: Streamlit app for pharmaceutical distributors using Speech-to-Text (Whisper) and entity extraction."
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+    "jiwer>=4.0.0",
+    "librosa>=0.11.0",
+    "noisereduce>=3.0.3",
+    "numpy<2",
+    "openpyxl>=3.1.5",
+    "pandas>=2.3.3",
+    "rapidfuzz>=3.14.3",
+    "regex>=2026.1.15",
+    "soundfile>=0.13.1",
+    "streamlit>=1.53.0",
+    "torch>=2.9.1",
+    "transformers>=4.57.6",
+    "webrtcvad>=2.0.10",
+]
+[project.scripts]
+start = "main:main"
+[tool.hatch.build.targets.wheel]
+packages = ["core", "simulation", "evaluation"]
+[tool.uv.sources]
+transformers = { git = "https://github.com/huggingface/transformers.git", rev = "65dc261512cbdb1ee72b88ae5b222f2605aad8e5" }

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+streamlit
+pandas
+openpyxl
+torch
+transformers
+librosa
+noisereduce
+soundfile
+rapidfuzz
+jiwer
+regex
+webrtcvad
+numpy<2.0.0
+huggingface_hub

simulation/manufacturer_db.py ADDED Viewed

	@@ -0,0 +1,98 @@

+import pandas as pd
+import json
+from pathlib import Path
+from rapidfuzz import process, fuzz
+class ManufacturerDB:
+    def __init__(self, data_dir: str = "data"):
+        self.data_dir = Path(data_dir)
+        self.manufacturers = self._load_manufacturers()
+        self.medicines = self._load_medicines()
+        self.aliases = self._load_aliases()
+    def _load_manufacturers(self) -> pd.DataFrame:
+        path = self.data_dir / "manufacturers.csv"
+        if not path.exists():
+            return pd.DataFrame(columns=["id", "name", "code"])
+        return pd.read_csv(path)
+    def _load_medicines(self) -> pd.DataFrame:
+        path = self.data_dir / "medicines.csv"
+        if not path.exists():
+            return pd.DataFrame(columns=["medicine_name", "dosage", "unit", "manufacturer_id"])
+        return pd.read_csv(path)
+    def _load_aliases(self) -> dict:
+        """Load pronunciation aliases from JSON file."""
+        path = self.data_dir / "aliases.json"
+        if path.exists():
+            with open(path, 'r') as f:
+                return json.load(f)
+        return {}
+    def _resolve_alias(self, name: str) -> str:
+        """Check if name is an alias for a known medicine."""
+        name_lower = name.lower()
+        for canonical, aliases in self.aliases.items():
+            if name_lower in aliases or name_lower == canonical:
+                return canonical
+        return name
+    def get_all_manufacturers(self) -> list:
+        """Return list of manufacturer dicts."""
+        return self.manufacturers.to_dict('records')
+    def get_manufacturer_by_medicine(self, medicine_name: str) -> dict:
+        """
+        Find manufacturer for a given medicine name using fuzzy matching.
+        Returns manufacturer dict or None.
+        """
+        # Resolve potential alias first
+        resolved_name = self._resolve_alias(medicine_name)
+        # Get list of known medicines
+        known_meds = self.medicines['medicine_name'].tolist()
+        # Fuzzy match to find closest medicine name
+        match = process.extractOne(resolved_name, known_meds, scorer=fuzz.WRatio)
+        if not match or match[1] < 75:  # Raised confidence threshold
+            return None
+        dataset_med_name = match[0]
+        # Look up manufacturer ID
+        med_row = self.medicines[self.medicines['medicine_name'] == dataset_med_name].iloc[0]
+        mfr_id = med_row['manufacturer_id']
+        # Get manufacturer details
+        mfr_row = self.manufacturers[self.manufacturers['id'] == mfr_id].iloc[0]
+        return {
+            "id": mfr_id,
+            "name": mfr_row["name"],
+            "medicine_match": dataset_med_name,  # Return the standardized name
+            "confidence": match[1]
+        }
+    def get_orders_by_manufacturer(self, current_orders: list) -> dict:
+        """
+        Group a list of extracted orders by manufacturer.
+        Returns: { "Sun Pharma": [orders...], "Cipla": [orders...] }
+        """
+        grouped = {mfr: [] for mfr in self.manufacturers['name'].tolist()}
+        grouped['Unknown'] = []  # For unmapped medicines
+        for order in current_orders:
+            med_name = order.get('medicine')
+            mfr_info = self.get_manufacturer_by_medicine(med_name)
+            if mfr_info:
+                # Update order with standardized name
+                order['medicine_standardized'] = mfr_info['medicine_match']
+                grouped[mfr_info['name']].append(order)
+            else:
+                grouped['Unknown'].append(order)
+        return grouped

simulation/order_queue.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from typing import List, Dict
+import streamlit as st
+from .manufacturer_db import ManufacturerDB
+class OrderQueue:
+    def __init__(self):
+        # Initialize session state for orders if not exists
+        if 'orders' not in st.session_state:
+            st.session_state.orders = []
+    def add_order(self, order: Dict):
+        """
+        Add a new order to the queue.
+        Order dict should contain: {'medicine': str, 'quantity': str, 'dosage': str}
+        """
+        st.session_state.orders.append(order)
+    def get_all_orders(self) -> List[Dict]:
+        return st.session_state.orders
+    def clear_queue(self):
+        st.session_state.orders = []
+    def get_grouped_orders(self, db: ManufacturerDB) -> Dict[str, List[Dict]]:
+        """Group all current orders by manufacturer."""
+        return db.get_orders_by_manufacturer(st.session_state.orders)

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff