Spaces:

Hamdy005
/

raij-ai

Running

App Files Files Community

github-actions[bot] commited on Mar 5

Commit

041edd5

1 Parent(s): 19363d6

chore: sync from GitHub 2026-03-05 15:48:45 UTC

Browse files

Files changed (6) hide show

models.py +1 -1
recommenders/content_based.py +2 -3
smart_search/Documentation.md +183 -0
smart_search/routes.py +9 -1
smart_search/smart_search.py +76 -0
utils.py +74 -131

models.py CHANGED Viewed

@@ -29,7 +29,7 @@ def get_image_pipeline():
         print("Loading image model (quantized)...")
         _image_pipeline = pipeline(
             task="zero-shot-image-classification",
-            model="openai/clip-vit-base-patch32",
             torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
         )
         # Apply dynamic quantization on CPU to reduce memory ~2-3x

         print("Loading image model (quantized)...")
         _image_pipeline = pipeline(
             task="zero-shot-image-classification",
+            model="google/siglip-base-patch16-224",
             torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
         )
         # Apply dynamic quantization on CPU to reduce memory ~2-3x

recommenders/content_based.py CHANGED Viewed

@@ -2,10 +2,9 @@
 Content-Based Recommender (Embedding-Based)
 ============================================
-Uses the 'products_recommend' ChromaDB collection (title + description + tags)
 and sentence-transformer embeddings to build user profiles and recommend
-similar products.  Descriptions are kept here (unlike search) so that
-related accessories/peripherals surface as cross-sell recommendations.
 How it works:
     1. Gather all user interactions grouped by user_id from 4 tables:

 Content-Based Recommender (Embedding-Based)
 ============================================
+Uses the ChromaDB 'products' collection (title + description + tags)
 and sentence-transformer embeddings to build user profiles and recommend
+similar products.
 How it works:
     1. Gather all user interactions grouped by user_id from 4 tables:

smart_search/Documentation.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# Smart Search — Technical Documentation
+Multi-modal product search system supporting **text**, **image**, and **audio** queries. Uses a two-stage pipeline: tag-based filtering followed by semantic similarity search.
+---
+## Architecture
+```
+routes.py          → FastAPI endpoints (text / image / audio / product details)
+smart_search.py    → Core search logic (tag filter, semantic search, data helpers)
+categories.txt     → Category labels for zero-shot image classification
+whisper_finetuned_ct2/  → Fine-tuned Faster-Whisper model for Arabic/English audio
+utils.py           → Shared helpers (Supabase clients, ChromaDB, vector DB management)
+models.py          → Lazy-loaded ML model singletons (embedder, CLIP, Whisper)
+```
+---
+## Search Pipeline
+```
+User Query (text / image / audio)
+  │
+  ├── [Image] Zero-shot CLIP classification → predicted category label
+  ├── [Audio] Faster-Whisper transcription  → text caption
+  └── [Text]  Used directly
+  │
+  ▼
+┌─────────────────────────────────────────────────┐
+│  Stage 1 — Tag Filter (Supabase)                │
+│  Query: WHERE tags && ['token1', 'token2', ...]  │
+│  Returns: list of product IDs                   │
+│                                                 │
+│  • Tokenizes query into individual words         │
+│  • Uses Supabase .overlaps() on the tags column  │
+│  • Hard-filters to only categorically relevant   │
+│    products, eliminating cross-category bleed    │
+└─────────────────────┬───────────────────────────┘
+                      │
+              Has matches?
+             ╱           ╲
+           Yes            No (fallback)
+            │               │
+            ▼               ▼
+┌───────────────────┐  ┌───────────────────────┐
+│ Scoped Semantic   │  │ Global Semantic Search │
+│ Search (ChromaDB) │  │ (ChromaDB, full k)     │
+│ filter={id: $in}  │  │ No filter applied       │
+│ k=min(top_k, n)   │  │                         │
+└─────────┬─────────┘  └──────────┬──────────────┘
+          │                       │
+          └───────────┬───────────┘
+                      │
+                      ▼
+            Ranked Results
+        (product_ids, titles, distances)
+```
+### Why Two Stages?
+Pure semantic search on `title + description + tags` embeddings causes **cross-category bleed** — a "smartphone" query returns phone cases and chargers because their descriptions mention "smartphone". The tag filter eliminates this:
+| Query | Without Tag Filter | With Tag Filter |
+|-------|--------------------|-----------------|
+| "smartphone" | Smartphones, phone cases, chargers, screen protectors | Only actual smartphones |
+| "laptop bag" | Laptop bags, laptops, backpacks | Only products tagged "laptop bag" |
+The **fallback** ensures specific brand queries like "Samsung Galaxy A15 ceramic white" still work — if no tags match, global semantic search handles it.
+---
+## Search Modalities
+### 1. Text Search (`POST /search/text`)
+Direct text-to-product search.
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `query` | string | required | Search query text |
+| `top_k` | int | 100 | Max results to return |
+### 2. Image Search (`POST /search/image`)
+Zero-shot image classification → text search pipeline.
+1. User uploads an image
+2. CLIP model (`openai/clip-vit-base-patch16`) classifies it against category labels from `categories.txt`
+3. The predicted category becomes the text query for the search pipeline
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `image` | file | required | Product image (JPEG/PNG) |
+| `top_k` | int | 100 | Max results to return |
+**Response** includes `predicted_category` and `confidence_score` alongside results.
+### 3. Audio Search (`POST /search/audio`)
+Speech-to-text → text search pipeline.
+1. User uploads an audio clip
+2. Fine-tuned Faster-Whisper model transcribes it (supports Arabic and English)
+3. The transcription becomes the text query for the search pipeline
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `audio` | file | required | Audio file (WAV/MP3/etc.) |
+| `language` | string | "en" | Language code ("en" or "ar") |
+| `top_k` | int | 100 | Max results to return |
+**Response** includes `caption` (transcription) alongside results.
+---
+## Vector Database
+### Embedding Model
+- **Model**: `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` (384-dim)
+- **Storage**: ChromaDB (persisted at `src/chroma_db/`)
+- **Collection**: `products` — each document is `title + description + tags`
+### Document Content
+Each product is embedded as:
+```
+"{title} {description} {tag1 tag2 tag3 ...}"
+```
+Metadata stored per document:
+| Field | Description |
+|-------|-------------|
+| `id` | Product UUID |
+| `title` | Product title |
+| `tags` | Space-separated tags string |
+### Adding Products
+Products are indexed in two ways:
+1. **Bulk at startup** — `update_vectordb()` in `app.py` syncs all Supabase products to ChromaDB on server start. Only new products (not already indexed) are added.
+2. **Single product via API** — `POST /vectordb/add?product_id=<uuid>` adds one product's embedding without restarting the server. Useful when a new product is created.
+---
+## Other Endpoints
+### Random Products (`GET /products/random`)
+Returns products for initial display before the user searches.
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `limit` | int | 20 | Number of products |
+### Product Details (`GET /product/{product_id}`)
+Returns full product info: title, description, price, old_price, sku, stock, seller name, and all images.
+---
+## Database Tables Used
+| Table | Fields Used | Purpose |
+|-------|-------------|---------|
+| `products` | id, title, description, tags, price, old_price, sku, stock, store_id, status | Product catalog + tag filter |
+| `product_images` | product_id, url | Product images |
+| `stores` | id, name | Store/seller name |
+---
+## Models Used
+| Component | Model | Size | Purpose |
+|-----------|-------|------|---------|
+| Text Embeddings | `paraphrase-multilingual-MiniLM-L12-v2` | ~120 MB | Semantic similarity (384-dim vectors) |
+| Image Classification | `openai/clip-vit-base-patch16` | ~600 MB | Zero-shot image → category |
+| Speech-to-Text | Fine-tuned Faster-Whisper (CTranslate2) | ~150 MB | Arabic/English audio transcription |
+All models run on CPU with no GPU requirement.

smart_search/routes.py CHANGED Viewed

@@ -3,7 +3,8 @@ import uvicorn
 from PIL import Image
 from fastapi import FastAPI, UploadFile, File
 from models import IMAGE_PIPELINE, AUDIO_MODEL
-from utils import similarity_search, load_categories, load_audio_bytes_ffmpeg, get_product_images, get_product_prices, get_product_details, get_random_products
 def register_search_routes(app: FastAPI):
@@ -106,3 +107,10 @@ def register_search_routes(app: FastAPI):
         if product:
             return product
         return {"error": "Product not found"}

 from PIL import Image
 from fastapi import FastAPI, UploadFile, File
 from models import IMAGE_PIPELINE, AUDIO_MODEL
+from utils import get_product_images, add_product_to_vectordb, get_product_prices, get_product_details, get_random_products, load_categories, load_audio_bytes_ffmpeg
+from smart_search.smart_search import similarity_search
 def register_search_routes(app: FastAPI):
         if product:
             return product
         return {"error": "Product not found"}
+    ## Add Product Embedding (called when a new product is created)
+    @app.post('/vectordb/add')
+    def add_product_embedding(product_id: str):
+        """Add a single product's embedding to ChromaDB without restarting the server."""
+        result = add_product_to_vectordb(product_id)
+        return result

smart_search/smart_search.py ADDED Viewed

	@@ -0,0 +1,76 @@

+"""
+Smart Search — Core Search Functions
+======================================
+Two-stage search pipeline:
+  1. Tag filter (Supabase)  — restrict to products whose tags overlap with query tokens.
+  2. Semantic search (ChromaDB) — vector similarity within the filtered set.
+  Fallback: if no tag matches, run unrestricted semantic search.
+"""
+from utils import supabase, get_vector_db
+# ═══════════════════════ Search Pipeline ════════════════════════
+def _tag_search(query_tokens: list) -> list:
+    """
+    Stage 1 — Tag filter.
+    Query Supabase for products whose tags array overlaps with any query token.
+    Returns a list of matching product IDs, or [] if none / on error.
+    """
+    if supabase is None or not query_tokens:
+        return []
+    try:
+        response = (
+            supabase.table("products")
+            .select("id")
+            .overlaps("tags", query_tokens)
+            .execute()
+        )
+        return [row["id"] for row in response.data]
+    except Exception as e:
+        print(f"⚠️ Tag search failed, falling back to pure semantic: {e}")
+        return []
+def similarity_search(query, top_k):
+    """
+    Two-stage search pipeline:
+      1. Tag filter  — restrict to products whose tags overlap with query tokens
+      2. Semantic search — vector similarity within the filtered set
+      3. Gap fill — if tag-filtered results < top_k, pad with global semantic results
+      Fallback: if no tag matches at all, run unrestricted semantic search.
+    """
+    query_tokens = [t.lower() for t in query.split()]
+    tag_filtered_ids = _tag_search(query_tokens)
+    if tag_filtered_ids:
+        # Semantic search scoped to tag-matched products only
+        k = min(top_k, len(tag_filtered_ids))
+        where_filter = {"id": {"$in": tag_filtered_ids}}
+        primary_results = get_vector_db().similarity_search_with_score(
+            query, k=k, filter=where_filter
+        )
+        # Gap fill: if we got fewer than top_k, pad with global semantic results
+        if len(primary_results) < top_k:
+            gap = top_k - len(primary_results)
+            seen_ids = {doc.metadata['id'] for doc, _ in primary_results}
+            fallback_results = get_vector_db().similarity_search_with_score(query, k=top_k)
+            extras = [
+                (doc, dist) for doc, dist in fallback_results
+                if doc.metadata['id'] not in seen_ids
+            ][:gap]
+            relevant_products = primary_results + extras
+        else:
+            relevant_products = primary_results
+    else:
+        # Fallback: no tag matches (e.g. brand-only query) → global semantic search
+        relevant_products = get_vector_db().similarity_search_with_score(query, k=top_k)
+    product_ids = [doc.metadata['id']    for doc, _    in relevant_products]
+    titles      = [doc.metadata['title'] for doc, _    in relevant_products]
+    distances   = [dist                  for _,    dist in relevant_products]
+    return product_ids, titles, distances

utils.py CHANGED Viewed

@@ -31,38 +31,22 @@ if not SUPABASE_URL or not SUPABASE_SERVICE_KEY:
 else:
     supabase_service: Client = create_client(SUPABASE_URL, SUPABASE_SERVICE_KEY)
-## Loading the Vector Databases (lazy — created on first use)
 CHROMA_DB_PATH = str(BASE_DIR / "chroma_db")
-_search_db = None      # title + tags only  (precise search)
-_recommend_db = None   # title + description + tags  (cross-sell recommendations)
-def get_search_db():
-    """ChromaDB collection for search — title + tags only."""
-    global _search_db
-    if _search_db is None:
-        from models import get_embedder
-        _search_db = Chroma(
-            collection_name='products_search',
-            embedding_function=get_embedder(),
-            persist_directory=CHROMA_DB_PATH
-        )
-    return _search_db
-def get_recommend_db():
-    """ChromaDB collection for recommendations — title + description + tags."""
-    global _recommend_db
-    if _recommend_db is None:
         from models import get_embedder
-        _recommend_db = Chroma(
-            collection_name='products_recommend',
             embedding_function=get_embedder(),
             persist_directory=CHROMA_DB_PATH
         )
-    return _recommend_db
-# Backward-compat alias used by content_based recommender
-def get_vector_db():
-    return get_recommend_db()
 def update_vectordb():
@@ -73,95 +57,61 @@ def update_vectordb():
     print("Fetching products from Supabase...")
     products = supabase.table("products").select("id, title, description, tags").execute().data
-    # --- Determine which products are already indexed in each collection ---
-    search_existing = {m["id"] for m in get_search_db().get(include=["metadatas"])["metadatas"]}
-    recommend_existing = {m["id"] for m in get_recommend_db().get(include=["metadatas"])["metadatas"]}
-    search_contents, search_metas = [], []
-    recommend_contents, recommend_metas = [], []
     for product in products:
         pid = product['id']
-        tags = product.get('tags') or []
-        tags_str = ' '.join(tags)
-        title = product.get('title') or ''
-        description = product.get('description') or ''
-        meta = {"id": pid, "title": title, "tags": tags_str}
-        # Search collection: title + tags only (precise)
-        if pid not in search_existing:
-            search_contents.append(f"{title} {tags_str}")
-            search_metas.append(meta)
-        # Recommend collection: title + description + tags (cross-sell)
-        if pid not in recommend_existing:
-            recommend_contents.append(f"{title} {description} {tags_str}")
-            recommend_metas.append(meta)
-    # --- Persist search collection ---
-    if search_contents:
-        get_search_db().add_texts(texts=search_contents, metadatas=search_metas)
-        get_search_db().persist()
-        print(f"✅ Added {len(search_contents)} products to search collection")
     else:
-        print("✅ Search collection is up to date")
-    # --- Persist recommend collection ---
-    if recommend_contents:
-        get_recommend_db().add_texts(texts=recommend_contents, metadatas=recommend_metas)
-        get_recommend_db().persist()
-        print(f"✅ Added {len(recommend_contents)} products to recommend collection")
-    else:
-        print("✅ Recommend collection is up to date")
-def _tag_search(query_tokens: list) -> list:
     """
-    Stage 1 — Tag filter.
-    Query Supabase for products whose tags array overlaps with any query token.
-    Returns a list of matching product IDs, or [] if none / on error.
     """
-    if supabase is None or not query_tokens:
-        return []
-    try:
-        response = (
-            supabase.table("products")
-            .select("id")
-            .overlaps("tags", query_tokens)
-            .execute()
-        )
-        return [row["id"] for row in response.data]
-    except Exception as e:
-        print(f"⚠️ Tag search failed, falling back to pure semantic: {e}")
-        return []
-def similarity_search(query, top_k):
-    """
-    Two-stage search pipeline:
-      1. Tag filter  — restrict to products whose tags overlap with query tokens
-      2. Semantic search — vector similarity within the filtered set
-      Fallback: if no tag matches, run unrestricted semantic search.
-    """
-    query_tokens = [t.lower() for t in query.split()]
-    tag_filtered_ids = _tag_search(query_tokens)
-    if tag_filtered_ids:
-        # Semantic search scoped to tag-matched products only
-        k = min(top_k, len(tag_filtered_ids))
-        where_filter = {"id": {"$in": tag_filtered_ids}}
-        relevant_products = get_search_db().similarity_search_with_score(
-            query, k=k, filter=where_filter
-        )
-    else:
-        # Fallback: no tag matches (e.g. brand-only query) → global semantic search
-        relevant_products = get_search_db().similarity_search_with_score(query, k=top_k)
-    product_ids = [doc.metadata['id']    for doc, _    in relevant_products]
-    titles      = [doc.metadata['title'] for doc, _    in relevant_products]
-    distances   = [dist                  for _,    dist in relevant_products]
-    return product_ids, titles, distances
 def get_product_images(product_ids: list) -> dict:
@@ -198,19 +148,19 @@ def get_product_prices(product_ids: list) -> dict:
     """
     if not product_ids:
         return {}
     try:
         response = supabase.table("products").select("id, price").in_("id", list(product_ids)).execute()
         prices_map = {}
         for row in response.data:
             pid = row.get("id")
             price = row.get("price")
             if pid:
                 prices_map[pid] = price
         return prices_map
     except Exception as e:
         print(f"Error fetching product prices: {e}")
         return {}
@@ -222,18 +172,17 @@ def get_product_details(product_id: str) -> dict:
     Returns product info including title, description, price, old_price, sku, stock, store name, etc.
     """
     try:
-        # Get product details
         response = supabase.table("products").select("*").eq("id", product_id).execute()
         if not response.data:
             return None
         product = response.data[0]
         # Get product images
         images_response = supabase.table("product_images").select("url").eq("product_id", product_id).execute()
         images = [img.get("url") for img in images_response.data if img.get("url")]
         # Get store name
         store_name = None
         store_id = product.get("store_id")
@@ -241,7 +190,7 @@ def get_product_details(product_id: str) -> dict:
             store_response = supabase.table("stores").select("name").eq("id", store_id).execute()
             if store_response.data:
                 store_name = store_response.data[0].get("name")
         return {
             "id": product.get("id"),
             "title": product.get("title"),
@@ -253,7 +202,7 @@ def get_product_details(product_id: str) -> dict:
             "sold_by": store_name,
             "images": images,
         }
     except Exception as e:
         print(f"Error fetching product details: {e}")
         return None
@@ -265,18 +214,15 @@ def get_random_products(limit: int = 10) -> list:
     Returns a list of products with id, title, price, and image_url.
     """
     try:
-        # Get first N products
         response = supabase.table("products").select("id, title, price").limit(limit).execute()
         if not response.data:
             return []
         products = response.data
         product_ids = [p.get("id") for p in products]
-        # Get images for these products
         images_map = get_product_images(product_ids)
         return [
             {
                 "id": p.get("id"),
@@ -286,26 +232,25 @@ def get_random_products(limit: int = 10) -> list:
             }
             for p in products
         ]
     except Exception as e:
         print(f"Error fetching random products: {e}")
         return []
-def load_categories(file_name = None):
     if file_name is None:
-        file_name = str(BASE_DIR / "smart_search" / "categories.txt")
     try:
         with open(file_name, 'r') as file:
             return [line.strip() for line in file.readlines() if line.strip()]
     except FileNotFoundError:
         print("Categories.txt file is not found")
-        return ["Product", "Electronics", "Fashion", "Home"]
 def load_audio_bytes_ffmpeg(audio_bytes):
     process = subprocess.Popen(
         [
             "ffmpeg", "-i", "pipe:0",
@@ -314,11 +259,9 @@ def load_audio_bytes_ffmpeg(audio_bytes):
             "-ar", "16000",
             "pipe:1"
         ],
-        stdin = subprocess.PIPE,
-        stdout = subprocess.PIPE,
-        stderr = subprocess.PIPE
     )
     out, _ = process.communicate(input=audio_bytes)
-    audio_np = np.frombuffer(out, dtype=np.float32)
-    return audio_np

 else:
     supabase_service: Client = create_client(SUPABASE_URL, SUPABASE_SERVICE_KEY)
+## Loading the Vector Database (lazy — created on first use)
 CHROMA_DB_PATH = str(BASE_DIR / "chroma_db")
+_vector_db = None
+def get_vector_db():
+    """Single ChromaDB collection — title + description + tags."""
+    global _vector_db
+    if _vector_db is None:
         from models import get_embedder
+        _vector_db = Chroma(
+            collection_name='products',
             embedding_function=get_embedder(),
             persist_directory=CHROMA_DB_PATH
         )
+    return _vector_db
 def update_vectordb():
     print("Fetching products from Supabase...")
     products = supabase.table("products").select("id, title, description, tags").execute().data
+    existing_ids = {m["id"] for m in get_vector_db().get(include=["metadatas"])["metadatas"]}
+    contents = []
+    metadatas = []
     for product in products:
         pid = product['id']
+        if pid not in existing_ids:
+            tags = product.get('tags') or []
+            tags_str = ' '.join(tags)
+            title = product.get('title') or ''
+            description = product.get('description') or ''
+            contents.append(f"{title} {description} {tags_str}")
+            metadatas.append({"id": pid, "title": title, "tags": tags_str})
+    if contents:
+        get_vector_db().add_texts(texts=contents, metadatas=metadatas)
+        get_vector_db().persist()
+        print(f"✅ Added {len(contents)} new products to ChromaDB")
     else:
+        print("✅ No new products to add, ChromaDB is up to date")
+def add_product_to_vectordb(product_id: str):
     """
+    Add a single product's embedding to ChromaDB.
+    Called via API when a new product is created — no need to restart the server.
     """
+    if supabase is None:
+        return {"error": "Supabase not configured"}
+    # Check if already indexed
+    existing_ids = {m["id"] for m in get_vector_db().get(include=["metadatas"])["metadatas"]}
+    if product_id in existing_ids:
+        return {"status": "already_indexed", "product_id": product_id}
+    # Fetch product from Supabase
+    response = supabase.table("products").select("id, title, description, tags").eq("id", product_id).execute()
+    if not response.data:
+        return {"error": f"Product {product_id} not found in Supabase"}
+    product = response.data[0]
+    tags = product.get('tags') or []
+    tags_str = ' '.join(tags)
+    title = product.get('title') or ''
+    description = product.get('description') or ''
+    content = f"{title} {description} {tags_str}"
+    meta = {"id": product_id, "title": title, "tags": tags_str}
+    get_vector_db().add_texts(texts=[content], metadatas=[meta])
+    get_vector_db().persist()
+    return {"status": "added", "product_id": product_id, "title": title}
 def get_product_images(product_ids: list) -> dict:
     """
     if not product_ids:
         return {}
     try:
         response = supabase.table("products").select("id, price").in_("id", list(product_ids)).execute()
         prices_map = {}
         for row in response.data:
             pid = row.get("id")
             price = row.get("price")
             if pid:
                 prices_map[pid] = price
         return prices_map
     except Exception as e:
         print(f"Error fetching product prices: {e}")
         return {}
     Returns product info including title, description, price, old_price, sku, stock, store name, etc.
     """
     try:
         response = supabase.table("products").select("*").eq("id", product_id).execute()
         if not response.data:
             return None
         product = response.data[0]
         # Get product images
         images_response = supabase.table("product_images").select("url").eq("product_id", product_id).execute()
         images = [img.get("url") for img in images_response.data if img.get("url")]
         # Get store name
         store_name = None
         store_id = product.get("store_id")
             store_response = supabase.table("stores").select("name").eq("id", store_id).execute()
             if store_response.data:
                 store_name = store_response.data[0].get("name")
         return {
             "id": product.get("id"),
             "title": product.get("title"),
             "sold_by": store_name,
             "images": images,
         }
     except Exception as e:
         print(f"Error fetching product details: {e}")
         return None
     Returns a list of products with id, title, price, and image_url.
     """
     try:
         response = supabase.table("products").select("id, title, price").limit(limit).execute()
         if not response.data:
             return []
         products = response.data
         product_ids = [p.get("id") for p in products]
         images_map = get_product_images(product_ids)
         return [
             {
                 "id": p.get("id"),
             }
             for p in products
         ]
     except Exception as e:
         print(f"Error fetching random products: {e}")
         return []
+def load_categories(file_name=None):
+    categories_path = BASE_DIR / "smart_search" / "categories.txt"
     if file_name is None:
+        file_name = str(categories_path)
     try:
         with open(file_name, 'r') as file:
             return [line.strip() for line in file.readlines() if line.strip()]
     except FileNotFoundError:
         print("Categories.txt file is not found")
+        return ["Product", "Electronics", "Fashion", "Home"]
 def load_audio_bytes_ffmpeg(audio_bytes):
     process = subprocess.Popen(
         [
             "ffmpeg", "-i", "pipe:0",
             "-ar", "16000",
             "pipe:1"
         ],
+        stdin=subprocess.PIPE,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE
     )
     out, _ = process.communicate(input=audio_bytes)
+    return np.frombuffer(out, dtype=np.float32)