Spaces:

decodingdatascience
/

insuranceomantel

Sleeping

App Files Files Community

sunifjagirdar1989 commited on Sep 28, 2025

Commit

60eb159

verified ·

1 Parent(s): 26b4abe

Upload 5 files

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +183 -13
app.py +125 -125
dds_logo.png +3 -0
insurance.pdf +3 -0
requirements.txt +8 -8

.gitattributes CHANGED Viewed

@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 insurance.pdf filter=lfs diff=lfs merge=lfs -text
 data/insurance.pdf filter=lfs diff=lfs merge=lfs -text
 data/dds_logo.png filter=lfs diff=lfs merge=lfs -text

 insurance.pdf filter=lfs diff=lfs merge=lfs -text
 data/insurance.pdf filter=lfs diff=lfs merge=lfs -text
 data/dds_logo.png filter=lfs diff=lfs merge=lfs -text
+dds_logo.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,13 +1,183 @@
----
-title: Insuranceomantel
-emoji: 🐨
-colorFrom: purple
-colorTo: blue
-sdk: gradio
-sdk_version: 5.47.0
-app_file: app.py
-pinned: false
-short_description: insuranceomantel
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+DDS Insurance Q&A — RAG Assistant (Pinecone + OpenAI + Gradio)
+Summary: A beginner-friendly, document-grounded insurance bot that you can replicate and deploy on Hugging Face Spaces. It answers only from your uploaded insurance documents using LlamaIndex + Pinecone (serverless) + OpenAI with a simple, polite system prompt.
+What You’ll Get
+Deployed Space URL you can share.
+Grounded answers (no docs → the bot politely says it can’t find it).
+Simple UI with an FAQ dropdown + free-text question box.
+Clean structure designed for easy replication.
+Features
+Answers strictly from your data/ documents (RAG).
+Pinecone serverless index (AWS us-east-1, cosine, 1536-dim).
+OpenAI for embeddings (text-embedding-3-small) and LLM (gpt-4o-mini).
+Gradio interface with a centered required logo (data/dds_logo.png).
+Beginner-friendly defaults and error messages.
+Repository Structure
+.
+├─ data/                     # Your insurance docs + required logo
+│  └─ dds_logo.png           # REQUIRED (shown in header)
+├─ app.py                    # Main app: indexing + query + Gradio UI
+├─ requirements.txt          # Dependencies
+└─ README.md                 # This file
+Configuration (in app.py)
+EMBED_MODEL = "text-embedding-3-small"   # 1536-dim
+LLM_MODEL   = "gpt-4o-mini"
+TOP_K       = 4                          # retrieval depth
+System Prompt (keeps answers grounded + polite):
+SYSTEM_PROMPT = """You are Aisha, a polite and professional Insurance assistant.
+Answer ONLY using the information found in the indexed insurance document(s).
+If the answer is not in the document(s), say: "I couldn’t find that in the document."
+Keep responses concise, helpful, and courteous.
+"""
+FAQ List (editable):
+FAQS = [
+    "",
+    "What benefits are covered under the policy?",
+    "How do I file a claim and what documents are required?",
+    "What are the exclusions and limitations?",
+    "Is pre-authorization needed for hospitalization?",
+    "What is the reimbursement timeline?",
+    "How are outpatient vs inpatient services handled?",
+    "How can I check my network hospitals/clinics?",
+    "What is the co-pay or deductible policy?",
+]
+Deploy to Hugging Face Spaces (Beginner-Friendly)
+1) Create a Space
+Go to Hugging Face → Spaces → New Space
+SDK: Gradio
+Visibility/licensing: your choice
+2) Add Project Files
+Upload these into your Space:
+app.py
+requirements.txt
+README.md
+Create folder data/ and upload:
+Your insurance documents (PDF/TXT/MD…)
+dds_logo.png (mandatory; exact filename)
+Tip: Your Space file tree should match the Repository Structure above.
+3) Set Secrets (Environment Variables)
+In Space → Settings → Variables and secrets, add:
+OPENAI_API_KEY → your OpenAI key
+PINECONE_API_KEY → your Pinecone key
+No legacy Pinecone environment URL needed. This app uses pinecone-client ≥ 5 with serverless.
+4) Build & Run
+Spaces auto-install from requirements.txt.
+Default CPU hardware is fine.
+Entry point auto-detected from app.py.
+On first start, the app will:
+Ensure a Pinecone serverless index:
+dds-insurance-index · cosine · 1536-dim · aws/us-east-1
+Read and index documents from data/
+Launch the Gradio UI
+Your deployed link is simply the Space URL once its status is Running.
+5) Updating Documents Later
+Upload/change files in data/
+Click Restart on the Space so it re-indexes your documents
+Troubleshooting (Common Issues)
+“Missing PINECONE_API_KEY or OPENAI_API_KEY”
+Add both secrets in Space → Settings → Variables and secrets.
+Pinecone 401 / “Malformed domain”
+Ensure you’re on pinecone-client>=5.0.1 (already in requirements.txt).
+Use a valid Pinecone API key; no environment URL needed for serverless.
+“Logo not found: data/dds_logo.png”
+Upload an image named exactly dds_logo.png into the data/ folder.
+“No documents found in data/”
+Upload at least one doc (PDF/TXT/MD) into data/, then Restart the Space.
+OpenAI authorization/rate-limit errors
+Confirm key validity and model access; reduce usage if rate-limited.
+Slow first load
+First run installs dependencies and builds the index; later runs are faster.
+Manual Test Checklist
+Ask a question clearly answered in your docs → response should quote that knowledge.
+Ask something not in your docs → bot should say it can’t find it.
+Adjust TOP_K in app.py to see how answer completeness changes.
+Requirements (from requirements.txt)
+gradio>=4.44.0
+pinecone-client>=5.0.1
+openai>=1.51.0
+llama-index>=0.11.0
+llama-index-vector-stores-pinecone>=0.3.0
+llama-index-embeddings-openai>=0.3.0
+llama-index-llms-openai>=0.2.0
+tiktoken>=0.7.0
+Customization Ideas
+Swap LLMs by editing LLM_MODEL.
+Add a file uploader to refresh docs from the UI.
+Add metadata filters (e.g., policy type).
+Log queries to refine the FAQ list.
+License
+Add your chosen license (e.g., MIT) as LICENSE.
+Acknowledgments
+Thanks to LlamaIndex, Pinecone, OpenAI, and Gradio for the tooling that makes this simple and reproducible.

app.py CHANGED Viewed

@@ -1,125 +1,125 @@
-# app.py — Insurance Q&A (RAG) with system prompt + simple config
-import os
-import gradio as gr
-from pinecone import Pinecone, ServerlessSpec
-from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
-from llama_index.vector_stores.pinecone import PineconeVectorStore
-from llama_index.embeddings.openai import OpenAIEmbedding
-from llama_index.llms.openai import OpenAI
-# --- System Prompt (polite + answer-from-document constraint) ---
-SYSTEM_PROMPT = """You are Aisha, a polite and professional Insurance assistant.
-Answer ONLY using the information found in the indexed insurance document(s).
-If the answer is not in the document(s), say: "I couldn’t find that in the document."
-Keep responses concise, helpful, and courteous.
-"""
-# ===== Minimal CONFIG (only necessary keys) =====
-PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
-OPENAI_API_KEY   = os.getenv("OPENAI_API_KEY")
-if not PINECONE_API_KEY or not OPENAI_API_KEY:
-    raise RuntimeError("Missing PINECONE_API_KEY or OPENAI_API_KEY (set them in Space → Settings → Variables).")
-DATA_DIR = "data"                         # Put insurance docs here (e.g., data/insurance.pdf)
-LOGO_PATH = os.path.join(DATA_DIR, "dds_logo.png")  # Mandatory logo
-if not os.path.exists(LOGO_PATH):
-    raise RuntimeError("Logo not found: data/dds_logo.png.png (commit it to your Space repo).")
-EMBED_MODEL = "text-embedding-3-small"    # 1536-dim
-LLM_MODEL   = "gpt-4o-mini"
-TOP_K       = 4                            # internal similarity_top_k
-# ===== LlamaIndex / Pinecone (simple, fixed serverless: aws/us-east-1) =====
-Settings.embed_model = OpenAIEmbedding(model=EMBED_MODEL, api_key=OPENAI_API_KEY)
-Settings.llm = OpenAI(model=LLM_MODEL, api_key=OPENAI_API_KEY, system_prompt=SYSTEM_PROMPT)
-pc = Pinecone(api_key=PINECONE_API_KEY)
-def ensure_index(name: str, dim: int = 1536):
-    names = [i["name"] for i in pc.list_indexes()]
-    if name not in names:
-        pc.create_index(
-            name=name, dimension=dim, metric="cosine",
-            spec=ServerlessSpec(cloud="aws", region="us-east-1"),
-        )
-    return pc.Index(name)
-# Fixed index name for simplicity
-pinecone_index = ensure_index("dds-insurance-index", dim=1536)
-vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
-def bootstrap_index():
-    if not os.path.isdir(DATA_DIR):
-        raise RuntimeError("No 'data/' directory found. Commit your documents to data/ in the Space repo.")
-    docs = SimpleDirectoryReader(DATA_DIR).load_data()
-    if not docs:
-        raise RuntimeError("No documents found in data/. Add e.g., data/insurance.pdf")
-    storage_ctx = StorageContext.from_defaults(vector_store=vector_store)
-    VectorStoreIndex.from_documents(docs, storage_context=storage_ctx, show_progress=True)
-bootstrap_index()
-def answer(query: str) -> str:
-    if not query.strip():
-        return "Please enter a question (or select one from the FAQ list)."
-    index = VectorStoreIndex.from_vector_store(vector_store)
-    resp = index.as_query_engine(similarity_top_k=TOP_K).query(query)
-    return str(resp)
-FAQS = [
-    "",
-    "What benefits are covered under the policy?",
-    "How do I file a claim and what documents are required?",
-    "What are the exclusions and limitations?",
-    "Is pre-authorization needed for hospitalization?",
-    "What is the reimbursement timeline?",
-    "How are outpatient vs inpatient services handled?",
-    "How can I check my network hospitals/clinics?",
-    "What is the co-pay or deductible policy?",
-]
-def use_faq(selected_faq: str, free_text: str):
-    prompt = (selected_faq or "").strip() or (free_text or "").strip()
-    if not prompt:
-        return "", "Please select a FAQ or type your question."
-    return prompt, answer(prompt)
-# ===== UI =====
-CSS = """
-.header { display:flex; flex-direction:column; align-items:center; gap:6px; }
-.logo img { width:300px; height:300px; object-fit:contain; }  /* fixed 300x300 */
-.title { text-align:center; font-weight:700; font-size:1.4rem; margin:6px 0 0 0; }
-.subnote { text-align:center; margin-top:-2px; opacity:0.8; }
-"""
-with gr.Blocks(css=CSS, theme=gr.themes.Soft()) as demo:
-    with gr.Row():
-        with gr.Column():
-            gr.Markdown("<div class='header'>")
-            gr.Image(value=LOGO_PATH, show_label=False, elem_classes=["logo"])
-            gr.Markdown(
-                "<h1 class='title'>DDS Insurance Q&A — RAG Assistant</h1>"
-                "<p class='subnote'>Answers strictly from your insurance document(s)</p>"
-            )
-            gr.Markdown("</div>")
-    with gr.Row():
-        with gr.Column(scale=1):
-            gr.Markdown("### Ask from Frequently Asked Questions")
-            faq = gr.Dropdown(choices=FAQS, value=FAQS[0], label="Select a common question")
-            gr.Markdown("### Or type your question")
-            user_q = gr.Textbox(
-                label="Your question",
-                placeholder="e.g., What is covered under outpatient benefits?",
-                lines=2
-            )
-            ask_btn = gr.Button("Ask", variant="primary")
-        with gr.Column(scale=1):
-            chosen_prompt = gr.Textbox(label="Query sent", interactive=False)
-            answer_box = gr.Markdown()
-    ask_btn.click(use_faq, inputs=[faq, user_q], outputs=[chosen_prompt, answer_box])
-if __name__ == "__main__":
-    demo.launch()

+# app.py — Insurance Q&A (RAG) with system prompt + simple config
+import os
+import gradio as gr
+from pinecone import Pinecone, ServerlessSpec
+from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
+from llama_index.vector_stores.pinecone import PineconeVectorStore
+from llama_index.embeddings.openai import OpenAIEmbedding
+from llama_index.llms.openai import OpenAI
+# --- System Prompt (polite + answer-from-document constraint) ---
+SYSTEM_PROMPT = """You are Aisha, a polite and professional Insurance assistant.
+Answer ONLY using the information found in the indexed insurance document(s).
+If the answer is not in the document(s), say: "I couldn’t find that in the document."
+Keep responses concise, helpful, and courteous.
+"""
+# ===== Minimal CONFIG (only necessary keys) =====
+PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
+OPENAI_API_KEY   = os.getenv("OPENAI_API_KEY")
+if not PINECONE_API_KEY or not OPENAI_API_KEY:
+    raise RuntimeError("Missing PINECONE_API_KEY or OPENAI_API_KEY (set them in Space → Settings → Variables).")
+DATA_DIR = "data"                         # Put insurance docs here (e.g., data/insurance.pdf)
+LOGO_PATH = os.path.join(DATA_DIR, "dds_logo.png")  # Mandatory logo
+if not os.path.exists(LOGO_PATH):
+    raise RuntimeError("Logo not found: data/dds_logo.png.png (commit it to your Space repo).")
+EMBED_MODEL = "text-embedding-3-small"    # 1536-dim
+LLM_MODEL   = "gpt-4o-mini"
+TOP_K       = 4                            # internal similarity_top_k
+# ===== LlamaIndex / Pinecone (simple, fixed serverless: aws/us-east-1) =====
+Settings.embed_model = OpenAIEmbedding(model=EMBED_MODEL, api_key=OPENAI_API_KEY)
+Settings.llm = OpenAI(model=LLM_MODEL, api_key=OPENAI_API_KEY, system_prompt=SYSTEM_PROMPT)
+pc = Pinecone(api_key=PINECONE_API_KEY)
+def ensure_index(name: str, dim: int = 1536):
+    names = [i["name"] for i in pc.list_indexes()]
+    if name not in names:
+        pc.create_index(
+            name=name, dimension=dim, metric="cosine",
+            spec=ServerlessSpec(cloud="aws", region="us-east-1"),
+        )
+    return pc.Index(name)
+# Fixed index name for simplicity
+pinecone_index = ensure_index("dds-insurance-index", dim=1536)
+vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
+def bootstrap_index():
+    if not os.path.isdir(DATA_DIR):
+        raise RuntimeError("No 'data/' directory found. Commit your documents to data/ in the Space repo.")
+    docs = SimpleDirectoryReader(DATA_DIR).load_data()
+    if not docs:
+        raise RuntimeError("No documents found in data/. Add e.g., data/insurance.pdf")
+    storage_ctx = StorageContext.from_defaults(vector_store=vector_store)
+    VectorStoreIndex.from_documents(docs, storage_context=storage_ctx, show_progress=True)
+bootstrap_index()
+def answer(query: str) -> str:
+    if not query.strip():
+        return "Please enter a question (or select one from the FAQ list)."
+    index = VectorStoreIndex.from_vector_store(vector_store)
+    resp = index.as_query_engine(similarity_top_k=TOP_K).query(query)
+    return str(resp)
+FAQS = [
+    "",
+    "What benefits are covered under the policy?",
+    "How do I file a claim and what documents are required?",
+    "What are the exclusions and limitations?",
+    "Is pre-authorization needed for hospitalization?",
+    "What is the reimbursement timeline?",
+    "How are outpatient vs inpatient services handled?",
+    "How can I check my network hospitals/clinics?",
+    "What is the co-pay or deductible policy?",
+]
+def use_faq(selected_faq: str, free_text: str):
+    prompt = (selected_faq or "").strip() or (free_text or "").strip()
+    if not prompt:
+        return "", "Please select a FAQ or type your question."
+    return prompt, answer(prompt)
+# ===== UI =====
+CSS = """
+.header { display:flex; flex-direction:column; align-items:center; gap:6px; }
+.logo img { width:300px; height:300px; object-fit:contain; }  /* fixed 300x300 */
+.title { text-align:center; font-weight:700; font-size:1.4rem; margin:6px 0 0 0; }
+.subnote { text-align:center; margin-top:-2px; opacity:0.8; }
+"""
+with gr.Blocks(css=CSS, theme=gr.themes.Soft()) as demo:
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("<div class='header'>")
+            gr.Image(value=LOGO_PATH, show_label=False, elem_classes=["logo"])
+            gr.Markdown(
+                "<h1 class='title'>DDS Insurance Q&A — RAG Assistant</h1>"
+                "<p class='subnote'>Answers strictly from your insurance document(s)</p>"
+            )
+            gr.Markdown("</div>")
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### Ask from Frequently Asked Questions")
+            faq = gr.Dropdown(choices=FAQS, value=FAQS[0], label="Select a common question")
+            gr.Markdown("### Or type your question")
+            user_q = gr.Textbox(
+                label="Your question",
+                placeholder="e.g., What is covered under outpatient benefits?",
+                lines=2
+            )
+            ask_btn = gr.Button("Ask", variant="primary")
+        with gr.Column(scale=1):
+            chosen_prompt = gr.Textbox(label="Query sent", interactive=False)
+            answer_box = gr.Markdown()
+    ask_btn.click(use_faq, inputs=[faq, user_q], outputs=[chosen_prompt, answer_box])
+if __name__ == "__main__":
+    demo.launch()

dds_logo.png ADDED Viewed

Git LFS Details

SHA256: b42f21a6a20156eabe67a0b0bfe99984b05ca38324186c5a1277d1d0a51e20a8
Pointer size: 132 Bytes
Size of remote file: 1.42 MB

insurance.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:536603a97eea5752c1447b7411ad4c03054d6d0f3a3bc1c887f3dc26de8e7892
+size 1341586

requirements.txt CHANGED Viewed

@@ -1,8 +1,8 @@
-gradio>=4.44.0
-pinecone-client>=5.0.1
-openai>=1.51.0
-llama-index>=0.11.0
-llama-index-vector-stores-pinecone>=0.3.0
-llama-index-embeddings-openai>=0.3.0
-llama-index-llms-openai>=0.2.0
-tiktoken>=0.7.0

+gradio>=4.44.0
+pinecone-client>=5.0.1
+openai>=1.51.0
+llama-index>=0.11.0
+llama-index-vector-stores-pinecone>=0.3.0
+llama-index-embeddings-openai>=0.3.0
+llama-index-llms-openai>=0.2.0
+tiktoken>=0.7.0