Upload ChatCM-RAG usage codes.py

This Python script demonstrates a minimal setup for using the ChatCM-RAG (Retrieval-Augmented Generation) system to search biomedical literature using natural language queries. It combines semantic search via a sentence-transformer model with a pre-built FAISS index and structured metadata to return the most relevant research abstracts from a curated dataset.

The code first downloads the full ChatMed-RAG package from Hugging Face using snapshot_download(repo_id="fc28/ChatMed-RAG"). It then initializes three components: a sentence encoder (all-mpnet-base-v2) to embed queries, a FAISS index (index.faiss) to search for semantically similar documents, and a metadata file (medllm_metadata.csv) that contains information like titles, abstracts, PMIDs, and publication years.

The search() function accepts a query and returns the top-k most relevant results by performing vector similarity search. Each result includes the title, year, PMID, and a truncated abstract snippet.

Files changed (1) hide show

ChatCM-RAG usage codes.py +33 -0

ChatCM-RAG usage codes.py ADDED Viewed

	@@ -0,0 +1,33 @@

+# Simplest usage codes
+from sentence_transformers import SentenceTransformer
+from huggingface_hub import snapshot_download
+import pandas as pd
+import faiss
+import pickle
+# 1. Download model
+print("Downloading ChatMed model...")
+model_path = snapshot_download(repo_id="fc28/ChatMed-RAG")
+# 2. Load components
+encoder = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
+index = faiss.read_index(f"{model_path}/faiss_index/index.faiss")
+metadata = pd.read_csv(f"{model_path}/metadata/medllm_metadata.csv")
+# 3. Search function
+def search(query, k=5):
+    query_vec = encoder.encode([query])
+    distances, indices = index.search(query_vec, k)
+    results = []
+    for idx in indices[0]:
+        if 0 <= idx < len(metadata):
+            results.append(metadata.iloc[idx])
+    return results
+# 4. Example usage
+results = search("ChatGPT medical education applications")
+for i, result in enumerate(results):
+    print(f"\n{i + 1}. {result['title']}")
+    print(f"   PMID: {result['pmid']}, Year: {result['year']}")
+    print(f"   Abstract: {result['abstract'][:200]}...")