fc28 commited on
Commit
3990efd
·
verified ·
1 Parent(s): 90cfa35

Upload ChatCM-RAG usage codes.py

Browse files

This Python script demonstrates a minimal setup for using the ChatCM-RAG (Retrieval-Augmented Generation) system to search biomedical literature using natural language queries. It combines semantic search via a sentence-transformer model with a pre-built FAISS index and structured metadata to return the most relevant research abstracts from a curated dataset.

The code first downloads the full ChatMed-RAG package from Hugging Face using snapshot_download(repo_id="fc28/ChatMed-RAG"). It then initializes three components: a sentence encoder (all-mpnet-base-v2) to embed queries, a FAISS index (index.faiss) to search for semantically similar documents, and a metadata file (medllm_metadata.csv) that contains information like titles, abstracts, PMIDs, and publication years.

The search() function accepts a query and returns the top-k most relevant results by performing vector similarity search. Each result includes the title, year, PMID, and a truncated abstract snippet.

Files changed (1) hide show
  1. ChatCM-RAG usage codes.py +33 -0
ChatCM-RAG usage codes.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Simplest usage codes
2
+ from sentence_transformers import SentenceTransformer
3
+ from huggingface_hub import snapshot_download
4
+ import pandas as pd
5
+ import faiss
6
+ import pickle
7
+
8
+ # 1. Download model
9
+ print("Downloading ChatMed model...")
10
+ model_path = snapshot_download(repo_id="fc28/ChatMed-RAG")
11
+
12
+ # 2. Load components
13
+ encoder = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
14
+ index = faiss.read_index(f"{model_path}/faiss_index/index.faiss")
15
+ metadata = pd.read_csv(f"{model_path}/metadata/medllm_metadata.csv")
16
+
17
+ # 3. Search function
18
+ def search(query, k=5):
19
+ query_vec = encoder.encode([query])
20
+ distances, indices = index.search(query_vec, k)
21
+
22
+ results = []
23
+ for idx in indices[0]:
24
+ if 0 <= idx < len(metadata):
25
+ results.append(metadata.iloc[idx])
26
+ return results
27
+
28
+ # 4. Example usage
29
+ results = search("ChatGPT medical education applications")
30
+ for i, result in enumerate(results):
31
+ print(f"\n{i + 1}. {result['title']}")
32
+ print(f" PMID: {result['pmid']}, Year: {result['year']}")
33
+ print(f" Abstract: {result['abstract'][:200]}...")