Final_Assignment_Template

Sleeping

App Files Files Community

dgsilvia commited on Jun 29, 2025

Commit

a64f470

verified ·

1 Parent(s): 1c0ea74

Upload 4 files

Browse files

Files changed (4) hide show

build_knowledge.py +62 -0
metadata.jsonl +0 -0
requirements.txt +10 -1
system_prompt.txt +5 -0

build_knowledge.py ADDED Viewed

	@@ -0,0 +1,62 @@

+# Load .jsonl
+import json
+from langchain_chroma import Chroma
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain.tools.retriever import create_retriever_tool
+import chromadb
+chromadb.config.Settings.telemetry_enabled = False
+if __name__=='__main__':
+    with open('metadata.jsonl', 'r') as jsonl_file:
+        json_list = list(jsonl_file)
+    json_QA = []
+    for json_str in json_list:
+        json_data = json.loads(json_str)
+        json_QA.append(json_data)
+    # Usa gli stessi embeddings
+    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
+    print(1)
+    # Inizializza Chroma
+    from langchain.schema import Document
+    from langchain_community.vectorstores import Chroma
+    # Prepara la lista di documenti
+    docs = []
+    print("orig:",len(json_QA))
+    for sample in json_QA:
+        print(len(docs))
+        content = f"Question : {sample['Question']}\n\nFinal answer : {sample['Final answer']}"
+        metadata = {"source": sample['task_id']}
+        doc = Document(page_content=content, metadata=metadata)
+        docs.append(doc)
+    # Inizializza il vector store Chroma
+    vector_store = Chroma.from_documents(
+        documents=docs,
+        embedding=embeddings,
+        persist_directory="./chroma_db"
+    )
+'''
+   # Ricrea lo stesso oggetto embeddings usato nella creazione
+embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
+# Carica il vector store salvato precedentemente
+vector_store = Chroma(
+    embedding_function=embeddings,
+    persist_directory="./chroma_db"  # stesso path usato durante il salvataggio
+)
+# Ottieni il retriever
+retriever = vector_store.as_retriever()
+query = "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?"
+results = retriever.invoke(query)
+print(results[0].page_content)
+'''

metadata.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt CHANGED Viewed

@@ -15,4 +15,13 @@ arxiv
 pymupdf
 wikipedia
 pgvector
-python-dotenv

 pymupdf
 wikipedia
 pgvector
+python-dotenv
+langgraph
+langchain
+langchain-core
+langchain-community
+duckduckgo-search
+sentence-transformers
+chromadb
+arxiv
+wikipedia

system_prompt.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+You are a helpful assistant tasked with answering questions using a set of tools.
+Now, I will ask you a question. Report your thoughts, and finish your answer with the following template:
+FINAL ANSWER: [YOUR FINAL ANSWER].
+YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string. If I provide you a similar question and answer for reference, use this information before using any other tools.
+Your answer should only start with "FINAL ANSWER: ", then follows with the answer.