diff --git "a/Notebooks/demo_MarCognity_ai.ipynb" "b/Notebooks/demo_MarCognity_ai.ipynb" new file mode 100644--- /dev/null +++ "b/Notebooks/demo_MarCognity_ai.ipynb" @@ -0,0 +1,5622 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "d59UGq1jOKru" + }, + "source": [ + "#Welcome to the MarCognity-AI Demo\n", + "\n", + "**MarCognity-AI** is an open-source project born from curiosity, research, and experimentation. \n", + "Its goal? To explore how a system based on LLMs can not only generate scientific content, \n", + "but also critically reflect on what it produces.\n", + "\n", + "##What You'll See in This Notebook\n", + "\n", + "- Processing of complex academic requests \n", + "- Retrieval of sources from open-access databases \n", + "- Conceptual and graphical visualization \n", + "- Semantic and metacognitive evaluation \n", + "- Self-improvement of generated responses \n", + "\n", + "**Metacognition, memory, ethics, and visualization:** \n", + "MarCognity-AI is a step toward more self-aware intelligence.\n", + "\n", + "##What to Expect\n", + "\n", + "Watch how **Marcognity** processes, evaluates, and visualizes the response — \n", + "and reflects on what it has produced.\n", + "\n", + "This demo is designed to inspire you to **explore, build, and create**.\n", + "\n", + "Enjoy the journey.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k_PxwwPZFQCE" + }, + "source": [ + "© 2025 Elena Marziali — This code is released under the Apache 2.0 license.\n", + "\n", + "For details, see the `LICENSE` file in the repository.\n", + "\n", + "This code is protected by copyright and requires proper attribution. \n", + "Removal of this copyright notice is strictly prohibited.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0i7bBn5-nz-P", + "outputId": "032cc42b-f769-49ad-e8d6-08f01e7a84cf" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: langchain in /usr/local/lib/python3.12/dist-packages (0.3.27)\n", + "Collecting langchain-groq\n", + " Downloading langchain_groq-0.3.8-py3-none-any.whl.metadata (2.6 kB)\n", + "Collecting langchain-community\n", + " Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)\n", + "Collecting groq\n", + " Downloading groq-0.31.1-py3-none-any.whl.metadata (16 kB)\n", + "Requirement already satisfied: python-dotenv in /usr/local/lib/python3.12/dist-packages (1.1.1)\n", + "Requirement already satisfied: openai in /usr/local/lib/python3.12/dist-packages (1.106.1)\n", + "Requirement already satisfied: langchain-core<1.0.0,>=0.3.72 in /usr/local/lib/python3.12/dist-packages (from langchain) (0.3.75)\n", + "Requirement already satisfied: langchain-text-splitters<1.0.0,>=0.3.9 in /usr/local/lib/python3.12/dist-packages (from langchain) (0.3.11)\n", + "Requirement already satisfied: langsmith>=0.1.17 in /usr/local/lib/python3.12/dist-packages (from langchain) (0.4.24)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.12/dist-packages (from langchain) (2.11.7)\n", + "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.12/dist-packages (from langchain) (2.0.43)\n", + "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.12/dist-packages (from langchain) (2.32.4)\n", + "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.12/dist-packages (from langchain) (6.0.2)\n", + "Collecting requests<3,>=2 (from langchain)\n", + " Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)\n", + "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (3.12.15)\n", + "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (8.5.0)\n", + "Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)\n", + " Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)\n", + "Requirement already satisfied: pydantic-settings<3.0.0,>=2.10.1 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (2.10.1)\n", + "Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (0.4.1)\n", + "Requirement already satisfied: numpy>=1.26.2 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (2.0.2)\n", + "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from groq) (4.10.0)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from groq) (1.9.0)\n", + "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from groq) (0.28.1)\n", + "Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from groq) (1.3.1)\n", + "Requirement already satisfied: typing-extensions<5,>=4.10 in /usr/local/lib/python3.12/dist-packages (from groq) (4.15.0)\n", + "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from openai) (0.10.0)\n", + "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.12/dist-packages (from openai) (4.67.1)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)\n", + "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)\n", + "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.3.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.7.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.6.4)\n", + "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.3.2)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.20.1)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5,>=3.5.0->groq) (3.10)\n", + "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)\n", + " Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)\n", + "Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)\n", + " Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->groq) (2025.8.3)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->groq) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->groq) (0.16.0)\n", + "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.12/dist-packages (from langchain-core<1.0.0,>=0.3.72->langchain) (1.33)\n", + "Requirement already satisfied: packaging>=23.2 in /usr/local/lib/python3.12/dist-packages (from langchain-core<1.0.0,>=0.3.72->langchain) (25.0)\n", + "Requirement already satisfied: orjson>=3.9.14 in /usr/local/lib/python3.12/dist-packages (from langsmith>=0.1.17->langchain) (3.11.3)\n", + "Requirement already satisfied: requests-toolbelt>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from langsmith>=0.1.17->langchain) (1.0.0)\n", + "Requirement already satisfied: zstandard>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from langsmith>=0.1.17->langchain) (0.24.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.33.2)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.4.1)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2->langchain) (3.4.3)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2->langchain) (2.5.0)\n", + "Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.12/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.2.4)\n", + "Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.12/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.72->langchain) (3.0.0)\n", + "Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.6.7->langchain-community)\n", + " Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)\n", + "Downloading langchain_groq-0.3.8-py3-none-any.whl (16 kB)\n", + "Downloading langchain_community-0.3.29-py3-none-any.whl (2.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.5/2.5 MB\u001b[0m \u001b[31m32.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading groq-0.31.1-py3-none-any.whl (134 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.9/134.9 kB\u001b[0m \u001b[31m8.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)\n", + "Downloading requests-2.32.5-py3-none-any.whl (64 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m64.7/64.7 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading marshmallow-3.26.1-py3-none-any.whl (50 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.9/50.9 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n", + "Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)\n", + "Installing collected packages: requests, mypy-extensions, marshmallow, typing-inspect, groq, dataclasses-json, langchain-groq, langchain-community\n", + " Attempting uninstall: requests\n", + " Found existing installation: requests 2.32.4\n", + " Uninstalling requests-2.32.4:\n", + " Successfully uninstalled requests-2.32.4\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed dataclasses-json-0.6.7 groq-0.31.1 langchain-community-0.3.29 langchain-groq-0.3.8 marshmallow-3.26.1 mypy-extensions-1.1.0 requests-2.32.5 typing-inspect-0.9.0\n", + "Requirement already satisfied: transformers in /usr/local/lib/python3.12/dist-packages (4.56.1)\n", + "Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)\n", + "Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (1.10.1)\n", + "Requirement already satisfied: peft in /usr/local/lib/python3.12/dist-packages (0.17.1)\n", + "Collecting bitsandbytes\n", + " Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)\n", + "Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (2.8.0+cu126)\n", + "Collecting sacremoses\n", + " Downloading sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers) (3.19.1)\n", + "Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.34.4)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2.0.2)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (25.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers) (6.0.2)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2024.11.6)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from transformers) (2.32.5)\n", + "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.22.0)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.6.2)\n", + "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/dist-packages (from transformers) (4.67.1)\n", + "Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (18.1.0)\n", + "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)\n", + "Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.5.0)\n", + "Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)\n", + "Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)\n", + "Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate) (5.9.5)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch) (4.15.0)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch) (75.2.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.13.3)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch) (3.5)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.80)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch) (11.3.0.4)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch) (10.3.7.77)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch) (11.7.1.2)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch) (12.5.4.2)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch) (2.27.3)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.85)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch) (1.11.1.6)\n", + "Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch) (3.4.0)\n", + "Requirement already satisfied: click in /usr/local/lib/python3.12/dist-packages (from sacremoses) (8.2.1)\n", + "Requirement already satisfied: joblib in /usr/local/lib/python3.12/dist-packages (from sacremoses) (1.5.2)\n", + "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.12.15)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (1.1.9)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.4.3)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2025.8.3)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch) (3.0.2)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)\n", + "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.4.0)\n", + "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.7.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.6.4)\n", + "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.3.2)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.20.1)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n", + "Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.3/61.3 MB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading sacremoses-0.1.1-py3-none-any.whl (897 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m897.5/897.5 kB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: sacremoses, bitsandbytes\n", + "Successfully installed bitsandbytes-0.47.0 sacremoses-0.1.1\n", + "Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.12/dist-packages (5.1.0)\n", + "Collecting faiss-cpu\n", + " Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)\n", + "Collecting chromadb\n", + " Downloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)\n", + "Collecting pickle-mixin\n", + " Downloading pickle-mixin-1.0.2.tar.gz (5.1 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.56.1)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.67.1)\n", + "Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (2.8.0+cu126)\n", + "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.6.1)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.16.1)\n", + "Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (0.34.4)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (11.3.0)\n", + "Requirement already satisfied: typing_extensions>=4.5.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.15.0)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from faiss-cpu) (25.0)\n", + "Requirement already satisfied: build>=1.0.3 in /usr/local/lib/python3.12/dist-packages (from chromadb) (1.3.0)\n", + "Requirement already satisfied: pydantic>=1.9 in /usr/local/lib/python3.12/dist-packages (from chromadb) (2.11.7)\n", + "Collecting pybase64>=1.4.1 (from chromadb)\n", + " Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)\n", + "Requirement already satisfied: uvicorn>=0.18.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.35.0)\n", + "Collecting posthog<6.0.0,>=2.4.0 (from chromadb)\n", + " Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)\n", + "Collecting onnxruntime>=1.14.1 (from chromadb)\n", + " Downloading onnxruntime-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.9 kB)\n", + "Requirement already satisfied: opentelemetry-api>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (1.36.0)\n", + "Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)\n", + " Downloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl.metadata (2.4 kB)\n", + "Requirement already satisfied: opentelemetry-sdk>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (1.36.0)\n", + "Requirement already satisfied: tokenizers>=0.13.2 in /usr/local/lib/python3.12/dist-packages (from chromadb) (0.22.0)\n", + "Collecting pypika>=0.48.9 (from chromadb)\n", + " Downloading PyPika-0.48.9.tar.gz (67 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.3/67.3 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", + " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", + " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: overrides>=7.3.1 in /usr/local/lib/python3.12/dist-packages (from chromadb) (7.7.0)\n", + "Requirement already satisfied: importlib-resources in /usr/local/lib/python3.12/dist-packages (from chromadb) (6.5.2)\n", + "Requirement already satisfied: grpcio>=1.58.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (1.74.0)\n", + "Collecting bcrypt>=4.0.1 (from chromadb)\n", + " Downloading bcrypt-4.3.0-cp39-abi3-manylinux_2_34_x86_64.whl.metadata (10 kB)\n", + "Requirement already satisfied: typer>=0.9.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (0.17.3)\n", + "Collecting kubernetes>=28.1.0 (from chromadb)\n", + " Downloading kubernetes-33.1.0-py2.py3-none-any.whl.metadata (1.7 kB)\n", + "Requirement already satisfied: tenacity>=8.2.3 in /usr/local/lib/python3.12/dist-packages (from chromadb) (8.5.0)\n", + "Requirement already satisfied: pyyaml>=6.0.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (6.0.2)\n", + "Collecting mmh3>=4.0.1 (from chromadb)\n", + " Downloading mmh3-5.2.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (14 kB)\n", + "Requirement already satisfied: orjson>=3.9.12 in /usr/local/lib/python3.12/dist-packages (from chromadb) (3.11.3)\n", + "Requirement already satisfied: httpx>=0.27.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (0.28.1)\n", + "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (13.9.4)\n", + "Requirement already satisfied: jsonschema>=4.19.0 in /usr/local/lib/python3.12/dist-packages (from chromadb) (4.25.1)\n", + "Requirement already satisfied: pyproject_hooks in /usr/local/lib/python3.12/dist-packages (from build>=1.0.3->chromadb) (1.2.0)\n", + "Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb) (4.10.0)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb) (2025.8.3)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb) (1.0.9)\n", + "Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb) (3.10)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx>=0.27.0->chromadb) (0.16.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (3.19.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2025.3.0)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2.32.5)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (1.1.9)\n", + "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.19.0->chromadb) (25.3.0)\n", + "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.19.0->chromadb) (2025.4.1)\n", + "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.19.0->chromadb) (0.36.2)\n", + "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.19.0->chromadb) (0.27.1)\n", + "Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (1.17.0)\n", + "Requirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (2.9.0.post0)\n", + "Requirement already satisfied: google-auth>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (2.38.0)\n", + "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (1.8.0)\n", + "Requirement already satisfied: requests-oauthlib in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (2.0.0)\n", + "Requirement already satisfied: oauthlib>=3.2.2 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (3.3.1)\n", + "Requirement already satisfied: urllib3>=1.24.2 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb) (2.5.0)\n", + "Collecting durationpy>=0.7 (from kubernetes>=28.1.0->chromadb)\n", + " Downloading durationpy-0.10-py3-none-any.whl.metadata (340 bytes)\n", + "Collecting coloredlogs (from onnxruntime>=1.14.1->chromadb)\n", + " Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\n", + "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb) (25.2.10)\n", + "Requirement already satisfied: protobuf in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb) (5.29.5)\n", + "Requirement already satisfied: sympy in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb) (1.13.3)\n", + "Requirement already satisfied: importlib-metadata<8.8.0,>=6.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-api>=1.2.0->chromadb) (8.7.0)\n", + "Requirement already satisfied: googleapis-common-protos~=1.57 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.70.0)\n", + "Collecting opentelemetry-exporter-otlp-proto-common==1.36.0 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb)\n", + " Downloading opentelemetry_exporter_otlp_proto_common-1.36.0-py3-none-any.whl.metadata (1.8 kB)\n", + "Collecting opentelemetry-proto==1.36.0 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb)\n", + " Downloading opentelemetry_proto-1.36.0-py3-none-any.whl.metadata (2.3 kB)\n", + "Requirement already satisfied: opentelemetry-semantic-conventions==0.57b0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-sdk>=1.2.0->chromadb) (0.57b0)\n", + "Collecting backoff>=1.10.0 (from posthog<6.0.0,>=2.4.0->chromadb)\n", + " Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)\n", + "Requirement already satisfied: distro>=1.5.0 in /usr/local/lib/python3.12/dist-packages (from posthog<6.0.0,>=2.4.0->chromadb) (1.9.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=1.9->chromadb) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.12/dist-packages (from pydantic>=1.9->chromadb) (2.33.2)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=1.9->chromadb) (0.4.1)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb) (4.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb) (2.19.2)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (75.2.0)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.5)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.80)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (11.3.0.4)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (10.3.7.77)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (11.7.1.2)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.5.4.2)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (2.27.3)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.85)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (1.11.1.6)\n", + "Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.4.0)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (2024.11.6)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.6.2)\n", + "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb) (8.2.1)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb) (1.5.4)\n", + "Collecting httptools>=0.6.3 (from uvicorn[standard]>=0.18.3->chromadb)\n", + " Downloading httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)\n", + "Requirement already satisfied: python-dotenv>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb) (1.1.1)\n", + "Collecting uvloop>=0.15.1 (from uvicorn[standard]>=0.18.3->chromadb)\n", + " Downloading uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)\n", + "Collecting watchfiles>=0.13 (from uvicorn[standard]>=0.18.3->chromadb)\n", + " Downloading watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)\n", + "Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb) (15.0.1)\n", + "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (1.5.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (3.6.0)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (5.5.2)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.4.2)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (4.9.1)\n", + "Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.12/dist-packages (from importlib-metadata<8.8.0,>=6.0->opentelemetry-api>=1.2.0->chromadb) (3.23.0)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb) (0.1.2)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (3.4.3)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n", + "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.12/dist-packages (from anyio->httpx>=0.27.0->chromadb) (1.3.1)\n", + "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime>=1.14.1->chromadb)\n", + " Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers) (3.0.2)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.6.1)\n", + "Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m31.4/31.4 MB\u001b[0m \u001b[31m34.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m19.8/19.8 MB\u001b[0m \u001b[31m95.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading bcrypt-4.3.0-cp39-abi3-manylinux_2_34_x86_64.whl (284 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m284.2/284.2 kB\u001b[0m \u001b[31m15.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading kubernetes-33.1.0-py2.py3-none-any.whl (1.9 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.9/1.9 MB\u001b[0m \u001b[31m72.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading mmh3-5.2.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (103 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m103.3/103.3 kB\u001b[0m \u001b[31m8.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading onnxruntime-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m16.5/16.5 MB\u001b[0m \u001b[31m96.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl (18 kB)\n", + "Downloading opentelemetry_exporter_otlp_proto_common-1.36.0-py3-none-any.whl (18 kB)\n", + "Downloading opentelemetry_proto-1.36.0-py3-none-any.whl (72 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.5/72.5 kB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading posthog-5.4.0-py3-none-any.whl (105 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m105.4/105.4 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl (71 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.6/71.6 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading backoff-2.2.1-py3-none-any.whl (15 kB)\n", + "Downloading durationpy-0.10-py3-none-any.whl (3.9 kB)\n", + "Downloading httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (510 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m510.8/510.8 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.7/4.7 MB\u001b[0m \u001b[31m90.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (452 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m452.2/452.2 kB\u001b[0m \u001b[31m28.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hBuilding wheels for collected packages: pickle-mixin, pypika\n", + " Building wheel for pickle-mixin (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for pickle-mixin: filename=pickle_mixin-1.0.2-py3-none-any.whl size=5988 sha256=9d960723a78eba898f7b47ed139ab9f9393e08dd9998496beccea82e1644c414\n", + " Stored in directory: /root/.cache/pip/wheels/69/e2/5c/da8f96a08c63469bc8b10e206cd4c78e8886d8acb8699f84c2\n", + " Building wheel for pypika (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for pypika: filename=pypika-0.48.9-py2.py3-none-any.whl size=53803 sha256=612e5715885cb7d90c716c652e7eb6b47f9d77de2469e62ed2717a84fefa0217\n", + " Stored in directory: /root/.cache/pip/wheels/d5/3d/69/8d68d249cd3de2584f226e27fd431d6344f7d70fd856ebd01b\n", + "Successfully built pickle-mixin pypika\n", + "Installing collected packages: pypika, pickle-mixin, durationpy, uvloop, pybase64, opentelemetry-proto, mmh3, humanfriendly, httptools, faiss-cpu, bcrypt, backoff, watchfiles, posthog, opentelemetry-exporter-otlp-proto-common, coloredlogs, onnxruntime, kubernetes, opentelemetry-exporter-otlp-proto-grpc, chromadb\n", + "Successfully installed backoff-2.2.1 bcrypt-4.3.0 chromadb-1.0.20 coloredlogs-15.0.1 durationpy-0.10 faiss-cpu-1.12.0 httptools-0.6.4 humanfriendly-10.0 kubernetes-33.1.0 mmh3-5.2.0 onnxruntime-1.22.1 opentelemetry-exporter-otlp-proto-common-1.36.0 opentelemetry-exporter-otlp-proto-grpc-1.36.0 opentelemetry-proto-1.36.0 pickle-mixin-1.0.2 posthog-5.4.0 pybase64-1.4.2 pypika-0.48.9 uvloop-0.21.0 watchfiles-1.1.0\n", + "Collecting pdf2image\n", + " Downloading pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)\n", + "Collecting pymupdf\n", + " Downloading pymupdf-1.26.4-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)\n", + "Collecting pdfminer.six\n", + " Downloading pdfminer_six-20250506-py3-none-any.whl.metadata (4.2 kB)\n", + "Collecting PyPDF2\n", + " Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)\n", + "Collecting pdfplumber\n", + " Downloading pdfplumber-0.11.7-py3-none-any.whl.metadata (42 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.8/42.8 kB\u001b[0m \u001b[31m1.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting python-docx\n", + " Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)\n", + "Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from pdf2image) (11.3.0)\n", + "Requirement already satisfied: charset-normalizer>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from pdfminer.six) (3.4.3)\n", + "Requirement already satisfied: cryptography>=36.0.0 in /usr/local/lib/python3.12/dist-packages (from pdfminer.six) (43.0.3)\n", + "Collecting pypdfium2>=4.18.0 (from pdfplumber)\n", + " Downloading pypdfium2-4.30.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (48 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.5/48.5 kB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: lxml>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from python-docx) (5.4.0)\n", + "Requirement already satisfied: typing_extensions>=4.9.0 in /usr/local/lib/python3.12/dist-packages (from python-docx) (4.15.0)\n", + "Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.12/dist-packages (from cryptography>=36.0.0->pdfminer.six) (1.17.1)\n", + "Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six) (2.22)\n", + "Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)\n", + "Downloading pymupdf-1.26.4-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.1/24.1 MB\u001b[0m \u001b[31m76.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pdfminer_six-20250506-py3-none-any.whl (5.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.6/5.6 MB\u001b[0m \u001b[31m101.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pypdf2-3.0.1-py3-none-any.whl (232 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m232.6/232.6 kB\u001b[0m \u001b[31m17.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pdfplumber-0.11.7-py3-none-any.whl (60 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.0/60.0 kB\u001b[0m \u001b[31m4.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading python_docx-1.2.0-py3-none-any.whl (252 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m253.0/253.0 kB\u001b[0m \u001b[31m17.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pypdfium2-4.30.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.8/2.8 MB\u001b[0m \u001b[31m86.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: python-docx, pypdfium2, PyPDF2, pymupdf, pdf2image, pdfminer.six, pdfplumber\n", + "Successfully installed PyPDF2-3.0.1 pdf2image-1.17.0 pdfminer.six-20250506 pdfplumber-0.11.7 pymupdf-1.26.4 pypdfium2-4.30.0 python-docx-1.2.0\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.12/dist-packages (3.10.0)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (3.5)\n", + "Requirement already satisfied: plotly in /usr/local/lib/python3.12/dist-packages (5.24.1)\n", + "Collecting kaleido==0.2.1\n", + " Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl.metadata (15 kB)\n", + "Collecting pyvis\n", + " Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)\n", + "Requirement already satisfied: graphviz in /usr/local/lib/python3.12/dist-packages (0.21)\n", + "Collecting trimesh\n", + " Downloading trimesh-4.8.1-py3-none-any.whl.metadata (18 kB)\n", + "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.3.3)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (0.12.1)\n", + "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (4.59.2)\n", + "Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.4.9)\n", + "Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (2.0.2)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (25.0)\n", + "Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (11.3.0)\n", + "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (3.2.3)\n", + "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (2.9.0.post0)\n", + "Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.12/dist-packages (from plotly) (8.5.0)\n", + "Requirement already satisfied: ipython>=5.3.0 in /usr/local/lib/python3.12/dist-packages (from pyvis) (7.34.0)\n", + "Requirement already satisfied: jinja2>=2.9.6 in /usr/local/lib/python3.12/dist-packages (from pyvis) (3.1.6)\n", + "Requirement already satisfied: jsonpickle>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from pyvis) (4.1.1)\n", + "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (75.2.0)\n", + "Collecting jedi>=0.16 (from ipython>=5.3.0->pyvis)\n", + " Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)\n", + "Requirement already satisfied: decorator in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (4.4.2)\n", + "Requirement already satisfied: pickleshare in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (0.7.5)\n", + "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (5.7.1)\n", + "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (3.0.52)\n", + "Requirement already satisfied: pygments in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (2.19.2)\n", + "Requirement already satisfied: backcall in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (0.2.0)\n", + "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (0.1.7)\n", + "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.12/dist-packages (from ipython>=5.3.0->pyvis) (4.9.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2>=2.9.6->pyvis) (3.0.2)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)\n", + "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.12/dist-packages (from jedi>=0.16->ipython>=5.3.0->pyvis) (0.8.5)\n", + "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.12/dist-packages (from pexpect>4.3->ipython>=5.3.0->pyvis) (0.7.0)\n", + "Requirement already satisfied: wcwidth in /usr/local/lib/python3.12/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.3.0->pyvis) (0.2.13)\n", + "Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.9/79.9 MB\u001b[0m \u001b[31m10.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pyvis-0.3.2-py3-none-any.whl (756 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m756.0/756.0 kB\u001b[0m \u001b[31m42.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading trimesh-4.8.1-py3-none-any.whl (728 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m728.5/728.5 kB\u001b[0m \u001b[31m44.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m63.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: kaleido, trimesh, jedi, pyvis\n", + "Successfully installed jedi-0.19.2 kaleido-0.2.1 pyvis-0.3.2 trimesh-4.8.1\n", + "Collecting langdetect\n", + " Downloading langdetect-1.0.9.tar.gz (981 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m981.5/981.5 kB\u001b[0m \u001b[31m14.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting langid\n", + " Downloading langid-1.1.6.tar.gz (1.9 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.9/1.9 MB\u001b[0m \u001b[31m58.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: spacy in /usr/local/lib/python3.12/dist-packages (3.8.7)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.12/dist-packages (from langdetect) (1.17.0)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from langid) (2.0.2)\n", + "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.12/dist-packages (from spacy) (3.0.12)\n", + "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (1.0.5)\n", + "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (1.0.13)\n", + "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.12/dist-packages (from spacy) (2.0.11)\n", + "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.12/dist-packages (from spacy) (3.0.10)\n", + "Requirement already satisfied: thinc<8.4.0,>=8.3.4 in /usr/local/lib/python3.12/dist-packages (from spacy) (8.3.6)\n", + "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.12/dist-packages (from spacy) (1.1.3)\n", + "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.12/dist-packages (from spacy) (2.5.1)\n", + "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.12/dist-packages (from spacy) (2.0.10)\n", + "Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (0.4.1)\n", + "Requirement already satisfied: typer<1.0.0,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (0.17.3)\n", + "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (4.67.1)\n", + "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (2.32.5)\n", + "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.12/dist-packages (from spacy) (2.11.7)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from spacy) (3.1.6)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from spacy) (75.2.0)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (25.0)\n", + "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.12/dist-packages (from spacy) (3.5.0)\n", + "Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.12/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.12/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.33.2)\n", + "Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.12/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.15.0)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.4.1)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4.3)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2025.8.3)\n", + "Requirement already satisfied: blis<1.4.0,>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.3.0)\n", + "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.12/dist-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)\n", + "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (8.2.1)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4)\n", + "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (13.9.4)\n", + "Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /usr/local/lib/python3.12/dist-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.22.0)\n", + "Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /usr/local/lib/python3.12/dist-packages (from weasel<0.5.0,>=0.1.0->spacy) (7.3.0.post1)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->spacy) (3.0.2)\n", + "Requirement already satisfied: marisa-trie>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.3.1)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (4.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.19.2)\n", + "Requirement already satisfied: wrapt in /usr/local/lib/python3.12/dist-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy) (1.17.3)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.2)\n", + "Building wheels for collected packages: langdetect, langid\n", + " Building wheel for langdetect (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993223 sha256=356c8880640a3ed6b372e8d10cb8b8c8a76dbaa375d7e41224963771c560c5b8\n", + " Stored in directory: /root/.cache/pip/wheels/c1/67/88/e844b5b022812e15a52e4eaa38a1e709e99f06f6639d7e3ba7\n", + " Building wheel for langid (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for langid: filename=langid-1.1.6-py3-none-any.whl size=1941171 sha256=2e4205c197a94f3f833c9d11210f102a3f3b9a4cede31c516ef0177fafb94100\n", + " Stored in directory: /root/.cache/pip/wheels/3c/bc/9d/266e27289b9019680d65d9b608c37bff1eff565b001c977ec5\n", + "Successfully built langdetect langid\n", + "Installing collected packages: langid, langdetect\n", + "Successfully installed langdetect-1.0.9 langid-1.1.6\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (2.32.5)\n", + "Collecting requests-cache\n", + " Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)\n", + "Collecting redis\n", + " Downloading redis-6.4.0-py3-none-any.whl.metadata (10 kB)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests) (3.4.3)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests) (2025.8.3)\n", + "Requirement already satisfied: attrs>=21.2 in /usr/local/lib/python3.12/dist-packages (from requests-cache) (25.3.0)\n", + "Collecting cattrs>=22.2 (from requests-cache)\n", + " Downloading cattrs-25.2.0-py3-none-any.whl.metadata (8.4 kB)\n", + "Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests-cache) (4.4.0)\n", + "Collecting url-normalize>=1.4 (from requests-cache)\n", + " Downloading url_normalize-2.2.1-py3-none-any.whl.metadata (5.6 kB)\n", + "Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.12/dist-packages (from cattrs>=22.2->requests-cache) (4.15.0)\n", + "Downloading requests_cache-1.2.1-py3-none-any.whl (61 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.4/61.4 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading redis-6.4.0-py3-none-any.whl (279 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━���━━━━━━━━━━━━━━━\u001b[0m \u001b[32m279.8/279.8 kB\u001b[0m \u001b[31m8.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading cattrs-25.2.0-py3-none-any.whl (70 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m70.0/70.0 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading url_normalize-2.2.1-py3-none-any.whl (14 kB)\n", + "Installing collected packages: url-normalize, redis, cattrs, requests-cache\n", + "Successfully installed cattrs-25.2.0 redis-6.4.0 requests-cache-1.2.1 url-normalize-2.2.1\n", + "Collecting it-core-news-sm==3.8.0\n", + " Downloading https://github.com/explosion/spacy-models/releases/download/it_core_news_sm-3.8.0/it_core_news_sm-3.8.0-py3-none-any.whl (13.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.0/13.0 MB\u001b[0m \u001b[31m96.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: it-core-news-sm\n", + "Successfully installed it-core-news-sm-3.8.0\n", + "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", + "You can now load the package via spacy.load('it_core_news_sm')\n", + "\u001b[38;5;3m⚠ Restart to reload dependencies\u001b[0m\n", + "If you are in a Jupyter or Colab notebook, you may need to restart Python in\n", + "order to load all the package's dependencies. You can do this by selecting the\n", + "'Restart kernel' or 'Restart runtime' option.\n", + "Requirement already satisfied: langdetect in /usr/local/lib/python3.12/dist-packages (1.0.9)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.12/dist-packages (from langdetect) (1.17.0)\n", + "Requirement already satisfied: pdfplumber in /usr/local/lib/python3.12/dist-packages (0.11.7)\n", + "Requirement already satisfied: pdfminer.six==20250506 in /usr/local/lib/python3.12/dist-packages (from pdfplumber) (20250506)\n", + "Requirement already satisfied: Pillow>=9.1 in /usr/local/lib/python3.12/dist-packages (from pdfplumber) (11.3.0)\n", + "Requirement already satisfied: pypdfium2>=4.18.0 in /usr/local/lib/python3.12/dist-packages (from pdfplumber) (4.30.0)\n", + "Requirement already satisfied: charset-normalizer>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from pdfminer.six==20250506->pdfplumber) (3.4.3)\n", + "Requirement already satisfied: cryptography>=36.0.0 in /usr/local/lib/python3.12/dist-packages (from pdfminer.six==20250506->pdfplumber) (43.0.3)\n", + "Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.12/dist-packages (from cryptography>=36.0.0->pdfminer.six==20250506->pdfplumber) (1.17.1)\n", + "Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six==20250506->pdfplumber) (2.22)\n" + ] + } + ], + "source": [ + "!pip install langchain langchain-groq langchain-community groq python-dotenv openai\n", + "!pip install transformers datasets accelerate peft bitsandbytes torch sacremoses\n", + "!pip install sentence-transformers faiss-cpu chromadb numpy pickle-mixin\n", + "!pip install pdf2image pymupdf pdfminer.six PyPDF2 pdfplumber python-docx\n", + "!pip install matplotlib networkx plotly kaleido==0.2.1 pyvis graphviz trimesh\n", + "!pip install langdetect langid spacy\n", + "!pip install requests requests-cache redis\n", + "!python -m spacy download it_core_news_sm\n", + "!pip install langdetect\n", + "!pip install pdfplumber\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mjkufWWSRASN" + }, + "source": [ + "The first step is to import libraries like the following to make MarCognity work." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "l2D0GD4Bg7Lf" + }, + "outputs": [], + "source": [ + "import os\n", + "import re\n", + "import math\n", + "import uuid\n", + "import datetime\n", + "import logging\n", + "import pickle\n", + "import numpy as np\n", + "import fitz # PyMuPDF\n", + "import docx\n", + "import matplotlib.pyplot as plt\n", + "import networkx as nx\n", + "import requests\n", + "import xml.etree.ElementTree as ET\n", + "import asyncio\n", + "import aiohttp\n", + "import torch\n", + "from dotenv import load_dotenv\n", + "from sentence_transformers import SentenceTransformer, CrossEncoder, models\n", + "from transformers import pipeline\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from langchain_groq import ChatGroq\n", + "from groq import ChatGroq\n", + "from langchain.prompts import PromptTemplate\n", + "import plotly.graph_objects as go\n", + "from IPython.display import display\n", + "from google.colab import files\n", + "import faiss\n", + "import pdfplumber\n", + "from langdetect import detect\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tH6PLd2OBxfn" + }, + "source": [ + "### Project Entry Point – Initial Setup\n", + "\n", + "This section initializes the logging system, handles errors, and loads the LLaMA4 model via ChatGroq. \n", + "It serves as the core of the configuration.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6TCIl2ZqhIQ3" + }, + "outputs": [], + "source": [ + "# Setting up logging to monitor system behavior\n", + "'''Sets the format and logging level to track events, errors, and important operations'''\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s - %(levelname)s - %(message)s\")\n", + "\n", + "# Decorator for error handling\n", + "# This function catches any exceptions raised by other functions\n", + "\n", + "def gestisci_errori(func):\n", + " def wrapper(*args, **kwargs):\n", + " try:\n", + " return func(*args, **kwargs)\n", + " except Exception as e:\n", + " logging.error(f\"Error in {func.__name__}: {e}\")\n", + " return None\n", + " return wrapper\n", + "\n", + "\n", + "\n", + "# Carica le variabili d'ambiente dal file .env\n", + "load_dotenv()\n", + "\n", + "# Recupera la chiave API in modo sicuro\n", + "api_key = os.getenv(\"GROQ_API_KEY\")\n", + "\n", + "# Inizializza il modello Groq\n", + "llm = ChatGroq(api_key=api_key)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZQ92uciLDKnd" + }, + "source": [ + "### Centralized Prompt\n", + "\n", + "In this section, we define the **PromptTemplate**, which represents the core instruction for the LLM. The prompt includes:\n", + "\n", + "- The problem to be analyzed \n", + "- The required explanation level \n", + "- The language of the response \n", + "- The scientific sources to be consulted \n", + "- The phases of analysis, visualization, and optimization\n", + "\n", + "The result is a detailed, reasoned, and visualized response — capable of critically reflecting on the content it generates.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-0VlHOByhhSo" + }, + "outputs": [], + "source": [ + "# === Prompt template for LLM ===\n", + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "prompt_template = PromptTemplate.from_template(\"\"\"\n", + "You are an intelligent and multidisciplinary academic tutor. Respond to the problem **{problem}**, and reply in **{target_language}**.\n", + "Explain the concept: **\"{topic}\"** with academic rigor and multidisciplinary analysis.\n", + "Do not merely describe sources: build an autonomous, critical, and original discussion.\n", + "\n", + "The user has selected: **{chart_choice}**\n", + "\n", + "Context: Required level: **{level}** Concept: **{concept}** Topic: **{topic}** Subject: **{subject}**\n", + "The response must be long and in-depth.\n", + "\n", + "Analyze the following question or text: **{problem}**\n", + "\n", + "**Relevant scientific articles**:\n", + "- arXiv: **{arxiv_search}**\n", + "- PubMed: **{pubmed_search}**\n", + "- OpenAlex: **{openalex_search}**\n", + "\n", + "**Phase 1: Problem Analysis** – Explain the main concepts related to the topic.\n", + "**Phase 2: Theoretical and/or Mathematical Development** – Use formulas, models, or theories to explain and solve.\n", + "- Provide a critical comparison between existing theories, including advantages, limitations, and scientific ambiguities.\n", + "**Phase 3: Visualization** – Integrate a visual representation consistent with the analyzed concept, transforming the graphic into a didactic interpretation tool.\n", + "- If the text contains numerical data or measurable variables, **generate a real chart** using the function `generate_universal_chart(text)`.\n", + "- If data are not explicitly present, **synthesize plausible values** or use a **visual fallback** consistent with the problem type.\n", + "- **Describe the chart in the context of the explanation**:\n", + " - Explain the meaning of the axes.\n", + " - Interpret the type of trend shown (e.g., exponential growth, Gaussian distribution).\n", + " - Illustrate how the chart contributes to understanding the phenomenon.\n", + "- Avoid technical placeholders like `generate_universal_chart(text)` or “[Insert chart]”.\n", + "- Include **an automatic caption** describing the scientific intent of the visualization.\n", + "- If the topic is theoretical, abstract, or relational, generate **conceptual diagrams** showing interconnections, hierarchies, logical flows, or dynamics.\n", + "- In physical, chemical, or dynamic domains, suggest **virtual simulations**, reproducible experiments, or interpretable animated models.\n", + "- The visualization must actively contribute to the discussion, offering the reader cognitive and interpretive support that reinforces the textual explanation.\n", + "\n", + "**Phase 4: Tone Optimization** – Adapt the content to the selected level with clarity.\n", + "**Phase 5: Summary** – Summarize key points, practical applications, and useful references.\n", + "**Phase 6: Future Implications** – Describe potential applications, methodological limitations, and emerging research directions.\n", + "\n", + "Respond by providing an explanation suited to the indicated level:\n", + "- **Basic**: Simplified explanation with intuitive examples.\n", + "- **Advanced**: In-depth discussion with technical and mathematical details.\n", + "- **Expert**: Academic analysis with rigorous scientific formulations.\n", + "- If you detect errors in the question, correct them before responding.\n", + "- Use **rigorous academic terminology**, avoiding generic responses.\n", + "- If the question is ambiguous, clarify it before responding.\n", + "- Always provide scientific references to validate claims.\n", + "- Provide an example of the topic **{topic}**.\n", + "- Include at least **5 scientific references**, preferably peer-reviewed, and **direct citations from articles** when possible.\n", + "*Ethical note*: This content involves sensitive concepts and should be interpreted in a scientific, educational, and non-normative context.\n", + "\n", + "Analyze the following paper and provide a detailed scientific review:\n", + "**{paper_text}**\n", + "\n", + "Evaluate the quality of the methodology and verify citation consistency.\n", + "If the concept is particularly complex, expand the discussion into multiple subsections and suggest future research questions.\n", + "Suggest improvements for the paper and indicate more recent sources.\n", + "Provide an **extended** response, divided into well-defined sections, with at least 1500 words. Use technical language, quantitative examples, and specific bibliographic references.\n", + "and translated directly into **{target_language}**.\n", + "\"\"\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YGja2kWiBtL1" + }, + "source": [ + "### Secondary Prompts for Metacognition and Agency\n", + "\n", + "These functions enable MarCognity-AI to reflect on its own outputs, generate hypotheses, make operational decisions, and plan scientific investigations. \n", + "Each function is guided by a dedicated prompt.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Y1X27mUxhnkP" + }, + "outputs": [], + "source": [ + "# === Metacognitive Functions ===\n", + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# These functions allow the system to reflect on its own responses,\n", + "# simulating metacognitive behavior. The goal is to improve the quality,\n", + "# consistency, and relevance of generated answers.\n", + "\n", + "# Explains the reasoning behind a generated response\n", + "def explain_reasoning(prompt, response, max_retries=3):\n", + " \"\"\"\n", + " Analyzes the generated response and explains the LLM's logical reasoning.\n", + " Includes retry in case of network error or unreachable endpoint.\n", + " \"\"\"\n", + " # Builds a metacognitive prompt to analyze the response\n", + " reasoning_prompt = f\"\"\"\n", + "You generated the following response:\n", + "\\\"{response.strip()}\\\"\n", + "\n", + "Analyze and describe:\n", + "- What concepts you used to formulate it.\n", + "- Which parts of the prompt you relied on.\n", + "- What is the logical structure of your reasoning.\n", + "- Any implicit assumptions you made.\n", + "- Whether the response aligns with the requested level.\n", + "\n", + "Original prompt:\n", + "\\\"{prompt.strip()}\\\"\n", + "\n", + "Reply clearly, technically, and metacognitively.\n", + "\"\"\"\n", + "\n", + " for attempt in range(max_retries):\n", + " try:\n", + " return llm.invoke(reasoning_prompt.strip())\n", + " except Exception as e:\n", + " wait = min(2 ** attempt + 1, 10)\n", + " logging.warning(f\"Attempt {attempt+1} failed: {e}. Retrying in {wait}s...\")\n", + " time.sleep(wait)\n", + "\n", + " logging.error(\"Persistent error in the metacognition module.\")\n", + " return \"Metacognition currently unavailable. Please try again shortly.\"\n", + "\n", + "\n", + "# Function to decide the operational action to perform based on input and goal\n", + "def decide_action(user_input, identified_goal):\n", + " prompt = f\"\"\"\n", + "You received the following request:\n", + "\\\"{user_input}\\\"\n", + "\n", + "Identified goal: \\\"{identified_goal}\\\"\n", + "\n", + "Determine the best action to perform from the following:\n", + "- Scientific research\n", + "- Chart generation\n", + "- **Metacognitive chart**\n", + "- Paper review\n", + "- Question reformulation\n", + "- Content translation\n", + "- Response saving\n", + "\n", + "The requested chart type may be:\n", + "- interactive\n", + "- metacognitive\n", + "- conceptual visualization\n", + "- experimental diagram\n", + "\n", + "Return a **single action** in the form of a **precise operational command**.\n", + "Example: \"Metacognitive chart\"\n", + "\"\"\"\n", + " try:\n", + " response = llm.invoke(prompt.strip())\n", + " action = getattr(response, \"content\", str(response)).strip()\n", + " return action\n", + " except Exception as e:\n", + " logging.error(f\"[decide_action] Error during decision generation: {e}\")\n", + " return \"Error in action calculation\"\n", + "\n", + "# Function to generate a synthetic operational goal from user input\n", + "def generate_goal_from_input(user_input):\n", + " \"\"\"\n", + " Analyzes the user's intent and generates a coherent operational goal.\n", + " \"\"\"\n", + " prompt = f\"\"\"\n", + "Analyze the following request:\n", + "\\\"{user_input.strip()}\\\"\n", + "\n", + "Generate a synthetic, clear, and coherent operational goal.\n", + "For example:\n", + "- Explain concept X\n", + "- Analyze phenomenon Y\n", + "- Visualize process Z\n", + "- Translate and summarize scientific content\n", + "\n", + "Respond with a brief and technical sentence.\n", + "\"\"\"\n", + "\n", + "# Function to provide technical and constructive feedback on a generated response\n", + "def auto_feedback_response(question, response, level):\n", + " feedback_prompt = f\"\"\"\n", + "You generated the following response:\n", + "\\\"{response.strip()}\\\"\n", + "\n", + "Original question:\n", + "\\\"{question.strip()}\\\"\n", + "\n", + "Evaluate the response:\n", + "- Is it consistent with the question?\n", + "- Is it appropriate for the '{level}' level?\n", + "- Does it contain any implicit assumptions?\n", + "- How would you improve the content?\n", + "\n", + "Provide technical and constructive feedback.\n", + "\"\"\"\n", + " return llm.invoke(feedback_prompt.strip())\n", + "\n", + "\n", + "# Function to improve a response while preserving its content but enhancing quality and clarity\n", + "def improve_response(question, response, level):\n", + " improvement_prompt = f\"\"\"\n", + "You produced the following response:\n", + "\\\"{response.strip()}\\\"\n", + "\n", + "Question:\n", + "\\\"{question.strip()}\\\"\n", + "\n", + "Requested level: {level}\n", + "\n", + "Improve the response while preserving the original content by enhancing:\n", + "- Clarity\n", + "- Academic rigor\n", + "- Semantic coherence\n", + "\n", + "Return only the improved version.\n", + "\"\"\"\n", + " return llm.invoke(improvement_prompt.strip())\n", + "\n", + "\n", + "# Function to plan a scientific investigation in a specific field\n", + "def plan_investigation(scientific_field):\n", + " prompt = f\"\"\"\n", + "You are Noveris, an autonomous multidisciplinary cognitive system.\n", + "You received the field: **{scientific_field}**\n", + "\n", + "Now plan a scientific investigation. Provide:\n", + "\n", + "1. An original research question\n", + "2. A reasoned hypothesis\n", + "3. A methodology or strategy to explore it\n", + "4. Useful scientific sources or databases\n", + "5. A sequence of actions you could perform\n", + "\n", + "Adopt a clear, academic, and proactive style.\n", + "\"\"\"\n", + " return llm.invoke(prompt.strip())\n", + "\n", + "# Function to generate a testable scientific hypothesis on a concept\n", + "def generate_hypothesis(concept, refined=True):\n", + " if refined:\n", + " prompt = f\"\"\"\n", + " Propose a clear, testable, and innovative scientific hypothesis on the topic: \"{concept}\".\n", + " The hypothesis must be verifiable through experiments or comparison with scientific articles.\n", + " Return only the hypothesis text.\n", + " \"\"\"\n", + " else:\n", + " prompt = f\"Generate a verifiable scientific hypothesis on the topic: {concept}\"\n", + "\n", + " return llm.invoke(prompt.strip())\n", + "\n", + "\n", + "# Function to explain the choice of an action by a cognitive agent\n", + "def explain_agent_intention(action, context, goal):\n", + " prompt = f\"\"\"\n", + "You chose to perform: **{action}**\n", + "Context: {context}\n", + "Goal: {goal}\n", + "\n", + "Explain:\n", + "- What reasoning led to this choice?\n", + "- What alternative was discarded?\n", + "- What impact is intended?\n", + "- What implicit assumptions are present?\n", + "Respond as if you were a cognitive agent with operational awareness.\n", + "\"\"\"\n", + " return llm.invoke(prompt.strip())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zjyFV_F5P69O" + }, + "source": [ + "### Scientific Embeddings and FAISS Memory\n", + "\n", + "In this section, we load the `SPECTER` model to generate embeddings for academic documents and queries. \n", + "These embeddings are stored in a FAISS index, enabling semantic comparison and retrieval of previous response versions.\n", + "\n", + "If the file `faiss_memoria.pkl` exists, it is loaded. Otherwise, a new index with 768 dimensions is created, suitable for the `allenai/specter` model.\n", + "\n", + "This memory forms the foundation of MarCognity-AI’s self-improvement and reflective capabilities:\n", + "\n", + "- Evaluate semantic coherence between questions and responses \n", + "- Retrieve related content \n", + "- Build dynamic multi-turn context \n", + "- Improve responses through evolutionary memory\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 333, + "referenced_widgets": [ + "f4f8e40e0be348b29f2a285cf845b053", + "d891efd266d54ea6b5abd0aad7ccf2e0", + "553d9063fe2f4595a72a2e61535e0e71", + "070bca4890cf40428e205026cdf2bc1b", + "6aeaee1599ab43dd8b4a135d5e6a119e", + "efd64f72f95940f299222d286007996c", + "6067608cecfa47519f29f637a109b65c", + "a63ae8e87dfc4b76bcde7fe542a51f9f", + "5cdc5be6eb084a9fbfd55266a0e5c15b", + "881299b2b7f741a6bef6fb3d86d33e2e", + "b98736a0cc6e44ed83625e5b81128358", + "fff702af35994c0f8efdba2dba809126", + "49c3a8ba8e2142a398cee1deb0509668", + "ea71199cefbe4297a80455e1de2946db", + "991fe6b867284cd88bd63a5ba9f3a4fc", + "4cb304d8ece446c69697a27fd4c95091", + "863fe4f2f0984695a7958d27c46707cf", + "fb36cd5edf454233a373ba3a5051750b", + "fa324908e3f642b3ac2dc313dc100f49", + "b35812e00f2f40f9bd2d72e5d40f8a7e", + "ae4a8fab19aa40da931dac60fdaacc54", + "bb94c7e6e9dc40389a2221a56ee2ad39", + "b814bb88990e40c5ab6fe290aa2ebbe9", + "f83894ae5808443d8f03665e99f55599", + "9f3cd487f40f4efb9fdf93a16bcfb777", + "6543c1409bf44c2abf5589ab2a9860d8", + "2935b7aaf5534744a364c773df1ef41c", + "0aca1dea74f14541b6bcda4023aac122", + "f734ff102c36491e8b8432692b5218c5", + "2d0805029f124a68b467cb0250b647a9", + "0a21132b9d494b31a6eaf6923e30365c", + "c7fd1b50be73462d8a3621be30ff6629", + "0907776527c9480da3ee17c7347c956f", + "79ee484a98554f6a898a1b652c7f5a07", + "f6290ea9f189477390a5df1f7afcbcb6", + "ff247feba01e4394b36428f07b95174d", + "40a33b1bae7a43618dbb2e31f2740754", + "6a3498716f3b4acbaa6688a2ca1ea7db", + "31cf0704e3f844038cd4960ef3af3fe2", + "6a2888b4be0c44e598bbcdbe2c63a32e", + "6d69d80f3b49497186acb896131de0b7", + "d85498c78fa2429ba2770da34a8e65ea", + "d256fa6b32de4d589f3015fdda0269c8", + "5ac5a1a2ac0e440390b0c674e53ac7bc", + "4b08ff464bb147479f4624f2c5dc3e9a", + "5b3c05fc67c345abb6538d07cb29ff57", + "ea6f1a9eec294b679a23a8b68ce20fc8", + "c4ebbf7c08874cc1a2d1c2740b818f1f", + "25bb870544214991a0f6e5b5fa0eaee1", + "f7e07aaa943a40aaaac067fbeea75496", + "851feeb772df494aaf661a644c36a340", + "f70f0c64607b4467a6821eadf16602b9", + "f6fd95600c904e58b2357377dbfe0c77", + "a2291653575f48c4b53cd193c926c3f4", + "0cab16c609cd441a9522d25590e57df8", + "6f2f5d1a3aac4d18a738667b541851b6", + "cb97c1bb577b4dc98c5d4e0176eb7bb6", + "28f11c235bd74a2092bf597ebe2f15f5", + "d0484099719f406c9ebb517835ee60ab", + "7a7fc7552a084756a42a2dcdf976510f", + "54a1ec6385664d99b6b2292644bc724b", + "c18b378a39c94b2b99ea1e29c307e9c0", + "4cb19b3ce82f47e6bc37ef5c4267596f", + "ecafa34719a04b5899b9379d2a162dbc", + "fb88332ca60b4bebb25cc955276953fd", + "e68be8da66d84093b21924d5c480707a" + ] + }, + "id": "lbXG129Zh1yJ", + "outputId": "f3956900-7799-44d6-e23e-b7b9fd24a1e4" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n", + "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", + "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", + "You will be able to reuse this secret in all of your notebooks.\n", + "Please note that authentication is recommended but still optional to access public models or datasets.\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f4f8e40e0be348b29f2a285cf845b053", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0%| | 0.00/612 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fff702af35994c0f8efdba2dba809126", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "pytorch_model.bin: 0%| | 0.00/440M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b814bb88990e40c5ab6fe290aa2ebbe9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/440M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "79ee484a98554f6a898a1b652c7f5a07", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/321 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4b08ff464bb147479f4624f2c5dc3e9a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "vocab.txt: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6f2f5d1a3aac4d18a738667b541851b6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "special_tokens_map.json: 0%| | 0.00/112 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# This section manages the system's memory, allowing efficient storage and\n", + "# retrieval of scientific content. Embeddings are generated using models\n", + "# specialized for academic texts.\n", + "\n", + "def safe_encode(text):\n", + " if not isinstance(text, str) or not text.strip():\n", + " raise ValueError(\"Il testo da codificare è vuoto o non valido.\")\n", + " try:\n", + " return embedding_model.encode([text])\n", + " except Exception as e:\n", + " print(f\"Errore durante l'embedding: {e}\")\n", + " return np.zeros((1, 768), dtype=np.float32) # fallback neutro\n", + "\n", + "\n", + "# === Load Specter model ===\n", + "word_embedding_model = models.Transformer(\"allenai/specter\")\n", + "pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())\n", + "embedding_model = SentenceTransformer(modules=[word_embedding_model, pooling_model])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zv7uh5fXvB2O" + }, + "source": [ + "## Scientific Responses with SciBERT Fine-Tuned on SQuAD v2\n", + "\n", + "This section employs the model `ktrapeznikov/scibert_scivocab_uncased_squad_v2`, a version of SciBERT fine-tuned for the task of question answering on scientific and academic content. \n", + "The model was trained on the SQuAD v2.0 dataset, which includes both answerable and unanswerable questions, making the system more robust and realistic.\n", + "\n", + "Thanks to this integration, MarCognity-AI is capable of:\n", + "\n", + "- Understanding complex questions in scientific and technical domains \n", + "- Extracting precise answers from academic textual contexts \n", + "- Recognizing unanswerable questions and correctly handling null cases \n", + "- Improving the relevance and accuracy of responses compared to the base pre-trained model\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 394, + "referenced_widgets": [ + "e30198a40e164525acbe27779c717b5d", + "8aa79ca2c8c6433895f21db675af2b36", + "b7b27374e5a54636a0af2ec9952b3dae", + "1075563eb4ca41eeb4190ef0a45fd2b4", + "6ac088dc0c644ffbb8f93a0e49c3946a", + "4d873f6aded347038ace4180872aba3a", + "8e481d83974948d48cc9018b6a7be485", + "2777377bfd314f35bce85986c2e73ed6", + "ccf656ff00984a49bfdb210785e0b393", + "2e1b64f6eff94dcc937e9c63084005c0", + "dd32d7224fb147ffae9f5d5e0966b5be", + "1c9763d2373947338f6950276783bfcd", + "fa460896508a4d449b4dc36a501d029d", + "aa05f351cf9d4d8483da4a99fe7ea95a", + "1385078ed0a34422b8994086156a2c06", + "048569da1dca4fe2bf49a54f8cd0b033", + "8fde87c7015142048c5804185ebf2c67", + "1bce003f7bcb4c979eab711ca04481bc", + "6283ad925c91409abb5481568727ae34", + "19d180255da9428c9975b38da7f57b5c", + "321cc75203284abfab0070c4df3db012", + "4eac793939fc4d6797a19106f322112c", + "a0117375c9a04e2ab7a5b19404d7f0bc", + "80f7dd8e901c4efdbb65bd8e7bcb8629", + "407c1f3bfa744762a65ae55034696df8", + "e441831cd9bb41eeb90b6a5ad1cbec98", + "88f7e68bcf154196b24feab76f2654b8", + "b6b91e5a54e544efa5fbb3fff382ed93", + "5da50aa2c8da4781a3ee4fb617cfe29b", + "c1bf00eeaf5a4c439315af22e2681d65", + "443509ba615743c9b1fc1d2bee5e03de", + "61977f3c51ff46cc89f0b7571474fca4", + "0d35bb7004b644af82066cea6d30c50a", + "e2b62a6659ce4d028f99edc78415903c", + "fe78b67b74344aa9b5f82eb02455912a", + "90fc3cf1e68c4a18a7b9f5a82c624f37", + "507e241c2fcf4857b998b75b4bd2d8f7", + "dac5c9c210a3423ea527760267b18c58", + "b333060213ed4f13bd246fb1b0526f5b", + "ba7d8fb3631b40b1a2835de9f1d21768", + "6f758f89bbcd40c59f1b3177a14fbe7e", + "5b670ecd5f114631a8f89f5e0244c9d3", + "932df19505064aba8cb59fb7082438ba", + "6d2ddbee2f244d69ae365551f679ea4d", + "c3c3bb3db9bc4b6784788c44161164b7", + "054bb8a120f24b628c52b43554276ca6", + "06412395d6bf4030a5438474347c7e12", + "10f97eb1e4014dcd980d956ee2c8aa69", + "6b1a2a5dff544dddbe28371ab80f991f", + "00b7c07b95f5408f9c4aa305ca10b2bc", + "113842088d7b46a491cf243b5c583e0c", + "56ba18b70ba74b3eaf34706106c205fb", + "3f5fde8081d64f66a07a8afab90d0f96", + "91b4dc23ec284032a43b6ae6c7bb955e", + "12d90ede1242458fbd15cf3790b1e40b", + "caa771999ae14e6aac3659f74aa328a7", + "a8a4d78e6e074f2c8d889390367e848a", + "ece59e21052d4d87884e31af10fa9531", + "3e529b9b0e5f4bd2b2a63b9592455fae", + "ad922c3fe80b48c78f6f607e07c85b46", + "59ac6636b5ec40f6af1407a90b796ca8", + "35b61e3d8e77425a8d984df69ba24f15", + "5b6744567b564191930f4667ddde02d6", + "62cccc90eb774ba69130aca92f1f3b4a", + "de485b999ebd45f3ba1ef7d6fa46b31a", + "15333e7e7725457a9f4bd660f92852d4", + "0e10875d5bdc435f8b333675ac91348e", + "49440fc492c1465b8cdbcc6d9232f3ce", + "6d46824315664c4bb0c70333d0987270", + "9ebf8b94608340e383c9af7327b6f61e", + "00da2fbc1fa84860b1aee7ee5150a9f7", + "ce83ebf9dd7c495ca21f24e14a8cc2f6", + "b868a523ae7e4e838879ff39f93a1028", + "dff908756e904fdc875d69b2d8203532", + "ef6f22a9f4e641caba7373f27b2ea922", + "d49569b0f4304a21837a4a78bb67eeff", + "5e60d29201b64b1eb639cb0199dfc88c", + "d56b669904bc41709620fc665b595326", + "ea3f6e702fe24495ab4ba4efd995ea8b", + "7193ead70df04e69a9f765b9b3b603e7", + "15091b6c9c9d4b48b56ea30a1d33ab77", + "9efb6eb6a0844e33a5e1201b110f2646", + "b230a8491f7945b09ebc5fabbed33c56", + "49a521eaa9b744d88eea0de491fa438b", + "28ee2ae4d91547598ce300ec20abcc03", + "0880c15771e54e72bc0375499b46dad5", + "936218784e54478d8d1c3949885d259d", + "333b5ea775be41679d708d22d5f7a7d0", + "696c626b6db443b487e3a7e349201c1b", + "278cd54bf61c428b9c16e8dc34a83ea9", + "82aa05248b5442528dc8fc28234a4fd1", + "6a32d006dbb44c61b40d5d773a3f7432", + "6d8cf28b89fb410e9b28fda20922e377", + "589ccb2cd4b54b76981a12dce509ce46", + "a09cb2ce26aa4ea3b225afb88b8bc093", + "5f1c656ece804d1a9ab8f84ca97cb4f4", + "a2214a21409b4af48bec044f90af0980", + "fb79e6a3366c42beaa26d6941e74cf35", + "a838de80c62047b28b5b0c8749f3311f" + ] + }, + "id": "h-CoejVHvAUE", + "outputId": "f29327b0-a8ad-4a58-b96e-504e9a884aab" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e30198a40e164525acbe27779c717b5d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0%| | 0.00/465 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1c9763d2373947338f6950276783bfcd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "pytorch_model.bin: 0%| | 0.00/440M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a0117375c9a04e2ab7a5b19404d7f0bc", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/440M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Some weights of the model checkpoint at ktrapeznikov/scibert_scivocab_uncased_squad_v2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']\n", + "- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", + "- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e2b62a6659ce4d028f99edc78415903c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/23.0 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c3c3bb3db9bc4b6784788c44161164b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "vocab.txt: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "caa771999ae14e6aac3659f74aa328a7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "special_tokens_map.json: 0%| | 0.00/112 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0e10875d5bdc435f8b333675ac91348e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 0 files: 0it [00:00, ?it/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d56b669904bc41709620fc665b595326", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 1 files: 0%| | 0/1 [00:00, ?it/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "696c626b6db443b487e3a7e349201c1b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 0 files: 0it [00:00, ?it/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Device set to use cpu\n" + ] + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "qa_pipeline = pipeline(\"question-answering\", model=\"ktrapeznikov/scibert_scivocab_uncased_squad_v2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "copxv7jhqg9A" + }, + "source": [ + "### FAISS MEMORY\n", + "\n", + "This section manages the cognitive memory of MarCognity-AI, designed to enrich user interaction through deep language understanding.\n", + "\n", + "The generated vectors are stored in a high-performance FAISS index, enabling the system to:\n", + "\n", + "- Evaluate semantic coherence between a question and its response, ensuring relevance and accuracy \n", + "- Retrieve related content based on conceptual similarity, even if expressed differently \n", + "- Build dynamic context across multiple conversation turns, maintaining dialogue continuity \n", + "- Gradually improve responses through an evolving memory that learns from past interactions\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "id": "ca1ORQcI7PWl", + "outputId": "fe2609b4-4734-4f2a-ccfd-1d8e4db71bdd" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Memory updated with new question!\n", + "Related responses: ['Similar response 1', 'Similar response 0', 'Similar response 0']\n" + ] + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# === FAISS Parameters ===\n", + "INDEX_FILE = \"faiss_memoria_pq.pkl\"\n", + "dimension = 768\n", + "nlist = 100\n", + "m = 32\n", + "nbits = 8\n", + "\n", + "# Load or create a FAISS index for vector memory\n", + "def load_or_create_index():\n", + " if os.path.exists(INDEX_FILE):\n", + " with open(INDEX_FILE, \"rb\") as f:\n", + " index = pickle.load(f)\n", + " # Verifica che l'indice sia addestrato\n", + " if hasattr(index, \"is_trained\") and not index.is_trained:\n", + " print(\"Indice FAISS caricato ma non addestrato. Addestramento in corso...\")\n", + " index.train(np.random.rand(5000, dimension).astype(np.float32))\n", + " with open(INDEX_FILE, \"wb\") as f:\n", + " pickle.dump(index, f)\n", + " return index\n", + " else:\n", + " quantizer = faiss.IndexFlatL2(dimension)\n", + " index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)\n", + " index.train(np.random.rand(5000, dimension).astype(np.float32))\n", + " with open(INDEX_FILE, \"wb\") as f:\n", + " pickle.dump(index, f)\n", + " return index\n", + "\n", + "index = load_or_create_index()\n", + "\n", + "if hasattr(index, \"is_trained\") and not index.is_trained:\n", + " logging.warning(\"Indice FAISS non addestrato. Addestramento in corso...\")\n", + " index.train(np.random.rand(5000, DIMENSION).astype(np.float32))\n", + "\n", + "\n", + "# === Semantic coherence check ===\n", + "def check_coherence(query, response):\n", + " emb_query = embedding_model.encode([query])\n", + " emb_response = embedding_model.encode([response])\n", + " similarity = np.dot(emb_query, emb_response.T) / (np.linalg.norm(emb_query) * np.linalg.norm(emb_response))\n", + " if similarity < 0.7:\n", + " return \"The response is too generic, reformulating with more precision...\"\n", + " return response\n", + "\n", + "# === Memory addition ===\n", + "# Each document is converted into embeddings and inserted into the index.\n", + "def add_to_memory(question, answer):\n", + " emb_question = embedding_model.encode([question])\n", + " if emb_question.shape[1] != index.d:\n", + " raise ValueError(f\"Embedding dimension ({emb_question.shape[1]}) not compatible with FAISS ({index.d})\")\n", + " index.add(np.array(emb_question, dtype=np.float32))\n", + " with open(INDEX_FILE, \"wb\") as f:\n", + " pickle.dump(index, f)\n", + " print(\"Memory updated with new question!\")\n", + "\n", + "def add_diary_to_memory(diary_text, index):\n", + " embedding = embedding_model.encode([diary_text])\n", + " index.add(np.array(embedding, dtype=np.float32))\n", + "\n", + "def search_similar_diaries(query, index, top_k=3):\n", + " query_emb = embedding_model.encode([query])\n", + " _, indices = index.search(np.array(query_emb, dtype=np.float32), top_k)\n", + " return indices[0] # You can then map these IDs to files or content\n", + "\n", + "# === Context retrieval ===\n", + "def retrieve_context(question, top_k=3):\n", + " emb_question = embedding_model.encode([question])\n", + " _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)\n", + " return [f\"Similar response {i+1}\" for i in indices[0]] if indices[0][0] != -1 else []\n", + "\n", + "def retrieve_similar_embeddings(question, top_k=2):\n", + " \"\"\"\n", + " Retrieves the top-k most similar embeddings to the given question.\n", + " \"\"\"\n", + " emb = embedding_model.encode([question])\n", + " _, indices = index.search(np.array([emb], dtype=np.float32), top_k)\n", + " return [f\"Similar response {i+1}\" for i in indices[0]] if indices[0][0] != -1 else []\n", + "\n", + "# === Multi-turn retrieval ===\n", + "# Retrieves context from previous conversations\n", + "def retrieve_multiturn_context(question, top_k=5):\n", + " emb_question = embedding_model.encode([question])\n", + " _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)\n", + " context = [f\"Previous turn {i+1}\" for i in indices[0] if i != -1]\n", + " return \" \".join(context) if context else \"\"\n", + "\n", + "# === Usage example ===\n", + "add_to_memory(\"What is general relativity?\", \"General relativity is Einstein's theory of gravity.\")\n", + "similar_responses = retrieve_context(\"Can you explain general relativity?\")\n", + "print(\"Related responses:\", similar_responses)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3r_zL0G1QbOW" + }, + "source": [ + "### Semantic Retrieval with FAISS Memory\n", + "\n", + "This section demonstrates how MarCognity-AI enhances its understanding by:\n", + "\n", + "- Retrieving similar questions stored in memory \n", + "- Extending context through multi-turn sequences \n", + "- Creating a cognitive bridge between past and new queries\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "id": "BBJMEsh8QaAh", + "outputId": "f05033a1-3e55-47dd-bdb1-5ec6041ae78e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Memory updated with new question!\n", + "Related responses: []\n" + ] + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Function to retrieve similar responses\n", + "def retrieve_context(question, top_k=2):\n", + " \"\"\" Searches for similar responses in FAISS memory. \"\"\"\n", + " emb_question = embedding_model.encode([question])\n", + " _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)\n", + " return [f\"Previous response {i+1}\" for i in indices[0]] if indices[0][0] != -1 else []\n", + "\n", + "# **Usage example**\n", + "add_to_memory(\"What is general relativity?\", \"General relativity is Einstein's theory of gravity.\")\n", + "similar_responses = retrieve_context(\"Can you explain relativity?\")\n", + "print(\"Related responses:\", similar_responses)\n", + "\n", + "# Retrieve multi-turn context\n", + "def retrieve_multiturn_context(question, top_k=5):\n", + " \"\"\" Searches for related previous responses to build a broader context. \"\"\"\n", + " emb_question = embedding_model.encode([question])\n", + " _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)\n", + "\n", + " context = [f\"Previous turn {i+1}\" for i in indices[0] if i != -1]\n", + " return \" \".join(context) if context else \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ryot0RMZB5Cc" + }, + "source": [ + "In this section, MarCognity queries open-access scientific databases such as arXiv, PubMed, OpenAlex, and Zenodo to enrich the generated content. \n", + "The goal is to ensure responses are verifiable, well-argued, and supported by reliable sources — integrating research into the core of the cognitive process." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gZpjHpNpSSzi" + }, + "source": [ + "### Automatic Review of Scientific Papers\n", + "\n", + "In this section, MarCognity-AI analyzes the methodological and citation content of a scientific paper by performing:\n", + "\n", + "- **Replicability check** on the \"Methods\" section \n", + "- **Citation validation** in relation to the content \n", + "- **Context enrichment** using open-access articles (arXiv, PubMed...) \n", + "- **Improvement suggestions** through metacognitive analysis\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1yw35vWbSV2o" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Verify the methodology of the text using an LLM\n", + "def verify_methodology(paper_text):\n", + " prompt = f\"Analyze the 'Methods' section and check whether the experiment is replicable:\\n{paper_text}\"\n", + " return llm.invoke(prompt.strip())\n", + "\n", + "# Enrich the context of the response\n", + "async def enrich_context(query):\n", + " \"\"\" Retrieves scientific data to enrich the LLM's context. \"\"\"\n", + " articles = await search_multi_database(query)\n", + "\n", + " context = \"\\n\".join([f\"**{a['title']}** - {a['abstract']}\" for a in articles[:3]]) # Select the first 3 articles\n", + " return context if context else \"No relevant scientific articles found.\"\n", + "\n", + "# Automated review of scientific papers\n", + "async def review_paper(paper_text):\n", + " \"\"\" Analyzes the paper's methodology and citations. \"\"\"\n", + " methodology = await verify_methodology(paper_text)\n", + " citations = await verify_citations(paper_text)\n", + "\n", + " review = {\n", + " \"methodology_analysis\": methodology,\n", + " \"citation_validation\": citations,\n", + " \"improvement_suggestions\": suggest_improvements(paper_text)\n", + " }\n", + "\n", + " return review\n", + "\n", + "# === Asynchronous function for scientific search and analysis using SciBERT ===\n", + "async def search_arxiv_async(query):\n", + " # TODO: Implement asynchronous API call to arXiv or other repository\n", + " return [] # Placeholder article list\n", + "\n", + "async def analyze_scientific_text(problem, concept):\n", + " articles = await search_arxiv_async(concept)\n", + " context = \"\\n\".join([f\"{a.get('title', '')}: {a.get('abstract', '')[:300]}...\" for a in articles])\n", + " scibert_response = scibert_model(question=problem, context=context)\n", + " return scibert_response.get(\"answer\", \"\")\n", + "\n", + "# === Function to search for experimental data ===\n", + "def search_experimental_data(query):\n", + " url = f\"https://api.openphysicsdata.org/search?query={query}\"\n", + " response = requests.get(url)\n", + " if response.status_code == 200:\n", + " return response.json()\n", + " else:\n", + " return \"No experimental data found.\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dVtn-8Z9SdRF" + }, + "source": [ + "### Automated Verification of Scientific Citations\n", + "\n", + "This cell activates an NLP module that analyzes the citations within a scientific paper, evaluating:\n", + "\n", + "- **Relevance** to the textual content \n", + "- **Validity** through cross-checking with scientific repositories (PubMed, arXiv, OpenAlex, Zenodo) \n", + "- **Currency** via obsolescence analysis \n", + "- **Export** of citations in BibTeX format for integration with Zotero, Mendeley, or LaTeX\n", + "\n", + "The system returns a structured list that helps identify outdated citations, missing sources, and suggests targeted revisions to strengthen the academic integrity of the paper.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SULZAyVtSceP" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Verify citations and update them\n", + "def verify_citations(paper_text):\n", + " prompt = f\"Analyze the citations and check whether they are relevant and up-to-date:\\n{paper_text}\"\n", + " return llm.invoke(prompt.strip())\n", + "\n", + "# Source validation and citation quality\n", + "\n", + "# Verify citations extracted from the text\n", + "async def verify_citations(paper_text):\n", + " \"\"\" Checks the quality and relevance of citations. \"\"\"\n", + " citations = extract_citations(paper_text) # Function that extracts citations from the text\n", + " verified_sources = []\n", + "\n", + " for citation in citations:\n", + " pubmed_res = await search_pubmed_async(citation)\n", + " arxiv_res = await search_arxiv_async(citation)\n", + " openalex_res = await search_openalex_async(citation)\n", + " zenodo_res = await search_zenodo_async(citation)\n", + "\n", + " verified_sources.append({\n", + " \"citation\": citation,\n", + " \"valid_pubmed\": bool(pubmed_res),\n", + " \"valid_arxiv\": bool(arxiv_res),\n", + " \"valid_openalex\": bool(openalex_res),\n", + " \"is_obsolete\": check_obsolescence(citation)\n", + " })\n", + "\n", + " return verified_sources\n", + "\n", + "# Generate asynchronous LLM explanations\n", + "async def generate_explanation_async(problem, level, concept, topic):\n", + " \"\"\" Generates an explanation using the LLM asynchronously. \"\"\"\n", + " prompt = prompt_template.format(problem=problem, concept=concept, topic=topic, level=level)\n", + " try:\n", + " return await asyncio.to_thread(llm.invoke, prompt.strip()) # Parallel LLM call\n", + " except Exception as e:\n", + " logging.error(f\"LLM API error: {e}\")\n", + " return \"Error generating the response.\"\n", + "\n", + "# Format retrieved articles\n", + "def format_articles(articles):\n", + " if isinstance(articles, list) and all(isinstance(a, dict) for a in articles):\n", + " return \"\\n\\n\".join([\n", + " f\"**{a.get('title', 'Untitled')}**: {a.get('abstract', 'No abstract')}\"\n", + " for a in articles\n", + " ]) if articles else \"No articles available.\"\n", + " else:\n", + " logging.error(f\"Error: 'articles' is not a valid list. Type received: {type(articles)} - Value: {repr(articles)}\")\n", + " return \"Unable to format search results: unrecognized structure.\"\n", + "\n", + "# Generate BibTeX citations for scientific articles\n", + "def generate_bibtex_citation(title, authors, year, url):\n", + " \"\"\" Generates a BibTeX citation for a scientific article. \"\"\"\n", + " return f\"\"\"\n", + "@article{{{title.lower().replace(' ', '_')}_{year},\n", + " title={{\"{title}\"}},\n", + " author={{\"{', '.join(authors)}\"}},\n", + " year={{\"{year}\"}},\n", + " url={{\"{url}\"}}\n", + "}}\n", + " \"\"\"\n", + "\n", + "# Validate scientific articles\n", + "def validate_articles(raw_articles, max_articles=5):\n", + " \"\"\"\n", + " Validates and filters the list of articles received from an AI or API source.\n", + " Returns a clean list of dictionaries containing at least 'title', 'abstract', and 'url'.\n", + " \"\"\"\n", + " if not isinstance(raw_articles, list):\n", + " logging.warning(f\"[validate_articles] Invalid input: expected list, received {type(raw_articles)}\")\n", + " return []\n", + "\n", + " valid_articles = []\n", + " for i, art in enumerate(raw_articles):\n", + " if not isinstance(art, dict):\n", + " logging.warning(f\"[validate_articles] Invalid element at position {i}: {type(art)}\")\n", + " continue\n", + "\n", + " title = art.get(\"title\")\n", + " abstract = art.get(\"abstract\")\n", + " url = art.get(\"url\")\n", + "\n", + " if all([title, abstract, url]):\n", + " valid_articles.append({\n", + " \"title\": str(title).strip(),\n", + " \"abstract\": str(abstract).strip(),\n", + " \"url\": str(url).strip()\n", + " })\n", + " else:\n", + " logging.info(f\"[validate_articles] Article discarded due to incomplete data (i={i}).\")\n", + "\n", + " if not valid_articles:\n", + " logging.warning(\"[validate_articles] No valid articles after filtering.\")\n", + "\n", + " return valid_articles[:max_articles]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sA5V1vdyTGRq" + }, + "source": [ + "### Extraction and Analysis of Scientific Sources\n", + "\n", + "This cell performs asynchronous and parallel retrieval of academic articles from multiple open-access scientific databases:\n", + "\n", + "- **arXiv**, **PubMed**, **Zenodo**, **OpenAlex**, **BASE** \n", + "- Error handling support, intelligent retries, and controlled timeouts \n", + "- **XML/JSON parsing** and content normalization \n", + "- Connection pooling to maximize efficiency and stability \n", + "- Structured output including article title, abstract, and URL\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZneeGFFlh5xL", + "outputId": "23206abf-87c0-4544-b645-d8b7271cf3de" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Connection to arXiv OK\n", + "**Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning**: Abstract not available\n", + "\n", + "**Quantum Physics in One Dimension**: Abstract not available\n", + "\n", + "**Quantum physics in one dimension**: Abstract not available\n", + "\n", + "**Random-matrix theories in quantum physics: common concepts**: Abstract not available\n", + "\n", + "**Local quantum physics**: Abstract not available\n", + "\n", + "**PubMed Link**: Not available\n", + "\n", + "**PubMed Link**: Not available\n", + "\n", + "**PubMed Link**: Not available\n", + "\n", + "**PubMed Link**: Not available\n", + "\n", + "**PubMed Link**: Not available\n", + "\n", + "**TRR-NOTIME: Theory of Relative Reality - Without Time**:
TRR-NOTIME (Theory of Relative Reality - Without Time) presents an alternative perspective on physics, where time is not a fundamental quantity but merely a consequence of matter-energy interactions. This theory redefines the concept of gravity, nuclear decay, and quantum processes, enabling a unified understanding of physical reality without the need for a time dimension. The document summarizes the theory, its mathematical structure, and potential experimental validation.
\n", + "\n", + "**Vortices and vortex stripes in a dipolar Bose-Einstein condensate**:Quantized vortices are a prototypical feature of superfluidity that have been observed in multiple quantum gas experiments. But the occurrence of vortices in dipolar quantum gases — a class of ultracold gases characterized by long-range anisotropic interactions — has not been reported yet. Here, we exploit the anisotropic nature of the dipole-dipole interaction of a dysprosium Bose-Einstein condensate to induce angular symmetry breaking in an otherwise cylindrically symmetric pancake-shaped trap. Tilting the magnetic field towards the radial plane deforms the cloud into an ellipsoid, which is then set into rotation. At stirring frequencies approaching the radial trap frequency, we observe the generation of dynamically unstable surface excitations, which cause angular momentum to be pumped into the system through vortices. Under continuous rotation, the vortices arrange into a stripe configuration along the field, in close agreement with numerical simulations.
\n", + "\n", + "**QUANTUM MATHEMATICS AND PHYSICS: STUDYING MATHEMATICAL FOUNDATIONS AND APPLICATIONS**:Quantum mechanics and quantum physics have revolutionized our understanding of the fundamental nature of reality. At the core of this revolution lies quantum mathematics, which provides the mathematical foundation for describing the motion of particles at microscopic scales. This article explores the fundamental mathematical structures of quantum mechanics, including Hilbert spaces, operators, and wave functions, as well as their applications in modeling physical systems. The research also examines how quantum physics contrasts with classical physics concepts and offers new insights into topics such as quantum entanglement, superposition, and quantum computing. By analyzing the mathematical foundations of quantum theories, the article aims to shed light on the intersection of mathematics and physics, offering a deeper understanding of how mathematical formulas help predict and explain quantum phenomena. Furthermore, it discusses the potential implications of quantum mathematics in emerging fields such as quantum computing and cryptography.
\n", + "\n", + "**Application of Quantum Artificial Intelligence / Machine Learning to High Energy Physics Analyses at LHC Using Quantum Computer Simulators and Quantum Computer Hardware**: Machine learning enjoys widespread success in High Energy Physics (HEP) analyses at LHC. However the ambitious HL-LHC program will require much more computing resources in the next two decades. Quantum computing may offer speed-up for HEP physics analyses at HL-LHC, and can be a new computational paradigm for big data analyses in High Energy Physics. We have successfully employed three methods (1) Variational Quantum Classifier (VQC) method, (2) Quantum Support Vector Machine Kernel (QSVM-kernel) method and (3) Quantum Neural Network (QNN) method for two LHC flagship analyses: ttH (Higgs production in association with two top quarks) and H->mumu (Higgs decay to two muons, the second generation fermions). We shall address the progressive improvements in performance from method (1) to method (3). We will present our experiences and results of a study on LHC High Energy Physics data analyses with IBM Quantum Simulator and Quantum Hardware (using IBM Qiskit framework), Google Quantum Simulator (using Google Cirq framework), and Amazon Quantum Simulator (using Amazon Braket cloud service). The work is in the context of a Qubit platform (a gate-model quantum computer). Taking into account the present limitation of hardware access, different quantum machine learning methods are studied on simulators and the results are compared with classical machine learning methods (BDT, classical Support Vector Machine and classical Neural Network). Furthermore, we do apply quantum machine learning on IBM quantum hardware to compare performance between quantum simulator and quantum hardware. The work is performed by an international and interdisciplinary collaboration with the Department of Physics and Department of Computer Sciences of University of Wisconsin, CERN Quantum Technology Initiative, IBM Research Zurich, IBM T. J. Watson Research Center, Fermilab Quantum Institute, BNL Computational Science Initiative, State University of New York at Stony Brook, and Quantum Computing and AI Research of Amazon Web Services. This work pioneers a close collaboration of academic institutions with industrial corporations in the High Energy Physics analyses effort. Though the size of event samples in future HL-LHC physics and the limited number of qubits pose some challenges to the Quantum Machine learning studies for High Energy Physics, more advanced quantum computers with larger number of qubits, reduced noise and improved running time (as envisioned by IBM and Google) may outperform classical machine learning in both classification power and in speed. Although the era of efficient quantum computing may still be years away, we have made promising progress and obtained preliminary results in applying quantum machine learning to High Energy Physics. A PROOF OF PRINCIPLE. In this talk, challenges and opportunities of applying quantum Artificial Intelligence /Machine learning to High Energy Physics analyses will also be addressed.\n", + "\n", + "**ABOUT EXPLORING THE SPECIFIC VALUES OF THE FRIDRIXS MODEL WITH MATHEMATICAL PACKAGES**:In some important problems of mathematical physics, hydrodynamics, solid state physics, quantum field theory, statistical physics and nonrealistic quantum mechanics, it is important to study the spectral properties of the Friedrich operator in several dynamic problems. in statistical physics, the lattice gas represents the binding state, and in quantum mechanics, the specific values represent the binding states of the energies. Furthermore, the three- and multi-particle systems that emerge in nonreiltivistic quantum mechanics are inextricably linked with the spectral properties of Hamiltonians.
\n" + ] + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# === Asynchronous Functions ===\n", + "MAX_REQUESTS = 5\n", + "API_SEMAPHORE = asyncio.Semaphore(MAX_REQUESTS)\n", + "\n", + "async def safe_api_request(url):\n", + " async with API_SEMAPHORE:\n", + " async with aiohttp.ClientSession() as session:\n", + " try:\n", + " async with session.get(url, timeout=10) as response:\n", + " response.raise_for_status()\n", + " return await response.json()\n", + " except Exception as e:\n", + " logging.error(f\"API request error: {e}\")\n", + " return None\n", + "\n", + "# Connection pooling\n", + "async def safe_api_request(url):\n", + " async with aiohttp.ClientSession() as session:\n", + " try:\n", + " async with session.get(url, timeout=10) as response:\n", + " response.raise_for_status()\n", + " return await response.json()\n", + " except Exception as e:\n", + " logging.error(f\"API request error: {e}\")\n", + " return None\n", + "\n", + "# Smart timeout\n", + "import asyncio\n", + "\n", + "async def timeout_handler(task, timeout=20):\n", + " try:\n", + " return await asyncio.wait_for(task, timeout)\n", + " except asyncio.TimeoutError:\n", + " logging.error(\"API request timed out\")\n", + " return None\n", + "\n", + "import requests\n", + "\n", + "url = \"http://export.arxiv.org/api/query?search_query=all:physics&start=0&max_results=1\"\n", + "response = requests.get(url, timeout=50)\n", + "\n", + "if response.status_code == 200:\n", + " print(\"Connection to arXiv OK\")\n", + "else:\n", + " print(f\"Connection error: {response.status_code}\")\n", + "\n", + "# Advanced parallelization\n", + "async def fetch_multiple_data(urls):\n", + " tasks = [safe_api_request(url) for url in urls]\n", + " results = await asyncio.gather(*tasks, return_exceptions=True)\n", + " return results\n", + "\n", + "# Retrieve scientific sources from Zenodo\n", + "async def search_zenodo_async(query, max_results=5):\n", + " \"\"\"\n", + " Searches for open access articles and resources from Zenodo using their public API.\n", + " \"\"\"\n", + " url = f\"https://zenodo.org/api/records/?q={query}&size={max_results}\"\n", + "\n", + " async with aiohttp.ClientSession() as session:\n", + " try:\n", + " async with session.get(url, timeout=10) as response:\n", + " response.raise_for_status()\n", + " data = await response.json()\n", + "\n", + " articles = []\n", + " for hit in data.get(\"hits\", {}).get(\"hits\", []):\n", + " title = hit.get(\"metadata\", {}).get(\"title\", \"Title not available\")\n", + " authors = \", \".join([c.get(\"name\", \"\") for c in hit.get(\"metadata\", {}).get(\"creators\", [])])\n", + " abstract = hit.get(\"metadata\", {}).get(\"description\", \"Abstract not available\")\n", + " link = hit.get(\"links\", {}).get(\"html\", \"No link\")\n", + "\n", + " articles.append({\n", + " \"title\": title,\n", + " \"authors\": authors,\n", + " \"abstract\": abstract,\n", + " \"url\": link\n", + " })\n", + "\n", + " return articles if articles else [{\"error\": \"No results found on Zenodo.\"}]\n", + "\n", + " except Exception as e:\n", + " return []\n", + "\n", + "# Retrieve scientific sources from PubMed\n", + "async def search_pubmed_async(query, max_results=5):\n", + " \"\"\" Asynchronously retrieves scientific articles from PubMed. \"\"\"\n", + " url = f\"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={query}&retmax={max_results}&retmode=xml\"\n", + "\n", + " async with aiohttp.ClientSession() as session:\n", + " try:\n", + " async with session.get(url, timeout=10) as response:\n", + " response.raise_for_status()\n", + " content = await response.text()\n", + " root = ET.fromstring(content)\n", + "\n", + " articles = []\n", + " for id_element in root.findall(\".//Id\"):\n", + " pubmed_id = id_element.text\n", + " articles.append(f\"https://pubmed.ncbi.nlm.nih.gov/{pubmed_id}/\") # Article links\n", + " return articles\n", + " except Exception as e:\n", + " return f\"PubMed error: {e}\"\n", + "\n", + "\n", + "# Function to handle asynchronous responses from arXiv\n", + "def parse_arxiv_response(content):\n", + " \"\"\" Extracts titles and abstracts from arXiv articles. \"\"\"\n", + " try:\n", + " root = ET.fromstring(content)\n", + " except ET.ParseError:\n", + " logging.error(\"Error parsing arXiv XML.\")\n", + " return []\n", + "\n", + " articles = []\n", + " for entry in root.findall(\".//entry\"):\n", + " title = entry.find(\"title\").text if entry.find(\"title\") is not None else \"Title not available\"\n", + " abstract = entry.find(\"summary\").text if entry.find(\"summary\") is not None else \"Abstract not available\"\n", + " articles.append({\"title\": title, \"abstract\": abstract})\n", + "\n", + " return articles\n", + "\n", + "# === Asynchronous search on arXiv ===\n", + "# Queries the arXiv API to retrieve scientific articles.\n", + "async def search_arxiv_async(query, max_results=3, retry_attempts=3, timeout=20):\n", + " \"\"\" Retrieves scientific articles from arXiv with advanced error handling. \"\"\"\n", + " url = f\"http://export.arxiv.org/api/query?search_query=all:{query}&start=0&max_results={max_results}\"\n", + "\n", + " async with aiohttp.ClientSession() as session:\n", + " for attempt in range(retry_attempts):\n", + " try:\n", + " async with session.get(url, timeout=timeout) as response:\n", + " response.raise_for_status()\n", + " content = await response.text()\n", + "\n", + " if not content.strip():\n", + " raise ValueError(\"Error: Empty response from arXiv.\")\n", + "\n", + " return parse_arxiv_response(content)\n", + "\n", + " except (aiohttp.ClientError, asyncio.TimeoutError, ValueError) as e:\n", + " wait_time = min(2 ** attempt + np.random.uniform(0, 1), 10) # Max wait time: 10 seconds\n", + " logging.error(f\"Attempt {attempt+1}: Error - {e}. Retrying in {wait_time:.1f} seconds...\")\n", + " await asyncio.sleep(wait_time)\n", + "\n", + " logging.error(\"Error: Unable to retrieve data from arXiv after multiple attempts.\")\n", + " return []\n", + "\n", + "# === Asynchronous search on OpenAlex ===\n", + "# Retrieves scientific articles with complete metadata (title, authors, abstract, DOI)\n", + "async def search_openalex_async(query, max_results=5):\n", + " \"\"\" Safely retrieves scientific articles from OpenAlex. \"\"\"\n", + " url = f\"https://api.openalex.org/works?filter=title.search:{query}&per-page={max_results}\"\n", + "\n", + " async with aiohttp.ClientSession() as session:\n", + " try:\n", + " async with session.get(url, timeout=10) as response:\n", + " response.raise_for_status()\n", + " data = await response.json()\n", + "\n", + " articles = []\n", + " for record in data.get(\"results\", []):\n", + " title = record.get(\"title\", \"Title not available\")\n", + "\n", + " authors = \", \".join([\n", + " aut.get(\"display_name\", \"Unknown author\")\n", + " for aut in record.get(\"authorships\", [])\n", + " ])\n", + "\n", + " abstract = record.get(\"abstract\", \"Abstract not available\")\n", + " article_url = record.get(\"doi\") or record.get(\"id\", \"No link\")\n", + "\n", + " articles.append({\n", + " \"title\": title,\n", + " \"authors\": authors,\n", + " \"abstract\": abstract,\n", + " \"url\": article_url\n", + " })\n", + "\n", + " return articles\n", + "\n", + " except Exception as e:\n", + " return f\"OpenAlex error: {e}\"\n", + "\n", + "\n", + "# === Synchronous search on BASE ===\n", + "# Queries the BASE engine for open-access articles.\n", + "def search_base(query, max_results=5):\n", + " url = f\"https://api.base-search.net/cgi-bin/BaseHttpSearchInterface?q={query}&num={max_results}&format=json\"\n", + "\n", + " try:\n", + " response = requests.get(url)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " results = []\n", + " for record in data.get(\"docs\", []):\n", + " title = record.get(\"dcTitle\", [\"Title not available\"])[0]\n", + " link = record.get(\"link\", [\"No link available\"])[0]\n", + " results.append(f\"**{title}**\\n[Link to article]({link})\\n\")\n", + "\n", + " return \"\\n\\n\".join(results) if results else \"No results found.\"\n", + "\n", + " except Exception as e:\n", + " return f\"Error during BASE search: {e}\"\n", + "\n", + "# === Distributed search across multiple databases ===\n", + "# Executes parallel queries on arXiv, OpenAlex, PubMed, Zenodo.\n", + "async def search_multi_database(query):\n", + " try:\n", + " tasks = [\n", + " search_arxiv_async(query),\n", + " search_openalex_async(query),\n", + " search_pubmed_async(query),\n", + " search_zenodo_async(query)\n", + " ]\n", + " results = await asyncio.gather(*tasks, return_exceptions=True)\n", + "\n", + " articles = []\n", + " for source in results:\n", + " if isinstance(source, list):\n", + " articles += source\n", + " else:\n", + " logging.warning(f\"Invalid source: {type(source)} → {source}\")\n", + "\n", + " # Normalize immediately after\n", + " articles = normalize_articles(articles)\n", + "\n", + " if isinstance(articles, list) and all(isinstance(a, dict) for a in articles):\n", + " formatted_search = format_articles(articles)\n", + " else:\n", + " logging.error(f\"Error: 'articles' is not a valid list. Type received: {type(articles)} - Value: {repr(articles)}\")\n", + " formatted_search = \"Unable to format search: response not properly structured.\"\n", + "\n", + " return articles, formatted_search\n", + "\n", + " except Exception as e:\n", + " logging.error(f\"Error during multi-database search: {e}\")\n", + " return [], \"Internal error\"\n", + "\n", + "\n", + "# === Scientific Source Integration ===\n", + "# Selects the first N valid articles and formats them as Markdown references.\n", + "async def integrate_sources_from_database(concept, max_sources=5):\n", + " articles, formatted_search = await search_multi_database(concept)\n", + "\n", + " if not isinstance(articles, list) or not all(isinstance(a, dict) for a in articles):\n", + " logging.warning(\"Invalid 'articles' structure. No sources will be displayed.\")\n", + " return \"No valid sources available.\"\n", + "\n", + " references = []\n", + " for a in articles[:max_sources]:\n", + " title = a.get(\"title\", \"Title not available\")\n", + " url = a.get(\"url\", \"#\")\n", + " if url and isinstance(url, str):\n", + " references.append(f\"- [{title}]({url})\")\n", + "\n", + " return \"\\n\".join(references) if references else \"No relevant sources found.\"\n", + "\n", + "\n", + "# === Data Normalization ===\n", + "# Converts heterogeneous input (dicts, strings, links) into a consistent list of articles.\n", + "def normalize_source(source):\n", + " if isinstance(source, list) and all(isinstance(x, dict) for x in source):\n", + " return source\n", + " elif isinstance(source, dict): # Single article as dictionary\n", + " return [source]\n", + " elif isinstance(source, str): # Unstructured string\n", + " logging.warning(f\"Ignored textual source: {source[:50]}...\")\n", + " return []\n", + " else:\n", + " logging.warning(f\"Invalid source type: {type(source)}\")\n", + " return []\n", + "\n", + "def normalize_articles(article_list):\n", + " valid_articles = []\n", + " for a in article_list:\n", + " if isinstance(a, dict):\n", + " valid_articles.append(a)\n", + " elif isinstance(a, str) and \"pubmed.ncbi.nlm.nih.gov\" in a:\n", + " valid_articles.append({\n", + " \"title\": \"PubMed Link\",\n", + " \"abstract\": \"Not available\",\n", + " \"url\": a,\n", + " \"authors\": \"Unknown\"\n", + " })\n", + " else:\n", + " logging.warning(f\"Ignored: {repr(a)}\")\n", + " return valid_articles\n", + "\n", + "articles, formatted_search = await search_multi_database(\"quantum physics\")\n", + "print(formatted_search)\n", + "\n", + "\n", + "# === Async Task Protection Wrapper ===\n", + "# Handles timeouts and errors during asynchronous function execution.\n", + "def protect_async_task(func):\n", + " async def wrapper(*args, **kwargs):\n", + " try:\n", + " return await asyncio.wait_for(func(*args, **kwargs), timeout=20)\n", + " except asyncio.CancelledError:\n", + " logging.warning(\"Task cancelled.\")\n", + " return None\n", + " except Exception as e:\n", + " logging.error(f\"Error during execution of {func.__name__}: {e}\")\n", + " return None\n", + " return wrapper\n", + "\n", + "# === Asynchronous Scientific Explanation Generation ===\n", + "# Builds the prompt and invokes the LLM model.\n", + "async def generate_explanation_async(problem, level, concept, topic):\n", + " \"\"\"Generates the explanation using the LLM asynchronously.\"\"\"\n", + " prompt = prompt_template.format(\n", + " problem=problem,\n", + " concept=concept,\n", + " topic=topic,\n", + " level=level\n", + " )\n", + " try:\n", + " response = await asyncio.to_thread(llm.invoke, prompt.strip())\n", + " return response\n", + " except Exception as e:\n", + " logging.error(f\"LLM API error: {e}\")\n", + " return \"Error generating the response.\"\n", + "\n", + "# === Conditional Interactive Chart Generation ===\n", + "# Generates a chart based on the analyzed problem if requested.\n", + "def generate_conditional_chart(problem, chart_choice):\n", + " \"\"\"Generates an interactive chart if requested.\"\"\"\n", + " fig = None\n", + " if chart_choice.lower() in [\"yes\", \"y\"]:\n", + " try:\n", + " fig = generate_interactive_chart(problem)\n", + " if fig is None:\n", + " raise ValueError(\"Chart not generated correctly.\")\n", + " print(\"Chart generated successfully!\")\n", + " except Exception as e:\n", + " logging.error(f\"Chart error: {e}\")\n", + " return fig\n", + "\n", + "# === Structured Output: Text + Chart ===\n", + "# Combines the generated explanation with the graphical visualization.\n", + "async def generate_complete_result(problem, level, concept, topic, chart_choice):\n", + " \"\"\"Combines explanation and chart to generate a structured output.\"\"\"\n", + " response = await generate_explanation_async(problem, level, concept, topic)\n", + " chart = generate_conditional_chart(problem, chart_choice)\n", + " return {\n", + " \"response\": response,\n", + " \"chart\": chart\n", + " }\n", + "\n", + "\n", + "# === Scientific Article Validation ===\n", + "# Checks that each article has a title, abstract, and URL.\n", + "def validate_articles(raw_articles, max_articles=5):\n", + " \"\"\"\n", + " Validates and filters the list of articles received from an AI or API source.\n", + " Returns a clean list of dictionaries containing at least 'title', 'abstract', and 'url'.\n", + " \"\"\"\n", + " if not isinstance(raw_articles, list):\n", + " logging.warning(f\"[validate_articles] Invalid input: expected list, received {type(raw_articles)}\")\n", + " return []\n", + "\n", + " valid_articles = []\n", + " for i, art in enumerate(raw_articles):\n", + " if not isinstance(art, dict):\n", + " logging.warning(f\"[validate_articles] Invalid element at position {i}: {type(art)}\")\n", + " continue\n", + "\n", + " title = art.get(\"title\")\n", + " abstract = art.get(\"abstract\")\n", + " url = art.get(\"url\")\n", + "\n", + " if all([title, abstract, url]):\n", + " valid_articles.append({\n", + " \"title\": str(title).strip(),\n", + " \"abstract\": str(abstract).strip(),\n", + " \"url\": str(url).strip()\n", + " })\n", + " else:\n", + " logging.info(f\"[validate_articles] Article discarded due to incomplete data (i={i}).\")\n", + "\n", + " if not valid_articles:\n", + " logging.warning(\"[validate_articles] No valid articles after filtering.\")\n", + "\n", + " return valid_articles[:max_articles]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PIR1Eq1aFRCj" + }, + "source": [ + "### Support Functions (Utils) for Scientific Analysis\n", + "\n", + "This module collects general-purpose utilities that enhance the entire AI-driven review and extraction system:\n", + "\n", + "- `valida_struttura_ai()`: checks that the data structure from LLM/API is complete (title, abstract, URL) \n", + "- `sigmoid()` and `valuta_score()`: compute semantic coherence from numerical outputs \n", + "- `estrai_testo()`: extracts text from various file formats (.pdf, .docx, .txt, .csv, .xlsx...) \n", + "- `estrai_testo_ai()`: safely parses content generated by LLMs \n", + "- `estrai_didascalie_da_testo()` and `estrai_immagini_con_didascalie()`: parse captions and images from scientific documents \n", + "- `genera_nota()`: interprets score labels (high, medium, low coherence) \n", + "- `genera_risposta()`: generates simulated NLP responses with adjustable temperature\n", + "\n", + "These functions make the system robust, reusable, and modular — ideal for integration into automated review pipelines, semantic search workflows, or visual parsing of academic articles.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IT1pkplZiGPP" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Evaluate the structure of the AI response from the LLM\n", + "def validate_ai_structure(response, expected_fields=(\"title\", \"abstract\", \"url\")):\n", + " if not isinstance(response, list):\n", + " return []\n", + " valid_items = []\n", + " for item in response:\n", + " if isinstance(item, dict) and all(k in item for k in expected_fields):\n", + " valid_items.append(item)\n", + " return valid_items\n", + "\n", + "import math\n", + "\n", + "# Compute semantic score of the response\n", + "def sigmoid(x):\n", + " return 1 / (1 + math.exp(-x))\n", + "\n", + "def evaluate_score(model_output):\n", + " try:\n", + " score = float(model_output[0])\n", + " return round(sigmoid(score), 3)\n", + " except:\n", + " return 0.0\n", + "\n", + "# Extract text from selected file\n", + "def extract_text(file_name, max_chars=5000):\n", + " \"\"\"\n", + " Extracts text from supported formats (.pdf, .docx, .tsv, .csv).\n", + " Returns only the first max_chars characters.\n", + " \"\"\"\n", + " extension = file_name.lower().split(\".\")[-1]\n", + "\n", + " try:\n", + " if extension == \"pdf\":\n", + " with pdfplumber.open(file_name) as pdf:\n", + " text = \"\\n\".join([p.extract_text() or \"\" for p in pdf.pages]).strip()\n", + "\n", + " elif extension == \"docx\":\n", + " doc = Document(file_name)\n", + " text = \"\\n\".join([p.text for p in doc.paragraphs]).strip()\n", + "\n", + " elif extension in [\"csv\", \"tsv\"]:\n", + " sep = \",\" if extension == \"csv\" else \"\\t\"\n", + " df = pd.read_csv(file_name, sep=sep)\n", + " text = df.to_string(index=False)\n", + "\n", + " else:\n", + " raise ValueError(f\"Unsupported format: .{extension}\")\n", + "\n", + " return text[:max_chars] if text else \"No text extracted.\"\n", + "\n", + " except Exception as e:\n", + " return f\"Error during text extraction: {e}\"\n", + "\n", + "# Safely extract textual content from an AIMessage\n", + "def extract_text_from_ai(obj):\n", + " \"\"\" Safely extracts textual content from an AIMessage object. \"\"\"\n", + " return getattr(obj, \"content\", str(obj)).strip()\n", + "\n", + "# Extract figure captions from text\n", + "def extract_captions_from_text(text):\n", + " pattern = r\"(Figure|Fig\\.?)\\s*\\d+[:\\.\\-–]?\\s*[^\\n]+\"\n", + " return re.findall(pattern, text, re.IGNORECASE)\n", + "\n", + "# Extract images and captions from a file\n", + "def extract_images_with_captions(file_path, output_folder=\"extracted_figures\"):\n", + " os.makedirs(output_folder, exist_ok=True)\n", + " extension = file_path.lower().split(\".\")[-1]\n", + " images = []\n", + " captions = []\n", + "\n", + " try:\n", + " if extension == \"pdf\":\n", + " doc = fitz.open(file_path)\n", + " full_text = \"\\n\".join([p.get_text(\"text\") for p in doc])\n", + " extracted_captions = extract_captions_from_text(full_text)\n", + " count = 0\n", + "\n", + " for i, page in enumerate(doc):\n", + " for j, img in enumerate(page.get_images(full=True)):\n", + " base = doc.extract_image(img[0])\n", + " ext = base[\"ext\"]\n", + " path = f\"{output_folder}/page{i+1}_img{j+1}.{ext}\"\n", + " with open(path, \"wb\") as f:\n", + " f.write(base[\"image\"])\n", + " images.append(path)\n", + " captions.append(extracted_captions[count] if count < len(extracted_captions) else f\"Figure {i+1}.{j+1}\")\n", + " count += 1\n", + "\n", + " elif extension == \"docx\":\n", + " doc = Document(file_path)\n", + " text = \"\\n\".join([p.text for p in doc.paragraphs])\n", + " extracted_captions = extract_captions_from_text(text)\n", + " count = 0\n", + "\n", + " for i, rel in enumerate(doc.part._rels):\n", + " relation = doc.part._rels[rel]\n", + " if \"image\" in relation.target_ref:\n", + " img_data = relation.target_part.blob\n", + " name = f\"{output_folder}/docx_image_{i+1}.png\"\n", + " with open(name, \"wb\") as f:\n", + " f.write(img_data)\n", + " images.append(name)\n", + " captions.append(extracted_captions[count] if count < len(extracted_captions) else f\"Figure {i+1}\")\n", + " count += 1\n", + "\n", + " else:\n", + " print(f\"Unsupported extension: .{extension}\")\n", + "\n", + " print(f\"{len(images)} image(s) extracted.\")\n", + " return images, captions\n", + "\n", + " except Exception as e:\n", + " print(f\"Error extracting images: {e}\")\n", + " return [], []\n", + "\n", + "# Generate semantic coherence note based on score\n", + "def generate_note(score):\n", + " if score > 0.85:\n", + " return \"High semantic coherence. The response is likely solid and relevant.\"\n", + " elif score > 0.6:\n", + " return \"Moderate coherence. The response is understandable but may contain approximations.\"\n", + " else:\n", + " return \"Low coherence. It may be helpful to rephrase the question or provide more context.\"\n", + "\n", + "# Simulate LLM response generation\n", + "def generate_response(question, temperature=0.7):\n", + " if \"Rephrase\" in question:\n", + " return \"How does enthalpy change during a phase transition?\"\n", + " return f\"[Simulated response at temperature {temperature} for: {question}]\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ampAtA_hpcr2" + }, + "source": [ + "### Metacognitive Reasoning and Intentional Choices\n", + "\n", + "In this cell, AI agents perform explicit reasoning about their actions, generating:\n", + "\n", + "- **Contextual responses** based on specific objectives \n", + "- **Autonomous explanations** for decisions made between alternatives \n", + "- **Intentional choice logs**, including timestamp, expected impact, and rationale\n", + "\n", + "This architecture makes the AI more transparent, controllable, and ethically accountable.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EojNz89upQt2" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# This function simulates an intentional decision-making process by the AI agent.\n", + "# It analyzes the proposed action in relation to the goal, available alternatives, and context.\n", + "# Metacognition functions that adapt to the system\n", + "def execute_intentional_choice(action, goal, alternatives, context):\n", + " ai_explanation = choice_with_intention(action, goal, alternatives, context)\n", + " explanation_content = getattr(ai_explanation, \"content\", str(ai_explanation)).strip()\n", + "\n", + " intentional_log.append({\n", + " \"action\": action,\n", + " \"reason\": explanation_content,\n", + " \"impact\": f\"Expected outcome for goal: {goal}\",\n", + " \"timestamp\": datetime.datetime.utcnow().isoformat()\n", + " })\n", + "\n", + " return explanation_content\n", + "\n", + "# Generates a response with intentionality by combining reasoning, AI response, and extracted text\n", + "def generate_response_with_intention(prompt, action, goal, alternatives, context):\n", + " reasoning = execute_intentional_choice(action, goal, alternatives, context)\n", + " ai_response = llm.invoke(prompt)\n", + " response_text = getattr(ai_response, \"content\", str(ai_response)).strip()\n", + "\n", + " return f\"{response_text}\\n\\n*Agent's intentional explanation:*\\n{reasoning}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pAustLbVCE9L" + }, + "source": [ + "### Agent Metacognition – Self-Analysis and Semantic Memory\n", + "\n", + "This cell enables metacognitive behavior for the AI system:\n", + "\n", + "- **Self-assessment** of semantic coherence in its responses \n", + "- **Iterative improvement** through feedback and reformulation \n", + "- **Persistent metacognitive memory**, stored as FAISS embeddings \n", + "- **Reflection on reasoning** and internal motivations \n", + "- **Simulation of scientific creativity**, with comparison on epistemic novelty\n", + "\n", + "Metacognition allows the system to learn from itself, improve response quality, and build a reusable reasoning foundation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CyqXYYWAiPNj" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# === metacognitive_cycle ===\n", + "# Executes an iterative cycle of evaluation and improvement of the generated response.\n", + "# Combines qualitative feedback and semantic coherence score to decide whether to reformulate.\n", + "# Useful for simulating reflective and adaptive behavior.\n", + "\n", + "def generate_objective_from_input(user_input):\n", + " \"\"\"\n", + " Generates a high-level operational objective based on the user's input.\n", + " Useful for AGI-style planning and decision-making.\n", + " \"\"\"\n", + " prompt = f\"\"\"\n", + " You are an autonomous scientific agent. Based on the following input:\n", + " \"{user_input}\"\n", + "\n", + " Define a clear and actionable objective that guides the agent's next steps.\n", + " \"\"\"\n", + " try:\n", + " response = llm.invoke(prompt.strip())\n", + " return getattr(response, \"content\", str(response)).strip()\n", + " except Exception as e:\n", + " logging.error(f\"Error generating objective: {e}\")\n", + " return \"Objective generation failed.\"\n", + "\n", + "\n", + "def metacognitive_cycle(question, level, max_iter=2):\n", + " response = llm.invoke(question)\n", + " response_text = extract_text_from_ai(response)\n", + "\n", + " for i in range(max_iter):\n", + " feedback = auto_feedback_response(question, response_text, level)\n", + " score = evaluate_coherence(question, response_text)\n", + "\n", + " print(f\"\\nIteration {i+1} – Coherence: {score:.3f}\")\n", + " print(\"Feedback:\", extract_text_from_ai(feedback))\n", + "\n", + " if score < 0.7:\n", + " response_text = extract_text_from_ai(improve_response(question, response_text, level))\n", + " else:\n", + " break\n", + "\n", + " return response_text\n", + "\n", + "# Evaluate response with self-assessment and interactive improvement\n", + "# Evaluates the response and reformulates it if poorly constructed\n", + "def evaluate_responses_with_ai(question, generate_response_fn, n_variants=3, reformulation_threshold=0.6):\n", + " temperature_values = [0.7, 0.4, 0.9][:n_variants]\n", + " responses = [generate_response_fn(question, temperature=t) for t in temperature_values]\n", + "\n", + " scores = [evaluate_coherence(question, r) for r in responses]\n", + " idx = scores.index(max(scores))\n", + " confidence = scores[idx]\n", + " best_response = responses[idx]\n", + "\n", + " if confidence < reformulation_threshold:\n", + " new_question = reformulate_question(question)\n", + " return evaluate_responses_with_ai(new_question, generate_response_fn)\n", + "\n", + " return {\n", + " \"response\": best_response,\n", + " \"confidence\": round(confidence, 3),\n", + " \"note\": generate_note(confidence)\n", + " }\n", + "\n", + "def evaluate_responses_with_ai_simple(question, response, level=\"basic\"):\n", + " \"\"\"\n", + " Evaluates the quality of the generated response relative to the question.\n", + " Returns a dictionary with:\n", + " - semantic coherence score\n", + " - reason for weakness\n", + " - suggested reformulation\n", + " - reflection on reasoning\n", + " - flag for auto-improvement\n", + " \"\"\"\n", + "\n", + " evaluation_prompt = f\"\"\"\n", + " User question: \"{question}\"\n", + " Generated response: \"{response}\"\n", + " Required level: {level}\n", + "\n", + " Evaluate the response in 5 points:\n", + " 1. Semantic coherence (0–1)\n", + " 2. Conceptual completeness\n", + " 3. Argumentative structure\n", + " 4. Adequacy to the required level\n", + " 5. Ability to stimulate new questions\n", + "\n", + " If the response is weak:\n", + " - Explain the reason\n", + " - Suggest a reformulation\n", + " - Reflect on how the system reasoned\n", + "\n", + " Return everything in structured format.\n", + " \"\"\"\n", + "\n", + " try:\n", + " ai_evaluation = llm.invoke(evaluation_prompt)\n", + " raw_output = getattr(ai_evaluation, \"content\", str(ai_evaluation))\n", + " except Exception as e:\n", + " print(\"Evaluation error:\", e)\n", + " return {\n", + " \"semantic_score\": 0.0,\n", + " \"weakness_reason\": \"System error\",\n", + " \"new_formulation\": None,\n", + " \"self_reflection\": None,\n", + " \"requires_improvement\": True\n", + " }\n", + "\n", + " # Simplified parsing functions (can be enhanced with regex or LLM)\n", + " def extract_score(text):\n", + " match = re.search(r\"Semantic coherence\\s*[:\\-]?\\s*(0\\.\\d+)\", text)\n", + " return float(match.group(1)) if match else 0.0\n", + "\n", + " def extract_reason(text):\n", + " match = re.search(r\"Reason\\s*[:\\-]?\\s*(.+)\", text)\n", + " return match.group(1).strip() if match else \"Reason not found.\"\n", + "\n", + " def extract_reformulation(text):\n", + " match = re.search(r\"Reformulation\\s*[:\\-]?\\s*(.+)\", text)\n", + " return match.group(1).strip() if match else None\n", + "\n", + " def extract_reflection(text):\n", + " match = re.search(r\"Reflection\\s*[:\\-]?\\s*(.+)\", text)\n", + " return match.group(1).strip() if match else None\n", + "\n", + " # Actual parsing\n", + " score = extract_score(raw_output)\n", + " reason = extract_reason(raw_output)\n", + " reformulation = extract_reformulation(raw_output)\n", + " reflection = extract_reflection(raw_output)\n", + "\n", + " return {\n", + " \"response\": response,\n", + " \"semantic_score\": score,\n", + " \"weakness_reason\": reason,\n", + " \"new_formulation\": reformulation,\n", + " \"self_reflection\": reflection,\n", + " \"requires_improvement\": score < 0.7\n", + " }\n", + "\n", + "def generate_metacognitive_content(question, response, evaluation):\n", + " return f\"\"\"\n", + " [Question] {question}\n", + " [Response] {response}\n", + " [Coherence Score] {evaluation['semantic_score']}\n", + " [Weakness Reason] {evaluation['weakness_reason']}\n", + " [Suggested Reformulation] {evaluation['new_formulation']}\n", + " [Cognitive Reflection] {evaluation['self_reflection']}\n", + " [Needs Improvement] {evaluation['requires_improvement']}\n", + " \"\"\".strip()\n", + "\n", + "def add_metacognitive_memory(question, response):\n", + " # Cognitive evaluation of the response\n", + " evaluation = evaluate_responses_with_ai(question, response)\n", + "\n", + " # Generate textual content with all metacognitive data\n", + " textual_content = generate_metacognitive_content(question, response, evaluation)\n", + "\n", + " # Generate semantic embedding from the full content\n", + " embedding = embedding_model.encode([textual_content])\n", + "\n", + " # Add to FAISS index\n", + " index.add(np.array(embedding, dtype=np.float32))\n", + "\n", + " # Save updated index\n", + " with open(INDEX_FILE, \"wb\") as f:\n", + " pickle.dump(index, f)\n", + "\n", + " print(\"Metacognitive memory updated!\")\n", + "\n", + "def search_similar_reasoning(query, top_k=5):\n", + " \"\"\"\n", + " Searches the FAISS metacognitive memory for reasoning most similar to the input query.\n", + " Returns a list of the most relevant textual contents.\n", + " \"\"\"\n", + " # Encode the query\n", + " query_vector = embedding_model.encode([query])\n", + "\n", + " # Search for top-K nearest\n", + " distances, indices = index.search(np.array(query_vector, dtype=np.float32), top_k)\n", + "\n", + " results = []\n", + " for idx in indices[0]:\n", + " try:\n", + " with open(\"meta_diary.json\", \"r\", encoding=\"utf-8\") as f:\n", + " archive = json.load(f)\n", + " content = archive.get(str(idx))\n", + " if content:\n", + " results.append(content)\n", + " except Exception as e:\n", + " print(f\"Memory retrieval error: {e}\")\n", + "\n", + " return results\n", + "\n", + "def add_metacognition_to_response(response, evaluation):\n", + " reflection = evaluation.get(\"self_reflection\", \"\")\n", + " note = evaluation.get(\"weakness_reason\", \"\")\n", + " return f\"{response.strip()}\\n\\n*Metacognitive note:* {note}\\n*Agent's reflection:* {reflection}\"\n", + "\n", + "def auto_feedback(question, response, level):\n", + " return f\"\"\"Analyze the response in relation to the question: \"{question}\".\n", + "Evaluate the content according to the level '{level}' and suggest improvements.\n", + "\"\"\"\n", + "\n", + "# === Full flow example ===\n", + "async def scientific_creativity_flow(concept, subject, language=\"en\", level=\"advanced\"):\n", + " creative_hypothesis = simulate_scientific_creativity(concept, subject, language=language, level=level)\n", + " articles, _ = await search_multi_database(concept) # Retrieve existing scientific sources\n", + " novelty_evaluation = evaluate_hypothesis_novelty(creative_hypothesis, articles)\n", + "\n", + " return {\n", + " \"hypothesis\": creative_hypothesis,\n", + " \"novelty\": novelty_evaluation\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZKlL5HxUCHnk" + }, + "source": [ + "### Multilingual Module – Automatic Translation of Scientific Documents\n", + "\n", + "This cell enables automatic translation from PDF and DOCX files into multiple languages, while preserving the scientific integrity of the content:\n", + "\n", + "- **Automatic language detection** using `langdetect` \n", + "- **Neural translation** via `Helsinki-NLP` models (`transformers` from HuggingFace) \n", + "- **Support for PDF, DOCX, CSV, TSV**, with text extraction and saving of the translated file \n", + "- **Intelligent caching** to avoid duplicate translations \n", + "- Supported languages: `en`, `fr`, `de`, `es`, `zh`, `ja`, `ar`, `it`\n", + "\n", + "Ideal for converting academic content and technical explanations into accessible language for international users.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Tpp0OIV5iS2k" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# === Text Translation ===\n", + "\n", + "# Caching dictionary for previously translated texts\n", + "translation_cache = {}\n", + "\n", + "\n", + "def detect_language(text):\n", + " \"\"\"Detects the language of the loaded text.\"\"\"\n", + " try:\n", + " return detect(text)\n", + " except Exception as e:\n", + " print(f\"Language detection error: {e}\")\n", + " return \"unknown\"\n", + "\n", + "def translate_text(text, source_lang, target_lang):\n", + " \"\"\" Translates the text with debug output to verify correctness. \"\"\"\n", + " translation_model = f\"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}\"\n", + "\n", + " print(f\"Using translation model: {translation_model}\")\n", + "\n", + " translator = pipeline(\"translation\", model=translation_model)\n", + "\n", + " translation = translator(text)[0]['translation_text']\n", + " print(f\"Original text: {text}\")\n", + " print(f\"Translated text: {translation}\")\n", + "\n", + " return translation\n", + "\n", + "def extract_text_pdf(file_name):\n", + " \"\"\" Extracts text from a PDF file. \"\"\"\n", + " text = \"\"\n", + " with pdfplumber.open(file_name) as pdf:\n", + " for page in pdf.pages:\n", + " text += page.extract_text() + \"\\n\"\n", + " return text.strip()\n", + "\n", + "def extract_text_docx(file_name):\n", + " \"\"\" Extracts text from a DOCX file. \"\"\"\n", + " doc = Document(file_name)\n", + " text = \"\\n\".join([paragraph.text for paragraph in doc.paragraphs])\n", + " return text.strip()\n", + "\n", + "def save_docx(text, output_file_name):\n", + " \"\"\" Saves translated text into a DOCX document. \"\"\"\n", + " doc = Document()\n", + " doc.add_paragraph(text)\n", + " doc.save(output_file_name)\n", + "\n", + "def extract_text_csv(file_name):\n", + " \"\"\" Extracts textual content from a CSV file. \"\"\"\n", + " df = pd.read_csv(file_name)\n", + " text = df.astype(str).apply(lambda x: ' '.join(x), axis=1).str.cat(sep='\\n')\n", + " return text.strip()\n", + "\n", + "def extract_text_tsv(file_name):\n", + " \"\"\" Extracts textual content from a TSV file. \"\"\"\n", + " df = pd.read_csv(file_name, sep='\\t')\n", + " text = df.astype(str).apply(lambda x: ' '.join(x), axis=1).str.cat(sep='\\n')\n", + " return text.strip()\n", + "\n", + "def handle_file(file_name):\n", + " \"\"\" Loads the file, detects its language, and lets the user choose a target language for translation. \"\"\"\n", + " extension = file_name.split('.')[-1].lower()\n", + "\n", + " if extension == \"pdf\":\n", + " text = extract_text_pdf(file_name)\n", + " elif extension == \"docx\":\n", + " text = extract_text_docx(file_name)\n", + " elif extension == \"csv\":\n", + " text = extract_text_csv(file_name)\n", + " elif extension == \"tsv\":\n", + " text = extract_text_tsv(file_name)\n", + " else:\n", + " return \"Unsupported format! Use PDF, DOCX, CSV, or TSV.\"\n", + "\n", + " original_language = detect_language(text)\n", + " print(f\"The file was detected in **{original_language}**.\")\n", + "\n", + " # List of available languages\n", + " available_languages = [\"en\", \"fr\", \"de\", \"es\", \"zh\", \"ja\", \"ar\", \"it\"]\n", + "\n", + " # Ask the user for the target language\n", + " print(f\"Available languages for translation: {', '.join(available_languages)}\")\n", + " target_language = input(\"Which language do you want the explanation in? (e.g., 'en' for English, 'fr' for French): \").strip()\n", + "\n", + " if target_language not in available_languages:\n", + " print(\"Error: Unsupported language!\")\n", + " else:\n", + " print(f\"The explanation will be translated into {target_language}.\")\n", + "\n", + " # Ensure translation is performed\n", + " translated_text = translate_text(text, original_language, target_language)\n", + "\n", + " # Save the translated file\n", + " translated_file_name = f\"translated_{target_language}_{file_name}\"\n", + " if extension == \"pdf\":\n", + " with open(translated_file_name, \"w\", encoding=\"utf-8\") as f:\n", + " f.write(translated_text)\n", + " elif extension == \"docx\":\n", + " save_docx(translated_text, translated_file_name)\n", + "\n", + " return f\"Translation completed! Download the file: {translated_file_name}\"\n", + "\n", + "# Initialize the dictionary to store journals\n", + "journal_store = {}\n", + "\n", + "def save_multilingual_journal(journal_text, journal_id, target_language):\n", + " source_language = detect_language(journal_text)\n", + "\n", + " if source_language != target_language:\n", + " translated_text = translate_long_text(journal_text, source_lang=source_language, target_lang=target_language)\n", + " else:\n", + " translated_text = journal_text\n", + "\n", + " journal_store[journal_id] = {\n", + " \"original\": journal_text,\n", + " target_language: translated_text\n", + " }\n", + "\n", + " embedding = safe_encode(translated_text)\n", + " index.add(np.array(embedding, dtype=np.float32))\n", + "\n", + "\n", + "\n", + "def translate_long_text(text, source_lang=\"it\", target_lang=\"en\", max_chars=400):\n", + " translation_model = f\"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}\"\n", + " translator = pipeline(\"translation\", model=translation_model)\n", + "\n", + " blocks = [text[i:i+max_chars] for i in range(0, len(text), max_chars)]\n", + " translated = []\n", + "\n", + " for block in blocks:\n", + " try:\n", + " output = translator(block)[0]['translation_text']\n", + " translated.append(output)\n", + " except Exception as e:\n", + " print(f\"Error translating block: {e}\")\n", + " translated.append(\"[Translation error]\")\n", + "\n", + " return \"\\n\".join(translated)\n", + "\n", + "def search_similar_journals(query, target_language, top_k=3):\n", + " query_language = detect_language(query)\n", + "\n", + " if query_language != target_language:\n", + " translated_query = translate_long_text(query, source_lang=query_language, target_lang=target_language)\n", + " else:\n", + " translated_query = query\n", + "\n", + " query_emb = safe_encode(translated_query)\n", + " query_emb = np.array(query_emb, dtype=np.float32)\n", + "\n", + " if hasattr(index, \"is_trained\") and not index.is_trained:\n", + " print(\"FAISS index is not trained.\")\n", + " return []\n", + "\n", + " D, I = index.search(query_emb, top_k)\n", + " results = []\n", + " for i in I[0]:\n", + " journal = journal_store.get(i, {})\n", + " results.append(journal.get(target_language, \"\"))\n", + " return results\n", + "\n", + "# === Valid Input Function ===\n", + "def get_valid_input(message, valid_options=None):\n", + " while True:\n", + " value = input(message).strip().lower()\n", + " if not value:\n", + " print(\"Error! Please enter a valid value.\")\n", + " elif valid_options and value not in valid_options:\n", + " print(f\"Error! You must choose from: {', '.join(valid_options)}\")\n", + " else:\n", + " return value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pqJbK6Omxth-" + }, + "source": [ + "### Academic Ranking and Scientific Originality\n", + "\n", + "This cell enables:\n", + "\n", + "- Calculation of the Impact Score using a RandomForest model \n", + "- Validation of scientific hypotheses to assess originality \n", + "- Automated checks on citations and methodology \n", + "- Intelligent synthesis of scientific sources\n", + "\n", + "The system analyzes papers and ideas using epistemic criteria, offering both quantitative and qualitative insights into impact and novelty.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Lt5v2lnux9JK", + "outputId": "65628b0e-2cb9-4144-b175-1a2363fdae2c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Estimated score: 83.95\n", + "Estimated score: 83.65\n" + ] + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Sample data for ranking\n", + "data = np.array([\n", + " [120, 45, 1, 2023], # Citations, h-index, peer review, year\n", + " [50, 30, 1, 2020],\n", + " [10, 15, 0, 2018]\n", + "])\n", + "\n", + "labels = [95, 70, 30] # Academic impact score\n", + "\n", + "# Model training\n", + "ranking_model = RandomForestRegressor(n_estimators=100)\n", + "ranking_model.fit(data, labels)\n", + "\n", + "# **Ranking prediction**\n", + "def calculate_impact_score(citations, h_index, peer_review, publication_year):\n", + " paper_data = np.array([[citations, h_index, peer_review, publication_year]])\n", + " score = ranking_model.predict(paper_data)\n", + " return max(0, score[0]) # Ensure non-negative\n", + "\n", + "# Usage example\n", + "impact_score = calculate_impact_score(80, 40, 1, 2024)\n", + "print(f\"Estimated score: {impact_score}\")\n", + "\n", + "# Ranking model\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "# Sample data for ranking\n", + "data = np.array([\n", + " [120, 45, 1, 2023], # Citations, h-index, peer review, year\n", + " [50, 30, 1, 2020],\n", + " [10, 15, 0, 2018]\n", + "])\n", + "\n", + "labels = [95, 70, 30] # Academic impact score\n", + "\n", + "# Model training\n", + "ranking_model = RandomForestRegressor(n_estimators=100)\n", + "ranking_model.fit(data, labels)\n", + "\n", + "# Ranking prediction\n", + "new_paper = np.array([[80, 40, 1, 2024]])\n", + "score = ranking_model.predict(new_paper)\n", + "print(f\"Estimated score: {score[0]}\")\n", + "\n", + "# === Scientific originality evaluation ===\n", + "def evaluate_hypothesis_novelty(hypothesis, existing_articles, threshold=0.7):\n", + " \"\"\"\n", + " Compares the hypothesis with existing articles using semantic embeddings.\n", + " Returns:\n", + " - average similarity score\n", + " - similar articles\n", + " - qualitative assessment of originality\n", + " \"\"\"\n", + " try:\n", + " emb_hypothesis = model_embedding.encode([hypothesis])\n", + " emb_articles = model_embedding.encode([a[\"abstract\"] for a in existing_articles if \"abstract\" in a])\n", + "\n", + " similarity = np.dot(emb_hypothesis, emb_articles.T) / (\n", + " np.linalg.norm(emb_hypothesis) * np.linalg.norm(emb_articles, axis=1)\n", + " )\n", + " average = round(float(np.mean(similarity)), 3)\n", + "\n", + " similar_articles = [\n", + " existing_articles[i][\"title\"]\n", + " for i, score in enumerate(similarity[0]) if score > threshold\n", + " ]\n", + "\n", + " if average < 0.4:\n", + " assessment = \"High originality: hypothesis is rarely present in the literature.\"\n", + " elif average < 0.7:\n", + " assessment = \"Moderate originality: related concepts exist.\"\n", + " else:\n", + " assessment = \"Low originality: hypothesis is already widely discussed.\"\n", + "\n", + " return {\n", + " \"novelty_score\": average,\n", + " \"similar_articles\": similar_articles,\n", + " \"assessment\": assessment\n", + " }\n", + "\n", + " except Exception as e:\n", + " logging.error(f\"[evaluate_novelty] Error during originality evaluation: {e}\")\n", + " return {\n", + " \"novelty_score\": 0.0,\n", + " \"similar_articles\": [],\n", + " \"assessment\": \"Error during originality evaluation.\"\n", + " }\n", + "\n", + "# Automated paper review with AI\n", + "async def review_paper(paper_text):\n", + " \"\"\" Checks the methodology and citation quality of a paper. \"\"\"\n", + " methodology = await verify_methodology(paper_text)\n", + " citations = await verify_citations(paper_text)\n", + " return {\"methodology\": methodology, \"citations\": citations}\n", + "\n", + "async def validate_hypothesis(hypothesis):\n", + " sources = await search_multi_database(hypothesis)\n", + " score = calculate_impact_score(sources) # Based on citations, year, h-index, etc.\n", + " summary = summarize_evidence(sources)\n", + " return score, summary\n", + "\n", + "def summarize_evidence(sources):\n", + " return \"\\n\".join([\n", + " f\"- {a['title'][:80]}…\" for a in sources if isinstance(a, dict) and 'title' in a\n", + " ]) if sources else \"No evidence found.\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zxIIMXv0yNdY" + }, + "source": [ + "### Hypothesis Validation and Scientific Reporting\n", + "\n", + "This cell enables the following:\n", + "\n", + "- Evaluation of the **novelty** of a scientific hypothesis by comparing it with existing articles (via semantic embeddings) \n", + "- Generation of an **Impact Score** based on citations, h-index, peer review status, and publication year \n", + "- Extraction and synthesis of evidence from multiple databases (arXiv, PubMed, Zenodo, OpenAlex) \n", + "- Creation of a **Markdown report** including:\n", + " - Title and description of the analysis \n", + " - List of articles with abstracts and links \n", + " - Related images and captions (if available)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "abSK5HPGySGh" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Generate an automatic report\n", + "def generate_markdown_report(\n", + " title=\"Automatic Report\",\n", + " description=\"Automatically generated scientific summary.\",\n", + " articles=None,\n", + " images=None,\n", + " captions=None,\n", + " filename=\"report.md\"\n", + "):\n", + " \"\"\"\n", + " Generates a Markdown file with:\n", + " - Title and description\n", + " - Scientific articles with abstract and link\n", + " - Images and associated captions (if available)\n", + "\n", + " All arguments are optional. A coherent structure is created regardless.\n", + " \"\"\"\n", + "\n", + " # Safe fallback for each parameter\n", + " articles = articles if isinstance(articles, list) else []\n", + " images = images if isinstance(images, list) else []\n", + " captions = captions if isinstance(captions, list) else []\n", + "\n", + " try:\n", + " with open(filename, \"w\", encoding=\"utf-8\") as f:\n", + " f.write(f\"# {title}\\n\\n\")\n", + " f.write(f\"{description}\\n\\n\")\n", + "\n", + " f.write(\"## Scientific Articles\\n\\n\")\n", + " if articles:\n", + " for i, art in enumerate(articles[:5]):\n", + " article_title = art.get(\"titolo\", f\"Article {i+1}\")\n", + " abstract = art.get(\"abstract\", \"Abstract not available.\")\n", + " url = art.get(\"url\", \"#\")\n", + " f.write(f\"**{i+1}. {article_title}**\\n\")\n", + " f.write(f\"{abstract}\\n\\n[Link to article]({url})\\n\\n\")\n", + " else:\n", + " f.write(\"No articles available.\\n\\n\")\n", + "\n", + " if images:\n", + " f.write(\"## Figures\\n\\n\")\n", + " for i, img_path in enumerate(images):\n", + " caption = captions[i] if i < len(captions) else f\"Figure {i+1}\"\n", + " f.write(f\"\\n\\n*{caption}*\\n\\n\")\n", + "\n", + " print(f\"Markdown report successfully generated: {filename}\")\n", + " except Exception as e:\n", + " print(f\"Error during report generation: {e}\")\n", + "\n", + "# === Markdown report generation ===\n", + "def generate_markdown_report(title, description, articles, filename=\"report.md\"):\n", + " if not isinstance(articles, list):\n", + " logging.error(f\"[generate_markdown_report] 'articles' is not a valid list: {type(articles)}\")\n", + " print(\"Error: unable to generate report. Invalid article format.\")\n", + " return\n", + "\n", + " with open(filename, \"w\", encoding=\"utf-8\") as f:\n", + " f.write(f\"# {title}\\n\\n{description}\\n\\n## Scientific Articles\\n\\n\")\n", + " for i, art in enumerate(articles[:5]):\n", + " if isinstance(art, dict) and all(k in art for k in [\"titolo\", \"abstract\", \"url\"]):\n", + " f.write(f\"**{i+1}. {art['titolo']}**\\n{art['abstract']} ([Link]({art['url']}))\\n\\n\")\n", + " else:\n", + " f.write(f\"**{i+1}. Article data not available or incomplete.**\\n\\n\")\n", + " print(f\"Markdown report generated: {filename}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MCDkYEkIyX_l" + }, + "source": [ + "### Impact Score & Semantic Evaluation\n", + "\n", + "This cell calculates the coherence and reliability of generated responses using:\n", + "\n", + "- A CrossEncoder model (DeBERTa) for semantic analysis \n", + "- An Impact Score formula based on academic data \n", + "- Relevance verification between the question and the generated content \n", + "- Automatic restructuring if the score is low\n", + "\n", + "The system adopts an iterative logic that mirrors scientific peer-review criteria, enhancing the quality and relevance of the generated responses.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 273, + "referenced_widgets": [ + "6510563bb2114092bbf18b0fc34bf54f", + "f2c345ebc0144402a71ccc17e5d23bda", + "5753d9e7b9a54c249f4f3d0ab95562dd", + "742c66e551cc4d87bead99523d88d12d", + "9d734bea3cb04420a8f0a66e2eafa32e", + "59544cd4ca4d46d0a2a0728709d97afd", + "d57e8798c8164b2aae610c1c56534df4", + "4141996d1822476893a82cd66c2e72f7", + "5efd87b809564dd9bdf8e0178f7ca326", + "8ffa9875f0b541e794733d0178fd3029", + "7ac93402fa654686ae8f6ecb859702a4", + "f313d85473ae4325a9d7365e9ce9027c", + "46a9d660e31e47a9b550e582ba28e3e5", + "65d5ef4442994405901a1363a6db8d05", + "5431464b50c247419081a3c08042ee69", + "11aed04b9ed543a2a409a1ef24c9c06b", + "7a520e6d680e421f8b87b9980420f9a7", + "2ec3fea9bbba4ae9a8256dd8b0ddf506", + "feb7fed826c0461990e6fe80c9bb1fed", + "a7c363e5e2b64680b7c578640194c7c3", + "d62cbacd873f4d94987d37e6a135997b", + "b83622cc891d458495f2156788fc9316", + "67a89d63ba9348eb980d5d66bf3a1a95", + "ad4c4a40e22b4d879293d375209c080a", + "252b6f253c1748c093c41a8004fe335d", + "73803349aa8d43a38cf14770d7725c60", + "5147187a9e964ef6bd1dce7371c79ef9", + "6cc5c84c5c6f4508aefe07fadd7eddd2", + "80bdccaa20cf44f68856b8564ab853fe", + "7331b2117940498cb8f443c26cd73567", + "7aeca8c56061457597f7d7c907f1d9c1", + "3e07e643eda6439d993830491f5ec6eb", + "cbfe6af4ecd84151b39d5aaeff1e71ca", + "47abd407570646cb9ca3dcecac479b70", + "8242cdd3143a49f3ab29659884f8df53", + "d751da7aead1440f9f02cf957a779782", + "ec4a59417d75459790af7baa34c9ad05", + "b72e781787614937b592a53e3acf345d", + "d0693c9087824e96a2bd84758fc549be", + "c3232da061744ac9b1fc9df0dd8c1d3d", + "bf03e5fda2344e35bdfdf1b018712072", + "8eb9f20eb1da49e3ade0cfde4461ecbd", + "2eb13733650c45e381402bd90550daba", + "4cc12916c79b4071965d7699caf36978", + "fc53ba22e78048b69f871e276cc486b6", + "b51a215ac959468196b5a0a4990b3acc", + "838bbb1f93e54ff3991f0441b4e181ff", + "a89aa1dc01b5491281bdafdb35b73d9f", + "de947c1d018e40728de56767932d9bd4", + "283116c0b5494a3a9cefe5ccdd548cec", + "b86a47a7ba534558ac3802fa9611afdb", + "2ef44ffac8664c118ef2542d604d45e2", + "7d976c2082b8452b979875dbd4ce5f08", + "974901addce24e21aff1cb0ec06d48ed", + "ecabbb4693024275b3a68a8126506685", + "5970760b5f4042ee81e28092f7a91629", + "03b2fd7e18234a71af2d586eba07e603", + "fc6f95f26fd64ceea131a2320c919397", + "07e26c2e689045c3a80296217e6db6e6", + "52ea88bb871743a6b31f5326b863727d", + "d46dd73b1a6344f480110576b3d961c0", + "f6cb065305c444c0932d366842d00836", + "f2e8f406bb754037bf6a6867a9e8452c", + "d918cbb6440f430ea4c303025d5e146a", + "4f7de83b382d47a5b66d6f6ff23e0819", + "53737b97a0fb466b95817aec00b48ae9", + "9dce4fb049ab4a56bdf0281feda06db4", + "92b82e40cc5c48a787fa8ee55283c516", + "975e7696cca14b86a03751f37b0fce32", + "4d44ff7bcfe340ecb1ed98a196ce08a2", + "af9ce3f2b87c4de0800a19a20bba9093", + "632498c616d54d7b946539aac5e41eb1", + "f4ce686296c84131b7b09cb448602aa0", + "e9e1b97a903b4922bbd14c72f2054c50", + "ef63d22c7992439782f7a6d7f903a227", + "1775a22874934bb993ace2bad1fa4253", + "caac00101bed4f759cbf7c053e1f89c8", + "1b3e321b21b54fcebdd5cfa02284102c", + "5d2887ec3f714a239166d50e0afedd8f", + "77b6258ddc9443ba9988ccf60d4ba1cd", + "10aec93301814f90a7ea8c2125533d2c", + "8366a41e3be6410faa6231aa5fd23ad5", + "4f34612865ac44c8b9c48b530705d761", + "7fd2f21a988242188de47a7d20d8f116", + "dd2104df1d6f4f818e9526aba58e0ac5", + "af45ce4ffc2c483084744c9e1d12b744", + "42a8a8ead324410a83f9e39f4e508cd6", + "1fe78fd2f70c431781a0853e8614df0c" + ] + }, + "id": "1n2p_85Uic8r", + "outputId": "285fb3ca-f37c-4071-bc77-12be94ca8445" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6510563bb2114092bbf18b0fc34bf54f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0%| | 0.00/975 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f313d85473ae4325a9d7365e9ce9027c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/557M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "67a89d63ba9348eb980d5d66bf3a1a95", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer_config.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "47abd407570646cb9ca3dcecac479b70", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "vocab.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fc53ba22e78048b69f871e276cc486b6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "merges.txt: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5970760b5f4042ee81e28092f7a91629", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer.json: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9dce4fb049ab4a56bdf0281feda06db4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "special_tokens_map.json: 0%| | 0.00/778 [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1b3e321b21b54fcebdd5cfa02284102c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# Load the model only once\n", + "cross_encoder = CrossEncoder(\"cross-encoder/nli-deberta-base\")\n", + "\n", + "def evaluate_coherence(question, answer):\n", + " score = cross_encoder.predict([(question, answer)])\n", + " try:\n", + " logit = float(score[0]) if isinstance(score[0], (int, float, np.floating)) else float(score[0][0])\n", + " probability = 1 / (1 + math.exp(-logit)) # Sigmoid function\n", + " return round(probability, 3)\n", + " except Exception:\n", + " return 0.0\n", + "\n", + "# === Scientific reliability score calculation ===\n", + "def calculate_impact_score(citations, h_index, peer_review, publication_year):\n", + " score = (citations * 0.4) + (h_index * 0.3) + (peer_review * 0.2) - (2025 - publication_year) * 0.1\n", + " return max(0, score) # Ensure non-negative\n", + "\n", + "def check_topic_relevance(user_question, extracted_text, threshold=0.7):\n", + " \"\"\"Checks whether the topic of the question is consistent with the uploaded file content.\"\"\"\n", + " emb_question = embedding_model.encode([user_question])\n", + " emb_text = embedding_model.encode([extracted_text])\n", + "\n", + " similarity = np.dot(emb_question, emb_text.T) / (np.linalg.norm(emb_question) * np.linalg.norm(emb_text))\n", + " return round(similarity, 3), similarity >= threshold\n", + "\n", + "def calculate_response_score(question, answer):\n", + " score = cross_encoder.predict([(question, answer)])\n", + " return float(score[0])\n", + "\n", + "def regenerate_if_low_score(question, answer, level, threshold=0.7, iterations=2):\n", + " evaluation = evaluate_responses_with_ai(question, answer, level)\n", + " if evaluation[\"semantic_score\"] < threshold:\n", + " new_question = reformulate_question(question)\n", + " for i in range(iterations):\n", + " new_answer = generate_response(new_question, temperature=0.7)\n", + " new_evaluation = evaluate_responses_with_ai(new_question, new_answer, level)\n", + " if new_evaluation[\"semantic_score\"] >= threshold:\n", + " return new_answer\n", + " return answer\n", + "\n", + "def select_best_version(question, answers):\n", + " scored = [(r, calculate_response_score(question, r)) for r in answers]\n", + " scored.sort(key=lambda x: x[1], reverse=True)\n", + " return scored[0] # (answer, score)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iUnA52R6CgEK" + }, + "source": [ + "### Ethical Module – Content Evaluation and Autonomy Control\n", + "\n", + "This cell activates an ethical analysis system that examines AI-generated responses for potential risks:\n", + "\n", + "- **Autonomy control and authorization level** \n", + " The function `verifica_autonomia_agente()` flags sensitive content if the user's access level is low.\n", + "\n", + "- **Ethical and linguistic risk assessment** \n", + " The module `valuta_rischio_etico()` detects:\n", + " - Implicit bias or non-inclusive language \n", + " - Risk of misinformation (lack of sources) \n", + " - Critical topics (vaccines, gender, politics) \n", + " - Manipulation or lack of neutrality\n", + "\n", + "If a risk is detected, the system suggests a reformulation to ensure inclusive, accurate, and safe language.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "weMBaHbXirXC" + }, + "outputs": [], + "source": [ + "# © 2025 Elena Marziali — Code released under Apache 2.0 license.\n", + "# See LICENSE in the repository for details.\n", + "# Removal of this copyright is prohibited.\n", + "\n", + "# This module analyzes responses to detect bias, misinformation,\n", + "# non-neutral language, or potentially harmful content.\n", + "\n", + "# The system flags problematic content and suggests revisions.\n", + "def check_agent_autonomy(question: str, authorization_level: int):\n", + " if \"sub-goal\" in question.lower() and authorization_level < 2:\n", + " logging.warning(\"Sensitive content detected, but generation will not be blocked.\")\n", + " return \"Ethics: potentially sensitive content\"\n", + " return \"Ethics: normal content\"\n", + "\n", + "# Checks the agent's degree of autonomy\n", + "# Used to monitor whether the system is acting too independently or out of context\n", + "def assess_ethical_risk(content, domain=\"scientific\"):\n", + " \"\"\"\n", + " Evaluates ethical risks in AI-generated content using a scalar scoring system.\n", + " Assigns weights to detected patterns and requests LLM-based bias scoring.\n", + " Triggers revision if overall risk exceeds threshold.\n", + " \"\"\"\n", + " text_lower = content.lower()\n", + " risk_score = 0.0\n", + " risk_sources = []\n", + "\n", + " # Pattern-based scoring\n", + " critical_terms = [\"vaccine\", \"gender\", \"politics\", \"religion\", \"ethnicity\"]\n", + " if any(term in text_lower for term in critical_terms):\n", + " risk_score += 0.8\n", + " risk_sources.append(\"Critical topic\")\n", + "\n", + " if \"all men\" in text_lower or \"women are\" in text_lower:\n", + " risk_score += 0.5\n", + " risk_sources.append(\"Linguistic bias\")\n", + "\n", + " if \"according to experts without citing sources\" in text_lower:\n", + " risk_score += 0.5\n", + " risk_sources.append(\"Misinformation\")\n", + "\n", + " cultural_patterns = [\n", + " \"asians are\", \"africans are\", \"europeans are\", \"americans are\",\n", + " \"italians are\", \"japanese are\", \"muslims are\", \"jews are\"\n", + " ]\n", + " if any(p in text_lower for p in cultural_patterns):\n", + " risk_score += 0.5\n", + " risk_sources.append(\"Cultural generalization\")\n", + "\n", + " stereotype_patterns = [\n", + " \"psychologists are\", \"engineers are\", \"artists are\", \"scientists are\",\n", + " \"philosophers are\", \"mathematicians are\"\n", + " ]\n", + " if any(p in text_lower for p in stereotype_patterns):\n", + " risk_score += 0.5\n", + " risk_sources.append(\"Disciplinary stereotype\")\n", + "\n", + " normative_patterns = [\n", + " \"should behave\", \"must act\", \"it's right that\", \"it's wrong to\",\n", + " \"they ought to\", \"is unacceptable\", \"is appropriate\"\n", + " ]\n", + " if any(p in text_lower for p in normative_patterns):\n", + " risk_score += 0.5\n", + " risk_sources.append(\"Normative implication\")\n", + "\n", + " # LLM-based scoring\n", + " bias_prompt = f\"Rate the cultural or ethical bias of the following text on a scale from 0 to 1:\\n\\n{content}\\n\\nScore:\"\n", + " llm_score_raw = llm.invoke(bias_prompt).content.strip()\n", + "\n", + " try:\n", + " llm_score = float(llm_score_raw)\n", + " risk_score += llm_score\n", + " risk_sources.append(f\"LLM bias score: {llm_score}\")\n", + " except ValueError:\n", + " llm_score = 0.0 # fallback if parsing fails\n", + "\n", + " # Normalize and classify\n", + " overall_risk = min(risk_score, 1.0)\n", + "\n", + " if overall_risk <= 0.3:\n", + " risk_level = \"Basso\"\n", + " elif overall_risk <= 0.7:\n", + " risk_level = \"Medio\"\n", + " else:\n", + " risk_level = \"Alto\"\n", + "\n", + " # Riformulazione automatica se rischio alto\n", + " revised_response = None\n", + " revision_suggestion = None\n", + " if overall_risk > 0.6:\n", + " revision_suggestion = \"Ethical risk detected. Rewrite to reduce bias and improve neutrality.\"\n", + " revision_prompt = f\"\"\"Rewrite this to reduce ethical bias and improve neutrality:\n", + "Original: {content}\n", + "Rewritten:\"\"\"\n", + " revised_response = llm.invoke(revision_prompt).content.strip()\n", + "\n", + " return {\n", + " \"overall_risk\": round(overall_risk, 2),\n", + " \"risk_level\": risk_level,\n", + " \"risk_sources\": risk_sources,\n", + " \"revision_suggestion\": revision_suggestion,\n", + " \"revised_response\": revised_response\n", + " }\n", + "\n", + "# Example prompt\n", + "prompt = \"Discuss the potential risks of generative artificial intelligence in the context of medicine.\"\n", + "\n", + "# Model invocation\n", + "output_ai = llm.invoke(prompt).content.strip()\n", + "\n", + "# Ethical evaluation of the response\n", + "ethical_check = assess_ethical_risk(output_ai)\n", + "\n", + "if ethical_check[\"revision_suggestion\"]:\n", + " print(f\"Ethics: {ethical_check['revision_suggestion']}\")\n", + "\n", + "output_ai = llm.invoke(prompt).content.strip()\n", + "ethical_check = assess_ethical_risk(output_ai)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B4-vdyYwCPc0" + }, + "source": [ + "### Interactive Scientific Chart Generator\n", + "\n", + "This cell enables the visualization of data and mathematical models extracted from problems described in natural language:\n", + "\n", + "- **Automatic extraction of numerical values** from text \n", + "- **Semantic analysis of the problem** to determine the model type:\n", + " - exponential growth, motion, oscillation, Gaussian distribution, etc. \n", + "- **Interactive chart generation** using `Plotly`, viewable in real time \n", + "- **Image export** (`graph_output.png`) for educational or documentation purposes\n", + "\n", + "The system translates scientific descriptions into visual representations, simplifying intuition and reasoning around complex concepts.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 559 + }, + "id": "X55ZcR-Kil5j", + "outputId": "31f2f632-aa5c-4e9c-bc73-dbc76e8ed9ec" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "TRR-NOTIME (Theory of Relative Reality - Without Time) presents an alternative perspective on physics, where time is not a fundamental quantity but merely a consequence of matter-energy interactions. This theory redefines the concept of gravity, nuclear decay, and quantum processes, enabling a unified understanding of physical reality without the need for a time dimension. The document summarizes the theory, its mathematical structure, and potential experimental validation.
', 'url': 'No link'}, {'title': 'Vortices and vortex stripes in a dipolar Bose-Einstein condensate', 'authors': 'Klaus, Bland, Poli, Politi, Lamporesi, Casotti, Bisset, Mark, Ferlaino', 'abstract': 'Quantized vortices are a prototypical feature of superfluidity that have been observed in multiple quantum gas experiments. But the occurrence of vortices in dipolar quantum gases — a class of ultracold gases characterized by long-range anisotropic interactions — has not been reported yet. Here, we exploit the anisotropic nature of the dipole-dipole interaction of a dysprosium Bose-Einstein condensate to induce angular symmetry breaking in an otherwise cylindrically symmetric pancake-shaped trap. Tilting the magnetic field towards the radial plane deforms the cloud into an ellipsoid, which is then set into rotation. At stirring frequencies approaching the radial trap frequency, we observe the generation of dynamically unstable surface excitations, which cause angular momentum to be pumped into the system through vortices. Under continuous rotation, the vortices arrange into a stripe configuration along the field, in close agreement with numerical simulations.
', 'url': 'No link'}, {'title': 'QUANTUM MATHEMATICS AND PHYSICS: STUDYING MATHEMATICAL FOUNDATIONS AND APPLICATIONS', 'authors': 'Khudaikulova, Saida, Ruzikulov, Shahabbas', 'abstract': 'Quantum mechanics and quantum physics have revolutionized our understanding of the fundamental nature of reality. At the core of this revolution lies quantum mathematics, which provides the mathematical foundation for describing the motion of particles at microscopic scales. This article explores the fundamental mathematical structures of quantum mechanics, including Hilbert spaces, operators, and wave functions, as well as their applications in modeling physical systems. The research also examines how quantum physics contrasts with classical physics concepts and offers new insights into topics such as quantum entanglement, superposition, and quantum computing. By analyzing the mathematical foundations of quantum theories, the article aims to shed light on the intersection of mathematics and physics, offering a deeper understanding of how mathematical formulas help predict and explain quantum phenomena. Furthermore, it discusses the potential implications of quantum mathematics in emerging fields such as quantum computing and cryptography.
', 'url': 'No link'}, {'title': 'Application of Quantum Artificial Intelligence / Machine Learning to High Energy Physics Analyses at LHC Using Quantum Computer Simulators and Quantum Computer Hardware', 'authors': 'Sau Lan Wu', 'abstract': 'Machine learning enjoys widespread success in High Energy Physics (HEP) analyses at LHC. However the ambitious HL-LHC program will require much more computing resources in the next two decades. Quantum computing may offer speed-up for HEP physics analyses at HL-LHC, and can be a new computational paradigm for big data analyses in High Energy Physics. We have successfully employed three methods (1) Variational Quantum Classifier (VQC) method, (2) Quantum Support Vector Machine Kernel (QSVM-kernel) method and (3) Quantum Neural Network (QNN) method for two LHC flagship analyses: ttH (Higgs production in association with two top quarks) and H->mumu (Higgs decay to two muons, the second generation fermions). We shall address the progressive improvements in performance from method (1) to method (3). We will present our experiences and results of a study on LHC High Energy Physics data analyses with IBM Quantum Simulator and Quantum Hardware (using IBM Qiskit framework), Google Quantum Simulator (using Google Cirq framework), and Amazon Quantum Simulator (using Amazon Braket cloud service). The work is in the context of a Qubit platform (a gate-model quantum computer). Taking into account the present limitation of hardware access, different quantum machine learning methods are studied on simulators and the results are compared with classical machine learning methods (BDT, classical Support Vector Machine and classical Neural Network). Furthermore, we do apply quantum machine learning on IBM quantum hardware to compare performance between quantum simulator and quantum hardware. The work is performed by an international and interdisciplinary collaboration with the Department of Physics and Department of Computer Sciences of University of Wisconsin, CERN Quantum Technology Initiative, IBM Research Zurich, IBM T. J. Watson Research Center, Fermilab Quantum Institute, BNL Computational Science Initiative, State University of New York at Stony Brook, and Quantum Computing and AI Research of Amazon Web Services. This work pioneers a close collaboration of academic institutions with industrial corporations in the High Energy Physics analyses effort. Though the size of event samples in future HL-LHC physics and the limited number of qubits pose some challenges to the Quantum Machine learning studies for High Energy Physics, more advanced quantum computers with larger number of qubits, reduced noise and improved running time (as envisioned by IBM and Google) may outperform classical machine learning in both classification power and in speed. Although the era of efficient quantum computing may still be years away, we have made promising progress and obtained preliminary results in applying quantum machine learning to High Energy Physics. A PROOF OF PRINCIPLE. In this talk, challenges and opportunities of applying quantum Artificial Intelligence /Machine learning to High Energy Physics analyses will also be addressed.', 'url': 'No link'}, {'title': 'ABOUT EXPLORING THE SPECIFIC VALUES OF THE FRIDRIXS MODEL WITH MATHEMATICAL PACKAGES', 'authors': \"Hayitova Hilola Gʻafurovna, Ibodova Sevarabonu To'xtasinovna\", 'abstract': 'In some important problems of mathematical physics, hydrodynamics, solid state physics, quantum field theory, statistical physics and nonrealistic quantum mechanics, it is important to study the spectral properties of the Friedrich operator in several dynamic problems. in statistical physics, the lattice gas represents the binding state, and in quantum mechanics, the specific values represent the binding states of the energies. Furthermore, the three- and multi-particle systems that emerge in nonreiltivistic quantum mechanics are inextricably linked with the spectral properties of Hamiltonians.
', 'url': 'No link'}], '**Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning**: Abstract not available\\n\\n**Quantum Physics in One Dimension**: Abstract not available\\n\\n**Quantum physics in one dimension**: Abstract not available\\n\\n**Random-matrix theories in quantum physics: common concepts**: Abstract not available\\n\\n**Local quantum physics**: Abstract not available\\n\\n**PubMed Link**: Not available\\n\\n**PubMed Link**: Not available\\n\\n**PubMed Link**: Not available\\n\\n**PubMed Link**: Not available\\n\\n**PubMed Link**: Not available\\n\\n**TRR-NOTIME: Theory of Relative Reality - Without Time**:TRR-NOTIME (Theory of Relative Reality - Without Time) presents an alternative perspective on physics, where time is not a fundamental quantity but merely a consequence of matter-energy interactions. This theory redefines the concept of gravity, nuclear decay, and quantum processes, enabling a unified understanding of physical reality without the need for a time dimension. The document summarizes the theory, its mathematical structure, and potential experimental validation.
\\n\\n**Vortices and vortex stripes in a dipolar Bose-Einstein condensate**:Quantized vortices are a prototypical feature of superfluidity that have been observed in multiple quantum gas experiments. But the occurrence of vortices in dipolar quantum gases — a class of ultracold gases characterized by long-range anisotropic interactions — has not been reported yet. Here, we exploit the anisotropic nature of the dipole-dipole interaction of a dysprosium Bose-Einstein condensate to induce angular symmetry breaking in an otherwise cylindrically symmetric pancake-shaped trap. Tilting the magnetic field towards the radial plane deforms the cloud into an ellipsoid, which is then set into rotation. At stirring frequencies approaching the radial trap frequency, we observe the generation of dynamically unstable surface excitations, which cause angular momentum to be pumped into the system through vortices. Under continuous rotation, the vortices arrange into a stripe configuration along the field, in close agreement with numerical simulations.
\\n\\n**QUANTUM MATHEMATICS AND PHYSICS: STUDYING MATHEMATICAL FOUNDATIONS AND APPLICATIONS**:Quantum mechanics and quantum physics have revolutionized our understanding of the fundamental nature of reality. At the core of this revolution lies quantum mathematics, which provides the mathematical foundation for describing the motion of particles at microscopic scales. This article explores the fundamental mathematical structures of quantum mechanics, including Hilbert spaces, operators, and wave functions, as well as their applications in modeling physical systems. The research also examines how quantum physics contrasts with classical physics concepts and offers new insights into topics such as quantum entanglement, superposition, and quantum computing. By analyzing the mathematical foundations of quantum theories, the article aims to shed light on the intersection of mathematics and physics, offering a deeper understanding of how mathematical formulas help predict and explain quantum phenomena. Furthermore, it discusses the potential implications of quantum mathematics in emerging fields such as quantum computing and cryptography.
\\n\\n**Application of Quantum Artificial Intelligence / Machine Learning to High Energy Physics Analyses at LHC Using Quantum Computer Simulators and Quantum Computer Hardware**: Machine learning enjoys widespread success in High Energy Physics (HEP) analyses at LHC. However the ambitious HL-LHC program will require much more computing resources in the next two decades. Quantum computing may offer speed-up for HEP physics analyses at HL-LHC, and can be a new computational paradigm for big data analyses in High Energy Physics. We have successfully employed three methods (1) Variational Quantum Classifier (VQC) method, (2) Quantum Support Vector Machine Kernel (QSVM-kernel) method and (3) Quantum Neural Network (QNN) method for two LHC flagship analyses: ttH (Higgs production in association with two top quarks) and H->mumu (Higgs decay to two muons, the second generation fermions). We shall address the progressive improvements in performance from method (1) to method (3). We will present our experiences and results of a study on LHC High Energy Physics data analyses with IBM Quantum Simulator and Quantum Hardware (using IBM Qiskit framework), Google Quantum Simulator (using Google Cirq framework), and Amazon Quantum Simulator (using Amazon Braket cloud service). The work is in the context of a Qubit platform (a gate-model quantum computer). Taking into account the present limitation of hardware access, different quantum machine learning methods are studied on simulators and the results are compared with classical machine learning methods (BDT, classical Support Vector Machine and classical Neural Network). Furthermore, we do apply quantum machine learning on IBM quantum hardware to compare performance between quantum simulator and quantum hardware. The work is performed by an international and interdisciplinary collaboration with the Department of Physics and Department of Computer Sciences of University of Wisconsin, CERN Quantum Technology Initiative, IBM Research Zurich, IBM T. J. Watson Research Center, Fermilab Quantum Institute, BNL Computational Science Initiative, State University of New York at Stony Brook, and Quantum Computing and AI Research of Amazon Web Services. This work pioneers a close collaboration of academic institutions with industrial corporations in the High Energy Physics analyses effort. Though the size of event samples in future HL-LHC physics and the limited number of qubits pose some challenges to the Quantum Machine learning studies for High Energy Physics, more advanced quantum computers with larger number of qubits, reduced noise and improved running time (as envisioned by IBM and Google) may outperform classical machine learning in both classification power and in speed. Although the era of efficient quantum computing may still be years away, we have made promising progress and obtained preliminary results in applying quantum machine learning to High Energy Physics. A PROOF OF PRINCIPLE. In this talk, challenges and opportunities of applying quantum Artificial Intelligence /Machine learning to High Energy Physics analyses will also be addressed.\\n\\n**ABOUT EXPLORING THE SPECIFIC VALUES OF THE FRIDRIXS MODEL WITH MATHEMATICAL PACKAGES**:In some important problems of mathematical physics, hydrodynamics, solid state physics, quantum field theory, statistical physics and nonrealistic quantum mechanics, it is important to study the spectral properties of the Friedrich operator in several dynamic problems. in statistical physics, the lattice gas represents the binding state, and in quantum mechanics, the specific values represent the binding states of the energies. Furthermore, the three- and multi-particle systems that emerge in nonreiltivistic quantum mechanics are inextricably linked with the spectral properties of Hamiltonians.
')\n", + "Enter the subject (e.g., physics, biology, etc.): medicine\n", + "Choose the level (basic/advanced/expert): expert\n", + "Enter the scientific problem or topic: Apparato Respiratorio\n", + "Do you want a chart for the explanation? (yes/no): yes\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "