Spaces:

leowuming
/

rag

Runtime error

App Files Files Community

gaojintao01 commited on Sep 25, 2025

Commit

f8b5d42

1 Parent(s): 18b32f0

Add files using Git LFS

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.devcontainer/README.md +73 -0
.devcontainer/devcontainer.json +211 -0
.editorconfig +17 -0
.gitattributes +3 -0
.github/FUNDING.yml +1 -0
.github/ISSUE_TEMPLATE/01_bug.yml +42 -0
.github/ISSUE_TEMPLATE/02_feature.yml +19 -0
.github/ISSUE_TEMPLATE/03_documentation.yml +13 -0
.github/ISSUE_TEMPLATE/config.yml +5 -0
.github/workflows/build-and-push-image-semver.yaml +117 -0
.github/workflows/build-and-push-image.yaml +138 -0
.github/workflows/check-package-versions.yaml +37 -0
.github/workflows/check-translations.yaml +37 -0
.github/workflows/dev-build.yaml +124 -0
.github/workflows/run-tests.yaml +77 -0
.github/workflows/sponsors.yaml +44 -0
.gitignore +12 -0
.gitmodules +7 -0
.hadolint.yaml +8 -0
.nvmrc +1 -0
.prettierignore +17 -0
.prettierrc +38 -0
.vscode/launch.json +74 -0
.vscode/settings.json +63 -0
.vscode/tasks.json +94 -0
BARE_METAL.md +147 -0
CONTRIBUTING.md +105 -0
LICENSE +21 -0
SECURITY.md +15 -0
cloud-deployments/aws/cloudformation/DEPLOY.md +49 -0
cloud-deployments/aws/cloudformation/aws_https_instructions.md +118 -0
cloud-deployments/aws/cloudformation/cloudformation_create_anythingllm.json +234 -0
cloud-deployments/digitalocean/terraform/DEPLOY.md +44 -0
cloud-deployments/digitalocean/terraform/main.tf +52 -0
cloud-deployments/digitalocean/terraform/outputs.tf +4 -0
cloud-deployments/digitalocean/terraform/user_data.tp1 +22 -0
cloud-deployments/gcp/deployment/DEPLOY.md +54 -0
cloud-deployments/gcp/deployment/gcp_deploy_anything_llm.yaml +45 -0
cloud-deployments/huggingface-spaces/Dockerfile +31 -0
cloud-deployments/k8/manifest.yaml +214 -0
collector/.env.example +1 -0
collector/.gitignore +6 -0
collector/.nvmrc +1 -0
collector/__tests__/utils/extensions/YoutubeTranscript/YoutubeLoader/youtube-transcript.test.js +16 -0
collector/extensions/index.js +207 -0
collector/extensions/resync/index.js +153 -0
collector/hotdir/__HOTDIR__.md +3 -0
collector/index.js +188 -0
collector/middleware/setDataSigner.js +41 -0
collector/middleware/verifyIntegrity.js +26 -0

.devcontainer/README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# AnythingLLM Development Container Setup
+Welcome to the AnythingLLM development container configuration, designed to create a seamless and feature-rich development environment for this project.
+<center><h1><b>PLEASE READ THIS</b></h1></center>
+## Prerequisites
+- [Docker](https://www.docker.com/get-started)
+- [Visual Studio Code](https://code.visualstudio.com/)
+- [Remote - Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) VS Code extension
+## Features
+- **Base Image**: Built on `mcr.microsoft.com/devcontainers/javascript-node:1-18-bookworm`, thus Node.JS LTS v18.
+- **Additional Tools**: Includes `hadolint`, and essential apt-packages such as `curl`, `gnupg`, and more.
+- **Ports**: Configured to auto-forward ports `3000` (Frontend) and `3001` (Backend).
+- **Environment Variables**: Sets `NODE_ENV` to `development` and `ESLINT_USE_FLAT_CONFIG` to `true`.
+- **VS Code Extensions**: A suite of extensions such as `Prettier`, `Docker`, `ESLint`, and more are automatically installed. Please revise if you do not agree with any of these extensions. AI-powered extensions and time trackers are (for now) not included to avoid any privacy concerns, but you can install them later in your own environment.
+## Getting Started
+1. Using GitHub Codespaces. Just select to create a new workspace, and the devcontainer will be created for you.
+2. Using your Local VSCode (Release or Insiders). We suggest you first make a fork of the repo and then clone it to your local machine using VSCode tools. Then open the project folder in VSCode, which will prompt you to open the project in a devcontainer. Select yes, and the devcontainer will be created for you. If this does not happen, you can open the command palette and select "Remote-Containers: Reopen in Container".
+## On Creation:
+When the container is built for the first time, it will automatically run `yarn setup` to ensure everything is in place for the Collector, Server and Frontend. This command is expected to be automatically re-run if there is a content change on next reboot.
+## Work in the Container:
+Once the container is up, be patient. Some extensions may complain because dependencies are still being installed, and in the Extensions tab, some may ask you to "Reload" the project. Don't do that yet. First, wait until all settle down for the first time. We suggest you create a new VSCode profile for this devcontainer, so any configuration and extensions you change, won't affect your default profile.
+Checklist:
+- [ ] The usual message asking you to start the Server and Frontend in different windows are now "hidden" in the building process of the devcontainer. Don't forget to do as suggested.
+- [ ] Open a JavaScript file, for example "server/index.js" and check if `eslint` is working. It will complain that `'err' is defined but never used.`. This means it is working.
+- [ ] Open a React File, for example, "frontend/src/main.jsx," and check if `eslint` complains about `Fast refresh only works when a file has exports. Move your component(s) to a separate file.`. Again, it means `eslint` is working. Now check at the status bar if the `Prettier` has a double checkmark :heavy_check_mark: (double). It means Prettier is working. You will see a nice extension `Formatting:`:heavy_check_mark: that can be used to disable the `Format on Save` feature temporarily.
+- [ ] Check if, on the left pane, you have the NPM Scripts (this may be disabled; look at the "Explorer" tree-dots up-right). There will be scripts inside the `package.json` files. You will basically need to run the `dev:collector`, `dev:server` and the `dev:frontend` in this order. When the frontend finishes starting, a window browser will open **inside** the VSCode. Still, you can open it outside.
+:warning: **Important for all developers** :warning:
+- [ ] When you are using the `NODE_ENV=development` the server will not store the configurations you set for security reasons. Please set the proper config on file `.env.development`. The side-effect if you don't, everytime you restart the server, you will be sent to the "Onboarding" page again.
+**Note when using GitHub Codespaces**
+- [ ] When running the "Server" for the first time, it will automatically configure its port to be publicly accessible by default, as this is required for the front end to reach the server backend. To know more, read the content of the `.env` file on the frontend folder about this, and if any issues occur, make sure to manually set the port "Visibility" of the "Server" is set to "Public" if needed. Again, this is only needed for developing on GitHub Codespaces.
+**For the Collector:**
+- [x] In the past, the Collector dwelled within the Python domain, but now it has journeyed to the splendid realm of Node.JS. Consequently, the configuration complexities of bygone versions are no longer a concern.
+### Now it is ready to start
+In the status bar you will see three shortcuts names `Collector`, `Server` and `Frontend`. Just click-and-wait on that order (don't forget to set the Server port 3001 to Public if you are using GH Codespaces **_before_** starting the Frontend).
+Now you can enjoy your time developing instead of reconfiguring everything.
+## Debugging with the devcontainers
+### For debugging the collector, server and frontend
+First, make sure the built-in extension (ms-vscode.js-debug) is active (I don't know why it would not be, but just in case). If you want, you can install the nightly version (ms-vscode.js-debug-nightly)
+Then, in the "Run and Debug" tab (Ctrl+shift+D), you can select on the menu:
+- Collector debug. This will start the collector in debug mode and attach the debugger. Works very well.
+- Server debug. This will start the server in debug mode and attach the debugger. Works very well.
+- Frontend debug. This will start the frontend in debug mode and attach the debugger. I am still struggling with this one. I don't know if VSCode can handle the .jsx files seamlessly as the pure .js on the server. Maybe there is a need for a particular configuration for Vite or React. Anyway, it starts. Another two configurations launch Chrome and Edge, and I think we could add breakpoints on .jsx files somehow. The best scenario would be always to use the embedded browser. WIP.
+Please leave comments on the Issues tab or the [![](https://img.shields.io/discord/1114740394715004990?logo=Discord&logoColor=white&label=Discord&labelColor=%235568ee&color=%2355A2DD&link=https%3A%2F%2Fdiscord.gg%2F6UyHPeGZAC)]("https://discord.gg/6UyHPeGZAC")

.devcontainer/devcontainer.json ADDED Viewed

	@@ -0,0 +1,211 @@

+// For format details, see https://aka.ms/devcontainer.json. For config options, see the
+// README at: https://github.com/devcontainers/templates/tree/main/src/javascript-node
+{
+  "name": "Node.js",
+  // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
+  // "build": {
+  //   "args": {
+  //     "ARG_UID": "1000",
+  //     "ARG_GID": "1000"
+  //   },
+  //   "dockerfile": "Dockerfile"
+  // },
+  // "containerUser": "anythingllm",
+  // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
+  "image": "mcr.microsoft.com/devcontainers/javascript-node:1-18-bookworm",
+  // Features to add to the dev container. More info: https://containers.dev/features.
+  "features": {
+    // Docker very useful linter
+    "ghcr.io/dhoeric/features/hadolint:1": {
+      "version": "latest"
+    },
+    // Terraform support
+    "ghcr.io/devcontainers/features/terraform:1": {},
+    // Just a wrap to install needed packages
+    "ghcr.io/devcontainers-contrib/features/apt-packages:1": {
+      // Dependencies copied from ../docker/Dockerfile plus some dev stuff
+      "packages": [
+        "build-essential",
+        "ca-certificates",
+        "curl",
+        "ffmpeg",
+        "fonts-liberation",
+        "git",
+        "gnupg",
+        "htop",
+        "less",
+        "libappindicator1",
+        "libasound2",
+        "libatk-bridge2.0-0",
+        "libatk1.0-0",
+        "libc6",
+        "libcairo2",
+        "libcups2",
+        "libdbus-1-3",
+        "libexpat1",
+        "libfontconfig1",
+        "libgbm1",
+        "libgcc1",
+        "libgfortran5",
+        "libglib2.0-0",
+        "libgtk-3-0",
+        "libnspr4",
+        "libnss3",
+        "libpango-1.0-0",
+        "libpangocairo-1.0-0",
+        "libstdc++6",
+        "libx11-6",
+        "libx11-xcb1",
+        "libxcb1",
+        "libxcomposite1",
+        "libxcursor1",
+        "libxdamage1",
+        "libxext6",
+        "libxfixes3",
+        "libxi6",
+        "libxrandr2",
+        "libxrender1",
+        "libxss1",
+        "libxtst6",
+        "locales",
+        "lsb-release",
+        "procps",
+        "tzdata",
+        "wget",
+        "xdg-utils"
+      ]
+    }
+  },
+  "updateContentCommand": "cd server && yarn && cd ../collector && PUPPETEER_DOWNLOAD_BASE_URL=https://storage.googleapis.com/chrome-for-testing-public yarn && cd ../frontend && yarn && cd .. && yarn setup:envs && yarn prisma:setup && echo \"Please run yarn dev:server, yarn dev:collector, and yarn dev:frontend in separate terminal tabs.\"",
+  // Use 'postCreateCommand' to run commands after the container is created.
+  // This configures VITE for github codespaces and installs gh cli
+  "postCreateCommand": "if [ \"${CODESPACES}\" = \"true\" ]; then echo 'VITE_API_BASE=\"https://$CODESPACE_NAME-3001.$GITHUB_CODESPACES_PORT_FORWARDING_DOMAIN/api\"' > ./frontend/.env && (type -p wget >/dev/null || (sudo apt update && sudo apt-get install wget -y)) && sudo mkdir -p -m 755 /etc/apt/keyrings && wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null && sudo chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg && echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main\" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null && sudo apt update && sudo apt install gh -y; fi",
+  "portsAttributes": {
+    "3001": {
+      "label": "Backend",
+      "onAutoForward": "notify"
+    },
+    "3000": {
+      "label": "Frontend",
+      "onAutoForward": "openPreview"
+    }
+  },
+  "capAdd": [
+    "SYS_ADMIN" // needed for puppeteer using headless chrome in sandbox
+  ],
+  "remoteEnv": {
+    "NODE_ENV": "development",
+    "ESLINT_USE_FLAT_CONFIG": "true",
+    "ANYTHING_LLM_RUNTIME": "docker"
+  },
+  // "initializeCommand": "echo Initialize....",
+  "shutdownAction": "stopContainer",
+  // Configure tool-specific properties.
+  "customizations": {
+    "codespaces": {
+      "openFiles": [
+        "README.md",
+        ".devcontainer/README.md"
+      ]
+    },
+    "vscode": {
+      "openFiles": [
+        "README.md",
+        ".devcontainer/README.md"
+      ],
+      "extensions": [
+        "bierner.github-markdown-preview",
+        "bradlc.vscode-tailwindcss",
+        "dbaeumer.vscode-eslint",
+        "editorconfig.editorconfig",
+        "esbenp.prettier-vscode",
+        "exiasr.hadolint",
+        "flowtype.flow-for-vscode",
+        "gamunu.vscode-yarn",
+        "hashicorp.terraform",
+        "mariusschulz.yarn-lock-syntax",
+        "ms-azuretools.vscode-docker",
+        "streetsidesoftware.code-spell-checker",
+        "actboy168.tasks",
+        "tombonnike.vscode-status-bar-format-toggle",
+        "ms-vscode.js-debug"
+      ],
+      "settings": {
+        "[css]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[dockercompose]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[dockerfile]": {
+          "editor.defaultFormatter": "ms-azuretools.vscode-docker"
+        },
+        "[html]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[javascript]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[javascriptreact]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[json]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[jsonc]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[markdown]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[postcss]": {
+          "editor.defaultFormatter": "esbenp.prettier-vscode"
+        },
+        "[toml]": {
+          "editor.defaultFormatter": "tamasfe.even-better-toml"
+        },
+        "eslint.debug": true,
+        "eslint.enable": true,
+        "eslint.experimental.useFlatConfig": true,
+        "eslint.run": "onSave",
+        "files.associations": {
+          ".*ignore": "ignore",
+          ".editorconfig": "editorconfig",
+          ".env*": "properties",
+          ".flowconfig": "ini",
+          ".prettierrc": "json",
+          "*.css": "tailwindcss",
+          "*.md": "markdown",
+          "*.sh": "shellscript",
+          "docker-compose.*": "dockercompose",
+          "Dockerfile*": "dockerfile",
+          "yarn.lock": "yarnlock"
+        },
+        "javascript.format.enable": false,
+        "javascript.inlayHints.enumMemberValues.enabled": true,
+        "javascript.inlayHints.functionLikeReturnTypes.enabled": true,
+        "javascript.inlayHints.parameterTypes.enabled": true,
+        "javascript.inlayHints.variableTypes.enabled": true,
+        "js/ts.implicitProjectConfig.module": "CommonJS",
+        "json.format.enable": false,
+        "json.schemaDownload.enable": true,
+        "npm.autoDetect": "on",
+        "npm.packageManager": "yarn",
+        "prettier.useEditorConfig": false,
+        "tailwindCSS.files.exclude": [
+          "**/.git/**",
+          "**/node_modules/**",
+          "**/.hg/**",
+          "**/.svn/**",
+          "**/dist/**"
+        ],
+        "typescript.validate.enable": false,
+        "workbench.editorAssociations": {
+          "*.md": "vscode.markdown.preview.editor"
+        }
+      }
+    }
+  }
+  // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
+  // "remoteUser": "root"
+}

.editorconfig ADDED Viewed

	@@ -0,0 +1,17 @@

+# EditorConfig is awesome: https://EditorConfig.org
+# top-most EditorConfig file
+root = true
+[*]
+# Non-configurable Prettier behaviors
+charset = utf-8
+insert_final_newline = true
+trim_trailing_whitespace = true
+# Configurable Prettier behaviors
+# (change these if your Prettier config differs)
+end_of_line = lf
+indent_style = space
+indent_size = 2
+max_line_length = 80

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.ttf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.webm filter=lfs diff=lfs merge=lfs -text

.github/FUNDING.yml ADDED Viewed

	@@ -0,0 +1 @@


1	+ github: Mintplex-Labs

.github/ISSUE_TEMPLATE/01_bug.yml ADDED Viewed

	@@ -0,0 +1,42 @@

+name: 🐛 Bug Report
+description: File a bug report for AnythingLLM
+title: "[BUG]: "
+labels: [possible bug]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Use this template to file a bug report for AnythingLLM. Please be as descriptive as possible to allow everyone to replicate and solve your issue.
+  - type: dropdown
+    id: runtime
+    attributes:
+      label: How are you running AnythingLLM?
+      description: AnythingLLM can be run in many environments, pick the one that best represents where you encounter the bug.
+      options:
+        - Docker (local)
+        - Docker (remote machine)
+        - Local development
+        - AnythingLLM desktop app
+        - All versions
+        - Not listed
+      default: 0
+    validations:
+      required: true
+  - type: textarea
+    id: what-happened
+    attributes:
+      label: What happened?
+      description: Also tell us, what did you expect to happen?
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Are there known steps to reproduce?
+      description: |
+        Let us know how to reproduce the bug and we may be able to fix it more
+        quickly. This is not required, but it is helpful.
+    validations:
+      required: false

.github/ISSUE_TEMPLATE/02_feature.yml ADDED Viewed

	@@ -0,0 +1,19 @@

+name: ✨ New Feature suggestion
+description: Suggest a new feature for AnythingLLM!
+title: "[FEAT]: "
+labels: [enhancement, feature request]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Share a new idea for a feature or improvement. Be sure to search existing
+        issues first to avoid duplicates.
+  - type: textarea
+    id: description
+    attributes:
+      label: What would you like to see?
+      description: |
+        Describe the feature and why it would be useful to your use-case as well as others.
+    validations:
+      required: true

.github/ISSUE_TEMPLATE/03_documentation.yml ADDED Viewed

	@@ -0,0 +1,13 @@

+name: 📚 Documentation improvement
+title: "[DOCS]: "
+description: Report an issue or problem with the documentation.
+labels: [documentation]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: Describe the issue with the documentation that is giving you trouble or causing confusion.
+    validations:
+      required: true

.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

	@@ -0,0 +1,5 @@

+blank_issues_enabled: true
+contact_links:
+  - name: 🧑‍🤝‍🧑 Community Discord
+    url: https://discord.gg/6UyHPeGZAC
+    about: Interact with the Mintplex Labs community here by asking for help, discussing and more!

.github/workflows/build-and-push-image-semver.yaml ADDED Viewed

	@@ -0,0 +1,117 @@

+name: Publish AnythingLLM Docker image on Release (amd64 & arm64)
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  release:
+    types: [published]
+jobs:
+  push_multi_platform_to_registries:
+    name: Push Docker multi-platform image to multiple registries
+    runs-on: ubuntu-latest
+    permissions:
+      packages: write
+      contents: read
+    steps:
+      - name: Check out the repo
+        uses: actions/checkout@v4
+      - name: Check if DockerHub build needed
+        shell: bash
+        run: |
+          # Check if the secret for USERNAME is set (don't even check for the password)
+          if [[ -z "${{ secrets.DOCKER_USERNAME }}" ]]; then
+            echo "DockerHub build not needed"
+            echo "enabled=false" >> $GITHUB_OUTPUT
+          else
+            echo "DockerHub build needed"
+            echo "enabled=true" >> $GITHUB_OUTPUT
+          fi
+        id: dockerhub
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          version: v0.22.0
+      - name: Log in to Docker Hub
+        uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
+        # Only login to the Docker Hub if the repo is mintplex/anythingllm, to allow for forks to build on GHCR
+        if: steps.dockerhub.outputs.enabled == 'true'
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+      - name: Log in to the Container registry
+        uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
+        with:
+          images: |
+            ${{ steps.dockerhub.outputs.enabled == 'true' && 'mintplexlabs/anythingllm' || '' }}
+            ghcr.io/${{ github.repository }}
+          tags: |
+            type=semver,pattern={{version}}
+            type=semver,pattern={{major}}.{{minor}}
+      - name: Build and push multi-platform Docker image
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          file: ./docker/Dockerfile
+          push: true
+          sbom: true
+          provenance: mode=max
+          platforms: linux/amd64,linux/arm64
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      # For Docker scout there are some intermediary reported CVEs which exists outside
+      # of execution content or are unreachable by an attacker but exist in image.
+      # We create VEX files for these so they don't show in scout summary.
+      - name: Collect known and verified CVE exceptions
+        id: cve-list
+        run: |
+          # Collect CVEs from filenames in vex folder
+          CVE_NAMES=""
+          for file in ./docker/vex/*.vex.json; do
+            [ -e "$file" ] || continue
+            filename=$(basename "$file")
+            stripped_filename=${filename%.vex.json}
+            CVE_NAMES+=" $stripped_filename"
+          done
+          echo "CVE_EXCEPTIONS=$CVE_NAMES" >> $GITHUB_OUTPUT
+        shell: bash
+      # About VEX attestations https://docs.docker.com/scout/explore/exceptions/
+      # Justifications https://github.com/openvex/spec/blob/main/OPENVEX-SPEC.md#status-justifications
+      - name: Add VEX attestations
+        env:
+          CVE_EXCEPTIONS: ${{ steps.cve-list.outputs.CVE_EXCEPTIONS }}
+        run: |
+          echo $CVE_EXCEPTIONS
+          curl -sSfL https://raw.githubusercontent.com/docker/scout-cli/main/install.sh | sh -s --
+          for cve in $CVE_EXCEPTIONS; do
+            for tag in "${{ join(fromJSON(steps.meta.outputs.json).tags, ' ') }}"; do
+              echo "Attaching VEX exception $cve to $tag"
+              docker scout attestation add \
+              --file "./docker/vex/$cve.vex.json" \
+              --predicate-type https://openvex.dev/ns/v0.2.0 \
+              $tag
+            done
+          done
+        shell: bash

.github/workflows/build-and-push-image.yaml ADDED Viewed

	@@ -0,0 +1,138 @@

+# This GitHub action is for publishing of the primary image for AnythingLLM
+# It will publish a linux/amd64 and linux/arm64 image at the same time
+# This file should ONLY BY USED FOR `master` BRANCH.
+# TODO: GitHub now has an ubuntu-24.04-arm64 runner, but we still need
+# to use QEMU to build the arm64 image because Chromium is not available for Linux arm64
+# so builds will still fail, or fail much more often. Its inconsistent and frustrating.
+name: Publish AnythingLLM Primary Docker image (amd64/arm64)
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  push:
+    branches: ['master'] # master branch only. Do not modify.
+    paths-ignore:
+      - '**.md'
+      - '.gitmodules'
+      - 'cloud-deployments/**/*'
+      - 'images/**/*'
+      - '.vscode/**/*'
+      - '**/.env.example'
+      - '.github/ISSUE_TEMPLATE/**/*'
+      - '.devcontainer/**/*'
+      - 'embed/**/*' # Embed is submodule
+      - 'browser-extension/**/*' # Chrome extension is submodule
+      - 'server/utils/agents/aibitat/example/**/*' # Do not push new image for local dev testing of new aibitat images.
+      - 'extras/**/*' # Extra is just for news and other local content.
+jobs:
+  push_multi_platform_to_registries:
+    name: Push Docker multi-platform image to multiple registries
+    runs-on: ubuntu-latest
+    permissions:
+      packages: write
+      contents: read
+    steps:
+      - name: Check out the repo
+        uses: actions/checkout@v4
+      - name: Check if DockerHub build needed
+        shell: bash
+        run: |
+          # Check if the secret for USERNAME is set (don't even check for the password)
+          if [[ -z "${{ secrets.DOCKER_USERNAME }}" ]]; then
+            echo "DockerHub build not needed"
+            echo "enabled=false" >> $GITHUB_OUTPUT
+          else
+            echo "DockerHub build needed"
+            echo "enabled=true" >> $GITHUB_OUTPUT
+          fi
+        id: dockerhub
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          version: v0.22.0
+      - name: Log in to Docker Hub
+        uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
+        # Only login to the Docker Hub if the repo is mintplex/anythingllm, to allow for forks to build on GHCR
+        if: steps.dockerhub.outputs.enabled == 'true'
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+      - name: Log in to the Container registry
+        uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
+        with:
+          images: |
+            ${{ steps.dockerhub.outputs.enabled == 'true' && 'mintplexlabs/anythingllm' || '' }}
+            ghcr.io/${{ github.repository }}
+          tags: |
+            type=raw,value=latest,enable={{is_default_branch}}
+            type=ref,event=branch
+            type=ref,event=tag
+            type=ref,event=pr
+      - name: Build and push multi-platform Docker image
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          file: ./docker/Dockerfile
+          push: true
+          sbom: true
+          provenance: mode=max
+          platforms: linux/amd64,linux/arm64
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      # For Docker scout there are some intermediary reported CVEs which exists outside
+      # of execution content or are unreachable by an attacker but exist in image.
+      # We create VEX files for these so they don't show in scout summary.
+      - name: Collect known and verified CVE exceptions
+        id: cve-list
+        run: |
+          # Collect CVEs from filenames in vex folder
+          CVE_NAMES=""
+          for file in ./docker/vex/*.vex.json; do
+            [ -e "$file" ] || continue
+            filename=$(basename "$file")
+            stripped_filename=${filename%.vex.json}
+            CVE_NAMES+=" $stripped_filename"
+          done
+          echo "CVE_EXCEPTIONS=$CVE_NAMES" >> $GITHUB_OUTPUT
+        shell: bash
+      # About VEX attestations https://docs.docker.com/scout/explore/exceptions/
+      # Justifications https://github.com/openvex/spec/blob/main/OPENVEX-SPEC.md#status-justifications
+      - name: Add VEX attestations
+        env:
+          CVE_EXCEPTIONS: ${{ steps.cve-list.outputs.CVE_EXCEPTIONS }}
+        run: |
+          echo $CVE_EXCEPTIONS
+          curl -sSfL https://raw.githubusercontent.com/docker/scout-cli/main/install.sh | sh -s --
+          for cve in $CVE_EXCEPTIONS; do
+            for tag in "${{ join(fromJSON(steps.meta.outputs.json).tags, ' ') }}"; do
+              echo "Attaching VEX exception $cve to $tag"
+              docker scout attestation add \
+              --file "./docker/vex/$cve.vex.json" \
+              --predicate-type https://openvex.dev/ns/v0.2.0 \
+              $tag
+            done
+          done
+        shell: bash

.github/workflows/check-package-versions.yaml ADDED Viewed

	@@ -0,0 +1,37 @@

+# This GitHub action is for checking the versions of the packages in the project.
+# Any package that is present in both the `server` and `collector` package.json file
+# is checked to ensure that they are the same version.
+name: Check package versions
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]
+    paths:
+      - "server/package.json"
+      - "collector/package.json"
+jobs:
+  run-script:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+      - name: Set up Node.js
+        uses: actions/setup-node@v3
+        with:
+          node-version: '18'
+      - name: Run verifyPackageVersions.mjs script
+        run: |
+          cd extras/scripts
+          node verifyPackageVersions.mjs
+      - name: Fail job on error
+        if: failure()
+        run: exit 1

.github/workflows/check-translations.yaml ADDED Viewed

	@@ -0,0 +1,37 @@

+# This GitHub action is for validation of all languages which translations are offered for
+# in the locales folder in `frontend/src`. All languages are compared to the EN translation
+# schema since that is the fallback language setting. This workflow will run on all PRs that
+# modify any files in the translation directory
+name: Verify translations files
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]
+    paths:
+      - "frontend/src/locales/**.js"
+jobs:
+  run-script:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+      - name: Set up Node.js
+        uses: actions/setup-node@v3
+        with:
+          node-version: '18'
+      - name: Run verifyTranslations.mjs script
+        run: |
+          cd frontend/src/locales
+          node verifyTranslations.mjs
+      - name: Fail job on error
+        if: failure()
+        run: exit 1

.github/workflows/dev-build.yaml ADDED Viewed

	@@ -0,0 +1,124 @@

+name: AnythingLLM Development Docker image (amd64)
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  push:
+    branches: ['3999-chromium-flags'] # put your current branch to create a build. Core team only.
+    paths-ignore:
+      - '**.md'
+      - 'cloud-deployments/*'
+      - 'images/**/*'
+      - '.vscode/**/*'
+      - '**/.env.example'
+      - '.github/ISSUE_TEMPLATE/**/*'
+      - '.devcontainer/**/*'
+      - 'embed/**/*' # Embed should be published to frontend (yarn build:publish) if any changes are introduced
+      - 'browser-extension/**/*' # Chrome extension is submodule
+      - 'server/utils/agents/aibitat/example/**/*' # Do not push new image for local dev testing of new aibitat images.
+      - 'extras/**/*' # Extra is just for news and other local content.
+jobs:
+  push_multi_platform_to_registries:
+    name: Push Docker multi-platform image to multiple registries
+    runs-on: ubuntu-latest
+    permissions:
+      packages: write
+      contents: read
+    steps:
+      - name: Check out the repo
+        uses: actions/checkout@v4
+      - name: Check if DockerHub build needed
+        shell: bash
+        run: |
+          # Check if the secret for USERNAME is set (don't even check for the password)
+          if [[ -z "${{ secrets.DOCKER_USERNAME }}" ]]; then
+            echo "DockerHub build not needed"
+            echo "enabled=false" >> $GITHUB_OUTPUT
+          else
+            echo "DockerHub build needed"
+            echo "enabled=true" >> $GITHUB_OUTPUT
+          fi
+        id: dockerhub
+      # Uncomment this + add linux/arm64 to platforms if you want to build for arm64 as well
+      # - name: Set up QEMU
+      #   uses: docker/setup-qemu-action@v3
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          version: v0.22.0
+      - name: Log in to Docker Hub
+        uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
+        # Only login to the Docker Hub if the repo is mintplex/anythingllm, to allow for forks to build on GHCR
+        if: steps.dockerhub.outputs.enabled == 'true'
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
+        with:
+          images: |
+            ${{ steps.dockerhub.outputs.enabled == 'true' && 'mintplexlabs/anythingllm' || '' }}
+          tags: |
+            type=raw,value=dev
+      - name: Build and push multi-platform Docker image
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          file: ./docker/Dockerfile
+          push: true
+          sbom: true
+          provenance: mode=max
+          platforms: linux/amd64
+          # platforms: linux/amd64,linux/arm64
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      # For Docker scout there are some intermediary reported CVEs which exists outside
+      # of execution content or are unreachable by an attacker but exist in image.
+      # We create VEX files for these so they don't show in scout summary.
+      - name: Collect known and verified CVE exceptions
+        id: cve-list
+        run: |
+          # Collect CVEs from filenames in vex folder
+          CVE_NAMES=""
+          for file in ./docker/vex/*.vex.json; do
+            [ -e "$file" ] || continue
+            filename=$(basename "$file")
+            stripped_filename=${filename%.vex.json}
+            CVE_NAMES+=" $stripped_filename"
+          done
+          echo "CVE_EXCEPTIONS=$CVE_NAMES" >> $GITHUB_OUTPUT
+        shell: bash
+      # About VEX attestations https://docs.docker.com/scout/explore/exceptions/
+      # Justifications https://github.com/openvex/spec/blob/main/OPENVEX-SPEC.md#status-justifications
+      # Fixed to use v1.15.1 of scout-cli as v1.16.0 install script is broken
+      # https://github.com/docker/scout-cli
+      - name: Add VEX attestations
+        env:
+          CVE_EXCEPTIONS: ${{ steps.cve-list.outputs.CVE_EXCEPTIONS }}
+        run: |
+          echo $CVE_EXCEPTIONS
+          curl -sSfL https://raw.githubusercontent.com/docker/scout-cli/main/install.sh | sh -s --
+          for cve in $CVE_EXCEPTIONS; do
+            for tag in "${{ join(fromJSON(steps.meta.outputs.json).tags, ' ') }}"; do
+              echo "Attaching VEX exception $cve to $tag"
+              docker scout attestation add \
+              --file "./docker/vex/$cve.vex.json" \
+              --predicate-type https://openvex.dev/ns/v0.2.0 \
+              $tag
+            done
+          done
+        shell: bash

.github/workflows/run-tests.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+name: Run backend tests
+concurrency:
+  group: build-${{ github.ref }}
+  cancel-in-progress: true
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]
+    paths:
+      - "server/**.js"
+      - "collector/**.js"
+jobs:
+  run-script:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+      - name: Set up Node.js
+        uses: actions/setup-node@v3
+        with:
+          node-version: '18'
+      - name: Cache root dependencies
+        uses: actions/cache@v3
+        with:
+          path: |
+            node_modules
+            ~/.cache/yarn
+          key: ${{ runner.os }}-yarn-root-${{ hashFiles('**/yarn.lock') }}
+          restore-keys: |
+            ${{ runner.os }}-yarn-root-
+      - name: Cache server dependencies
+        uses: actions/cache@v3
+        with:
+          path: |
+            server/node_modules
+            ~/.cache/yarn
+          key: ${{ runner.os }}-yarn-server-${{ hashFiles('server/yarn.lock') }}
+          restore-keys: |
+            ${{ runner.os }}-yarn-server-
+      - name: Cache collector dependencies
+        uses: actions/cache@v3
+        with:
+          path: |
+            collector/node_modules
+            ~/.cache/yarn
+          key: ${{ runner.os }}-yarn-collector-${{ hashFiles('collector/yarn.lock') }}
+          restore-keys: |
+            ${{ runner.os }}-yarn-collector-
+      - name: Install root dependencies
+        if: steps.cache-root.outputs.cache-hit != 'true'
+        run: yarn install --frozen-lockfile
+      - name: Install server dependencies
+        if: steps.cache-server.outputs.cache-hit != 'true'
+        run: cd server && yarn install --frozen-lockfile
+      - name: Install collector dependencies
+        if: steps.cache-collector.outputs.cache-hit != 'true'
+        run: cd collector && yarn install --frozen-lockfile
+      - name: Setup environment and Prisma
+        run: yarn setup:envs && yarn prisma:setup
+      - name: Run test suites
+        run: yarn test
+      - name: Fail job on error
+        if: failure()
+        run: exit 1

.github/workflows/sponsors.yaml ADDED Viewed

	@@ -0,0 +1,44 @@

+name: Generate Sponsors README
+on:
+  schedule:
+    - cron: "0 12 * * 3" # Run every Wednesday at 12:00 PM UTC
+permissions:
+  contents: write
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout 🛎️
+        uses: actions/checkout@v2
+      - name: Generate All Sponsors README
+        id: generate-all-sponsors
+        uses: JamesIves/github-sponsors-readme-action@v1
+        with:
+          token: ${{ secrets.SPONSOR_PAT }}
+          file: 'README.md'
+          organization: true
+          active-only: false
+          marker: 'all-sponsors'
+      - name: Commit and Push 🚀
+        uses: stefanzweifel/git-auto-commit-action@v5
+        id: auto-commit-action
+        with:
+          commit_message: 'Update Sponsors README'
+          file_pattern: 'README.md'
+      - name: Generate PR if changes detected
+        uses: peter-evans/create-pull-request@v7
+        if: steps.auto-commit-action.outputs.files_changed == 'true'
+        with:
+          token: ${{ secrets.GITHUB_TOKEN }}
+          title: 'Update Sponsors README'
+          branch: 'chore/update-sponsors'
+          base: 'master'
+          draft: false
+          reviewers: 'timothycarambat'
+          assignees: 'timothycarambat'
+          maintainer-can-modify: true

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+v-env
+.env
+!.env.example
+node_modules
+__pycache__
+v-env
+.DS_Store
+aws_cf_deploy_anything_llm.json
+yarn.lock
+*.bak
+.idea

.gitmodules ADDED Viewed

	@@ -0,0 +1,7 @@

+[submodule "browser-extension"]
+	path = browser-extension
+	url = https://github.com/Mintplex-Labs/anythingllm-extension.git
+[submodule "embed"]
+	path = embed
+	url = https://github.com/Mintplex-Labs/anythingllm-embed.git
+	branch = main

.hadolint.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+failure-threshold: warning
+ignored:
+  - DL3008
+  - DL3013
+format: tty
+trustedRegistries:
+  - docker.io
+  - gcr.io

.nvmrc ADDED Viewed

	@@ -0,0 +1 @@


1	+ v18.18.0

.prettierignore ADDED Viewed

	@@ -0,0 +1,17 @@

+# defaults
+**/.git
+**/.svn
+**/.hg
+**/node_modules
+#frontend
+frontend/bundleinspector.html
+**/dist
+#server
+server/swagger/openapi.json
+server/**/*.mjs
+#embed
+**/static/**
+embed/src/utils/chat/hljs.js

.prettierrc ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "tabWidth": 2,
+  "useTabs": false,
+  "endOfLine": "lf",
+  "semi": true,
+  "singleQuote": false,
+  "printWidth": 80,
+  "trailingComma": "es5",
+  "bracketSpacing": true,
+  "bracketSameLine": false,
+  "overrides": [
+    {
+      "files": ["*.js", "*.mjs", "*.jsx"],
+      "options": {
+        "parser": "flow",
+        "arrowParens": "always"
+      }
+    },
+    {
+      "files": ["*.config.js"],
+      "options": {
+        "semi": false,
+        "parser": "flow",
+        "trailingComma": "none"
+      }
+    },
+    {
+      "files": "*.html",
+      "options": {
+        "bracketSameLine": true
+      }
+    },
+    {
+      "files": ".prettierrc",
+      "options": { "parser": "json" }
+    }
+  ]
+}

.vscode/launch.json ADDED Viewed

	@@ -0,0 +1,74 @@

+{
+  // Use IntelliSense to learn about possible attributes.
+  // Hover to view descriptions of existing attributes.
+  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+  "version": "0.2.0",
+  "configurations": [
+    {
+      "name": "Collector debug",
+      "request": "launch",
+      "cwd": "${workspaceFolder}/collector",
+      "env": {
+        "NODE_ENV": "development"
+      },
+      "runtimeArgs": [
+        "index.js"
+      ],
+      // not using yarn/nodemon because it doesn't work with breakpoints
+      // "runtimeExecutable": "yarn",
+      "skipFiles": [
+        "<node_internals>/**"
+      ],
+      "type": "node"
+    },
+    {
+      "name": "Server debug",
+      "request": "launch",
+      "cwd": "${workspaceFolder}/server",
+      "env": {
+        "NODE_ENV": "development"
+      },
+      "runtimeArgs": [
+        "index.js"
+      ],
+      // not using yarn/nodemon because it doesn't work with breakpoints
+      // "runtimeExecutable": "yarn",
+      "skipFiles": [
+        "<node_internals>/**"
+      ],
+      "type": "node"
+    },
+    {
+      "name": "Frontend debug",
+      "request": "launch",
+      "cwd": "${workspaceFolder}/frontend",
+      "env": {
+        "NODE_ENV": "development",
+      },
+      "runtimeExecutable": "${workspaceFolder}/frontend/node_modules/.bin/vite",
+      "runtimeArgs": [
+        "--debug",
+        "--host=0.0.0.0"
+      ],
+      // "runtimeExecutable": "yarn",
+      "skipFiles": [
+        "<node_internals>/**"
+      ],
+      "type": "node"
+    },
+    {
+      "name": "Launch Edge",
+      "request": "launch",
+      "type": "msedge",
+      "url": "http://localhost:3000",
+      "webRoot": "${workspaceFolder}"
+    },
+    {
+      "type": "chrome",
+      "request": "launch",
+      "name": "Launch Chrome against localhost",
+      "url": "http://localhost:3000",
+      "webRoot": "${workspaceFolder}"
+    }
+  ]
+}

.vscode/settings.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "cSpell.words": [
+    "adoc",
+    "aibitat",
+    "AIbitat",
+    "allm",
+    "anythingllm",
+    "Apipie",
+    "Astra",
+    "Chartable",
+    "cleancss",
+    "comkey",
+    "cooldown",
+    "cooldowns",
+    "datafile",
+    "Deduplicator",
+    "Dockerized",
+    "docpath",
+    "elevenlabs",
+    "Embeddable",
+    "epub",
+    "fireworksai",
+    "GROQ",
+    "hljs",
+    "huggingface",
+    "inferencing",
+    "koboldcpp",
+    "Langchain",
+    "lmstudio",
+    "localai",
+    "mbox",
+    "Milvus",
+    "Mintplex",
+    "mixtral",
+    "moderations",
+    "novita",
+    "numpages",
+    "Ollama",
+    "Oobabooga",
+    "openai",
+    "opendocument",
+    "openrouter",
+    "pagerender",
+    "ppio",
+    "Qdrant",
+    "royalblue",
+    "SearchApi",
+    "searxng",
+    "Serper",
+    "Serply",
+    "streamable",
+    "textgenwebui",
+    "togetherai",
+    "Unembed",
+    "uuidv",
+    "vectordbs",
+    "Weaviate",
+    "XAILLM",
+    "Zilliz"
+  ],
+  "eslint.experimental.useFlatConfig": true,
+  "docker.languageserver.formatter.ignoreMultilineInstructions": true
+}

.vscode/tasks.json ADDED Viewed

	@@ -0,0 +1,94 @@

+{
+  // See https://go.microsoft.com/fwlink/?LinkId=733558
+  // for the documentation about the tasks.json format
+  "version": "2.0.0",
+  "tasks": [
+    {
+      "type": "shell",
+      "options": {
+        "cwd": "${workspaceFolder}/collector",
+        "statusbar": {
+          "color": "#ffea00",
+          "detail": "Runs the collector",
+          "label": "Collector: $(play) run",
+          "running": {
+            "color": "#ffea00",
+            "label": "Collector: $(gear~spin) running"
+          }
+        }
+      },
+      "command": "cd ${workspaceFolder}/collector/ && yarn dev",
+      "runOptions": {
+        "instanceLimit": 1,
+        "reevaluateOnRerun": true
+      },
+      "presentation": {
+        "echo": true,
+        "reveal": "always",
+        "focus": false,
+        "panel": "shared",
+        "showReuseMessage": true,
+        "clear": false
+      },
+      "label": "Collector: run"
+    },
+    {
+      "type": "shell",
+      "options": {
+        "cwd": "${workspaceFolder}/server",
+        "statusbar": {
+          "color": "#ffea00",
+          "detail": "Runs the server",
+          "label": "Server: $(play) run",
+          "running": {
+            "color": "#ffea00",
+            "label": "Server: $(gear~spin) running"
+          }
+        }
+      },
+      "command": "if [ \"${CODESPACES}\" = \"true\" ]; then while ! gh codespace ports -c $CODESPACE_NAME | grep 3001; do sleep 1; done; gh codespace ports visibility 3001:public -c $CODESPACE_NAME; fi & cd ${workspaceFolder}/server/ && yarn dev",
+      "runOptions": {
+        "instanceLimit": 1,
+        "reevaluateOnRerun": true
+      },
+      "presentation": {
+        "echo": true,
+        "reveal": "always",
+        "focus": false,
+        "panel": "shared",
+        "showReuseMessage": true,
+        "clear": false
+      },
+      "label": "Server: run"
+    },
+    {
+      "type": "shell",
+      "options": {
+        "cwd": "${workspaceFolder}/frontend",
+        "statusbar": {
+          "color": "#ffea00",
+          "detail": "Runs the frontend",
+          "label": "Frontend: $(play) run",
+          "running": {
+            "color": "#ffea00",
+            "label": "Frontend: $(gear~spin) running"
+          }
+        }
+      },
+      "command": "cd ${workspaceFolder}/frontend/ && yarn dev",
+      "runOptions": {
+        "instanceLimit": 1,
+        "reevaluateOnRerun": true
+      },
+      "presentation": {
+        "echo": true,
+        "reveal": "always",
+        "focus": false,
+        "panel": "shared",
+        "showReuseMessage": true,
+        "clear": false
+      },
+      "label": "Frontend: run"
+    }
+  ]
+}

BARE_METAL.md ADDED Viewed

	@@ -0,0 +1,147 @@

+# Run AnythingLLM in production without Docker
+> [!WARNING]
+> This method of deployment is **not supported** by the core-team and is to be used as a reference for your deployment.
+> You are fully responsible for securing your deployment and data in this mode.
+> **Any issues** experienced from bare-metal or non-containerized deployments will be **not** answered or supported.
+Here you can find the scripts and known working process to run AnythingLLM outside of a Docker container.
+### Minimum Requirements
+> [!TIP]
+> You should aim for at least 2GB of RAM. Disk storage is proportional to however much data
+> you will be storing (documents, vectors, models, etc). Minimum 10GB recommended.
+- NodeJS v18
+- Yarn
+## Getting started
+1. Clone the repo into your server as the user who the application will run as.
+`git clone git@github.com:Mintplex-Labs/anything-llm.git`
+2. `cd anything-llm` and run `yarn setup`. This will install all dependencies to run in production as well as debug the application.
+3. `cp server/.env.example server/.env` to create the basic ENV file for where instance settings will be read from on service start.
+4. Ensure that the `server/.env` file has _at least_ these keys to start. These values will persist and this file will be automatically written and managed after your first successful boot.
+```
+STORAGE_DIR="/your/absolute/path/to/server/storage"
+```
+5. Edit the `frontend/.env` file for the `VITE_BASE_API` to now be set to `/api`. This is documented in the .env for which one you should use.
+```
+# VITE_API_BASE='http://localhost:3001/api' # Use this URL when developing locally
+# VITE_API_BASE="https://$CODESPACE_NAME-3001.$GITHUB_CODESPACES_PORT_FORWARDING_DOMAIN/api" # for GitHub Codespaces
+VITE_API_BASE='/api' # Use this URL deploying on non-localhost address OR in docker.
+```
+## To start the application
+AnythingLLM is comprised of three main sections. The `frontend`, `server`, and `collector`. When running in production you will be running `server` and `collector` on two different processes, with a build step for compilation of the frontend.
+1. Build the frontend application.
+`cd frontend && yarn build` - this will produce a `frontend/dist` folder that will be used later.
+2. Copy `frontend/dist` to `server/public` - `cp -R frontend/dist server/public`.
+This should create a folder in `server` named `public` which contains a top level `index.html` file and various other files/folders.
+3. Migrate and prepare your database file.
+```
+cd server && npx prisma generate --schema=./prisma/schema.prisma
+cd server && npx prisma migrate deploy --schema=./prisma/schema.prisma
+```
+4. Boot the server in production
+`cd server && NODE_ENV=production node index.js &`
+5. Boot the collection in another process
+`cd collector && NODE_ENV=production node index.js &`
+AnythingLLM should now be running on `http://localhost:3001`!
+## Updating AnythingLLM
+To update AnythingLLM with future updates you can `git pull origin master` to pull in the latest code and then repeat steps 2 - 5 to deploy with all changes fully.
+_note_ You should ensure that each folder runs `yarn` again to ensure packages are up to date in case any dependencies were added, changed, or removed.
+_note_ You should `pkill node` before running an update so that you are not running multiple AnythingLLM processes on the same instance as this can cause conflicts.
+### Example update script
+```shell
+#!/bin/bash
+cd $HOME/anything-llm &&\
+git checkout . &&\
+git pull origin master &&\
+echo "HEAD pulled to commit $(git log -1 --pretty=format:"%h" | tail -n 1)"
+echo "Freezing current ENVs"
+curl -I "http://localhost:3001/api/env-dump" | head -n 1|cut -d$' ' -f2
+echo "Rebuilding Frontend"
+cd $HOME/anything-llm/frontend && yarn && yarn build && cd $HOME/anything-llm
+echo "Copying to Server Public"
+rm -rf server/public
+cp -r frontend/dist server/public
+echo "Killing node processes"
+pkill node
+echo "Installing collector dependencies"
+cd $HOME/anything-llm/collector && yarn
+echo "Installing server dependencies & running migrations"
+cd $HOME/anything-llm/server && yarn
+cd $HOME/anything-llm/server && npx prisma migrate deploy --schema=./prisma/schema.prisma
+cd $HOME/anything-llm/server && npx prisma generate
+echo "Booting up services."
+truncate -s 0 /logs/server.log # Or any other log file location.
+truncate -s 0 /logs/collector.log
+cd $HOME/anything-llm/server
+(NODE_ENV=production node index.js) &> /logs/server.log &
+cd $HOME/anything-llm/collector
+(NODE_ENV=production node index.js) &> /logs/collector.log &
+```
+## Using Nginx?
+If you are using Nginx, you can use the following example configuration to proxy the requests to the server. Chats for streaming require **websocket** connections, so you need to ensure that the Nginx configuration is set up to support websockets. You can do this with a simple reverse proxy configuration.
+```nginx
+server {
+   # Enable websocket connections for agent protocol.
+   location ~* ^/api/agent-invocation/(.*) {
+      proxy_pass http://0.0.0.0:3001;
+      proxy_http_version 1.1;
+      proxy_set_header Upgrade $http_upgrade;
+      proxy_set_header Connection "Upgrade";
+   }
+   listen 80;
+   server_name [insert FQDN here];
+   location / {
+      # Prevent timeouts on long-running requests.
+      proxy_connect_timeout       605;
+      proxy_send_timeout          605;
+      proxy_read_timeout          605;
+      send_timeout                605;
+      keepalive_timeout           605;
+      # Enable readable HTTP Streaming for LLM streamed responses
+      proxy_buffering off;
+      proxy_cache off;
+      # Proxy your locally running service
+      proxy_pass  http://0.0.0.0:3001;
+    }
+}
+```

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# Contributing to AnythingLLM
+AnythingLLM is an open-source project and we welcome contributions from the community.
+## Reporting Issues
+If you encounter a bug or have a feature request, please open an issue on the
+[GitHub issue tracker](https://github.com/mintplex-labs/anything-llm).
+## Picking an issue
+We track issues on the GitHub issue tracker. If you are looking for something to
+work on, check the [good first issue](https://github.com/mintplex-labs/anything-llm/contribute) label. These issues are typically the best described and have the smallest scope. There may be issues that are not labeled as good first issue, but are still a good starting point.
+If there's an issue you are interested in working on, please leave a comment on the issue. This will help us avoid duplicate work. Additionally, if you have questions about the issue, please ask them in the issue comments. We are happy to provide guidance on how to approach the issue.
+## Before you start
+Keep in mind that we are a small team and have limited resources. We will do our best to review and merge your PRs, but please be patient. Ultimately, **we become the maintainer** of your changes. It is our responsibility to make sure that the changes are working as expected and are of high quality as well as being compatible with the rest of the project both for existing users and for future users & features.
+Before you start working on an issue, please read the following so that you don't waste time on something that is not a good fit for the project or is more suitable for a personal fork. We would rather answer a comment on an issue than close a PR after you've spent time on it. Your time is valuable and we appreciate your time and effort to make AnythingLLM better.
+0. (most important) If you are making a PR that does not have a corresponding issue, **it will not be merged.** _The only exception to this is language translations._
+1. If you are modifying the permission system for a new role or something custom, you are likely better off forking the project and building your own version since this is a core part of the project and is only to be maintained by the AnythingLLM team.
+2. Integrations (LLM, Vector DB, etc.) are reviewed at our discretion. We will eventually get to them. Do not expect us to merge your integration PR instantly since there are often many moving parts and we want to make sure we get it right. We will get to it!
+3. It is our discretion to merge or not merge a PR. We value every contribution, but we also value the quality of the code and the user experience we envision for the project. It is a fine line to walk when running a project like this and please understand that merging or not merging a PR is not a reflection of the quality of the contribution and is not personal. We will do our best to provide feedback on the PR and help you make the changes necessary to get it merged.
+4. **Security** is always important. If you have a security concern, please do not open an issue. Instead, please open a CVE on our designated reporting platform [Huntr](https://huntr.com) or contact us at [team@mintplexlabs.com](mailto:team@mintplexlabs.com).
+## Configuring Git
+First, fork the repository on GitHub, then clone your fork:
+```bash
+git clone https://github.com/<username>/anything-llm.git
+cd anything-llm
+```
+Then add the main repository as a remote:
+```bash
+git remote add upstream https://github.com/mintplex-labs/anything-llm.git
+git fetch upstream
+```
+## Setting up your development environment
+In the root of the repository, run:
+```bash
+yarn setup
+```
+This will install the dependencies, set up the proper and expected ENV files for the project, and run the prisma setup script.
+Next, run:
+```bash
+yarn dev:all
+```
+This will start the server, frontend, and collector in development mode. Changes to the code will be hot reloaded.
+## Best practices for pull requests
+For the best chance of having your pull request accepted, please follow these guidelines:
+1. Unit test all bug fixes and new features. Your code will not be merged if it
+   doesn't have tests.
+1. If you change the public API, update the documentation in the `anythingllm-docs` repository.
+1. Aim to minimize the number of changes in each pull request. Keep to solving
+   one problem at a time, when possible.
+1. Before marking a pull request ready-for-review, do a self review of your code.
+   Is it clear why you are making the changes? Are the changes easy to understand?
+1. Use [conventional commit messages](https://www.conventionalcommits.org/en/) as pull request titles. Examples:
+    * New feature: `feat: adding foo API`
+    * Bug fix: `fix: issue with foo API`
+    * Documentation change: `docs: adding foo API documentation`
+1. If your pull request is a work in progress, leave the pull request as a draft.
+   We will assume the pull request is ready for review when it is opened.
+1. When writing tests, test the error cases. Make sure they have understandable
+   error messages.
+## Project structure
+The core library is written in Node.js. There are additional sub-repositories for the embed widget and browser extension. These are not part of the core AnythingLLM project, but are maintained by the AnythingLLM team.
+* `server`: Node.js server source code
+* `frontend`: React frontend source code
+* `collector`: Python collector source code
+## Release process
+Changes to the core AnythingLLM project are released through the `master` branch. When a PR is merged into `master`, a new version of the package is published to Docker and GitHub Container Registry under the `latest` tag.
+When a new version is released, the following steps are taken a new image is built and pushed to Docker Hub and GitHub Container Registry under the assoicated version tag. Version tags are of the format `v<major>.<minor>.<patch>` and are pinned code, while `latest` is the latest version of the code at any point in time.
+### Desktop propogation
+Changes to the desktop app are downstream of the core AnythingLLM project. Releases of the desktop app are published at the same time as the core AnythingLLM project. Code from the core AnythingLLM project is copied into the desktop app into an Electron wrapper. The Electron wrapper that wraps around the core AnythingLLM project is **not** part of the core AnythingLLM project, but is maintained by the AnythingLLM team.
+## License
+By contributing to AnythingLLM (this repository), you agree to license your contributions under the MIT license.

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+The MIT License
+Copyright (c) Mintplex Labs Inc.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

SECURITY.md ADDED Viewed

	@@ -0,0 +1,15 @@

+# Security Policy
+## Supported Versions
+Use this section to tell people about which versions of your project are
+currently being supported with security updates.
+| Version | Supported          |
+| ------- | ------------------ |
+| 0.1.x   | :white_check_mark: |
+## Reporting a Vulnerability
+If a security concern is found that you would like to disclose you can create a PR for it or if you would like to clear this issue before posting you can email [Core Mintplex Labs Team](mailto:team@mintplexlabs.com).

cloud-deployments/aws/cloudformation/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,49 @@

+# How to deploy a private AnythingLLM instance on AWS
+With an AWS account you can easily deploy a private AnythingLLM instance on AWS. This will create a url that you can access from any browser over HTTP (HTTPS not supported). This single instance will run on your own keys and they will not be exposed - however if you want your instance to be protected it is highly recommend that you set a password once setup is complete.
+**Quick Launch (EASY)**
+1. Log in to your AWS account
+2. Open [CloudFormation](https://us-west-1.console.aws.amazon.com/cloudformation/home)
+3. Ensure you are deploying in a geographic zone that is nearest to your physical location to reduce latency.
+4. Click `Create Stack`
+![Create Stack](../../../images/screenshots/create_stack.png)
+5. Use the file `cloudformation_create_anythingllm.json` as your JSON template.
+![Upload Stack](../../../images/screenshots/upload.png)
+6. Click Deploy.
+7. Wait for stack events to finish and be marked as `Completed`
+8. View `Outputs` tab.
+![Stack Output](../../../images/screenshots/cf_outputs.png)
+9. Wait for all resources to be built. Now wait until instance is available on `[InstanceIP]:3001`.
+This process may take up to 10 minutes. See **Note** below on how to visualize this process.
+The output of this cloudformation stack will be:
+- 1 EC2 Instance
+- 1 Security Group with 0.0.0.0/0 access on port 3001
+- 1 EC2 Instance Volume `gb2` of 10Gib minimum - customizable pre-deploy.
+**Requirements**
+- An AWS account with billing information.
+## Please read this notice before submitting issues about your deployment
+**Note:**
+Your instance will not be available instantly. Depending on the instance size you launched with it can take 5-10 minutes to fully boot up.
+If you want to check the instance's progress, navigate to [your deployed EC2 instances](https://us-west-1.console.aws.amazon.com/ec2/home) and connect to your instance via SSH in browser.
+Once connected run `sudo tail -f /var/log/cloud-init-output.log` and wait for the file to conclude deployment of the docker image.
+You should see an output like this
+```
+[+] Running 2/2
+ ⠿ Network docker_anything-llm  Created
+ ⠿ Container anything-llm       Started
+```
+Additionally, your use of this deployment process means you are responsible for any costs of these AWS resources fully.

cloud-deployments/aws/cloudformation/aws_https_instructions.md ADDED Viewed

	@@ -0,0 +1,118 @@

+# How to Configure HTTPS for Anything LLM AWS private deployment
+Instructions for manual https configuration after generating and running the aws cloudformation template (aws_build_from_source_no_credentials.json). Tested on following browsers: Firefox version 119, Chrome version 118, Edge 118.
+**Requirements**
+- Successful deployment of Amazon Linux 2023 EC2 instance with Docker container running Anything LLM
+- Admin priv to configure Elastic IP for EC2 instance via AWS Management Console UI
+- Admin priv to configure DNS services (i.e. AWS Route 53) via AWS Management Console UI
+- Admin priv to configure EC2 Security Group rules via AWS Management Console UI
+## Step 1: Allocate and assign Elastic IP Address to your deployed EC2 instance
+1. Follow AWS instructions on allocating EIP here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-eips-allocating
+2. Follow AWS instructions on assigning EIP to EC2 instance here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-eips-associating
+## Step 2: Configure DNS A record to resolve to the previously assigned EC2 instance via EIP
+These instructions assume that you already have a top-level domain configured and are using a subdomain
+to access AnythingLLM.
+1. Follow AWS instructions on routing traffic to EC2 instance here: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-to-ec2-instance.html
+## Step 3: Install and enable nginx
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo yum install nginx -y
+2. $sudo systemctl enable nginx && sudo systemctl start nginx
+## Step 4: Install certbot
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo yum install -y augeas-libs
+2. $sudo python3 -m venv /opt/certbot/
+3. $sudo /opt/certbot/bin/pip install --upgrade pip
+4. $sudo /opt/certbot/bin/pip install certbot certbot-nginx
+5. $sudo ln -s /opt/certbot/bin/certbot /usr/bin/certbot
+## Step 5: Configure temporary Inbound Traffic Rule for Security Group to certbot DNS verification
+1. Follow AWS instructions on creating inbound rule (http port 80 0.0.0.0/0) for EC2 security group here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#adding-security-group-rule
+## Step 6: Comment out default http NGINX proxy configuration
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo vi /etc/nginx/nginx.conf
+2. In the nginx.conf file, comment out the default server block configuration for http/port 80. It should look something like the following:
+```
+#    server {
+#        listen       80;
+#        listen       [::]:80;
+#        server_name  _;
+#        root         /usr/share/nginx/html;
+#
+#        # Load configuration files for the default server block.
+#        include /etc/nginx/default.d/*.conf;
+#
+#        error_page 404 /404.html;
+#        location = /404.html {
+#        }
+#
+#        error_page 500 502 503 504 /50x.html;
+#        location = /50x.html {
+#        }
+#    }
+```
+3. Enter ':wq' to save the changes to the nginx default config
+## Step 7: Create simple http proxy configuration for AnythingLLM
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo vi /etc/nginx/conf.d/anything.conf
+2. Add the following configuration ensuring that you add your FQDN:.
+```
+server {
+   # Enable websocket connections for agent protocol.
+   location ~* ^/api/agent-invocation/(.*) {
+      proxy_pass http://0.0.0.0:3001;
+      proxy_http_version 1.1;
+      proxy_set_header Upgrade $http_upgrade;
+      proxy_set_header Connection "Upgrade";
+   }
+   listen 80;
+   server_name [insert FQDN here];
+   location / {
+      # Prevent timeouts on long-running requests.
+      proxy_connect_timeout       605;
+      proxy_send_timeout          605;
+      proxy_read_timeout          605;
+      send_timeout                605;
+      keepalive_timeout           605;
+      # Enable readable HTTP Streaming for LLM streamed responses
+      proxy_buffering off;
+      proxy_cache off;
+      # Proxy your locally running service
+      proxy_pass  http://0.0.0.0:3001;
+    }
+}
+```
+3. Enter ':wq' to save the changes to the anything config file
+## Step 8: Test nginx http proxy config and restart nginx service
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo nginx -t
+2. $sudo systemctl restart nginx
+3. Navigate to http://FQDN in a browser and you should be proxied to the AnythingLLM web UI.
+## Step 9: Generate/install cert
+These instructions are for CLI configuration and assume you are logged in to EC2 instance as the ec2-user.
+1. $sudo certbot --nginx -d [Insert FQDN here]
+    Example command: $sudo certbot --nginx -d anythingllm.exampleorganization.org
+    This command will generate the appropriate certificate files, write the files to /etc/letsencrypt/live/yourFQDN, and make updates to the nginx
+    configuration file for anythingllm located at /etc/nginx/conf.d/anything.llm
+3. Enter the email address you would like to use for updates.
+4. Accept the terms of service.
+5. Accept or decline to receive communication from LetsEncrypt.
+## Step 10: Test Cert installation
+1. $sudo cat /etc/nginx/conf.d/anything.conf
+Your should see a completely updated configuration that includes https/443 and a redirect configuration for http/80.
+2. Navigate to https://FQDN in a browser and you should be proxied to the AnythingLLM web UI.
+## Step 11: (Optional) Remove temporary Inbound Traffic Rule for Security Group to certbot DNS verification
+1. Follow AWS instructions on deleting inbound rule (http port 80 0.0.0.0/0) for EC2 security group here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#deleting-security-group-rule

cloud-deployments/aws/cloudformation/cloudformation_create_anythingllm.json ADDED Viewed

	@@ -0,0 +1,234 @@

+{
+  "AWSTemplateFormatVersion": "2010-09-09",
+  "Description": "Create a stack that runs AnythingLLM on a single instance",
+  "Parameters": {
+    "InstanceType": {
+      "Description": "EC2 instance type",
+      "Type": "String",
+      "Default": "t3.small"
+    },
+    "InstanceVolume": {
+      "Description": "Storage size of disk on Instance in GB",
+      "Type": "Number",
+      "Default": 10,
+      "MinValue": 4
+    }
+  },
+  "Resources": {
+    "AnythingLLMInstance": {
+      "Type": "AWS::EC2::Instance",
+      "Properties": {
+        "ImageId": {
+          "Fn::FindInMap": [
+            "Region2AMI",
+            {
+              "Ref": "AWS::Region"
+            },
+            "AMI"
+          ]
+        },
+        "InstanceType": {
+          "Ref": "InstanceType"
+        },
+        "SecurityGroupIds": [
+          {
+            "Ref": "AnythingLLMInstanceSecurityGroup"
+          }
+        ],
+        "BlockDeviceMappings": [
+          {
+            "DeviceName": {
+              "Fn::FindInMap": [
+                "Region2AMI",
+                {
+                  "Ref": "AWS::Region"
+                },
+                "RootDeviceName"
+              ]
+            },
+            "Ebs": {
+              "VolumeSize": {
+                "Ref": "InstanceVolume"
+              }
+            }
+          }
+        ],
+        "UserData": {
+          "Fn::Base64": {
+            "Fn::Join": [
+              "",
+              [
+                "Content-Type: multipart/mixed; boundary=\"//\"\n",
+                "MIME-Version: 1.0\n",
+                "\n",
+                "--//\n",
+                "Content-Type: text/cloud-config; charset=\"us-ascii\"\n",
+                "MIME-Version: 1.0\n",
+                "Content-Transfer-Encoding: 7bit\n",
+                "Content-Disposition: attachment; filename=\"cloud-config.txt\"\n",
+                "\n",
+                "\n",
+                "#cloud-config\n",
+                "cloud_final_modules:\n",
+                "- [scripts-user, once-per-instance]\n",
+                "\n",
+                "\n",
+                "--//\n",
+                "Content-Type: text/x-shellscript; charset=\"us-ascii\"\n",
+                "MIME-Version: 1.0\n",
+                "Content-Transfer-Encoding: 7bit\n",
+                "Content-Disposition: attachment; filename=\"userdata.txt\"\n",
+                "\n",
+                "\n",
+                "#!/bin/bash\n",
+                "# check output of userdata script with sudo tail -f /var/log/cloud-init-output.log\n",
+                "sudo yum install docker iptables -y\n",
+                "sudo iptables -A OUTPUT -m owner ! --uid-owner root -d 169.254.169.254 -j DROP\n",
+                "sudo systemctl enable docker\n",
+                "sudo systemctl start docker\n",
+                "mkdir -p /home/ec2-user/anythingllm\n",
+                "touch /home/ec2-user/anythingllm/.env\n",
+                "sudo chown ec2-user:ec2-user -R /home/ec2-user/anythingllm\n",
+                "docker pull mintplexlabs/anythingllm\n",
+                "docker run -d -p 3001:3001 --cap-add SYS_ADMIN -v /home/ec2-user/anythingllm:/app/server/storage -v /home/ec2-user/anythingllm/.env:/app/server/.env -e STORAGE_DIR=\"/app/server/storage\" mintplexlabs/anythingllm\n",
+                "echo \"Container ID: $(sudo docker ps --latest --quiet)\"\n",
+                "export ONLINE=$(curl -Is http://localhost:3001/api/ping | head -n 1|cut -d$' ' -f2)\n",
+                "echo \"Health check: $ONLINE\"\n",
+                "echo \"Setup complete! AnythingLLM instance is now online!\"\n",
+                "\n",
+                "--//--\n"
+              ]
+            ]
+          }
+        }
+      }
+    },
+    "AnythingLLMInstanceSecurityGroup": {
+      "Type": "AWS::EC2::SecurityGroup",
+      "Properties": {
+        "GroupDescription": "AnythingLLM Instance Security Group",
+        "SecurityGroupIngress": [
+          {
+            "IpProtocol": "tcp",
+            "FromPort": "22",
+            "ToPort": "22",
+            "CidrIp": "0.0.0.0/0"
+          },
+          {
+            "IpProtocol": "tcp",
+            "FromPort": "3001",
+            "ToPort": "3001",
+            "CidrIp": "0.0.0.0/0"
+          },
+          {
+            "IpProtocol": "tcp",
+            "FromPort": "3001",
+            "ToPort": "3001",
+            "CidrIpv6": "::/0"
+          }
+        ]
+      }
+    }
+  },
+  "Outputs": {
+    "ServerIp": {
+      "Description": "IP address of the AnythingLLM instance",
+      "Value": {
+        "Fn::GetAtt": [
+          "AnythingLLMInstance",
+          "PublicIp"
+        ]
+      }
+    },
+    "ServerURL": {
+      "Description": "URL of the AnythingLLM server",
+      "Value": {
+        "Fn::Join": [
+          "",
+          [
+            "http://",
+            {
+              "Fn::GetAtt": [
+                "AnythingLLMInstance",
+                "PublicIp"
+              ]
+            },
+            ":3001"
+          ]
+        ]
+      }
+    }
+  },
+  "Mappings": {
+    "Region2AMI": {
+      "ap-south-1": {
+        "AMI": "ami-0e6329e222e662a52",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "eu-north-1": {
+        "AMI": "ami-08c308b1bb265e927",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "eu-west-3": {
+        "AMI": "ami-069d1ea6bc64443f0",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "eu-west-2": {
+        "AMI": "ami-06a566ca43e14780d",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "eu-west-1": {
+        "AMI": "ami-0a8dc52684ee2fee2",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ap-northeast-3": {
+        "AMI": "ami-0c8a89b455fae8513",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ap-northeast-2": {
+        "AMI": "ami-0ff56409a6e8ea2a0",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ap-northeast-1": {
+        "AMI": "ami-0ab0bbbd329f565e6",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ca-central-1": {
+        "AMI": "ami-033c256a10931f206",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "sa-east-1": {
+        "AMI": "ami-0dabf4dab6b183eef",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ap-southeast-1": {
+        "AMI": "ami-0dc5785603ad4ff54",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "ap-southeast-2": {
+        "AMI": "ami-0c5d61202c3b9c33e",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "eu-central-1": {
+        "AMI": "ami-004359656ecac6a95",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "us-east-1": {
+        "AMI": "ami-0cff7528ff583bf9a",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "us-east-2": {
+        "AMI": "ami-02238ac43d6385ab3",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "us-west-1": {
+        "AMI": "ami-01163e76c844a2129",
+        "RootDeviceName": "/dev/xvda"
+      },
+      "us-west-2": {
+        "AMI": "ami-0ceecbb0f30a902a6",
+        "RootDeviceName": "/dev/xvda"
+      }
+    }
+  }
+}

cloud-deployments/digitalocean/terraform/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# How to deploy a private AnythingLLM instance on DigitalOcean using Terraform
+With a DigitalOcean account, you can easily deploy a private AnythingLLM instance using Terraform. This will create a URL that you can access from any browser over HTTP (HTTPS not supported). This single instance will run on your own keys, and they will not be exposed. However, if you want your instance to be protected, it is highly recommended that you set a password once setup is complete.
+The output of this Terraform configuration will be:
+- 1 DigitalOcean Droplet
+- An IP address to access your application
+**Requirements**
+- An DigitalOcean  account with billing information
+- Terraform installed on your local machine
+  - Follow the instructions in the [official Terraform documentation](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) for your operating system.
+## How to deploy on DigitalOcean
+Open your terminal and navigate to the `docker` folder
+1. Create a `.env` file by cloning the `.env.example`.
+2. Navigate to `digitalocean/terraform` folder.
+3. Replace the token value in the provider "digitalocean" block in main.tf with your DigitalOcean API token.
+4. Run the following commands to initialize Terraform, review the infrastructure changes, and apply them:
+    ```
+    terraform init
+    terraform plan
+    terraform apply
+    ```
+Confirm the changes by typing yes when prompted.
+5. Once the deployment is complete, Terraform will output the public IP address of your droplet. You can access your application using this IP address.
+## How to deploy on DigitalOcean
+To delete the resources created by Terraform, run the following command in the terminal:
+`
+terraform destroy
+`
+## Please read this notice before submitting issues about your deployment
+**Note:**
+Your instance will not be available instantly. Depending on the instance size you launched with it can take anywhere from 5-10 minutes to fully boot up.
+If you want to check the instances progress, navigate to [your deployed instances](https://cloud.digitalocean.com/droplets) and connect to your instance via SSH in browser.
+Once connected run `sudo tail -f /var/log/cloud-init-output.log` and wait for the file to conclude deployment of the docker image.
+Additionally, your use of this deployment process means you are responsible for any costs of these Digital Ocean resources fully.

cloud-deployments/digitalocean/terraform/main.tf ADDED Viewed

	@@ -0,0 +1,52 @@

+terraform {
+  required_version = ">= 1.0.0"
+  required_providers {
+    digitalocean = {
+      source  = "digitalocean/digitalocean"
+      version = "~> 2.0"
+    }
+  }
+}
+provider "digitalocean" {
+  # Add your DigitalOcean API token here
+  token = "DigitalOcean API token"
+}
+resource "digitalocean_droplet" "anything_llm_instance" {
+  image  = "ubuntu-24-04-x64"
+  name   = "anything-llm-instance"
+  region = "nyc3"
+  size   = "s-2vcpu-2gb"
+  user_data = templatefile("user_data.tp1", {
+    env_content = local.formatted_env_content
+  })
+}
+locals {
+  env_content = file("../../../docker/.env")
+  formatted_env_content = join("\n", [
+    for line in split("\n", local.env_content) :
+    line
+    if !(
+      (
+        substr(line, 0, 1) == "#"
+      ) ||
+      (
+        substr(line, 0, 3) == "UID"
+      ) ||
+      (
+        substr(line, 0, 3) == "GID"
+      ) ||
+      (
+        substr(line, 0, 11) == "CLOUD_BUILD"
+      ) ||
+      (
+        line == ""
+      )
+    )
+  ])
+}

cloud-deployments/digitalocean/terraform/outputs.tf ADDED Viewed

	@@ -0,0 +1,4 @@

+output "ip_address" {
+  value = digitalocean_droplet.anything_llm_instance.ipv4_address
+  description = "The public IP address of your droplet application."
+}

cloud-deployments/digitalocean/terraform/user_data.tp1 ADDED Viewed

	@@ -0,0 +1,22 @@

+#!/bin/bash
+# check output of userdata script with sudo tail -f /var/log/cloud-init-output.log
+sudo apt-get update
+sudo apt-get install -y docker.io
+sudo usermod -a -G docker ubuntu
+sudo systemctl enable docker
+sudo systemctl start docker
+mkdir -p /home/anythingllm
+cat <<EOF >/home/anythingllm/.env
+${env_content}
+EOF
+sudo docker pull mintplexlabs/anythingllm
+sudo docker run -d -p 3001:3001 --cap-add SYS_ADMIN -v /home/anythingllm:/app/server/storage -v /home/anythingllm/.env:/app/server/.env -e STORAGE_DIR="/app/server/storage" mintplexlabs/anythingllm
+echo "Container ID: $(sudo docker ps --latest --quiet)"
+export ONLINE=$(curl -Is http://localhost:3001/api/ping | head -n 1|cut -d$' ' -f2)
+echo "Health check: $ONLINE"
+echo "Setup complete! AnythingLLM instance is now online!"

cloud-deployments/gcp/deployment/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# How to deploy a private AnythingLLM instance on GCP
+With a GCP account you can easily deploy a private AnythingLLM instance on GCP. This will create a url that you can access from any browser over HTTP (HTTPS not supported). This single instance will run on your own keys and they will not be exposed - however if you want your instance to be protected it is highly recommend that you set a password once setup is complete.
+The output of this cloudformation stack will be:
+- 1 GCP VM
+- 1 Security Group with 0.0.0.0/0 access on Ports 22 & 3001
+- 1 GCP VM Volume `gb2` of 10Gib minimum
+**Requirements**
+- An GCP account with billing information.
+## How to deploy on GCP
+Open your terminal
+1. Log in to your GCP account using the following command:
+    ```
+    gcloud auth login
+    ```
+2. After successful login, Run the following command to create a deployment using the Deployment Manager CLI:
+  ```
+  gcloud deployment-manager deployments create anything-llm-deployment --config gcp/deployment/gcp_deploy_anything_llm.yaml
+  ```
+Once you execute these steps, the CLI will initiate the deployment process on GCP based on your configuration file. You can monitor the deployment status and view the outputs using the Google Cloud Console or the Deployment Manager CLI commands.
+```
+gcloud compute instances get-serial-port-output anything-llm-instance
+```
+ssh into the instance
+```
+gcloud compute ssh anything-llm-instance
+```
+Delete the deployment
+```
+gcloud deployment-manager deployments delete anything-llm-deployment
+```
+## Please read this notice before submitting issues about your deployment
+**Note:**
+Your instance will not be available instantly. Depending on the instance size you launched with it can take anywhere from 5-10 minutes to fully boot up.
+If you want to check the instances progress, navigate to [your deployed instances](https://console.cloud.google.com/compute/instances) and connect to your instance via SSH in browser.
+Once connected run `sudo tail -f /var/log/cloud-init-output.log` and wait for the file to conclude deployment of the docker image.
+Additionally, your use of this deployment process means you are responsible for any costs of these GCP resources fully.

cloud-deployments/gcp/deployment/gcp_deploy_anything_llm.yaml ADDED Viewed

	@@ -0,0 +1,45 @@

+resources:
+  - name: anything-llm-instance
+    type: compute.v1.instance
+    properties:
+      zone: us-central1-a
+      machineType: zones/us-central1-a/machineTypes/n1-standard-1
+      disks:
+        - deviceName: boot
+          type: PERSISTENT
+          boot: true
+          autoDelete: true
+          initializeParams:
+            sourceImage: projects/ubuntu-os-cloud/global/images/family/ubuntu-2004-lts
+            diskSizeGb: 10
+      networkInterfaces:
+        - network: global/networks/default
+          accessConfigs:
+            - name: External NAT
+              type: ONE_TO_ONE_NAT
+      metadata:
+        items:
+          - key: startup-script
+            value: |
+              #!/bin/bash
+              # check output of userdata script with sudo tail -f /var/log/cloud-init-output.log
+              sudo apt-get update
+              sudo apt-get install -y docker.io
+              sudo usermod -a -G docker ubuntu
+              sudo systemctl enable docker
+              sudo systemctl start docker
+              mkdir -p /home/anythingllm
+              touch /home/anythingllm/.env
+              sudo chown -R ubuntu:ubuntu /home/anythingllm
+              sudo docker pull mintplexlabs/anythingllm
+              sudo docker run -d -p 3001:3001 --cap-add SYS_ADMIN -v /home/anythingllm:/app/server/storage -v /home/anythingllm/.env:/app/server/.env -e STORAGE_DIR="/app/server/storage" mintplexlabs/anythingllm
+              echo "Container ID: $(sudo docker ps --latest --quiet)"
+              export ONLINE=$(curl -Is http://localhost:3001/api/ping | head -n 1|cut -d$' ' -f2)
+              echo "Health check: $ONLINE"
+              echo "Setup complete! AnythingLLM instance is now online!"

cloud-deployments/huggingface-spaces/Dockerfile ADDED Viewed

	@@ -0,0 +1,31 @@

+# With this dockerfile in a Huggingface space you will get an entire AnythingLLM instance running
+# in your space with all features you would normally get from the docker based version of AnythingLLM.
+#
+# How to use
+# - Login to https://huggingface.co/spaces
+# - Click on "Create new Space"
+# - Name the space and select "Docker" as the SDK w/ a blank template
+# - The default 2vCPU/16GB machine is OK. The more the merrier.
+# - Decide if you want your AnythingLLM Space public or private.
+#   **You might want to stay private until you at least set a password or enable multi-user mode**
+# - Click "Create Space"
+# - Click on "Settings" on top of page (https://huggingface.co/spaces/<username>/<space-name>/settings)
+# - Scroll to "Persistent Storage" and select the lowest tier of now - you can upgrade if you run out.
+# - Confirm and continue storage upgrade
+# - Go to "Files" Tab (https://huggingface.co/spaces/<username>/<space-name>/tree/main)
+# - Click "Add Files"
+# - Upload this file or create a file named `Dockerfile` and copy-paste this content into it. "Commit to main" and save.
+# - Your container will build and boot. You now have AnythingLLM on HuggingFace. Your data is stored in the persistent storage attached.
+# Have Fun 🤗
+# Have issues? Check the logs on HuggingFace for clues.
+FROM mintplexlabs/anythingllm:render
+USER root
+RUN mkdir -p /data/storage
+RUN ln -s /data/storage /storage
+USER anythingllm
+ENV STORAGE_DIR="/data/storage"
+ENV SERVER_PORT=7860
+ENTRYPOINT ["/bin/bash", "/usr/local/bin/render-entrypoint.sh"]

cloud-deployments/k8/manifest.yaml ADDED Viewed

	@@ -0,0 +1,214 @@

+---
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: anything-llm-volume
+  annotations:
+    pv.beta.kubernetes.io/uid: "1000"
+    pv.beta.kubernetes.io/gid: "1000"
+spec:
+  storageClassName: gp2
+  capacity:
+    storage: 5Gi
+  accessModes:
+    - ReadWriteOnce
+  awsElasticBlockStore:
+    # This is the volume UUID from AWS EC2 EBS Volumes list.
+    volumeID: "{{ anythingllm_awsElasticBlockStore_volumeID }}"
+    fsType: ext4
+  nodeAffinity:
+    required:
+      nodeSelectorTerms:
+      - matchExpressions:
+        - key: topology.kubernetes.io/zone
+          operator: In
+          values:
+          - us-east-1c
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: anything-llm-volume-claim
+  namespace: "{{ namespace }}"
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 5Gi
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: anything-llm
+  namespace: "{{ namespace }}"
+  labels:
+    anything-llm: "true"
+spec:
+  selector:
+    matchLabels:
+      k8s-app: anything-llm
+  replicas: 1
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 0%
+      maxUnavailable: 100%
+  template:
+    metadata:
+      labels:
+        anything-llm: "true"
+        k8s-app: anything-llm
+        app.kubernetes.io/name: anything-llm
+        app.kubernetes.io/part-of: anything-llm
+      annotations:
+        prometheus.io/scrape: "true"
+        prometheus.io/path: /metrics
+        prometheus.io/port: "9090"
+    spec:
+      serviceAccountName: "default"
+      terminationGracePeriodSeconds: 10
+      securityContext:
+        fsGroup: 1000
+        runAsNonRoot: true
+        runAsGroup: 1000
+        runAsUser: 1000
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: topology.kubernetes.io/zone
+                operator: In
+                values:
+                - us-east-1c
+      containers:
+      - name: anything-llm
+        resources:
+          limits:
+            memory: "1Gi"
+            cpu: "500m"
+          requests:
+            memory: "512Mi"
+            cpu: "250m"
+        imagePullPolicy: IfNotPresent
+        image: "mintplexlabs/anythingllm:render"
+        securityContext:
+          allowPrivilegeEscalation: true
+          capabilities:
+            add:
+              - SYS_ADMIN
+          runAsNonRoot: true
+          runAsGroup: 1000
+          runAsUser: 1000
+        command:
+          # Specify a command to override the Dockerfile's ENTRYPOINT.
+          - /bin/bash
+          - -c
+          - |
+            set -x -e
+            sleep 3
+            echo "AWS_REGION: $AWS_REGION"
+            echo "SERVER_PORT: $SERVER_PORT"
+            echo "NODE_ENV: $NODE_ENV"
+            echo "STORAGE_DIR: $STORAGE_DIR"
+            {
+              cd /app/server/ &&
+                npx prisma generate --schema=./prisma/schema.prisma &&
+                npx prisma migrate deploy --schema=./prisma/schema.prisma &&
+                node /app/server/index.js
+              echo "Server process exited with status $?"
+            } &
+            {
+              node /app/collector/index.js
+              echo "Collector process exited with status $?"
+            } &
+            wait -n
+            exit $?
+        readinessProbe:
+          httpGet:
+            path: /v1/api/health
+            port: 8888
+          initialDelaySeconds: 15
+          periodSeconds: 5
+          successThreshold: 2
+        livenessProbe:
+          httpGet:
+            path: /v1/api/health
+            port: 8888
+          initialDelaySeconds: 15
+          periodSeconds: 5
+          failureThreshold: 3
+        env:
+          - name: AWS_REGION
+            value: "{{ aws_region }}"
+          - name: AWS_ACCESS_KEY_ID
+            value: "{{ aws_access_id }}"
+          - name: AWS_SECRET_ACCESS_KEY
+            value: "{{ aws_access_secret }}"
+          - name: SERVER_PORT
+            value: "3001"
+          - name: JWT_SECRET
+            value: "my-random-string-for-seeding" # Please generate random string at least 12 chars long.
+          - name: STORAGE_DIR
+            value: "/storage"
+          - name: NODE_ENV
+            value: "production"
+          - name: UID
+            value: "1000"
+          - name: GID
+            value: "1000"
+        volumeMounts:
+          - name: anything-llm-server-storage-volume-mount
+            mountPath: /storage
+      volumes:
+        - name: anything-llm-server-storage-volume-mount
+          persistentVolumeClaim:
+            claimName: anything-llm-volume-claim
+---
+# This serves the UI and the backend.
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: anything-llm-ingress
+  namespace: "{{ namespace }}"
+  annotations:
+    external-dns.alpha.kubernetes.io/hostname: "{{ namespace }}-chat.{{ base_domain }}"
+    kubernetes.io/ingress.class: "internal-ingress"
+    nginx.ingress.kubernetes.io/rewrite-target: /
+    ingress.kubernetes.io/ssl-redirect: "false"
+spec:
+  rules:
+  - host: "{{ namespace }}-chat.{{ base_domain }}"
+    http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: anything-llm-svc
+            port:
+              number: 3001
+  tls: # < placing a host in the TLS config will indicate a cert should be created
+    - hosts:
+        - "{{ namespace }}-chat.{{ base_domain }}"
+      secretName: letsencrypt-prod
+---
+apiVersion: v1
+kind: Service
+metadata:
+  labels:
+    kubernetes.io/name: anything-llm
+  name: anything-llm-svc
+  namespace: "{{ namespace }}"
+spec:
+  ports:
+  # "port" is external port, and "targetPort" is internal.
+  - port: 3301
+    targetPort: 3001
+    name: traffic
+  - port: 9090
+    targetPort: 9090
+    name: metrics
+  selector:
+    k8s-app: anything-llm

collector/.env.example ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Placeholder .env file for collector runtime

collector/.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+hotdir/*
+!hotdir/__HOTDIR__.md
+yarn-error.log
+!yarn.lock
+outputs
+scripts

collector/.nvmrc ADDED Viewed

	@@ -0,0 +1 @@


1	+ v18.13.0

collector/__tests__/utils/extensions/YoutubeTranscript/YoutubeLoader/youtube-transcript.test.js ADDED Viewed

	@@ -0,0 +1,16 @@

+const { YoutubeTranscript } = require("../../../../../utils/extensions/YoutubeTranscript/YoutubeLoader/youtube-transcript.js");
+describe("YoutubeTranscript", () => {
+  it("should fetch transcript from YouTube video", async () => {
+    const videoId = "BJjsfNO5JTo";
+    const transcript = await YoutubeTranscript.fetchTranscript(videoId, {
+      lang: "en",
+    });
+    expect(transcript).toBeDefined();
+    expect(typeof transcript).toBe("string");
+    expect(transcript.length).toBeGreaterThan(0);
+    // console.log("Success! Transcript length:", transcript.length);
+    // console.log("First 200 characters:", transcript.substring(0, 200) + "...");
+  }, 30000);
+});

collector/extensions/index.js ADDED Viewed

	@@ -0,0 +1,207 @@

+const { setDataSigner } = require("../middleware/setDataSigner");
+const { verifyPayloadIntegrity } = require("../middleware/verifyIntegrity");
+const { resolveRepoLoader, resolveRepoLoaderFunction } = require("../utils/extensions/RepoLoader");
+const { reqBody } = require("../utils/http");
+const { validURL } = require("../utils/url");
+const RESYNC_METHODS = require("./resync");
+const { loadObsidianVault } = require("../utils/extensions/ObsidianVault");
+function extensions(app) {
+  if (!app) return;
+  app.post(
+    "/ext/resync-source-document",
+    [verifyPayloadIntegrity, setDataSigner],
+    async function (request, response) {
+      try {
+        const { type, options } = reqBody(request);
+        if (!RESYNC_METHODS.hasOwnProperty(type)) throw new Error(`Type "${type}" is not a valid type to sync.`);
+        return await RESYNC_METHODS[type](options, response);
+      } catch (e) {
+        console.error(e);
+        response.status(200).json({
+          success: false,
+          content: null,
+          reason: e.message || "A processing error occurred.",
+        });
+      }
+      return;
+    }
+  )
+  app.post(
+    "/ext/:repo_platform-repo",
+    [verifyPayloadIntegrity, setDataSigner],
+    async function (request, response) {
+      try {
+        const loadRepo = resolveRepoLoaderFunction(request.params.repo_platform);
+        const { success, reason, data } = await loadRepo(
+          reqBody(request),
+          response,
+        );
+        response.status(200).json({
+          success,
+          reason,
+          data,
+        });
+      } catch (e) {
+        console.error(e);
+        response.status(200).json({
+          success: false,
+          reason: e.message || "A processing error occurred.",
+          data: {},
+        });
+      }
+      return;
+    }
+  );
+  // gets all branches for a specific repo
+  app.post(
+    "/ext/:repo_platform-repo/branches",
+    [verifyPayloadIntegrity],
+    async function (request, response) {
+      try {
+        const RepoLoader = resolveRepoLoader(request.params.repo_platform);
+        const allBranches = await new RepoLoader(
+          reqBody(request)
+        ).getRepoBranches();
+        response.status(200).json({
+          success: true,
+          reason: null,
+          data: {
+            branches: allBranches,
+          },
+        });
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({
+          success: false,
+          reason: e.message,
+          data: {
+            branches: [],
+          },
+        });
+      }
+      return;
+    }
+  );
+  app.post(
+    "/ext/youtube-transcript",
+    [verifyPayloadIntegrity],
+    async function (request, response) {
+      try {
+        const { loadYouTubeTranscript } = require("../utils/extensions/YoutubeTranscript");
+        const { success, reason, data } = await loadYouTubeTranscript(
+          reqBody(request)
+        );
+        response.status(200).json({ success, reason, data });
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({
+          success: false,
+          reason: e.message,
+          data: {
+            title: null,
+            author: null,
+          },
+        });
+      }
+      return;
+    }
+  );
+  app.post(
+    "/ext/website-depth",
+    [verifyPayloadIntegrity],
+    async function (request, response) {
+      try {
+        const websiteDepth = require("../utils/extensions/WebsiteDepth");
+        const { url, depth = 1, maxLinks = 20 } = reqBody(request);
+        if (!validURL(url)) throw new Error("Not a valid URL.");
+        const scrapedData = await websiteDepth(url, depth, maxLinks);
+        response.status(200).json({ success: true, data: scrapedData });
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({ success: false, reason: e.message });
+      }
+      return;
+    }
+  );
+  app.post(
+    "/ext/confluence",
+    [verifyPayloadIntegrity, setDataSigner],
+    async function (request, response) {
+      try {
+        const { loadConfluence } = require("../utils/extensions/Confluence");
+        const { success, reason, data } = await loadConfluence(
+          reqBody(request),
+          response
+        );
+        response.status(200).json({ success, reason, data });
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({
+          success: false,
+          reason: e.message,
+          data: {
+            title: null,
+            author: null,
+          },
+        });
+      }
+      return;
+    }
+  );
+  app.post(
+    "/ext/drupalwiki",
+    [verifyPayloadIntegrity, setDataSigner],
+    async function (request, response) {
+      try {
+        const { loadAndStoreSpaces } = require("../utils/extensions/DrupalWiki");
+        const { success, reason, data } = await loadAndStoreSpaces(
+          reqBody(request),
+          response
+        );
+        response.status(200).json({ success, reason, data });
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({
+          success: false,
+          reason: e.message,
+          data: {
+            title: null,
+            author: null,
+          },
+        });
+      }
+      return;
+    }
+  );
+  app.post(
+    "/ext/obsidian/vault",
+    [verifyPayloadIntegrity, setDataSigner],
+    async function (request, response) {
+      try {
+        const { files } = reqBody(request);
+        const result = await loadObsidianVault({ files });
+        response.status(200).json(result);
+      } catch (e) {
+        console.error(e);
+        response.status(400).json({
+          success: false,
+          reason: e.message,
+          data: null,
+        });
+      }
+      return;
+    }
+  );
+}
+module.exports = extensions;

collector/extensions/resync/index.js ADDED Viewed

	@@ -0,0 +1,153 @@

+const { getLinkText } = require("../../processLink");
+/**
+ * Fetches the content of a raw link. Returns the content as a text string of the link in question.
+ * @param {object} data - metadata from document (eg: link)
+ * @param {import("../../middleware/setDataSigner").ResponseWithSigner} response
+ */
+async function resyncLink({ link }, response) {
+  if (!link) throw new Error('Invalid link provided');
+  try {
+    const { success, content = null } = await getLinkText(link);
+    if (!success) throw new Error(`Failed to sync link content. ${reason}`);
+    response.status(200).json({ success, content });
+  } catch (e) {
+    console.error(e);
+    response.status(200).json({
+      success: false,
+      content: null,
+    });
+  }
+}
+/**
+ * Fetches the content of a YouTube link. Returns the content as a text string of the video in question.
+ * We offer this as there may be some videos where a transcription could be manually edited after initial scraping
+ * but in general - transcriptions often never change.
+ * @param {object} data - metadata from document (eg: link)
+ * @param {import("../../middleware/setDataSigner").ResponseWithSigner} response
+ */
+async function resyncYouTube({ link }, response) {
+  if (!link) throw new Error('Invalid link provided');
+  try {
+    const { fetchVideoTranscriptContent } = require("../../utils/extensions/YoutubeTranscript");
+    const { success, reason, content } = await fetchVideoTranscriptContent({ url: link });
+    if (!success) throw new Error(`Failed to sync YouTube video transcript. ${reason}`);
+    response.status(200).json({ success, content });
+  } catch (e) {
+    console.error(e);
+    response.status(200).json({
+      success: false,
+      content: null,
+    });
+  }
+}
+/**
+ * Fetches the content of a specific confluence page via its chunkSource.
+ * Returns the content as a text string of the page in question and only that page.
+ * @param {object} data - metadata from document (eg: chunkSource)
+ * @param {import("../../middleware/setDataSigner").ResponseWithSigner} response
+ */
+async function resyncConfluence({ chunkSource }, response) {
+  if (!chunkSource) throw new Error('Invalid source property provided');
+  try {
+    // Confluence data is `payload` encrypted. So we need to expand its
+    // encrypted payload back into query params so we can reFetch the page with same access token/params.
+    const source = response.locals.encryptionWorker.expandPayload(chunkSource);
+    const { fetchConfluencePage } = require("../../utils/extensions/Confluence");
+    const { success, reason, content } = await fetchConfluencePage({
+      pageUrl: `https:${source.pathname}`, // need to add back the real protocol
+      baseUrl: source.searchParams.get('baseUrl'),
+      spaceKey: source.searchParams.get('spaceKey'),
+      accessToken: source.searchParams.get('token'),
+      username: source.searchParams.get('username'),
+    });
+    if (!success) throw new Error(`Failed to sync Confluence page content. ${reason}`);
+    response.status(200).json({ success, content });
+  } catch (e) {
+    console.error(e);
+    response.status(200).json({
+      success: false,
+      content: null,
+    });
+  }
+}
+/**
+ * Fetches the content of a specific confluence page via its chunkSource.
+ * Returns the content as a text string of the page in question and only that page.
+ * @param {object} data - metadata from document (eg: chunkSource)
+ * @param {import("../../middleware/setDataSigner").ResponseWithSigner} response
+ */
+async function resyncGithub({ chunkSource }, response) {
+  if (!chunkSource) throw new Error('Invalid source property provided');
+  try {
+    // Github file data is `payload` encrypted (might contain PAT). So we need to expand its
+    // encrypted payload back into query params so we can reFetch the page with same access token/params.
+    const source = response.locals.encryptionWorker.expandPayload(chunkSource);
+    const { fetchGithubFile } = require("../../utils/extensions/RepoLoader/GithubRepo");
+    const { success, reason, content } = await fetchGithubFile({
+      repoUrl: `https:${source.pathname}`, // need to add back the real protocol
+      branch: source.searchParams.get('branch'),
+      accessToken: source.searchParams.get('pat'),
+      sourceFilePath: source.searchParams.get('path'),
+    });
+    if (!success) throw new Error(`Failed to sync GitHub file content. ${reason}`);
+    response.status(200).json({ success, content });
+  } catch (e) {
+    console.error(e);
+    response.status(200).json({
+      success: false,
+      content: null,
+    });
+  }
+}
+/**
+ * Fetches the content of a specific DrupalWiki page via its chunkSource.
+ * Returns the content as a text string of the page in question and only that page.
+ * @param {object} data - metadata from document (eg: chunkSource)
+ * @param {import("../../middleware/setDataSigner").ResponseWithSigner} response
+ */
+async function resyncDrupalWiki({ chunkSource }, response) {
+  if (!chunkSource) throw new Error('Invalid source property provided');
+  try {
+    // DrupalWiki data is `payload` encrypted. So we need to expand its
+    // encrypted payload back into query params so we can reFetch the page with same access token/params.
+    const source = response.locals.encryptionWorker.expandPayload(chunkSource);
+    const { loadPage } = require("../../utils/extensions/DrupalWiki");
+    const { success, reason, content } = await loadPage({
+      baseUrl: source.searchParams.get('baseUrl'),
+      pageId: source.searchParams.get('pageId'),
+      accessToken: source.searchParams.get('accessToken'),
+    });
+    if (!success) {
+      console.error(`Failed to sync DrupalWiki page content. ${reason}`);
+      response.status(200).json({
+        success: false,
+        content: null,
+      });
+    } else {
+      response.status(200).json({ success, content });
+    }
+  } catch (e) {
+    console.error(e);
+    response.status(200).json({
+      success: false,
+      content: null,
+    });
+  }
+}
+module.exports = {
+  link: resyncLink,
+  youtube: resyncYouTube,
+  confluence: resyncConfluence,
+  github: resyncGithub,
+  drupalwiki: resyncDrupalWiki,
+}

collector/hotdir/__HOTDIR__.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ ### What is the "Hot directory"
2	+
3	+ This is a pre-set file location that documents will be written to when uploaded by AnythingLLM. There is really no need to touch it.

collector/index.js ADDED Viewed

	@@ -0,0 +1,188 @@

+process.env.NODE_ENV === "development"
+  ? require("dotenv").config({ path: `.env.${process.env.NODE_ENV}` })
+  : require("dotenv").config();
+require("./utils/logger")();
+const express = require("express");
+const bodyParser = require("body-parser");
+const cors = require("cors");
+const path = require("path");
+const { ACCEPTED_MIMES } = require("./utils/constants");
+const { reqBody } = require("./utils/http");
+const { processSingleFile } = require("./processSingleFile");
+const { processLink, getLinkText } = require("./processLink");
+const { wipeCollectorStorage } = require("./utils/files");
+const extensions = require("./extensions");
+const { processRawText } = require("./processRawText");
+const { verifyPayloadIntegrity } = require("./middleware/verifyIntegrity");
+const app = express();
+const FILE_LIMIT = "3GB";
+app.use(cors({ origin: true }));
+app.use(
+  bodyParser.text({ limit: FILE_LIMIT }),
+  bodyParser.json({ limit: FILE_LIMIT }),
+  bodyParser.urlencoded({
+    limit: FILE_LIMIT,
+    extended: true,
+  })
+);
+app.post(
+  "/process",
+  [verifyPayloadIntegrity],
+  async function (request, response) {
+    const { filename, options = {}, metadata = {} } = reqBody(request);
+    try {
+      const targetFilename = path
+        .normalize(filename)
+        .replace(/^(\.\.(\/|\\|$))+/, "");
+      const {
+        success,
+        reason,
+        documents = [],
+      } = await processSingleFile(targetFilename, options, metadata);
+      response
+        .status(200)
+        .json({ filename: targetFilename, success, reason, documents });
+    } catch (e) {
+      console.error(e);
+      response.status(200).json({
+        filename: filename,
+        success: false,
+        reason: "A processing error occurred.",
+        documents: [],
+      });
+    }
+    return;
+  }
+);
+app.post(
+  "/parse",
+  [verifyPayloadIntegrity],
+  async function (request, response) {
+    const { filename, options = {} } = reqBody(request);
+    try {
+      const targetFilename = path
+        .normalize(filename)
+        .replace(/^(\.\.(\/|\\|$))+/, "");
+      const {
+        success,
+        reason,
+        documents = [],
+      } = await processSingleFile(targetFilename, {
+        ...options,
+        parseOnly: true,
+      });
+      response
+        .status(200)
+        .json({ filename: targetFilename, success, reason, documents });
+    } catch (e) {
+      console.error(e);
+      response.status(200).json({
+        filename: filename,
+        success: false,
+        reason: "A processing error occurred.",
+        documents: [],
+      });
+    }
+    return;
+  }
+);
+app.post(
+  "/process-link",
+  [verifyPayloadIntegrity],
+  async function (request, response) {
+    const { link, scraperHeaders = {}, metadata = {} } = reqBody(request);
+    try {
+      const {
+        success,
+        reason,
+        documents = [],
+      } = await processLink(link, scraperHeaders, metadata);
+      response.status(200).json({ url: link, success, reason, documents });
+    } catch (e) {
+      console.error(e);
+      response.status(200).json({
+        url: link,
+        success: false,
+        reason: "A processing error occurred.",
+        documents: [],
+      });
+    }
+    return;
+  }
+);
+app.post(
+  "/util/get-link",
+  [verifyPayloadIntegrity],
+  async function (request, response) {
+    const { link, captureAs = "text" } = reqBody(request);
+    try {
+      const { success, content = null } = await getLinkText(link, captureAs);
+      response.status(200).json({ url: link, success, content });
+    } catch (e) {
+      console.error(e);
+      response.status(200).json({
+        url: link,
+        success: false,
+        content: null,
+      });
+    }
+    return;
+  }
+);
+app.post(
+  "/process-raw-text",
+  [verifyPayloadIntegrity],
+  async function (request, response) {
+    const { textContent, metadata } = reqBody(request);
+    try {
+      const {
+        success,
+        reason,
+        documents = [],
+      } = await processRawText(textContent, metadata);
+      response
+        .status(200)
+        .json({ filename: metadata.title, success, reason, documents });
+    } catch (e) {
+      console.error(e);
+      response.status(200).json({
+        filename: metadata?.title || "Unknown-doc.txt",
+        success: false,
+        reason: "A processing error occurred.",
+        documents: [],
+      });
+    }
+    return;
+  }
+);
+extensions(app);
+app.get("/accepts", function (_, response) {
+  response.status(200).json(ACCEPTED_MIMES);
+});
+app.all("*", function (_, response) {
+  response.sendStatus(200);
+});
+app
+  .listen(8888, async () => {
+    await wipeCollectorStorage();
+    console.log(`Document processor app listening on port 8888`);
+  })
+  .on("error", function (_) {
+    process.once("SIGUSR2", function () {
+      process.kill(process.pid, "SIGUSR2");
+    });
+    process.on("SIGINT", function () {
+      process.kill(process.pid, "SIGINT");
+    });
+  });

collector/middleware/setDataSigner.js ADDED Viewed

	@@ -0,0 +1,41 @@

+const { EncryptionWorker } = require("../utils/EncryptionWorker");
+const { CommunicationKey } = require("../utils/comKey");
+/**
+ * Express Response Object interface with defined encryptionWorker attached to locals property.
+ * @typedef {import("express").Response & import("express").Response['locals'] & {encryptionWorker: EncryptionWorker} } ResponseWithSigner
+*/
+// You can use this middleware to assign the EncryptionWorker to the response locals
+// property so that if can be used to encrypt/decrypt arbitrary data via response object.
+// eg: Encrypting API keys in chunk sources.
+// The way this functions is that the rolling RSA Communication Key is used server-side to private-key encrypt the raw
+// key of the persistent EncryptionManager credentials. Since EncryptionManager credentials do _not_ roll, we should not send them
+// even between server<>collector in plaintext because if the user configured the server/collector to be public they could technically
+// be exposing the key in transit via the X-Payload-Signer header. Even if this risk is minimal we should not do this.
+// This middleware uses the CommunicationKey public key to first decrypt the base64 representation of the EncryptionManager credentials
+// and then loads that in to the EncryptionWorker as a buffer so we can use the same credentials across the system. Should we ever break the
+// collector out into its own service this would still work without SSL/TLS.
+/**
+ *
+ * @param {import("express").Request} request
+ * @param {import("express").Response} response
+ * @param {import("express").NextFunction} next
+ */
+function setDataSigner(request, response, next) {
+  const comKey = new CommunicationKey();
+  const encryptedPayloadSigner = request.header("X-Payload-Signer");
+  if (!encryptedPayloadSigner) console.log('Failed to find signed-payload to set encryption worker! Encryption calls will fail.');
+  const decryptedPayloadSignerKey = comKey.decrypt(encryptedPayloadSigner);
+  const encryptionWorker = new EncryptionWorker(decryptedPayloadSignerKey);
+  response.locals.encryptionWorker = encryptionWorker;
+  next();
+}
+module.exports = {
+  setDataSigner
+}

collector/middleware/verifyIntegrity.js ADDED Viewed

	@@ -0,0 +1,26 @@

+const { CommunicationKey } = require("../utils/comKey");
+const RuntimeSettings = require("../utils/runtimeSettings");
+const runtimeSettings = new RuntimeSettings();
+function verifyPayloadIntegrity(request, response, next) {
+  const comKey = new CommunicationKey();
+  if (process.env.NODE_ENV === "development") {
+    comKey.log('verifyPayloadIntegrity is skipped in development.');
+    runtimeSettings.parseOptionsFromRequest(request);
+    next();
+    return;
+  }
+  const signature = request.header("X-Integrity");
+  if (!signature) return response.status(400).json({ msg: 'Failed integrity signature check.' })
+  const validSignedPayload = comKey.verify(signature, request.body);
+  if (!validSignedPayload) return response.status(400).json({ msg: 'Failed integrity signature check.' });
+  runtimeSettings.parseOptionsFromRequest(request);
+  next();
+}
+module.exports = {
+  verifyPayloadIntegrity
+}