Spaces:
Sleeping
Sleeping
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,120 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: 📚
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: streamlit
|
| 7 |
sdk_version: 1.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Ask GC Library Guides
|
| 3 |
emoji: 📚
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: streamlit
|
| 7 |
sdk_version: 1.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
hf_oauth: false
|
| 11 |
+
hf_username: ''
|
| 12 |
+
hf_token: ''
|
| 13 |
+
hf_private: false
|
| 14 |
+
hf_space_id: ''
|
| 15 |
+
hf_disable_embedding: false
|
| 16 |
+
hf_disable_inference: false
|
| 17 |
+
hf_disable_sharing: false
|
| 18 |
+
hf_disable_suggestion: false
|
| 19 |
+
hf_suggested_questions: []
|
| 20 |
+
hf_suggested_themes: []
|
| 21 |
+
hf_suggested_examples: []
|
| 22 |
+
hf_suggested_datasets: []
|
| 23 |
+
hf_suggested_models: []
|
| 24 |
+
hf_suggested_tasks: []
|
| 25 |
+
hf_suggested_libraries: []
|
| 26 |
+
hf_suggested_metrics: []
|
| 27 |
+
hf_suggested_visualizers: []
|
| 28 |
+
hf_suggested_widgets: []
|
| 29 |
+
hf_suggested_co2: []
|
| 30 |
+
hf_suggested_pipeline_tags: []
|
| 31 |
+
hf_suggested_tags: []
|
| 32 |
+
hf_suggested_configs: []
|
| 33 |
+
hf_suggested_args: []
|
| 34 |
+
hf_suggested_kwargs: []
|
| 35 |
+
hf_suggested_env: {}
|
| 36 |
+
hf_suggested_requirements: []
|
| 37 |
+
hf_suggested_setup: ''
|
| 38 |
+
hf_suggested_dockerfile: ''
|
| 39 |
+
hf_suggested_app_file: ''
|
| 40 |
+
hf_suggested_sdk: ''
|
| 41 |
+
hf_suggested_sdk_version: ''
|
| 42 |
+
hf_suggested_python_version: ''
|
| 43 |
+
hf_suggested_base_image: ''
|
| 44 |
+
hf_suggested_entrypoint: ''
|
| 45 |
+
hf_suggested_cmd: ''
|
| 46 |
+
hf_suggested_workdir: ''
|
| 47 |
+
hf_suggested_expose: []
|
| 48 |
+
hf_suggested_volumes: []
|
| 49 |
+
hf_suggested_ports: []
|
| 50 |
+
hf_suggested_networks: []
|
| 51 |
+
hf_suggested_depends_on: []
|
| 52 |
+
hf_suggested_links: []
|
| 53 |
+
hf_suggested_extra_hosts: []
|
| 54 |
+
hf_suggested_dns: []
|
| 55 |
+
hf_suggested_dns_search: []
|
| 56 |
+
hf_suggested_cap_add: []
|
| 57 |
+
hf_suggested_cap_drop: []
|
| 58 |
+
hf_suggested_cgroup_parent: ''
|
| 59 |
+
hf_suggested_devices: []
|
| 60 |
+
hf_suggested_device_requests: []
|
| 61 |
+
hf_suggested_device_cgroup_rules: []
|
| 62 |
+
hf_suggested_dns_opt: []
|
| 63 |
+
hf_suggested_domainname: ''
|
| 64 |
+
hf_suggested_entrypoint_args: []
|
| 65 |
+
hf_suggested_env_file: []
|
| 66 |
+
hf_suggested_expose_ports: []
|
| 67 |
+
hf_suggested_external_links: []
|
| 68 |
+
hf_suggested_extra_hosts_list: []
|
| 69 |
+
hf_suggested_healthcheck: {}
|
| 70 |
+
hf_suggested_hostname: ''
|
| 71 |
+
hf_suggested_init: false
|
| 72 |
+
hf_suggested_ipc: ''
|
| 73 |
+
hf_suggested_labels: {}
|
| 74 |
+
hf_suggested_links_list: []
|
| 75 |
+
hf_suggested_logging: {}
|
| 76 |
+
hf_suggested_mac_address: ''
|
| 77 |
+
hf_suggested_network_mode: ''
|
| 78 |
+
hf_suggested_networks_list: []
|
| 79 |
+
hf_suggested_pid: ''
|
| 80 |
+
hf_suggested_ports_list: []
|
| 81 |
+
hf_suggested_privileged: false
|
| 82 |
+
hf_suggested_read_only: false
|
| 83 |
+
hf_suggested_restart: ''
|
| 84 |
+
hf_suggested_security_opt: []
|
| 85 |
+
hf_suggested_shm_size: ''
|
| 86 |
+
hf_suggested_stdin_open: false
|
| 87 |
+
hf_suggested_stop_grace_period: ''
|
| 88 |
+
hf_suggested_stop_signal: ''
|
| 89 |
+
hf_suggested_sysctls: {}
|
| 90 |
+
hf_suggested_tmpfs: []
|
| 91 |
+
hf_suggested_tty: false
|
| 92 |
+
hf_suggested_ulimits: {}
|
| 93 |
+
hf_suggested_user: ''
|
| 94 |
+
hf_suggested_userns_mode: ''
|
| 95 |
+
hf_suggested_volumes_from: []
|
| 96 |
+
hf_suggested_volumes_list: []
|
| 97 |
+
hf_suggested_working_dir: ''
|
| 98 |
---
|
| 99 |
|
| 100 |
+
# Ask GC Library Guides (RAG Demo)
|
| 101 |
+
|
| 102 |
+
This Space demonstrates a Retrieval-Augmented Generation (RAG) application built with Streamlit. It allows users to ask questions about the CUNY Graduate Center library guides.
|
| 103 |
+
|
| 104 |
+
**How it works:**
|
| 105 |
+
|
| 106 |
+
1. **Data Source:** Content extracted from LibGuides (`extracted_content.jsonl`).
|
| 107 |
+
2. **Embedding:** On first startup, the application uses the `BAAI/bge-m3` sentence transformer model (run locally within the Space) to embed the LibGuides content and stores it in a ChromaDB vector database (`./chroma_db`). This database persists if the Space uses persistent storage.
|
| 108 |
+
3. **Query Processing:**
|
| 109 |
+
* User queries are optionally expanded using the generation model.
|
| 110 |
+
* Queries are embedded using the same local `BAAI/bge-m3` model (handled internally by ChromaDB).
|
| 111 |
+
* ChromaDB performs a similarity search to find relevant text chunks.
|
| 112 |
+
4. **Generation:** The relevant chunks and the original query are passed to the `google/gemma-3-27b-it` model via the Hugging Face Inference API to generate a final answer.
|
| 113 |
+
|
| 114 |
+
**Configuration:**
|
| 115 |
+
|
| 116 |
+
* **Embedding Model:** `BAAI/bge-m3` (local via `sentence-transformers` & ChromaDB)
|
| 117 |
+
* **Generation Model:** `google/gemma-3-27b-it` (via HF Inference API)
|
| 118 |
+
* **Requires Secret:** A Hugging Face User Access Token must be added as a Space Secret named `HF_TOKEN`.
|
| 119 |
+
|
| 120 |
+
**Note:** The initial embedding process when the Space first starts (or restarts without persistent storage) can take some time as the model needs to process all the documents.
|