Spaces:
Sleeping
Sleeping
File size: 5,783 Bytes
1669650 0e692b1 1669650 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 901545d 0ec1ce2 7cb0777 0ec1ce2 901545d 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 7cb0777 0ec1ce2 901545d 7cb0777 901545d 7cb0777 0ec1ce2 1669650 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
title: Crypto_RAG_ChatBot
emoji: 💡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: "5.44.1"
app_file: app.py
pinned: false
---
# Crypto_RAG_ChatBot
This is a cryptocurrency-focused Retrieval-Augmented Generation (RAG) app. It retrieves from your uploaded documents and added URLs, reranks results, and generates answers with a chat LLM. It can also route "price" queries to a live price tool for major coins.
1) **OPENAI API KEY (REQUIRED; PASTE-ONLY, NEVER SAVED)**
-----------------------------------------------------
- The chat model requires an OpenAI API key.
- Paste your key into the "OpenAI API Key" field in the UI.
- The key is kept in memory for the current session only:
- It is not written to disk, not bundled in the repository, and not logged.
- Restarting or refreshing the Space clears it.
- The key is used only to generate chat responses for this app.
2) **QUICK COST ESTIMATE (GPT-4o-mini PRICING)**
--------------------------------------------
Pricing used: (https://platform.openai.com/docs/pricing)
- Input: $0.15 per 1,000,000 tokens
- Output: $0.60 per 1,000,000 tokens
Assumptions:
- 200 input tokens per query
- 500 output tokens per query
- 20 queries total
Per-query cost:
- Input: 200 / 1,000,000 * $0.15 = $0.00003
- Output: 500 / 1,000,000 * $0.60 = $0.00030
- Total per query = $0.00003 + $0.00030 = $0.00033
20-query session:
- $0.00033 * 20 = $0.0066
Result: Approximately six-tenths of a cent ($0.0066) for the full try, well under the $0.50 budget.
3) **FREE OPEN-SOURCE RETRIEVAL MODELS**
------------------------------------
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2
These run locally in the Space and are free (no API costs).
4) **UPLOADING .PDF / .TXT / .MD FILES (FOR RAG)**
----------------------------------------------
- Click "Add files" and select any combination of .pdf, .txt, or .md.
Sample pdf and txt files can be found at : /data/samples -> download the
files to your drive and multi select to upload
- The app extracts text (PDFs via pypdf), splits into chunks, and stores them
with metadata for retrieval.
- After adding files, click "Build Index" so they are included in search.
5) **ADDING MULTIPLE URLS (FOR RAG)**
---------------------------------
- Paste one URL per line in the "URLs" box and click "Add URLs".
Refer to /data/samples/Links%20sample.txt for example links
- The app fetches and parses the pages (static articles and PDFs work best).
- After adding URLs, click "Build Index" to include them in retrieval.
6) **BUILD INDEX AND RETRIEVAL OPTIONS**
------------------------------------
After you add files and/or URLs, click "Build Index". You can tune:
- Top-K retrieve (k): how many candidates to pull initially (e.g., 6 to 10).
- Hybrid alpha (BM25 <-> Dense): blend between keyword BM25 and dense
similarity. 0.0 = all dense, 1.0 = all BM25. A balanced default is 0.5.
- Rerank Top-K: how many of the retrieved candidates the cross-encoder reranks
(e.g., 3 to 8). The final answer uses the top reranked passages.
If results seem off:
- Increase Top-K for better recall.
- Adjust alpha (higher favors keywords; lower favors semantic similarity).
- Increase Rerank Top-K for stronger final ordering (slightly more CPU).
7) **STREAMING RESPONSES (SELECTABLE)**
---------------------------------
- Streaming ON: words appear live as the model generates.
- Streaming OFF: you receive a single final answer after generation finishes.
Toggle this with the "Streaming" checkbox.
8) **CHAT MODEL CHOICE (FIXED FOR NOW)**
------------------------------------
- The chat model is fixed in this version of the app for reliability and cost
control. If you need a different model, it can be changed in code and
redeployed; the UI currently does not expose a model selector.
9) **LIVE PRICE SEARCH FOR MAJOR COINS + ROUTING**
----------------------------------------------
- If your question looks like a price query (for example: "BTC price",
"price of ETH", "SOL price in USD"), the app routes to a tool instead of RAG:
- It calls a public price API (for example, CoinGecko) to get the latest
price for major coins such as BTC, ETH, SOL, and XRP.
- It can also show the Fear and Greed Index for market sentiment.
- Routing logic: the pipeline checks your query for price-intent keywords
("price", "quote", "market cap", "ATH", and similar). If matched, it uses
the tools route; otherwise it uses the RAG route (retrieve -> rerank -> answer).
10) **QUICK START**
---------------
1. Initialize pipeline (if manual init is enabled).
2. Paste your OpenAI API key (not saved).
3. Add files and/or add URLs.
- Sample pdf and txt files can be found at : /data/samples -> download the
files to your drive and multi select to upload
- Refer to /data/samples/Links%20sample.txt for example links
4. Click Build Index.
5. Ask questions. E.g. "what is Ethereum vs Solana?", "What is bitcoin strength and weakness?"
6. For prices, try queries like "ETH price", "SOL quote", "XRP price in USD".
NOTES
-----
- This tool is for research and education only. It is not financial advice.
- For best results, use focused, well-structured documents and reputable URLs.
### Installation and Execution (for Gradio UI)
---------------
1. **Create a new Python environment:**
```bash
python -m venv .venv
```
2. **Activate the environment:**
For macOS and Linux:
```bash
source .venv/bin/activate
```
For Windows:
```bash
.venv\Scripts\activate
```
3. **Install the dependencies:**
```bash
pip install -r requirements.txt
```
4. **Run the application:**
```bash
python app.py
``` |