Julian Vanecek commited on
Commit
3151bfa
·
1 Parent(s): 6edaf19
backend/FAQ_MANAGEMENT.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FAQ Management Guide
2
+
3
+ This guide explains how to manage FAQ documents in the OpenAI Chatbot MCP system.
4
+
5
+ ## Initial Setup (Without FAQ Documents)
6
+
7
+ 1. **Upgrade OpenAI library**:
8
+ ```bash
9
+ pip install --upgrade openai>=1.50.0
10
+ ```
11
+
12
+ 2. **Create vector stores (skipping empty FAQ)**:
13
+ ```bash
14
+ python backend/upload_versioned_pdfs.py
15
+ ```
16
+ This will:
17
+ - Create vector stores for all versions with PDFs
18
+ - Skip the general_faq store since no FAQ documents exist yet
19
+ - Save configuration with actual vector store IDs
20
+
21
+ ## Adding FAQ Documents Later
22
+
23
+ ### Option 1: Add to Existing FAQ Store
24
+
25
+ If you created an empty FAQ store:
26
+ ```bash
27
+ # Add single FAQ document
28
+ python backend/add_to_vector_store.py add general_faq /path/to/faq.pdf
29
+
30
+ # Add multiple FAQ documents
31
+ python backend/add_to_vector_store.py add general_faq /path/to/faq1.pdf /path/to/faq2.pdf
32
+ ```
33
+
34
+ ### Option 2: Create FAQ Store First
35
+
36
+ If you skipped the FAQ store initially:
37
+ ```bash
38
+ # Create the FAQ store
39
+ python backend/add_to_vector_store.py create general_faq \
40
+ --name "General FAQ and Overview" \
41
+ --description "General information, FAQs, and cross-version content"
42
+
43
+ # Then add documents
44
+ python backend/add_to_vector_store.py add general_faq /path/to/faq.pdf
45
+ ```
46
+
47
+ ## Listing Available Stores
48
+
49
+ To see all configured vector stores:
50
+ ```bash
51
+ python backend/add_to_vector_store.py list
52
+ ```
53
+
54
+ ## FAQ Document Naming
55
+
56
+ For automatic detection in future runs, name FAQ documents with keywords:
57
+ - `faq` - e.g., `product_faq.pdf`
58
+ - `general` - e.g., `general_overview.pdf`
59
+ - `overview` - e.g., `platform_overview.pdf`
60
+ - `comparison` - e.g., `version_comparison.pdf`
61
+
62
+ ## Full Re-upload with FAQ
63
+
64
+ Once you have FAQ documents in the `/pdfs` directory:
65
+ ```bash
66
+ # This will detect and upload FAQ documents automatically
67
+ python backend/upload_versioned_pdfs.py
68
+ ```
69
+
70
+ ## Forcing Empty Store Creation
71
+
72
+ To create all stores including empty ones:
73
+ ```bash
74
+ python backend/upload_versioned_pdfs.py --create-empty
75
+ ```
76
+
77
+ This is useful if you want all stores ready even without documents.
backend/IMPORTANT_API_CHANGES.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Important: OpenAI API Changes
2
+
3
+ ## Vector Stores API Location
4
+
5
+ As of OpenAI Python SDK 1.93.x, the vector stores API has moved:
6
+
7
+ - **OLD**: `client.beta.vector_stores`
8
+ - **NEW**: `client.vector_stores`
9
+
10
+ ## How Vector Stores Work
11
+
12
+ Vector stores are designed to work with the Assistants API:
13
+
14
+ 1. **Create vector stores**: `client.vector_stores.create()`
15
+ 2. **Upload files to stores**: `client.vector_stores.files.create()`
16
+ 3. **Use with assistants**: Vector stores are queried through assistants using the file_search tool
17
+
18
+ ## The Architecture
19
+
20
+ ```
21
+ Vector Stores (storage) -> Assistants (query interface) -> Threads (conversations)
22
+ ```
23
+
24
+ ## Current Implementation Status
25
+
26
+ 1. **upload_versioned_pdfs.py**: ✅ Fixed to use `client.vector_stores`
27
+ 2. **add_to_vector_store.py**: ✅ Fixed to use `client.vector_stores`
28
+ 3. **vector_store_manager.py**: ❌ Needs assistant creation for querying
29
+
30
+ ## Next Steps
31
+
32
+ To properly use vector stores for querying, you need to:
33
+
34
+ 1. Create an assistant with file_search capability
35
+ 2. Attach vector stores to the assistant
36
+ 3. Use threads to query the assistant
37
+
38
+ Alternative approach:
39
+ - Use OpenAI embeddings API directly
40
+ - Store embeddings in a local database
41
+ - Implement your own similarity search
42
+
43
+ This would avoid the complexity of the Assistants API but requires more implementation work.
backend/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Backend package
backend/add_to_vector_store.py ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Add documents to existing OpenAI vector stores.
4
+ Useful for adding FAQ documents or updating existing stores.
5
+ """
6
+
7
+ import os
8
+ import json
9
+ import time
10
+ import argparse
11
+ from pathlib import Path
12
+ from typing import List, Optional
13
+ from openai import OpenAI, __version__ as openai_version
14
+ from packaging import version
15
+ import sys
16
+
17
+
18
+ class VectorStoreUpdater:
19
+ def __init__(self, api_key: Optional[str] = None):
20
+ """Initialize the updater with OpenAI client."""
21
+ # Check OpenAI version
22
+ if version.parse(openai_version) < version.parse("1.50.0"):
23
+ print(f"Error: OpenAI library version {openai_version} is too old.")
24
+ print("Vector stores require version 1.50.0 or higher.")
25
+ print("Please run: pip install --upgrade openai>=1.50.0")
26
+ sys.exit(1)
27
+
28
+ self.client = OpenAI(api_key=api_key or os.getenv("OPENAI_API_KEY"))
29
+ self.config_path = Path(__file__).parent.parent / "config" / "vector_stores.json"
30
+ self.load_config()
31
+
32
+ def load_config(self):
33
+ """Load vector store configuration."""
34
+ if not self.config_path.exists():
35
+ print(f"Error: Configuration file not found at {self.config_path}")
36
+ print("Please run upload_versioned_pdfs.py first to create vector stores.")
37
+ sys.exit(1)
38
+
39
+ with open(self.config_path, 'r') as f:
40
+ self.config = json.load(f)
41
+ self.vector_stores = self.config.get('vector_stores', {})
42
+
43
+ def list_stores(self):
44
+ """List all available vector stores."""
45
+ print("\nAvailable vector stores:")
46
+ for store_name, store_id in self.vector_stores.items():
47
+ print(f" - {store_name}: {store_id}")
48
+
49
+ def add_file_to_store(self, store_name: str, file_path: Path) -> bool:
50
+ """Add a file to an existing vector store."""
51
+ if store_name not in self.vector_stores:
52
+ print(f"Error: Vector store '{store_name}' not found.")
53
+ self.list_stores()
54
+ return False
55
+
56
+ store_id = self.vector_stores[store_name]
57
+ print(f"Adding {file_path.name} to {store_name} ({store_id})...")
58
+
59
+ try:
60
+ # Upload file
61
+ with open(file_path, "rb") as file:
62
+ file_upload = self.client.files.create(
63
+ file=file,
64
+ purpose="assistants"
65
+ )
66
+
67
+ # Add file to vector store
68
+ self.client.vector_stores.files.create(
69
+ vector_store_id=store_id,
70
+ file_id=file_upload.id
71
+ )
72
+
73
+ # Wait for processing
74
+ while True:
75
+ file_status = self.client.vector_stores.files.retrieve(
76
+ vector_store_id=store_id,
77
+ file_id=file_upload.id
78
+ )
79
+ if file_status.status == "completed":
80
+ print(f"✓ Successfully added {file_path.name}")
81
+ return True
82
+ elif file_status.status == "failed":
83
+ print(f"✗ Failed to process {file_path.name}")
84
+ return False
85
+ time.sleep(2)
86
+
87
+ except Exception as e:
88
+ print(f"✗ Error adding file: {str(e)}")
89
+ return False
90
+
91
+ def add_multiple_files(self, store_name: str, file_paths: List[Path]):
92
+ """Add multiple files to a vector store."""
93
+ if not file_paths:
94
+ print("No files to add.")
95
+ return
96
+
97
+ print(f"\nAdding {len(file_paths)} files to {store_name}...")
98
+ success_count = 0
99
+
100
+ for file_path in file_paths:
101
+ if self.add_file_to_store(store_name, file_path):
102
+ success_count += 1
103
+
104
+ print(f"\n✓ Successfully added {success_count}/{len(file_paths)} files")
105
+
106
+ def create_empty_store(self, store_name: str, name: str, description: str) -> Optional[str]:
107
+ """Create a new empty vector store."""
108
+ if store_name in self.vector_stores:
109
+ print(f"Error: Vector store '{store_name}' already exists.")
110
+ return None
111
+
112
+ print(f"Creating new vector store: {name}")
113
+ # Note: description parameter no longer supported, storing it in config instead
114
+ try:
115
+ vector_store = self.client.vector_stores.create(
116
+ name=name
117
+ )
118
+
119
+ # Update config
120
+ self.vector_stores[store_name] = vector_store.id
121
+ self.config['vector_stores'] = self.vector_stores
122
+
123
+ # Store description in config since API no longer supports it
124
+ if 'descriptions' not in self.config:
125
+ self.config['descriptions'] = {}
126
+ self.config['descriptions'][store_name] = description
127
+
128
+ with open(self.config_path, 'w') as f:
129
+ json.dump(self.config, f, indent=2)
130
+
131
+ print(f"✓ Created vector store: {store_name} ({vector_store.id})")
132
+ return vector_store.id
133
+
134
+ except Exception as e:
135
+ print(f"✗ Error creating vector store: {str(e)}")
136
+ return None
137
+
138
+
139
+ def main():
140
+ """Main function."""
141
+ parser = argparse.ArgumentParser(description="Add documents to OpenAI vector stores")
142
+
143
+ subparsers = parser.add_subparsers(dest='command', help='Commands')
144
+
145
+ # List command
146
+ list_parser = subparsers.add_parser('list', help='List available vector stores')
147
+
148
+ # Add command
149
+ add_parser = subparsers.add_parser('add', help='Add files to a vector store')
150
+ add_parser.add_argument('store_name', help='Name of the vector store (e.g., general_faq)')
151
+ add_parser.add_argument('files', nargs='+', help='Files to add')
152
+
153
+ # Create command
154
+ create_parser = subparsers.add_parser('create', help='Create a new empty vector store')
155
+ create_parser.add_argument('store_name', help='Internal name (e.g., general_faq)')
156
+ create_parser.add_argument('--name', required=True, help='Display name')
157
+ create_parser.add_argument('--description', required=True, help='Description')
158
+
159
+ args = parser.parse_args()
160
+
161
+ if not args.command:
162
+ parser.print_help()
163
+ return
164
+
165
+ # Check for API key
166
+ if not os.getenv("OPENAI_API_KEY"):
167
+ print("Error: OPENAI_API_KEY environment variable not set")
168
+ return
169
+
170
+ updater = VectorStoreUpdater()
171
+
172
+ if args.command == 'list':
173
+ updater.list_stores()
174
+
175
+ elif args.command == 'add':
176
+ # Convert file paths
177
+ file_paths = []
178
+ for file_arg in args.files:
179
+ file_path = Path(file_arg)
180
+ if not file_path.exists():
181
+ print(f"Warning: File not found: {file_path}")
182
+ else:
183
+ file_paths.append(file_path)
184
+
185
+ if file_paths:
186
+ updater.add_multiple_files(args.store_name, file_paths)
187
+ else:
188
+ print("No valid files to add.")
189
+
190
+ elif args.command == 'create':
191
+ updater.create_empty_store(args.store_name, args.name, args.description)
192
+
193
+
194
+ if __name__ == "__main__":
195
+ main()
backend/chatbot_backend.py ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ OpenAI Chatbot Backend with Multi-Vector Store Support and MCP-style Tools
3
+ """
4
+
5
+ import os
6
+ import json
7
+ import time
8
+ import logging
9
+ from typing import Dict, List, Optional, Tuple, Generator
10
+ from pathlib import Path
11
+ from openai import OpenAI
12
+ import tiktoken
13
+
14
+ from .vector_store_manager import VectorStoreManager
15
+ from .document_reader import DocumentReader
16
+ from ..tools.vector_search_tool import (
17
+ get_vector_search_tool_definition,
18
+ execute_vector_search,
19
+ format_search_results_for_context
20
+ )
21
+ from ..tools.document_reader_tool import (
22
+ get_document_reader_tool_definition,
23
+ execute_document_read,
24
+ format_document_content_for_context
25
+ )
26
+
27
+ logging.basicConfig(level=logging.INFO)
28
+ logger = logging.getLogger(__name__)
29
+
30
+
31
+ class ChatbotBackend:
32
+ def __init__(self, api_key: Optional[str] = None):
33
+ """Initialize the chatbot backend."""
34
+ self.client = OpenAI(api_key=api_key or os.getenv("OPENAI_API_KEY"))
35
+ self.vector_store_manager = VectorStoreManager(self.client)
36
+ self.document_reader = DocumentReader()
37
+
38
+ # Load configuration
39
+ config_path = Path(__file__).parent.parent / "config" / "openai_config.json"
40
+ with open(config_path, 'r') as f:
41
+ self.config = json.load(f)
42
+
43
+ # Initialize tokenizer for token counting
44
+ self.encoding = tiktoken.encoding_for_model("gpt-4o")
45
+
46
+ # Define available tools
47
+ self.tools = [
48
+ get_vector_search_tool_definition(),
49
+ get_document_reader_tool_definition()
50
+ ]
51
+
52
+ def count_tokens(self, text: str) -> int:
53
+ """Count tokens in text."""
54
+ return len(self.encoding.encode(text))
55
+
56
+ def query_with_version(self, query: str, product: str, version: str,
57
+ custom_prompt: Optional[str] = None,
58
+ model: str = "gpt-4o",
59
+ temperature: float = 0.7,
60
+ max_tokens: int = 4000) -> Generator[Dict, None, None]:
61
+ """
62
+ Query the chatbot with automatic version-specific and general context.
63
+ Yields streaming responses.
64
+ """
65
+ start_time = time.time()
66
+
67
+ # Query both version-specific and general vector stores
68
+ version_results, general_results = self.vector_store_manager.query_version_and_general(
69
+ product, version, query, max_results=self.config.get("max_chunks", 10)
70
+ )
71
+
72
+ # Format context from vector store results
73
+ context = self.vector_store_manager.format_search_results(
74
+ version_results, general_results, product, version
75
+ )
76
+
77
+ # Build the enhanced query
78
+ enhanced_query = f"{context}\n\nUser Question: {query}"
79
+
80
+ # Add custom prompt if provided
81
+ if custom_prompt:
82
+ enhanced_query = f"{custom_prompt}\n\n{enhanced_query}"
83
+
84
+ # Create messages
85
+ messages = [
86
+ {
87
+ "role": "system",
88
+ "content": (
89
+ f"You are an expert assistant for {product.capitalize()} version {version}. "
90
+ f"You have access to version-specific documentation and general information. "
91
+ f"You can use the provided tools to search for more information or read specific document pages. "
92
+ f"Always provide accurate, version-specific answers based on the documentation."
93
+ )
94
+ },
95
+ {"role": "user", "content": enhanced_query}
96
+ ]
97
+
98
+ # Count input tokens
99
+ input_tokens = sum(self.count_tokens(msg["content"]) for msg in messages)
100
+
101
+ # Stream response with function calling
102
+ try:
103
+ stream = self.client.chat.completions.create(
104
+ model=model,
105
+ messages=messages,
106
+ temperature=temperature,
107
+ max_tokens=max_tokens,
108
+ stream=True,
109
+ tools=self.tools,
110
+ tool_choice="auto"
111
+ )
112
+
113
+ # Track usage
114
+ output_tokens = 0
115
+ full_response = ""
116
+ tool_calls = []
117
+ current_tool_call = None
118
+
119
+ for chunk in stream:
120
+ delta = chunk.choices[0].delta
121
+
122
+ # Handle tool calls
123
+ if delta.tool_calls:
124
+ for tool_call_delta in delta.tool_calls:
125
+ if tool_call_delta.id:
126
+ # New tool call
127
+ if current_tool_call:
128
+ tool_calls.append(current_tool_call)
129
+ current_tool_call = {
130
+ "id": tool_call_delta.id,
131
+ "type": "function",
132
+ "function": {
133
+ "name": tool_call_delta.function.name if tool_call_delta.function else "",
134
+ "arguments": ""
135
+ }
136
+ }
137
+
138
+ if tool_call_delta.function and tool_call_delta.function.arguments:
139
+ current_tool_call["function"]["arguments"] += tool_call_delta.function.arguments
140
+
141
+ # Handle regular content
142
+ if delta.content:
143
+ output_tokens += self.count_tokens(delta.content)
144
+ full_response += delta.content
145
+
146
+ yield {
147
+ "type": "content",
148
+ "content": delta.content,
149
+ "done": False
150
+ }
151
+
152
+ # Check if stream is done
153
+ if chunk.choices[0].finish_reason == "tool_calls":
154
+ # Add the last tool call
155
+ if current_tool_call:
156
+ tool_calls.append(current_tool_call)
157
+
158
+ # Execute tool calls
159
+ tool_results = self._execute_tool_calls(tool_calls)
160
+
161
+ # Continue conversation with tool results
162
+ messages.append({
163
+ "role": "assistant",
164
+ "content": full_response,
165
+ "tool_calls": tool_calls
166
+ })
167
+
168
+ for tool_result in tool_results:
169
+ messages.append({
170
+ "role": "tool",
171
+ "tool_call_id": tool_result["tool_call_id"],
172
+ "content": tool_result["content"]
173
+ })
174
+
175
+ # Get follow-up response
176
+ follow_up_stream = self.client.chat.completions.create(
177
+ model=model,
178
+ messages=messages,
179
+ temperature=temperature,
180
+ max_tokens=max_tokens,
181
+ stream=True
182
+ )
183
+
184
+ for follow_up_chunk in follow_up_stream:
185
+ if follow_up_chunk.choices[0].delta.content:
186
+ content = follow_up_chunk.choices[0].delta.content
187
+ output_tokens += self.count_tokens(content)
188
+ full_response += content
189
+
190
+ yield {
191
+ "type": "content",
192
+ "content": content,
193
+ "done": False
194
+ }
195
+
196
+ # Calculate final metrics
197
+ end_time = time.time()
198
+ response_time = end_time - start_time
199
+
200
+ # Calculate costs
201
+ model_info = self.config["models"].get(model, {})
202
+ input_cost = (input_tokens / 1_000_000) * model_info.get("input_cost", 0)
203
+ output_cost = (output_tokens / 1_000_000) * model_info.get("output_cost", 0)
204
+ total_cost = input_cost + output_cost
205
+
206
+ # Yield final metadata
207
+ yield {
208
+ "type": "metadata",
209
+ "done": True,
210
+ "usage": {
211
+ "input_tokens": input_tokens,
212
+ "output_tokens": output_tokens,
213
+ "total_tokens": input_tokens + output_tokens
214
+ },
215
+ "cost": {
216
+ "input": round(input_cost, 4),
217
+ "output": round(output_cost, 4),
218
+ "total": round(total_cost, 4)
219
+ },
220
+ "response_time": round(response_time, 2),
221
+ "model": model,
222
+ "version_context": f"{product.capitalize()} {version}"
223
+ }
224
+
225
+ except Exception as e:
226
+ logger.error(f"Error in chat completion: {str(e)}")
227
+ yield {
228
+ "type": "error",
229
+ "error": str(e),
230
+ "done": True
231
+ }
232
+
233
+ def _execute_tool_calls(self, tool_calls: List[Dict]) -> List[Dict]:
234
+ """Execute tool calls and return results."""
235
+ results = []
236
+
237
+ for tool_call in tool_calls:
238
+ function_name = tool_call["function"]["name"]
239
+ arguments = json.loads(tool_call["function"]["arguments"])
240
+
241
+ if function_name == "search_vector_store":
242
+ result = execute_vector_search(
243
+ self.vector_store_manager,
244
+ arguments["query"],
245
+ arguments["vector_store_name"],
246
+ arguments.get("max_results", 5)
247
+ )
248
+ content = format_search_results_for_context(result)
249
+
250
+ elif function_name == "read_document_pages":
251
+ result = execute_document_read(
252
+ self.document_reader,
253
+ arguments["document_name"],
254
+ arguments.get("page_numbers")
255
+ )
256
+ content = format_document_content_for_context(result)
257
+
258
+ else:
259
+ content = f"Unknown function: {function_name}"
260
+
261
+ results.append({
262
+ "tool_call_id": tool_call["id"],
263
+ "content": content
264
+ })
265
+
266
+ return results
267
+
268
+ def get_available_versions(self) -> Dict[str, List[str]]:
269
+ """Get all available product versions."""
270
+ return self.vector_store_manager.list_available_versions()
271
+
272
+ def get_available_models(self) -> Dict[str, Dict]:
273
+ """Get available models and their information."""
274
+ return self.config["models"]
backend/document_reader.py ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Document Reader for page-level document access.
3
+ """
4
+
5
+ import os
6
+ import json
7
+ from typing import List, Optional, Dict, Union
8
+ from pathlib import Path
9
+ import logging
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+
14
+ class DocumentReader:
15
+ def __init__(self, pages_dir: Optional[Path] = None):
16
+ """Initialize the document reader."""
17
+ self.pages_dir = pages_dir or Path(__file__).parent.parent / "pages"
18
+ self.document_index = self._load_document_index()
19
+
20
+ def _load_document_index(self) -> Dict:
21
+ """Load document index if available."""
22
+ index_path = self.pages_dir / "document_index.json"
23
+ if index_path.exists():
24
+ try:
25
+ with open(index_path, 'r') as f:
26
+ return json.load(f)
27
+ except Exception as e:
28
+ logger.error(f"Error loading document index: {e}")
29
+ return {}
30
+
31
+ def _normalize_document_name(self, document_name: str) -> str:
32
+ """Normalize document name for consistent file matching."""
33
+ # Remove common prefixes/suffixes
34
+ name = document_name.strip()
35
+ name = name.replace(" ", "_")
36
+ name = name.replace(".", "_")
37
+
38
+ # Handle different formats
39
+ if not name.endswith(("UserGuide", "InstallationGuide", "QuickStartGuide")):
40
+ # Try to identify the document type
41
+ if "user" in name.lower() and "guide" in name.lower():
42
+ if not name.endswith("UserGuide"):
43
+ name = name.replace("User_Guide", "UserGuide")
44
+ elif "installation" in name.lower() and "guide" in name.lower():
45
+ if not name.endswith("InstallationGuide"):
46
+ name = name.replace("Installation_Guide", "InstallationGuide")
47
+ elif "quick" in name.lower() and "start" in name.lower():
48
+ if not name.endswith("QuickStartGuide"):
49
+ name = name.replace("Quick_Start_Guide", "QuickStartGuide")
50
+
51
+ return name
52
+
53
+ def get_table_of_contents(self, document_name: str) -> Optional[str]:
54
+ """Get the table of contents for a document."""
55
+ normalized_name = self._normalize_document_name(document_name)
56
+ toc_filename = f"{normalized_name}_TOC.txt"
57
+ toc_path = self.pages_dir / toc_filename
58
+
59
+ if not toc_path.exists():
60
+ # Try alternative naming conventions
61
+ alternatives = [
62
+ f"{document_name}_TOC.txt",
63
+ f"{document_name.replace(' ', '_')}_TOC.txt",
64
+ f"{document_name.replace('.', '_')}_TOC.txt"
65
+ ]
66
+
67
+ for alt in alternatives:
68
+ alt_path = self.pages_dir / alt
69
+ if alt_path.exists():
70
+ toc_path = alt_path
71
+ break
72
+
73
+ if toc_path.exists():
74
+ try:
75
+ with open(toc_path, 'r', encoding='utf-8') as f:
76
+ return f.read()
77
+ except Exception as e:
78
+ logger.error(f"Error reading TOC file {toc_path}: {e}")
79
+ return None
80
+
81
+ logger.warning(f"TOC file not found for document: {document_name}")
82
+ return None
83
+
84
+ def read_pages(self, document_name: str, page_numbers: Optional[List[int]] = None) -> Union[str, Dict[int, str]]:
85
+ """
86
+ Read specific pages from a document.
87
+ If page_numbers is None, returns the table of contents.
88
+ """
89
+ if page_numbers is None:
90
+ # Return table of contents
91
+ toc = self.get_table_of_contents(document_name)
92
+ if toc:
93
+ return f"Table of Contents for {document_name}:\n\n{toc}"
94
+ else:
95
+ return f"Table of contents not found for document: {document_name}"
96
+
97
+ # Read specific pages
98
+ normalized_name = self._normalize_document_name(document_name)
99
+ pages_content = {}
100
+
101
+ for page_num in page_numbers:
102
+ page_filename = f"{normalized_name}_page_{page_num:03d}.txt"
103
+ page_path = self.pages_dir / page_filename
104
+
105
+ if not page_path.exists():
106
+ # Try alternative formats
107
+ alternatives = [
108
+ f"{document_name}_page_{page_num:03d}.txt",
109
+ f"{document_name.replace(' ', '_')}_page_{page_num:03d}.txt",
110
+ f"{document_name.replace('.', '_')}_page_{page_num:03d}.txt"
111
+ ]
112
+
113
+ for alt in alternatives:
114
+ alt_path = self.pages_dir / alt
115
+ if alt_path.exists():
116
+ page_path = alt_path
117
+ break
118
+
119
+ if page_path.exists():
120
+ try:
121
+ with open(page_path, 'r', encoding='utf-8') as f:
122
+ pages_content[page_num] = f.read()
123
+ except Exception as e:
124
+ logger.error(f"Error reading page {page_num} from {document_name}: {e}")
125
+ pages_content[page_num] = f"Error reading page {page_num}"
126
+ else:
127
+ pages_content[page_num] = f"Page {page_num} not found"
128
+
129
+ # Format the output
130
+ if len(pages_content) == 1:
131
+ page_num = list(pages_content.keys())[0]
132
+ return f"Page {page_num} of {document_name}:\n\n{pages_content[page_num]}"
133
+ else:
134
+ formatted_pages = []
135
+ for page_num in sorted(pages_content.keys()):
136
+ formatted_pages.append(f"=== Page {page_num} ===\n{pages_content[page_num]}")
137
+ return f"Pages from {document_name}:\n\n" + "\n\n".join(formatted_pages)
138
+
139
+ def list_available_documents(self) -> List[str]:
140
+ """List all available documents."""
141
+ documents = set()
142
+
143
+ # Scan for TOC files
144
+ for toc_file in self.pages_dir.glob("*_TOC.txt"):
145
+ doc_name = toc_file.stem.replace("_TOC", "")
146
+ documents.add(doc_name)
147
+
148
+ # Also check document index
149
+ if self.document_index:
150
+ documents.update(self.document_index.keys())
151
+
152
+ return sorted(list(documents))
153
+
154
+ def get_document_info(self, document_name: str) -> Dict[str, any]:
155
+ """Get information about a document (number of pages, etc.)."""
156
+ normalized_name = self._normalize_document_name(document_name)
157
+ info = {
158
+ "name": document_name,
159
+ "normalized_name": normalized_name,
160
+ "has_toc": False,
161
+ "page_count": 0,
162
+ "available_pages": []
163
+ }
164
+
165
+ # Check for TOC
166
+ toc_path = self.pages_dir / f"{normalized_name}_TOC.txt"
167
+ info["has_toc"] = toc_path.exists()
168
+
169
+ # Count pages
170
+ page_pattern = f"{normalized_name}_page_*.txt"
171
+ page_files = list(self.pages_dir.glob(page_pattern))
172
+
173
+ if not page_files:
174
+ # Try alternative patterns
175
+ for alt_pattern in [f"{document_name}_page_*.txt",
176
+ f"{document_name.replace(' ', '_')}_page_*.txt"]:
177
+ page_files = list(self.pages_dir.glob(alt_pattern))
178
+ if page_files:
179
+ break
180
+
181
+ if page_files:
182
+ page_numbers = []
183
+ for page_file in page_files:
184
+ try:
185
+ # Extract page number from filename
186
+ page_num_str = page_file.stem.split("_page_")[-1]
187
+ page_num = int(page_num_str)
188
+ page_numbers.append(page_num)
189
+ except:
190
+ pass
191
+
192
+ info["page_count"] = len(page_numbers)
193
+ info["available_pages"] = sorted(page_numbers)
194
+
195
+ return info
backend/test_pdf_mapping.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Test script to verify PDF mapping before uploading."""
3
+
4
+ from upload_versioned_pdfs import VectorStoreUploader
5
+ from pathlib import Path
6
+
7
+ def main():
8
+ """Test PDF file detection and mapping."""
9
+ uploader = VectorStoreUploader()
10
+
11
+ print("PDF Directory:", uploader.pdf_directory)
12
+ print("Directory exists:", uploader.pdf_directory.exists())
13
+ print()
14
+
15
+ if uploader.pdf_directory.exists():
16
+ all_pdfs = list(uploader.pdf_directory.glob("*.pdf"))
17
+ print(f"Total PDFs found: {len(all_pdfs)}")
18
+ print("\nAll PDF files:")
19
+ for pdf in sorted(all_pdfs):
20
+ print(f" - {pdf.name}")
21
+ print()
22
+
23
+ pdf_mapping = uploader.get_pdf_files()
24
+
25
+ print("\nPDF Mapping by Version:")
26
+ for store_name, pdf_files in pdf_mapping.items():
27
+ print(f"\n{store_name}: ({len(pdf_files)} files)")
28
+ for pdf in pdf_files:
29
+ print(f" - {pdf.name}")
30
+
31
+ if __name__ == "__main__":
32
+ main()
backend/upload_versioned_pdfs.py ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Upload versioned PDFs to separate OpenAI vector stores.
4
+ Creates one vector store per version and one general/FAQ store.
5
+ """
6
+
7
+ import os
8
+ import json
9
+ import time
10
+ import sys
11
+ from pathlib import Path
12
+ from typing import Dict, List, Optional
13
+ from openai import OpenAI, __version__ as openai_version
14
+ from datetime import datetime
15
+ from packaging import version
16
+
17
+
18
+ class VectorStoreUploader:
19
+ def __init__(self, api_key: Optional[str] = None, skip_empty: bool = True):
20
+ """Initialize the uploader with OpenAI client.
21
+
22
+ Args:
23
+ api_key: OpenAI API key
24
+ skip_empty: Skip creation of empty vector stores
25
+ """
26
+ # Check OpenAI version
27
+ if version.parse(openai_version) < version.parse("1.50.0"):
28
+ print(f"Error: OpenAI library version {openai_version} is too old.")
29
+ print("Vector stores require version 1.50.0 or higher.")
30
+ print("Please run: pip install --upgrade openai>=1.50.0")
31
+ sys.exit(1)
32
+
33
+ self.client = OpenAI(api_key=api_key or os.getenv("OPENAI_API_KEY"))
34
+ self.config_path = Path(__file__).parent.parent / "config" / "vector_stores.json"
35
+ self.pdf_directory = Path("/Users/jsv/Work/ataya/concert-master/pdfs")
36
+ self.skip_empty = skip_empty
37
+
38
+ def create_vector_store(self, name: str, description: str) -> str:
39
+ """Create a new vector store and return its ID."""
40
+ print(f"Creating vector store: {name}")
41
+ # Note: description parameter no longer supported in API
42
+ vector_store = self.client.vector_stores.create(
43
+ name=name
44
+ )
45
+ return vector_store.id
46
+
47
+ def upload_file_to_store(self, vector_store_id: str, file_path: Path) -> str:
48
+ """Upload a file to a vector store."""
49
+ print(f" Uploading {file_path.name}...")
50
+
51
+ # Upload file
52
+ with open(file_path, "rb") as file:
53
+ file_upload = self.client.files.create(
54
+ file=file,
55
+ purpose="assistants"
56
+ )
57
+
58
+ # Add file to vector store
59
+ self.client.vector_stores.files.create(
60
+ vector_store_id=vector_store_id,
61
+ file_id=file_upload.id
62
+ )
63
+
64
+ # Wait for processing
65
+ while True:
66
+ file_status = self.client.vector_stores.files.retrieve(
67
+ vector_store_id=vector_store_id,
68
+ file_id=file_upload.id
69
+ )
70
+ if file_status.status == "completed":
71
+ print(f" ✓ {file_path.name} processed successfully")
72
+ break
73
+ elif file_status.status == "failed":
74
+ print(f" ✗ {file_path.name} failed to process")
75
+ break
76
+ time.sleep(2)
77
+
78
+ return file_upload.id
79
+
80
+ def get_pdf_files(self) -> Dict[str, List[Path]]:
81
+ """Organize PDF files by version."""
82
+ pdf_mapping = {
83
+ "harmony_1_2": [],
84
+ "harmony_1_5": [],
85
+ "harmony_1_6": [],
86
+ "harmony_1_8": [],
87
+ "chorus_1_1": [],
88
+ "general_faq": []
89
+ }
90
+
91
+ if not self.pdf_directory.exists():
92
+ print(f"PDF directory not found: {self.pdf_directory}")
93
+ return pdf_mapping
94
+
95
+ # Map file patterns to versions
96
+ for pdf_file in self.pdf_directory.glob("*.pdf"):
97
+ filename = pdf_file.name.lower()
98
+
99
+ # Check for Harmony versions
100
+ if "harmony" in filename:
101
+ if "1.2" in filename or "r1.2" in filename:
102
+ pdf_mapping["harmony_1_2"].append(pdf_file)
103
+ elif "1.5" in filename or "r1.5" in filename:
104
+ pdf_mapping["harmony_1_5"].append(pdf_file)
105
+ elif "1.6" in filename or "r1.6" in filename:
106
+ pdf_mapping["harmony_1_6"].append(pdf_file)
107
+ elif "1.8" in filename or "r1.8" in filename:
108
+ pdf_mapping["harmony_1_8"].append(pdf_file)
109
+
110
+ # Check for Chorus versions
111
+ elif "chorus" in filename:
112
+ if "1.1" in filename or "r1.1" in filename:
113
+ pdf_mapping["chorus_1_1"].append(pdf_file)
114
+
115
+ # General/FAQ documents
116
+ elif any(keyword in filename for keyword in ["faq", "general", "overview", "comparison"]):
117
+ pdf_mapping["general_faq"].append(pdf_file)
118
+
119
+ return pdf_mapping
120
+
121
+ def upload_all_pdfs(self):
122
+ """Create vector stores and upload all PDFs."""
123
+ pdf_mapping = self.get_pdf_files()
124
+ vector_stores = {}
125
+ descriptions = {}
126
+
127
+ # Create vector stores and upload files
128
+ for store_name, pdf_files in pdf_mapping.items():
129
+ if not pdf_files:
130
+ if self.skip_empty:
131
+ print(f"\nNo PDFs found for {store_name}, skipping...")
132
+ continue
133
+ else:
134
+ print(f"\nNo PDFs found for {store_name}, but creating empty store...")
135
+
136
+ # Create descriptive name and description
137
+ if store_name == "general_faq":
138
+ name = "General FAQ and Overview"
139
+ description = "General information, FAQs, and cross-version content"
140
+ else:
141
+ product, version = store_name.split("_", 1)
142
+ version_display = version.replace("_", ".")
143
+ name = f"{product.capitalize()} {version_display}"
144
+ description = f"Documentation for {product.capitalize()} version {version_display}"
145
+
146
+ # Create vector store
147
+ vector_store_id = self.create_vector_store(name, description)
148
+ vector_stores[store_name] = vector_store_id
149
+ descriptions[store_name] = description
150
+
151
+ # Upload files
152
+ print(f"\nUploading {len(pdf_files)} files to {name}:")
153
+ for pdf_file in pdf_files:
154
+ self.upload_file_to_store(vector_store_id, pdf_file)
155
+
156
+ # Save configuration
157
+ self.save_config(vector_stores, descriptions)
158
+
159
+ return vector_stores
160
+
161
+ def save_config(self, vector_stores: Dict[str, str], descriptions: Dict[str, str]):
162
+ """Save vector store configuration."""
163
+ config = {
164
+ "vector_stores": vector_stores,
165
+ "descriptions": descriptions,
166
+ "latest_versions": {
167
+ "harmony": "1.8",
168
+ "chorus": "1.1"
169
+ },
170
+ "created_at": datetime.now().isoformat(),
171
+ "chunk_size": 1000,
172
+ "max_chunks": 10
173
+ }
174
+
175
+ # Ensure config directory exists
176
+ self.config_path.parent.mkdir(parents=True, exist_ok=True)
177
+
178
+ # Save configuration
179
+ with open(self.config_path, "w") as f:
180
+ json.dump(config, f, indent=2)
181
+
182
+ print(f"\nConfiguration saved to: {self.config_path}")
183
+ print(json.dumps(config, indent=2))
184
+
185
+
186
+ def main():
187
+ """Main function to run the upload process."""
188
+ import argparse
189
+
190
+ parser = argparse.ArgumentParser(description="Upload PDFs to OpenAI vector stores")
191
+ parser.add_argument(
192
+ "--create-empty",
193
+ action="store_true",
194
+ help="Create empty vector stores even if no PDFs are found"
195
+ )
196
+ parser.add_argument(
197
+ "--no-confirm",
198
+ action="store_true",
199
+ help="Skip confirmation prompt"
200
+ )
201
+ args = parser.parse_args()
202
+
203
+ print("OpenAI Chatbot MCP - Vector Store Setup")
204
+ print("=" * 50)
205
+
206
+ # Check for API key
207
+ if not os.getenv("OPENAI_API_KEY"):
208
+ print("Error: OPENAI_API_KEY environment variable not set")
209
+ return
210
+
211
+ # Create uploader and run
212
+ uploader = VectorStoreUploader(skip_empty=not args.create_empty)
213
+
214
+ # First, let's check what PDFs we have
215
+ print("\nScanning for PDF files...")
216
+ pdf_mapping = uploader.get_pdf_files()
217
+
218
+ print("\nFound PDFs:")
219
+ for store_name, pdf_files in pdf_mapping.items():
220
+ print(f"\n{store_name}:")
221
+ for pdf in pdf_files:
222
+ print(f" - {pdf.name}")
223
+
224
+ # Confirm before proceeding
225
+ if not args.no_confirm:
226
+ response = input("\nProceed with vector store creation? (yes/no): ")
227
+ if response.lower() != "yes":
228
+ print("Aborted.")
229
+ return
230
+
231
+ # Upload all PDFs
232
+ vector_stores = uploader.upload_all_pdfs()
233
+
234
+ print("\n✅ Vector store setup complete!")
235
+ print(f"Created {len(vector_stores)} vector stores")
236
+
237
+
238
+ if __name__ == "__main__":
239
+ main()
backend/vector_store_manager.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Vector Store Manager for handling multiple version-specific vector stores.
3
+ """
4
+
5
+ import os
6
+ import json
7
+ from typing import Dict, List, Optional, Tuple
8
+ from pathlib import Path
9
+ from openai import OpenAI
10
+ import logging
11
+
12
+ logger = logging.getLogger(__name__)
13
+
14
+
15
+ class VectorStoreManager:
16
+ def __init__(self, client: OpenAI, config_path: Optional[Path] = None):
17
+ """Initialize the vector store manager."""
18
+ self.client = client
19
+ self.config_path = config_path or Path(__file__).parent.parent / "config" / "vector_stores.json"
20
+ self.vector_stores = {}
21
+ self.latest_versions = {}
22
+ self.load_config()
23
+
24
+ def load_config(self):
25
+ """Load vector store configuration from file."""
26
+ if not self.config_path.exists():
27
+ logger.warning(f"Vector store config not found at {self.config_path}")
28
+ return
29
+
30
+ try:
31
+ with open(self.config_path, 'r') as f:
32
+ config = json.load(f)
33
+ self.vector_stores = config.get('vector_stores', {})
34
+ self.latest_versions = config.get('latest_versions', {})
35
+ logger.info(f"Loaded {len(self.vector_stores)} vector stores from config")
36
+ except Exception as e:
37
+ logger.error(f"Error loading vector store config: {e}")
38
+
39
+ def get_store_name_from_version(self, product: str, version: str) -> str:
40
+ """Convert product and version to store name."""
41
+ # Normalize version (e.g., "1.8" -> "1_8")
42
+ version_normalized = version.replace(".", "_")
43
+ return f"{product.lower()}_{version_normalized}"
44
+
45
+ def get_vector_store_id(self, store_name: str) -> Optional[str]:
46
+ """Get vector store ID by name."""
47
+ return self.vector_stores.get(store_name)
48
+
49
+ def query_vector_store(self, store_name: str, query: str, max_results: int = 5) -> List[Dict]:
50
+ """Query a specific vector store."""
51
+ store_id = self.get_vector_store_id(store_name)
52
+ if not store_id:
53
+ logger.warning(f"Vector store '{store_name}' not found")
54
+ return []
55
+
56
+ # Check for placeholder IDs
57
+ if store_id.startswith("vs_PLACEHOLDER"):
58
+ logger.warning(f"Vector store '{store_name}' has placeholder ID: {store_id}")
59
+ logger.warning("Please run upload_versioned_pdfs.py to create actual vector stores")
60
+ return []
61
+
62
+ try:
63
+ # Create a thread for the query
64
+ thread = self.client.beta.threads.create()
65
+
66
+ # Add the query as a message
67
+ self.client.beta.threads.messages.create(
68
+ thread_id=thread.id,
69
+ role="user",
70
+ content=query
71
+ )
72
+
73
+ # Run the assistant with the specific vector store
74
+ run = self.client.beta.threads.runs.create_and_poll(
75
+ thread_id=thread.id,
76
+ assistant_id="asst_temp", # This will be replaced with actual assistant ID
77
+ tools=[{"type": "file_search"}],
78
+ tool_resources={
79
+ "file_search": {
80
+ "vector_store_ids": [store_id]
81
+ }
82
+ }
83
+ )
84
+
85
+ # Get the messages
86
+ messages = self.client.beta.threads.messages.list(
87
+ thread_id=thread.id,
88
+ order="asc"
89
+ )
90
+
91
+ # Extract search results
92
+ results = []
93
+ for message in messages:
94
+ if message.role == "assistant":
95
+ for content in message.content:
96
+ if content.type == "text":
97
+ # Parse file search annotations
98
+ annotations = content.text.annotations
99
+ for annotation in annotations:
100
+ if annotation.type == "file_citation":
101
+ results.append({
102
+ "text": annotation.text,
103
+ "file_id": annotation.file_citation.file_id,
104
+ "quote": annotation.file_citation.quote
105
+ })
106
+
107
+ return results[:max_results]
108
+
109
+ except Exception as e:
110
+ logger.error(f"Error querying vector store '{store_name}': {e}")
111
+ return []
112
+
113
+ def query_version_and_general(self, product: str, version: str, query: str, max_results: int = 5) -> Tuple[List[Dict], List[Dict]]:
114
+ """Query both version-specific and general vector stores."""
115
+ # Query version-specific store
116
+ store_name = self.get_store_name_from_version(product, version)
117
+ version_results = self.query_vector_store(store_name, query, max_results)
118
+
119
+ # Query general/FAQ store
120
+ general_results = self.query_vector_store("general_faq", query, max_results)
121
+
122
+ return version_results, general_results
123
+
124
+ def search_across_stores(self, query: str, store_names: Optional[List[str]] = None, max_results_per_store: int = 3) -> Dict[str, List[Dict]]:
125
+ """Search across multiple vector stores."""
126
+ if store_names is None:
127
+ store_names = list(self.vector_stores.keys())
128
+
129
+ results = {}
130
+ for store_name in store_names:
131
+ if store_name in self.vector_stores:
132
+ store_results = self.query_vector_store(store_name, query, max_results_per_store)
133
+ if store_results:
134
+ results[store_name] = store_results
135
+
136
+ return results
137
+
138
+ def get_latest_version(self, product: str) -> Optional[str]:
139
+ """Get the latest version for a product."""
140
+ return self.latest_versions.get(product.lower())
141
+
142
+ def list_available_versions(self) -> Dict[str, List[str]]:
143
+ """List all available product versions."""
144
+ versions = {"harmony": [], "chorus": []}
145
+
146
+ for store_name in self.vector_stores.keys():
147
+ if store_name == "general_faq":
148
+ continue
149
+
150
+ parts = store_name.split("_", 1)
151
+ if len(parts) == 2:
152
+ product, version = parts
153
+ version_display = version.replace("_", ".")
154
+ if product in versions:
155
+ versions[product].append(version_display)
156
+
157
+ # Sort versions
158
+ for product in versions:
159
+ versions[product].sort(key=lambda x: [int(p) for p in x.split(".")])
160
+
161
+ return versions
162
+
163
+ def format_search_results(self, version_results: List[Dict], general_results: List[Dict], product: str, version: str) -> str:
164
+ """Format search results for appending to user query."""
165
+ formatted = []
166
+
167
+ if version_results:
168
+ formatted.append(f"Based on {product.capitalize()} {version} documentation:")
169
+ for i, result in enumerate(version_results, 1):
170
+ formatted.append(f"{i}. {result.get('quote', result.get('text', ''))}")
171
+ formatted.append("")
172
+
173
+ if general_results:
174
+ formatted.append("Additional general information:")
175
+ for i, result in enumerate(general_results, 1):
176
+ formatted.append(f"{i}. {result.get('quote', result.get('text', ''))}")
177
+
178
+ return "\n".join(formatted)