Spaces:

taijichat
/

chat

Running

WeMWish commited on Oct 3, 2025

Commit

8d66edb

1 Parent(s): 9b31367

Add authentication, token quota tracking, and comprehensive usage logging

Added:
- HF OAuth integration with login overlay and session management
- Token quota system (100k default) with real-time enforcement
- Supabase database for user management and usage logging
- Complete token tracking across all OpenAI API calls
- GenerationAgent: Chat Completions API
- SupervisorAgent: Assistants API
- ExecutorAgent: Vision API (describe_image)
- ManagerAgent: Aggregates all sub-agent usage
- Database schema (users, usage_logs, user_stats view)
- Python Supabase client and R wrapper
- OAuth helper functions for HF authentication
- .env.example configuration template

Changed:
- All agents now track and return token usage
- ManagerAgent checks quota before processing queries
- ManagerAgent logs all queries immediately after completion
- server.R integrates OAuth and Supabase initialization
- ui.R adds login overlay and OAuth callback handling
- Updated dependencies (supabase, python-dotenv, httr2)

Fixed:
- Untracked token usage from describe_image() Vision API calls
- Complete token tracking for accurate quota enforcement

Technical:
- Authentication flow: OAuth → token exchange → user creation → session storage
- Quota enforcement: check → process → aggregate → log → update usage
- Graceful degradation when Supabase not configured

Files changed (22) hide show

.env.example +23 -0
.gitignore +2 -1
CHANGELOG.md +95 -0
CLAUDE.md +210 -15
Dockerfile +4 -1
README.md +1 -0
WORKFLOW_CHANGES.md +0 -287
agents/executor_agent.py +27 -3
agents/generation_agent.py +21 -7
agents/manager_agent.py +137 -7
agents/supervisor_agent.py +32 -12
auth/hf_oauth.R +164 -0
codebase_analysis.md +0 -153
database_schema.sql +72 -0
plan_temp.txt +0 -30
requirements.txt +4 -1
server.R +54 -6
tools/agent_tools.py +14 -2
ui.R +52 -1
utils/__init__.py +2 -0
utils/supabase_client.py +259 -0
utils/supabase_r.R +132 -0

.env.example ADDED Viewed

	@@ -0,0 +1,23 @@

+# TaijiChat Environment Variables Template
+# Copy this file to .env and fill in your values (for local development)
+# For Hugging Face Spaces, set these as secrets in your Space settings
+# ===== OpenAI Configuration =====
+OPENAI_API_KEY=your_openai_api_key_here
+# ===== Supabase Configuration =====
+# Get these from your Supabase project settings
+SUPABASE_URL=https://your-project-id.supabase.co
+SUPABASE_KEY=your_supabase_service_role_key_here
+# ===== Hugging Face OAuth =====
+# These are automatically populated by Hugging Face Spaces when hf_oauth: true is set
+# Do NOT set these manually in Hugging Face Spaces
+# For local testing, you can create an OAuth app at https://huggingface.co/settings/applications
+# OAUTH_CLIENT_ID=your_oauth_client_id
+# OAUTH_CLIENT_SECRET=your_oauth_client_secret
+# OAUTH_SCOPES=openid profile email
+# ===== Optional Configuration =====
+# Enable/disable async processing (default: TRUE)
+TAIJICHAT_USE_ASYNC=TRUE

.gitignore CHANGED Viewed

@@ -5,7 +5,8 @@ api_key.txt
 # Ignore virtual environments
 venv/
-.venv/
 ENV/
 env/

 # Ignore virtual environments
 venv/
+.venv//
 ENV/
 env/
+./claude

CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,101 @@
 ## [Unreleased]
 ### Added
 - Literature search toggle button in chat interface
   - Users can now explicitly enable/disable external literature search

 ## [Unreleased]
+### Added
+- **Authentication & Access Control System**
+  - Hugging Face OAuth integration for user authentication
+  - Login overlay with "Sign in with Hugging Face" button
+  - One account per HF user (duplicate prevention)
+  - Session management with user context tracking
+- **Token Quota & Budget Tracking**
+  - Per-user token quota system (default: 100,000 tokens)
+  - Real-time quota checking before query processing
+  - Quota-based access control (queries rejected when quota exceeded)
+  - Token usage tracking across all OpenAI API calls
+- **Comprehensive Usage Logging**
+  - Supabase database integration for persistent storage
+  - Database schema with users and usage_logs tables
+  - Logs: user_id, timestamp, query_text, token counts, response, errors, conversation history
+  - Immediate logging after each query (before returning response)
+  - User statistics view for aggregated usage analytics
+- **Complete Token Tracking**
+  - GenerationAgent: Tracks Chat Completions API usage
+  - SupervisorAgent: Tracks Assistants API usage
+  - ExecutorAgent: Tracks token usage from executed code
+  - agent_tools.py: Captures Vision API usage from describe_image()
+  - ManagerAgent: Aggregates all token usage from sub-agents
+- **Database & Infrastructure**
+  - Supabase PostgreSQL database for user management
+  - Python Supabase client (utils/supabase_client.py)
+  - R wrapper for Supabase (utils/supabase_r.R)
+  - OAuth helper functions (auth/hf_oauth.R)
+  - Database indexes for performance optimization
+- **Configuration Files**
+  - .env.example template for environment variables
+  - README.md OAuth metadata (hf_oauth: true)
+  - database_schema.sql for Supabase setup
+### Changed
+- **ManagerAgent (agents/manager_agent.py)**
+  - Added Supabase client integration
+  - Added user context (user_id, hf_user_id) tracking
+  - Added quota checking before query processing
+  - Added comprehensive logging (success, error, quota exceeded)
+  - Added token aggregation from all sub-agents
+- **GenerationAgent (agents/generation_agent.py)**
+  - Added token usage extraction from OpenAI response
+  - Returns usage info in response dict
+- **SupervisorAgent (agents/supervisor_agent.py)**
+  - Added token usage extraction from Assistants API
+  - Returns usage info in all response paths (success and error)
+- **ExecutorAgent (agents/executor_agent.py)**
+  - Added global usage collector mechanism
+  - Aggregates token usage from executed code
+  - Returns usage info in execution result
+- **agent_tools.py**
+  - describe_image() now captures Vision API token usage
+  - Stores usage in global collector for ExecutorAgent
+- **server.R**
+  - Added OAuth initialization
+  - Added Supabase client initialization
+  - Added quota checking in chat message handler
+  - Added user context setting in agent before queries
+  - Passes Supabase client to ManagerAgent
+- **ui.R**
+  - Added login overlay for unauthenticated users
+  - Added OAuth callback JavaScript handler
+  - Added auth state UI output placeholder
+- **Dependencies**
+  - requirements.txt: Added supabase>=2.0.0, python-dotenv>=1.0.0
+  - Dockerfile: Added httr2 R package for OAuth
+### Fixed
+- Fixed untracked token usage from describe_image() API calls
+- Fixed ExecutorAgent token tracking for Vision API usage
+- All OpenAI API calls now tracked for accurate quota enforcement
+### Technical Details
+- **Authentication Flow**: Login overlay → OAuth redirect → Token exchange → User creation/retrieval → Session storage
+- **Quota Enforcement**: Check quota → Process query → Aggregate tokens → Log to database → Update user usage
+- **Token Tracking**: GenerationAgent + SupervisorAgent + ExecutorAgent → ManagerAgent aggregation → Supabase logging
+- **Graceful Degradation**: System works without Supabase (logs as "anonymous")
+---
+## [Previous Unreleased Features]
 ### Added
 - Literature search toggle button in chat interface
   - Users can now explicitly enable/disable external literature search

CLAUDE.md CHANGED Viewed

@@ -6,6 +6,33 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 TaijiChat is a Shiny web application that combines R and Python to provide an interactive chat interface for analyzing transcription factor data from T cell states research. The application uses a multi-agent architecture with OpenAI GPT models to generate insights and visualizations from genomics datasets.
 ## Key Architecture
 ### Multi-Agent System
@@ -16,6 +43,13 @@ The application uses a specialized agent architecture for handling user queries:
 - **SupervisorAgent** (`agents/supervisor_agent.py`): Reviews generated code for safety and compliance before execution
 - **ExecutorAgent** (`agents/executor_agent.py`): Executes approved Python code in a restricted environment
 ### Technology Stack
 - **R (Shiny)**: Frontend web interface and server logic
 - **Python**: Backend agents and data processing tools
@@ -53,7 +87,7 @@ shiny::runApp('.', host='0.0.0.0', port=7860)
 - Create `api_key.txt` file in project root
 **Python Environment:**
-Configure reticulate in `ui.R` by uncommenting one of:
 ```r
 # Option 1: Python executable path
 reticulate::use_python("/path/to/python", required = TRUE)
@@ -61,29 +95,31 @@ reticulate::use_python("/path/to/python", required = TRUE)
 # Option 2: Virtual environment
 reticulate::use_virtualenv("venv_name", required = TRUE)
-# Option 3: Conda environment
 reticulate::use_condaenv("conda_env_name", required = TRUE)
 ```
 **Install Python Dependencies:**
 ```bash
 pip install -r requirements.txt
 ```
 ### Performance Features
 **Async Processing (Default):**
 - Set `TAIJICHAT_USE_ASYNC=TRUE` to enable async agents (default)
 - Set `TAIJICHAT_USE_ASYNC=FALSE` to use synchronous agents
-**Cache Management:**
-```r
-# Check cache statistics
-reticulate::py_run_string("
-from agents.smart_cache import get_cache_stats
-print('Cache Stats:', get_cache_stats())
-")
-```
 ## Key Implementation Details
@@ -114,7 +150,7 @@ All data analysis functions are centralized in `tools/agent_tools.py`:
 ### Data Handling
 - **Pre-ranked Tables**: Never re-sort TF ranking data - tables come pre-ranked by importance
 - **Path Management**: All file paths are relative to project root via `BASE_WWW_PATH`
-- **Caching**: 5-minute TTL with 100MB memory limit for performance
 ### Code Generation Rules
 - Only use functions from `tools.agent_tools` module
@@ -123,22 +159,181 @@ All data analysis functions are centralized in `tools/agent_tools.py`:
 - All generated code must pass SupervisorAgent safety review
 ### UI Integration
-- Chat interface uses custom JavaScript for real-time updates
 - Lazy loading implemented for large image datasets
-- Progress streaming shows agent reasoning steps to users
 ## Troubleshooting
 **Common Issues:**
 - **reticulate errors**: Verify Python environment configuration in `ui.R`
-- **Import failures**: Ensure all requirements are installed in configured Python environment
 - **API errors**: Check `OPENAI_API_KEY` is set correctly
 - **Performance issues**: Enable async mode with `TAIJICHAT_USE_ASYNC=TRUE`
 **Asset Optimization:**
 - Images optimized to 49% of original size for faster loading
 - Backup of original assets available in `www_backup_original/`
 ## Literature Search Toggle Feature
 ### **Overview**
@@ -151,7 +346,7 @@ TaijiChat includes a toggle button that allows users to control external literat
 - **Controls**: Click to enable/disable external literature search
 ### **Technical Implementation**
-- **Frontend**: Button state managed via JavaScript and CSS styling
 - **Backend**: Literature preference passed from R to Python agents
 - **Agent Integration**: `ManagerAgent.process_single_query_with_preferences()` method handles literature control
 - **Internal Data**: Paper-based analysis (internal dataset) remains always enabled

 TaijiChat is a Shiny web application that combines R and Python to provide an interactive chat interface for analyzing transcription factor data from T cell states research. The application uses a multi-agent architecture with OpenAI GPT models to generate insights and visualizations from genomics datasets.
+## Project Structure
+```
+taijichat/
+├── ui.R                    # Shiny UI definition and Python environment setup
+├── server.R                # Shiny server logic with agent integration
+├── chat_ui.R               # Chat interface UI components
+├── agents/                 # Python agent system
+│   ├── manager_agent.py    # Central orchestrator
+│   ├── generation_agent.py # Code generation with 13-step reasoning
+│   ├── supervisor_agent.py # Safety validation
+│   └── executor_agent.py   # Sandboxed code execution
+├── tools/
+│   └── agent_tools.py      # All data analysis functions
+├── www/                    # Static assets and data files
+│   ├── chat_script.js      # Frontend JavaScript
+│   ├── chat_styles.css     # Custom styling
+│   ├── tablePagerank/      # TF PageRank data
+│   ├── waveanalysis/       # Wave analysis results
+│   ├── TFcorintextrm/      # TF correlation data
+│   └── tfcommunities/      # Community analysis
+├── R/
+│   └── caching.R           # R-side caching utilities
+├── requirements.txt        # Python dependencies
+└── Dockerfile              # Container configuration
+```
 ## Key Architecture
 ### Multi-Agent System
 - **SupervisorAgent** (`agents/supervisor_agent.py`): Reviews generated code for safety and compliance before execution
 - **ExecutorAgent** (`agents/executor_agent.py`): Executes approved Python code in a restricted environment
+**Query Processing Flow:**
+1. User query received by ManagerAgent via `process_single_query_with_preferences()`
+2. GenerationAgent creates execution plan
+3. SupervisorAgent validates generated code
+4. ExecutorAgent runs approved code in sandbox
+5. Results streamed back to UI via R callback system
 ### Technology Stack
 - **R (Shiny)**: Frontend web interface and server logic
 - **Python**: Backend agents and data processing tools
 - Create `api_key.txt` file in project root
 **Python Environment:**
+Configure reticulate in `ui.R` (lines 11-18) by uncommenting one of:
 ```r
 # Option 1: Python executable path
 reticulate::use_python("/path/to/python", required = TRUE)
 # Option 2: Virtual environment
 reticulate::use_virtualenv("venv_name", required = TRUE)
+# Option 3: Conda environment
 reticulate::use_condaenv("conda_env_name", required = TRUE)
 ```
+**Docker Environment:**
+When using Docker, Python environment is automatically configured via `RETICULATE_PYTHON` environment variable (see Dockerfile:5)
 **Install Python Dependencies:**
 ```bash
 pip install -r requirements.txt
 ```
+**Install R Dependencies:**
+```r
+install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'))
+```
 ### Performance Features
 **Async Processing (Default):**
 - Set `TAIJICHAT_USE_ASYNC=TRUE` to enable async agents (default)
 - Set `TAIJICHAT_USE_ASYNC=FALSE` to use synchronous agents
+**Module Reloading:**
+If making changes to Python agents, the module is automatically reloaded on server startup (see server.R:89-97)
 ## Key Implementation Details
 ### Data Handling
 - **Pre-ranked Tables**: Never re-sort TF ranking data - tables come pre-ranked by importance
 - **Path Management**: All file paths are relative to project root via `BASE_WWW_PATH`
+- **Excel File Caching**: Schema information is cached to improve performance when discovering files (see tools/agent_tools.py)
 ### Code Generation Rules
 - Only use functions from `tools.agent_tools` module
 - All generated code must pass SupervisorAgent safety review
 ### UI Integration
+- Chat interface uses custom JavaScript (`www/chat_script.js`) for real-time updates
+- Custom CSS styling in `www/chat_styles.css`
+- Chat UI components defined in `chat_ui.R`
 - Lazy loading implemented for large image datasets
+- Progress streaming shows agent reasoning steps to users via callback system (see server.R:17-33)
+- Resizable chat panel with drag handle
+## Development Workflow
+### Debugging
+**R Console Debugging:**
+- Print statements in `ui.R` and `server.R` appear in R console
+- Check Python integration status with `reticulate::py_config()`
+**Python Agent Debugging:**
+- Python print statements appear in R console when running locally
+- Agent thought callbacks stream to UI in real-time
+- Check conversation history in agent state
+**Module Reloading:**
+When modifying Python agent code:
+1. Restart the Shiny app (changes auto-reload on server startup)
+2. For manual reload during development, use reticulate to reload modules
+### Testing
+**Manual Testing:**
+- Use `test_queries.txt` for common test scenarios
+- `tested_queries.txt` contains verified working queries
+**Python Syntax Validation:**
+```bash
+python syntax_check.py
+```
 ## Troubleshooting
 **Common Issues:**
 - **reticulate errors**: Verify Python environment configuration in `ui.R`
+- **Import failures**: Ensure all requirements are installed in configured Python environment
 - **API errors**: Check `OPENAI_API_KEY` is set correctly
 - **Performance issues**: Enable async mode with `TAIJICHAT_USE_ASYNC=TRUE`
+- **Python module changes not reflected**: Restart Shiny app to trigger automatic reload
 **Asset Optimization:**
 - Images optimized to 49% of original size for faster loading
 - Backup of original assets available in `www_backup_original/`
+## Authentication & Access Control
+### Hugging Face OAuth Integration
+**Setup:**
+- OAuth enabled via `hf_oauth: true` in `README.md` metadata
+- Automatically provides: `OAUTH_CLIENT_ID`, `OAUTH_CLIENT_SECRET`, `OAUTH_SCOPES`
+- Implementation in `auth/hf_oauth.R`
+**User Flow:**
+1. Unauthenticated users see login overlay
+2. Click "Sign in with Hugging Face" → OAuth flow
+3. After authentication, user info stored in session
+4. User created/retrieved from Supabase database
+5. Single account per HF user enforced
+**Key Functions:**
+- `initialize_oauth()` - Configure OAuth client
+- `get_authorization_url()` - Generate OAuth URL
+- `exchange_code_for_token()` - Get access token
+- `get_user_info()` - Fetch HF user profile
+- `is_authenticated(session)` - Check auth status
+### Token Quota System
+**Configuration:**
+- Default quota: 100,000 tokens per user (configurable in `database_schema.sql`)
+- Tracked in Supabase `users` table
+- Checked before each query in `manager_agent.py:_check_quota_before_processing()`
+**Implementation:**
+```python
+# In manager_agent.py
+has_quota, remaining, error = self._check_quota_before_processing()
+if not has_quota:
+    return quota_exceeded_message  # Query blocked
+```
+**Quota Management:**
+- Real-time usage tracking after each query
+- Automatic increment via `update_token_usage()`
+- View remaining quota via `user_stats` view in Supabase
+### Comprehensive Usage Logging
+**Logging Points:**
+- ✅ **After successful query** (manager_agent.py:500-518)
+- ✅ **On query error** (manager_agent.py:527-538)
+- ✅ **On quota exceeded** (manager_agent.py:472-479)
+**Logged Data:**
+```python
+{
+    'user_id': UUID,
+    'hf_user_id': str,
+    'query_text': str,
+    'prompt_tokens': int,
+    'completion_tokens': int,
+    'total_tokens': int,
+    'model': str,  # e.g., "gpt-4o"
+    'response_text': str,
+    'error_message': str | None,
+    'conversation_history': JSONB,
+    'is_image_response': bool,
+    'image_path': str | None
+}
+```
+**Database Schema:**
+- Tables: `users`, `usage_logs` (see `database_schema.sql`)
+- View: `user_stats` (aggregated statistics)
+- Indexes for performance on `hf_user_id`, `timestamp`, `error_message`
+### Token Tracking Architecture
+**Flow:**
+1. `GenerationAgent` captures usage from OpenAI API response
+2. `SupervisorAgent` captures usage from OpenAI API response
+3. `ManagerAgent` aggregates via `_aggregate_token_usage()`
+4. Total logged to Supabase immediately after query
+5. User's `tokens_used` incremented atomically
+**Implementation:**
+```python
+# In generation_agent.py and supervisor_agent.py
+if hasattr(response, 'usage') and response.usage:
+    usage_info = {
+        'prompt_tokens': response.usage.prompt_tokens,
+        'completion_tokens': response.usage.completion_tokens,
+        'total_tokens': response.usage.total_tokens
+    }
+    parsed_response['usage'] = usage_info
+```
+### Environment Variables
+**Required for Production:**
+```bash
+# Supabase (get from project settings)
+SUPABASE_URL=https://your-project.supabase.co
+SUPABASE_KEY=your_service_role_key
+# OAuth (auto-set by HF Spaces)
+OAUTH_CLIENT_ID=auto_populated
+OAUTH_CLIENT_SECRET=auto_populated
+OAUTH_SCOPES=auto_populated
+# OpenAI (existing)
+OPENAI_API_KEY=your_key
+```
+**Setup in Hugging Face Spaces:**
+1. Go to Space Settings → Repository secrets
+2. Add `SUPABASE_URL` and `SUPABASE_KEY`
+3. OAuth vars are auto-added when `hf_oauth: true` is set
+### Supabase Integration
+**Python Client:** `utils/supabase_client.py`
+- `SupabaseClient` class with methods for all operations
+- Singleton via `get_supabase_client()`
+- Graceful degradation if Supabase disabled
+**R Interface:** `utils/supabase_r.R`
+- Wrapper functions callable from R
+- Uses `reticulate` to call Python client
+- Functions: `initialize_supabase()`, `check_user_quota()`, `get_or_create_user()`, etc.
+**Key Operations:**
+- `get_or_create_user()` - Prevent duplicate accounts
+- `check_quota()` - Returns (has_quota, remaining, used)
+- `log_usage()` - **Called immediately after query**
+- `update_token_usage()` - Increment user's usage
 ## Literature Search Toggle Feature
 ### **Overview**
 - **Controls**: Click to enable/disable external literature search
 ### **Technical Implementation**
+- **Frontend**: Button state managed via JavaScript and CSS styling
 - **Backend**: Literature preference passed from R to Python agents
 - **Agent Integration**: `ManagerAgent.process_single_query_with_preferences()` method handles literature control
 - **Internal Data**: Paper-based analysis (internal dataset) remains always enabled

Dockerfile CHANGED Viewed

@@ -17,7 +17,7 @@ RUN apt-get update && apt-get install -y \
 # Install R packages
 # Added .libPaths() to ensure installation in the main library site
-RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
 # Verify reticulate installation
 RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
@@ -31,6 +31,9 @@ RUN R -e "if (!requireNamespace('shinyjs', quietly = TRUE)) { stop('shinyjs pack
 # Verify digest installation
 RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
 # Install Python packages
 COPY requirements.txt /app/requirements.txt
 RUN pip3 install --no-cache-dir -r /app/requirements.txt

 # Install R packages
 # Added .libPaths() to ensure installation in the main library site
+RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest', 'httr2'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
 # Verify reticulate installation
 RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
 # Verify digest installation
 RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
+# Verify httr2 installation
+RUN R -e "if (!requireNamespace('httr2', quietly = TRUE)) { stop('httr2 package not found after installation') } else { print(paste('httr2 version:', packageVersion('httr2'))) }"
 # Install Python packages
 COPY requirements.txt /app/requirements.txt
 RUN pip3 install --no-cache-dir -r /app/requirements.txt

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ colorFrom: indigo
 colorTo: green
 sdk: docker
 pinned: false
 ---
 # Taijichat Application

 colorTo: green
 sdk: docker
 pinned: false
+hf_oauth: true
 ---
 # Taijichat Application

WORKFLOW_CHANGES.md DELETED Viewed

@@ -1,287 +0,0 @@
-# TaijiChat Workflow Changes: Literature Dialog Removal
-## Overview
-This document outlines the major changes made to the TaijiChat multi-agent system to improve user experience by removing the upfront literature confirmation dialog and implementing a post-analysis literature exploration approach.
-## Problem Statement
-### Previous Workflow Issues:
-1. **User Friction**: Every query was blocked by a literature preference dialog before processing
-2. **Interruption of Flow**: Users had to make decisions before seeing any analysis results
-3. **Unclear Context**: Users couldn't make informed decisions about literature sources without seeing initial results
-4. **Pattern Matching Limitations**: Hardcoded keyword matching was unreliable for determining user intent
-## Solution Design
-### New Workflow Philosophy:
-- **Analyze First, Explore Later**: Provide immediate value with optional deeper exploration
-- **LLM-Powered Classification**: Use AI reasoning instead of pattern matching for intent detection
-- **Clear Source Distinction**: Differentiate between primary paper (guaranteed) vs external literature (supplementary)
-- **Progressive Disclosure**: Natural conversation flow with contextual followup options
-## Implementation Details
-### 1. ManagerAgent Changes (`agents/manager_agent.py`)
-#### **Removed Components:**
-```python
-# REMOVED: Literature confirmation dialog
-def _request_literature_confirmation_upfront(self, user_query: str) -> str:
-    # This entire method was removed
-```
-#### **Modified Components:**
-```python
-def _process_turn(self, user_query_text: str) -> tuple:
-    # OLD: Asked for literature preferences before processing
-    # NEW: Process directly with default settings (both sources enabled)
-    response_text = self._process_with_literature_preferences(
-        user_query_text,
-        use_paper=True,
-        use_external_literature=True
-    )
-    return response_text, False, None
-```
-#### **Enhanced Features:**
-- Proper conversation history management
-- Direct processing without interruption
-- Maintains all existing security features
-### 2. GenerationAgent Changes (`agents/generation_agent.py`)
-#### **Enhanced 13-Step Reasoning Process:**
-```
-1. Analyze the user query in detail
-2. Analyze the conversation history if there's any
-3. Analyze images, paper, data according to the plan if there's any provided
-4. Analyze errors from previous attempts if there's any
-5. Read the paper description to understand what the paper is about
-6. **NEW: QUERY TYPE CLASSIFICATION:**
-   - Is this a NEW_TASK (fresh analytical question) or FOLLOWUP_REQUEST (responding to literature offer)?
-   - If FOLLOWUP_REQUEST, what does user want: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE?
-   - Base decision on conversation context and user intent, not keywords
-   - Consider if previous response contained "Explore Supporting Literature" section
-7. Read the tools documentation thoroughly
-8. Decide which tools can be helpful when answering the query
-9. Read the data documentation
-10. Decide which datasets are relevant to the user query
-11. Decide whether the user query can be solved by paper or tools or data or a combination
-12. Decide whether the user query is about image(s)
-13. Put everything together to make a comprehensive plan
-```
-#### **New Helper Methods:**
-```python
-def _check_for_literature_offer(self, conversation_history: list) -> bool:
-    """Check if previous response contained literature exploration offer."""
-def _classify_query_type(self, user_query: str, conversation_history: list) -> dict:
-    """Provide context for LLM-based query classification."""
-def _append_literature_offer(self, explanation: str) -> str:
-    """Append literature exploration options to NEW_TASK responses."""
-```
-#### **Response Format Rules:**
-- **NEW_TASK**: Provide analysis + literature exploration offer
-- **FOLLOWUP_REQUEST**: Execute requested literature analysis without new offer
-### 3. Literature Offer Format
-#### **Clear Source Distinction:**
-```markdown
----
-**Explore Supporting Literature:**
-📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
-🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
-📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
-*Note: External literature serves as supplementary information only.*
-```
-#### **Key Benefits:**
-- **Primary Paper**: Vetted, guaranteed accuracy, foundational to website
-- **External Literature**: Recent, supplementary, not guaranteed by website
-- **User Choice**: Informed decision about source reliability vs recency
-## Workflow Examples
-### Example 1: Fresh Query → Analysis + Offer
-**User Input:** *"What are the top 5 TEXterm-specific TFs?"*
-**System Flow:**
-1. ManagerAgent processes immediately (no dialog)
-2. GenerationAgent Step 6: Classification → NEW_TASK
-3. Execute TF data analysis
-4. Return results with literature exploration offer
-**Expected Response:**
-```
-The top 5 TEXterm-specific transcription factors are:
-1. Zscan20 (p-value: 0.001)
-2. Jdp2 (p-value: 0.002)
-3. Zfp324 (p-value: 0.003)
-4. Batf (p-value: 0.004)
-5. Ikzf1 (p-value: 0.005)
-These rankings are based on statistical significance from the dataset analysis.
----
-**Explore Supporting Literature:**
-📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these TFs.
-🔍 **Recent Publications**: Search external academic databases for the latest research on these transcription factors.
-📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
-*Note: External literature serves as supplementary information only.*
-```
-### Example 2: Literature Followup → Targeted Analysis
-**User Input:** *"Search recent publications about these TFs"*
-**System Flow:**
-1. GenerationAgent detects previous literature offer
-2. Step 6: Classification → FOLLOWUP_REQUEST, intent: EXTERNAL_LITERATURE
-3. Execute literature search using previous TF context
-4. Return literature analysis (no new offer)
-**Expected Response:**
-```
-## Recent Literature on TEXterm Transcription Factors
-Based on external academic database search, here are key recent findings:
-**Zscan20 in T Cell Exhaustion:**
-Recent studies [1] demonstrate that Zscan20 acts as a master regulator of terminal exhaustion...
-**Jdp2 Regulatory Networks:**
-New research [2] reveals Jdp2's role in chromatin remodeling during exhaustion programming...
-[Additional literature analysis with proper citations]
-## References
-[1] Smith et al. (2023). Zscan20 controls T cell exhaustion pathways. Nature Immunology.
-[2] Johnson et al. (2023). Jdp2 in immune regulation. Cell.
-*This analysis is based on external literature sources and serves as supplementary information.*
-```
-### Example 3: Primary Paper Request → Paper Analysis
-**User Input:** *"What does the foundational study say about these TFs?"*
-**System Flow:**
-1. Step 6: Classification → FOLLOWUP_REQUEST, intent: PRIMARY_PAPER
-2. Analyze paper.pdf with previous TF context
-3. Return focused paper analysis
-## Technical Implementation
-### Query Classification Logic
-The system uses LLM reasoning instead of pattern matching:
-```python
-# Context provided to LLM for classification
-classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
-classification_instructions += f"\\n- Previous response had literature offer: {has_previous_offer}"
-if has_previous_offer:
-    classification_instructions += "\\n- This query might be a FOLLOWUP_REQUEST for literature analysis"
-    classification_instructions += "\\n- Determine user intent: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE"
-    classification_instructions += "\\n- If FOLLOWUP_REQUEST, do NOT append literature offer to final response"
-else:
-    classification_instructions += "\\n- This is likely a NEW_TASK requiring fresh analysis"
-    classification_instructions += "\\n- If status is CODE_COMPLETE, append literature offer to explanation"
-```
-### Conversation History Management
-```python
-# ManagerAgent properly manages conversation state
-def _process_with_literature_preferences(self, user_query: str, use_paper: bool, use_external_literature: bool) -> str:
-    # Process query and get response
-    final_response = final_plan_for_turn.get('explanation', 'Processing completed.')
-    # Add response to conversation history for future context
-    self.conversation_history.append({"role": "assistant", "content": final_response})
-    return final_response
-```
-## Benefits
-### 1. **Improved User Experience**
-- **Immediate Response**: No blocking dialogs
-- **Natural Flow**: Conversational interaction
-- **Informed Decisions**: Literature choices made after seeing results
-### 2. **Better Intent Recognition**
-- **LLM-Powered**: Semantic understanding vs keyword matching
-- **Context-Aware**: Considers conversation history
-- **Flexible**: Adapts to various user phrasings
-### 3. **Clear Information Hierarchy**
-- **Primary Sources**: Guaranteed accuracy, foundational research
-- **Supplementary Sources**: Recent literature, clearly marked as external
-- **User Agency**: Informed choice about source reliability
-### 4. **Maintained Security**
-- **All existing safeguards preserved**
-- **SupervisorAgent**: Code review unchanged
-- **ExecutorAgent**: Sandboxed execution unchanged
-- **Literature preferences**: Still respected in execution
-## Testing
-### Test Scenarios Created:
-1. **Fresh Query Test**: Verify immediate analysis + literature offer
-2. **External Literature Followup**: Test FOLLOWUP_REQUEST classification
-3. **Primary Paper Followup**: Test paper analysis request
-4. **Conversation Context**: Verify proper history management
-### Test File: `test_workflow.py`
-- Comprehensive workflow testing
-- Conversation history verification
-- Response format validation
-## Migration Notes
-### Backward Compatibility
-- **R Interface**: `handle_literature_confirmation()` method marked as LEGACY but preserved
-- **Existing Data**: All dataset access patterns unchanged
-- **Security Model**: No changes to permission structure
-### Deployment Considerations
-- **No breaking changes** to existing functionality
-- **Enhanced user experience** without compromising security
-- **Gradual rollout** possible through feature flags if needed
-## Future Enhancements
-### Potential Improvements:
-1. **Smart Context Extraction**: Better extraction of relevant terms from previous analysis for literature searches
-2. **Citation Quality**: Enhanced citation formatting and link validation
-3. **User Preferences**: Optional user settings to remember literature preferences
-4. **Analytics**: Track which literature options users choose most frequently
-## Conclusion
-The new workflow successfully addresses the original user experience issues while maintaining all security and functionality requirements. The system now provides immediate value to users while offering natural pathways for deeper exploration, creating a more engaging and efficient interaction model.
-Key success metrics:
-- ✅ **Removed user friction**: No blocking dialogs
-- ✅ **Maintained security**: All safeguards preserved
-- ✅ **Improved classification**: LLM-based intent recognition
-- ✅ **Clear information hierarchy**: Distinguished source types
-- ✅ **Natural conversation flow**: Progressive disclosure model

agents/executor_agent.py CHANGED Viewed

@@ -52,21 +52,45 @@ class ExecutorAgent:
         }
         # No separate locals, exec will use restricted_globals as locals too
         captured_output = io.StringIO()
         try:
             with contextlib.redirect_stdout(captured_output):
-                exec(python_code, restricted_globals)
             output_str = captured_output.getvalue()
             return {
                 "execution_output": output_str.strip() if output_str else "(No output printed by code)",
-                "execution_status": "SUCCESS"
             }
         except Exception as e:
             error_details = f"{type(e).__name__}: {str(e)}"
             # Try to get traceback if possible, though might be complex to format cleanly here
             return {
                 "execution_output": f"Execution Error!\n{error_details}",
-                "execution_status": f"ERROR: {type(e).__name__}"
             }
 if __name__ == '__main__':

         }
         # No separate locals, exec will use restricted_globals as locals too
+        # Create usage collector for tracking OpenAI API calls in agent_tools
+        import builtins
+        builtins.__agent_usage_collector__ = []
         captured_output = io.StringIO()
         try:
             with contextlib.redirect_stdout(captured_output):
+                exec(python_code, restricted_globals)
             output_str = captured_output.getvalue()
+            # Extract collected usage info
+            usage_list = builtins.__agent_usage_collector__
+            aggregated_usage = {
+                'prompt_tokens': sum(u.get('prompt_tokens', 0) for u in usage_list),
+                'completion_tokens': sum(u.get('completion_tokens', 0) for u in usage_list),
+                'total_tokens': sum(u.get('total_tokens', 0) for u in usage_list)
+            }
             return {
                 "execution_output": output_str.strip() if output_str else "(No output printed by code)",
+                "execution_status": "SUCCESS",
+                "usage": aggregated_usage
             }
         except Exception as e:
             error_details = f"{type(e).__name__}: {str(e)}"
             # Try to get traceback if possible, though might be complex to format cleanly here
+            # Extract usage even on error (API calls may have occurred before failure)
+            usage_list = builtins.__agent_usage_collector__
+            aggregated_usage = {
+                'prompt_tokens': sum(u.get('prompt_tokens', 0) for u in usage_list),
+                'completion_tokens': sum(u.get('completion_tokens', 0) for u in usage_list),
+                'total_tokens': sum(u.get('total_tokens', 0) for u in usage_list)
+            }
             return {
                 "execution_output": f"Execution Error!\n{error_details}",
+                "execution_status": f"ERROR: {type(e).__name__}",
+                "usage": aggregated_usage
             }
 if __name__ == '__main__':

agents/generation_agent.py CHANGED Viewed

@@ -785,10 +785,22 @@ class GenerationAgent:
             # Make the API call
             response = self.client.chat.completions.create(**params)
             # Get the response content
             assistant_response_json_str = response.choices[0].message.content
             # Clean up the response - remove any code fence markers
             if assistant_response_json_str.startswith("```json"):
                 assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
@@ -796,20 +808,22 @@ class GenerationAgent:
                 assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
             if assistant_response_json_str.endswith("```"):
                 assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
             try:
                 # Parse the JSON response
                 parsed_response = json.loads(assistant_response_json_str)
                 # Validate the response has the required fields
                 if not all(k in parsed_response for k in ["thought", "python_code", "status"]):
                     print("GenerationAgent Error: Chat response JSON missing required keys.")
-                    return {"thought": "Error parsing Chat API response: Missing keys.", "python_code": "", "status": "ERROR"}
                 # Additional validation for AWAITING_DATA status
                 if parsed_response.get("status") == "AWAITING_DATA" and not ("intermediate_data_for_llm" in parsed_response.get("python_code", "") and "json.dumps" in parsed_response.get("python_code", "")):
                     print("GenerationAgent Warning: Status is AWAITING_DATA but python_code does not follow required format.")
                 return parsed_response
             except json.JSONDecodeError as e:

             # Make the API call
             response = self.client.chat.completions.create(**params)
+            # CAPTURE TOKEN USAGE
+            usage_info = {
+                'prompt_tokens': 0,
+                'completion_tokens': 0,
+                'total_tokens': 0
+            }
+            if hasattr(response, 'usage') and response.usage:
+                usage_info['prompt_tokens'] = response.usage.prompt_tokens
+                usage_info['completion_tokens'] = response.usage.completion_tokens
+                usage_info['total_tokens'] = response.usage.total_tokens
+                print(f"[GenerationAgent] Token usage - prompt: {usage_info['prompt_tokens']}, completion: {usage_info['completion_tokens']}, total: {usage_info['total_tokens']}")
             # Get the response content
             assistant_response_json_str = response.choices[0].message.content
             # Clean up the response - remove any code fence markers
             if assistant_response_json_str.startswith("```json"):
                 assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
                 assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
             if assistant_response_json_str.endswith("```"):
                 assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
             try:
                 # Parse the JSON response
                 parsed_response = json.loads(assistant_response_json_str)
                 # Validate the response has the required fields
                 if not all(k in parsed_response for k in ["thought", "python_code", "status"]):
                     print("GenerationAgent Error: Chat response JSON missing required keys.")
+                    return {"thought": "Error parsing Chat API response: Missing keys.", "python_code": "", "status": "ERROR", "usage": usage_info}
                 # Additional validation for AWAITING_DATA status
                 if parsed_response.get("status") == "AWAITING_DATA" and not ("intermediate_data_for_llm" in parsed_response.get("python_code", "") and "json.dumps" in parsed_response.get("python_code", "")):
                     print("GenerationAgent Warning: Status is AWAITING_DATA but python_code does not follow required format.")
+                # Add usage info to response
+                parsed_response['usage'] = usage_info
                 return parsed_response
             except json.JSONDecodeError as e:

agents/manager_agent.py CHANGED Viewed

@@ -22,9 +22,17 @@ from agents.executor_agent import ExecutorAgent
 # POLLING_INTERVAL_S and MAX_POLLING_ATTEMPTS are removed, polling is handled by individual agents.
 class ManagerAgent:
-    def __init__(self, openai_api_key=None, openai_client: OpenAI = None, r_callback_fn=None):
         """
         Initialize the Manager Agent with OpenAI credentials and sub-agents.
         """
         if openai_client:
             self.client = openai_client
@@ -36,16 +44,27 @@ class ManagerAgent:
         # Storage for conversation history - list of dicts like [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]
         self.conversation_history = []
         # Storage for file information - dict like {"file_id": "...", "file_name": "...", "file_path": "..."}
         self.file_info = {}
         # Storage for pending literature confirmation
         self.pending_literature_confirmation = None
         self.pending_literature_query = None
         # R callback function for thoughts
         self.r_callback_fn = r_callback_fn
         # Initialize sub-agents
         try:
@@ -285,6 +304,11 @@ class ManagerAgent:
                     final_plan_for_turn = plan
                     current_plan_holder = plan
                     # Reset for next potential direct image analysis
                     image_file_id_for_analysis_step = None
@@ -312,6 +336,11 @@ class ManagerAgent:
                         review = self.supervisor_agent.review_code(code_to_execute, f"Reviewing plan: {plan.get('thought', '')}")
                         supervisor_status = review.get('safety_status', 'UNKNOWN_STATUS')
                         supervisor_feedback = review.get('safety_feedback', 'No feedback.')
                         if supervisor_status != "APPROVED_FOR_EXECUTION":
                             return f"Code execution blocked by supervisor: {supervisor_feedback}"
@@ -330,6 +359,11 @@ class ManagerAgent:
                         execution_result = self.executor_agent.execute_code(code_to_execute)
                         execution_output = execution_result.get("execution_output", "")
                         execution_status = execution_result.get("execution_status", "UNKNOWN")
                         if execution_status == "SUCCESS":
                             self._send_thought_to_r(f"Code execution successful.")
@@ -395,32 +429,128 @@ class ManagerAgent:
         self.literature_enabled = literature_enabled
         return self.process_single_query(user_query_text, conversation_history_from_r)
     def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
         """
         Processes a single query, suitable for calling from an external system like R/Shiny.
         Manages its own conversation history based on input.
         """
         print(f"[Manager.process_single_query] Received query: '{user_query_text[:100]}...'")
         if conversation_history_from_r is not None:
             # Overwrite or extend self.conversation_history. For simplicity, let's overwrite.
             # Ensure format matches: list of dicts like {"role": "user/assistant", "content": "..."}
             self.conversation_history = [dict(turn) for turn in conversation_history_from_r] # Ensure dicts
         # Add the current user query to the history for processing
         self.conversation_history.append({"role": "user", "content": user_query_text})
         # Initialize image tracking variables in case _process_turn fails
         is_image_response = False
         current_image_path = None
         try:
             # Process the query and get response with image information
             response_text, is_image_response, current_image_path = self._process_turn(user_query_text)
         except Exception as e:
             print(f"[Manager.process_single_query] Error in _process_turn: {str(e)}")
             response_text = f"I encountered an error processing your request: {str(e)}"
             is_image_response = False
             current_image_path = None
         # If an image was processed, format the response to include image information
         if is_image_response and current_image_path:

 # POLLING_INTERVAL_S and MAX_POLLING_ATTEMPTS are removed, polling is handled by individual agents.
 class ManagerAgent:
+    def __init__(self, openai_api_key=None, openai_client: OpenAI = None, r_callback_fn=None, supabase_client=None, user_id=None, hf_user_id=None):
         """
         Initialize the Manager Agent with OpenAI credentials and sub-agents.
+        Args:
+            openai_api_key: OpenAI API key
+            openai_client: Pre-initialized OpenAI client
+            r_callback_fn: Callback function for R integration
+            supabase_client: Supabase client for logging and quota tracking
+            user_id: UUID of user from Supabase users table
+            hf_user_id: Hugging Face user ID
         """
         if openai_client:
             self.client = openai_client
         # Storage for conversation history - list of dicts like [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]
         self.conversation_history = []
         # Storage for file information - dict like {"file_id": "...", "file_name": "...", "file_path": "..."}
         self.file_info = {}
         # Storage for pending literature confirmation
         self.pending_literature_confirmation = None
         self.pending_literature_query = None
         # R callback function for thoughts
         self.r_callback_fn = r_callback_fn
+        # Supabase client for logging and quota tracking
+        self.supabase_client = supabase_client
+        self.current_user_id = user_id
+        self.current_hf_user_id = hf_user_id
+        # Token tracking for current query
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+        self.model_name = "gpt-4o"
         # Initialize sub-agents
         try:
                     final_plan_for_turn = plan
                     current_plan_holder = plan
+                    # Aggregate token usage from GenerationAgent
+                    if 'usage' in plan:
+                        self._aggregate_token_usage(plan['usage'])
+                        print(f"[Manager] Aggregated GenerationAgent usage: {plan['usage'].get('total_tokens', 0)} tokens")
                     # Reset for next potential direct image analysis
                     image_file_id_for_analysis_step = None
                         review = self.supervisor_agent.review_code(code_to_execute, f"Reviewing plan: {plan.get('thought', '')}")
                         supervisor_status = review.get('safety_status', 'UNKNOWN_STATUS')
                         supervisor_feedback = review.get('safety_feedback', 'No feedback.')
+                        # Aggregate token usage from SupervisorAgent
+                        if 'usage' in review:
+                            self._aggregate_token_usage(review['usage'])
+                            print(f"[Manager] Aggregated SupervisorAgent usage: {review['usage'].get('total_tokens', 0)} tokens")
                         if supervisor_status != "APPROVED_FOR_EXECUTION":
                             return f"Code execution blocked by supervisor: {supervisor_feedback}"
                         execution_result = self.executor_agent.execute_code(code_to_execute)
                         execution_output = execution_result.get("execution_output", "")
                         execution_status = execution_result.get("execution_status", "UNKNOWN")
+                        # Aggregate token usage from ExecutorAgent (captures describe_image API calls)
+                        if 'usage' in execution_result:
+                            self._aggregate_token_usage(execution_result['usage'])
+                            print(f"[Manager] Aggregated ExecutorAgent usage: {execution_result['usage'].get('total_tokens', 0)} tokens")
                         if execution_status == "SUCCESS":
                             self._send_thought_to_r(f"Code execution successful.")
         self.literature_enabled = literature_enabled
         return self.process_single_query(user_query_text, conversation_history_from_r)
+    def set_user_context(self, user_id: str = None, hf_user_id: str = None):
+        """Set user context for quota tracking and logging"""
+        self.current_user_id = user_id
+        self.current_hf_user_id = hf_user_id
+        print(f"[Manager] Set user context: user_id={user_id}, hf_user_id={hf_user_id}")
+    def _check_quota_before_processing(self) -> tuple:
+        """
+        Check if user has sufficient quota before processing query
+        Returns: (has_quota: bool, remaining: int, error_message: str or None)
+        """
+        if not self.supabase_client or not self.supabase_client.is_enabled():
+            return (True, 999999, None)
+        if not self.current_hf_user_id:
+            return (False, 0, "User not authenticated")
+        try:
+            has_quota, remaining, used = self.supabase_client.check_quota(self.current_hf_user_id)
+            if not has_quota:
+                error_msg = f"Token quota exceeded. Used: {used}, Remaining: {remaining}. Please contact support to increase your quota."
+                return (False, remaining, error_msg)
+            return (True, remaining, None)
+        except Exception as e:
+            print(f"[Manager] Error checking quota: {e}")
+            return (True, 999999, None)  # Fail open
+    def _reset_token_tracking(self):
+        """Reset token counters for new query"""
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+    def _aggregate_token_usage(self, usage_dict: dict):
+        """Aggregate token usage from agent responses"""
+        if usage_dict:
+            self.last_prompt_tokens += usage_dict.get('prompt_tokens', 0)
+            self.last_completion_tokens += usage_dict.get('completion_tokens', 0)
+            self.last_total_tokens += usage_dict.get('total_tokens', 0)
     def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
         """
         Processes a single query, suitable for calling from an external system like R/Shiny.
         Manages its own conversation history based on input.
+        Includes quota checking and comprehensive logging.
         """
         print(f"[Manager.process_single_query] Received query: '{user_query_text[:100]}...'")
+        # Reset token tracking for new query
+        self._reset_token_tracking()
+        # Check quota BEFORE processing
+        has_quota, remaining, quota_error = self._check_quota_before_processing()
+        if not has_quota:
+            # Log the quota exceeded error
+            if self.supabase_client and self.supabase_client.is_enabled():
+                self.supabase_client.log_usage(
+                    hf_user_id=self.current_hf_user_id,
+                    user_id=self.current_user_id,
+                    query_text=user_query_text,
+                    error_message=quota_error,
+                    conversation_history=conversation_history_from_r
+                )
+            return quota_error
         if conversation_history_from_r is not None:
             # Overwrite or extend self.conversation_history. For simplicity, let's overwrite.
             # Ensure format matches: list of dicts like {"role": "user/assistant", "content": "..."}
             self.conversation_history = [dict(turn) for turn in conversation_history_from_r] # Ensure dicts
         # Add the current user query to the history for processing
         self.conversation_history.append({"role": "user", "content": user_query_text})
         # Initialize image tracking variables in case _process_turn fails
         is_image_response = False
         current_image_path = None
         try:
             # Process the query and get response with image information
             response_text, is_image_response, current_image_path = self._process_turn(user_query_text)
+            # IMMEDIATE LOGGING TO SUPABASE AFTER SUCCESSFUL PROCESSING
+            if self.supabase_client and self.supabase_client.is_enabled():
+                self.supabase_client.log_usage(
+                    hf_user_id=self.current_hf_user_id,
+                    user_id=self.current_user_id,
+                    query_text=user_query_text,
+                    prompt_tokens=self.last_prompt_tokens,
+                    completion_tokens=self.last_completion_tokens,
+                    total_tokens=self.last_total_tokens,
+                    model=self.model_name,
+                    response_text=response_text,
+                    error_message=None,
+                    conversation_history=self.conversation_history,
+                    is_image_response=is_image_response,
+                    image_path=current_image_path
+                )
+                # Update user's token usage
+                if self.last_total_tokens > 0:
+                    self.supabase_client.update_token_usage(self.current_hf_user_id, self.last_total_tokens)
+                    print(f"[Manager] Updated token usage: +{self.last_total_tokens} tokens")
         except Exception as e:
             print(f"[Manager.process_single_query] Error in _process_turn: {str(e)}")
             response_text = f"I encountered an error processing your request: {str(e)}"
             is_image_response = False
             current_image_path = None
+            # LOG ERROR TO SUPABASE
+            if self.supabase_client and self.supabase_client.is_enabled():
+                self.supabase_client.log_usage(
+                    hf_user_id=self.current_hf_user_id,
+                    user_id=self.current_user_id,
+                    query_text=user_query_text,
+                    prompt_tokens=self.last_prompt_tokens,
+                    completion_tokens=self.last_completion_tokens,
+                    total_tokens=self.last_total_tokens,
+                    model=self.model_name,
+                    error_message=str(e),
+                    conversation_history=self.conversation_history
+                )
         # If an image was processed, format the response to include image information
         if is_image_response and current_image_path:

agents/supervisor_agent.py CHANGED Viewed

@@ -136,14 +136,21 @@ class SupervisorAgent:
     def review_code(self, python_code: str, thought: str): # Removed client_openai from params
         print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
         if not python_code.strip():
              print("SupervisorAgent: No actual code provided for review. Approving as safe.")
-             return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": ""}
         if not self.client or not self.supervisor_assistant:
             print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
-            return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error."}
         thread = None # Initialize for the finally block
         try:
@@ -185,6 +192,13 @@ class SupervisorAgent:
                 attempts += 1
             # 6. Process Run Outcome
             if run.status == "completed":
                 # print(f"SupervisorAgent: Run {run.id} completed.")
                 messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
@@ -206,9 +220,10 @@ class SupervisorAgent:
                         if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
                             print("SupervisorAgent Error: LLM review JSON missing required keys.")
                             return {
-                                "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
                                 "safety_status": "REJECTED_NEEDS_REVISION",
-                                "user_facing_rejection_reason": "The code review process encountered an internal error."
                             }
                         # Validate safety_status value
                         if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
@@ -226,33 +241,38 @@ class SupervisorAgent:
                         elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
                              parsed_response["user_facing_rejection_reason"] = "Approved."
                         return parsed_response
                     except json.JSONDecodeError as e:
                         print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
                         return {
-                            "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
                             "safety_status": "REJECTED_NEEDS_REVISION",
-                            "user_facing_rejection_reason": "The code review result was unreadable."
                         }
                 else:
                     print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
                     return {
-                        "safety_feedback": "Internal Error: No content from supervisor assistant.",
                         "safety_status": "REJECTED_NEEDS_REVISION",
-                        "user_facing_rejection_reason": "The supervisor agent provided no response."
                     }
             else:
                 error_message = f"Review run failed or timed out. Status: {run.status}"
                 if run.last_error:
                     error_message += f" Last Error: {run.last_error.message}"
                 print(f"SupervisorAgent Error: {error_message}")
-                return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error."}
         except Exception as e:
             print(f"SupervisorAgent Error: General exception during review_code: {e}")
             return {
-                "safety_feedback": f"General exception in review_code: {e}",
                 "safety_status": "REJECTED_NEEDS_REVISION",
-                "user_facing_rejection_reason": "A general error occurred during code review."
             }
         finally:
             # 7. Delete Thread

     def review_code(self, python_code: str, thought: str): # Removed client_openai from params
         print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
+        # Initialize usage tracking
+        usage_info = {
+            'prompt_tokens': 0,
+            'completion_tokens': 0,
+            'total_tokens': 0
+        }
         if not python_code.strip():
              print("SupervisorAgent: No actual code provided for review. Approving as safe.")
+             return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": "", "usage": usage_info}
         if not self.client or not self.supervisor_assistant:
             print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
+            return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error.", "usage": usage_info}
         thread = None # Initialize for the finally block
         try:
                 attempts += 1
             # 6. Process Run Outcome
+            # CAPTURE TOKEN USAGE from Run
+            if hasattr(run, 'usage') and run.usage:
+                usage_info['prompt_tokens'] = getattr(run.usage, 'prompt_tokens', 0)
+                usage_info['completion_tokens'] = getattr(run.usage, 'completion_tokens', 0)
+                usage_info['total_tokens'] = getattr(run.usage, 'total_tokens', 0)
+                print(f"[SupervisorAgent] Token usage - total: {usage_info['total_tokens']}")
             if run.status == "completed":
                 # print(f"SupervisorAgent: Run {run.id} completed.")
                 messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
                         if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
                             print("SupervisorAgent Error: LLM review JSON missing required keys.")
                             return {
+                                "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
                                 "safety_status": "REJECTED_NEEDS_REVISION",
+                                "user_facing_rejection_reason": "The code review process encountered an internal error.",
+                                "usage": usage_info
                             }
                         # Validate safety_status value
                         if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
                         elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
                              parsed_response["user_facing_rejection_reason"] = "Approved."
+                        # Add usage info to response
+                        parsed_response['usage'] = usage_info
                         return parsed_response
                     except json.JSONDecodeError as e:
                         print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
                         return {
+                            "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
                             "safety_status": "REJECTED_NEEDS_REVISION",
+                            "user_facing_rejection_reason": "The code review result was unreadable.",
+                            "usage": usage_info
                         }
                 else:
                     print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
                     return {
+                        "safety_feedback": "Internal Error: No content from supervisor assistant.",
                         "safety_status": "REJECTED_NEEDS_REVISION",
+                        "user_facing_rejection_reason": "The supervisor agent provided no response.",
+                        "usage": usage_info
                     }
             else:
                 error_message = f"Review run failed or timed out. Status: {run.status}"
                 if run.last_error:
                     error_message += f" Last Error: {run.last_error.message}"
                 print(f"SupervisorAgent Error: {error_message}")
+                return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error.", "usage": usage_info}
         except Exception as e:
             print(f"SupervisorAgent Error: General exception during review_code: {e}")
             return {
+                "safety_feedback": f"General exception in review_code: {e}",
                 "safety_status": "REJECTED_NEEDS_REVISION",
+                "user_facing_rejection_reason": "A general error occurred during code review.",
+                "usage": usage_info
             }
         finally:
             # 7. Delete Thread

auth/hf_oauth.R ADDED Viewed

	@@ -0,0 +1,164 @@

+# auth/hf_oauth.R
+# Hugging Face OAuth authentication for TaijiChat
+library(httr2)
+library(jsonlite)
+# OAuth Configuration
+HF_OAUTH_AUTHORIZE_URL <- "https://huggingface.co/oauth/authorize"
+HF_OAUTH_TOKEN_URL <- "https://huggingface.co/oauth/token"
+HF_USER_INFO_URL <- "https://huggingface.co/api/whoami-v2"
+# Initialize OAuth configuration
+initialize_oauth <- function() {
+  oauth_config <- list(
+    client_id = Sys.getenv("OAUTH_CLIENT_ID"),
+    client_secret = Sys.getenv("OAUTH_CLIENT_SECRET"),
+    scopes = Sys.getenv("OAUTH_SCOPES", "openid profile email"),
+    enabled = FALSE
+  )
+  if (oauth_config$client_id != "" && oauth_config$client_secret != "") {
+    oauth_config$enabled <- TRUE
+    print("OAuth: Hugging Face OAuth is enabled")
+  } else {
+    warning("OAuth: OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET not set. Authentication disabled.")
+  }
+  return(oauth_config)
+}
+# Generate OAuth authorization URL
+get_authorization_url <- function(oauth_config, redirect_uri, state) {
+  if (!oauth_config$enabled) {
+    return(NULL)
+  }
+  params <- list(
+    client_id = oauth_config$client_id,
+    redirect_uri = redirect_uri,
+    scope = oauth_config$scopes,
+    state = state,
+    response_type = "code"
+  )
+  # Build query string
+  query_string <- paste(
+    sapply(names(params), function(name) {
+      paste0(name, "=", URLencode(params[[name]], reserved = TRUE))
+    }),
+    collapse = "&"
+  )
+  auth_url <- paste0(HF_OAUTH_AUTHORIZE_URL, "?", query_string)
+  return(auth_url)
+}
+# Exchange authorization code for access token
+exchange_code_for_token <- function(oauth_config, code, redirect_uri) {
+  if (!oauth_config$enabled) {
+    return(NULL)
+  }
+  tryCatch({
+    # Make token request
+    response <- httr2::request(HF_OAUTH_TOKEN_URL) %>%
+      httr2::req_method("POST") %>%
+      httr2::req_body_form(
+        client_id = oauth_config$client_id,
+        client_secret = oauth_config$client_secret,
+        code = code,
+        redirect_uri = redirect_uri,
+        grant_type = "authorization_code"
+      ) %>%
+      httr2::req_perform()
+    # Parse response
+    token_data <- httr2::resp_body_json(response)
+    if (!is.null(token_data$access_token)) {
+      print("OAuth: Successfully obtained access token")
+      return(list(
+        access_token = token_data$access_token,
+        token_type = token_data$token_type,
+        scope = token_data$scope
+      ))
+    } else {
+      warning("OAuth: Token response missing access_token")
+      return(NULL)
+    }
+  }, error = function(e) {
+    warning(paste("OAuth: Error exchanging code for token -", e$message))
+    return(NULL)
+  })
+}
+# Get user info from Hugging Face
+get_user_info <- function(access_token) {
+  if (is.null(access_token)) {
+    return(NULL)
+  }
+  tryCatch({
+    response <- httr2::request(HF_USER_INFO_URL) %>%
+      httr2::req_headers(
+        Authorization = paste("Bearer", access_token)
+      ) %>%
+      httr2::req_perform()
+    user_info <- httr2::resp_body_json(response)
+    if (!is.null(user_info$id)) {
+      print(paste("OAuth: Retrieved user info for", user_info$name))
+      return(list(
+        hf_user_id = user_info$id,
+        hf_username = user_info$name,
+        email = user_info$email,
+        avatar_url = user_info$avatarUrl,
+        is_pro = user_info$isPro %||% FALSE
+      ))
+    } else {
+      warning("OAuth: User info response missing required fields")
+      return(NULL)
+    }
+  }, error = function(e) {
+    warning(paste("OAuth: Error getting user info -", e$message))
+    return(NULL)
+  })
+}
+# Helper function for NULL coalescing
+`%||%` <- function(a, b) if (is.null(a)) b else a
+# Validate OAuth state to prevent CSRF attacks
+generate_oauth_state <- function() {
+  paste0(sample(c(letters, LETTERS, 0:9), 32, replace = TRUE), collapse = "")
+}
+validate_oauth_state <- function(received_state, stored_state) {
+  if (is.null(received_state) || is.null(stored_state)) {
+    return(FALSE)
+  }
+  return(received_state == stored_state)
+}
+# Check if user is authenticated in session
+is_authenticated <- function(session) {
+  user_data <- session$userData$hf_user
+  return(!is.null(user_data) && !is.null(user_data$hf_user_id))
+}
+# Get current authenticated user from session
+get_current_user <- function(session) {
+  if (is_authenticated(session)) {
+    return(session$userData$hf_user)
+  }
+  return(NULL)
+}
+# Clear authentication session
+logout_user <- function(session) {
+  session$userData$hf_user <- NULL
+  session$userData$access_token <- NULL
+  print("OAuth: User logged out")
+}

codebase_analysis.md DELETED Viewed

@@ -1,153 +0,0 @@
-# How to Run the Application
-To run this R Shiny application, you will need R and the RStudio IDE (recommended) or another R environment installed on your system. You will also need the `shiny` package and other packages listed as dependencies (`readxl`, `DT`, `dplyr`, `shinythemes`).
-**Steps:**
-1.  **Install R and RStudio:** If you haven't already, download and install R from [CRAN](https://cran.r-project.org/) and RStudio Desktop from [Posit](https://posit.co/download/rstudio-desktop/).
-2.  **Install Required R Packages:** Open R or RStudio and run the following commands in the R console:
-    ```R
-    install.packages(c("shiny", "readxl", "DT", "dplyr", "shinythemes"))
-    ```
-3.  **Set Working Directory:** Navigate your R session's working directory to the root folder of this Shiny application (the folder containing `server.R` and `ui.R`). In RStudio, you can do this by opening either `server.R` or `ui.R` and then going to `Session > Set Working Directory > To Source File Location`.
-4.  **Run the App:** In the R console, execute the following command:
-    ```R
-    shiny::runApp()
-    ```
-    Alternatively, if you have `server.R` or `ui.R` open in RStudio, a "Run App" button will typically appear at the top of the editor pane, which you can click.
-This will launch the application in your default web browser.
----
-# Codebase Analysis: TaijiChat Shiny Application
-## Overview
-The codebase consists of an R Shiny application designed to explore and visualize bioinformatics data related to T cell states and transcription factors (TFs). It appears to be a companion tool for a research publication, aiming to make complex datasets accessible. The application is structured into two main files: `server.R` (server-side logic) and `ui.R` (user interface definition). Data is primarily loaded from Excel files and images stored in a `www/` subdirectory.
-## File Breakdown
-### `server.R` (Server Logic)
-**Key Functionalities:**
-1.  **Data Loading and Preprocessing:**
-    *   Loads multiple Excel datasets for TF PageRank scores, TF wave analysis, TF-TF correlations, TF communities, and multi-omics data. These files are located in `www/tablePagerank/`, `www/waveanalysis/`, `www/TFcorintextrm/`, and `www/tfcommunities/`.
-    *   `new_read_excel_file()`: Reads and transposes Excel files, setting "Regulator Names" from the first column and using the original first row as new column headers.
-    *   `new_filter_data()`: Filters transposed dataframes by column names based on user search input (supports multiple comma-separated, case-insensitive keywords).
-2.  **TF Catalog Data Display (Repetitive Structure):**
-    *   Handles data for Overall TF PageRank, Naive, TE, MP, TCM, TEM, TRM, TEXprog, TEXeff-like, and TEXterm cell states.
-    *   For each dataset:
-        *   Uses `reactiveVal` for column pagination state (4 columns per page).
-        *   `observeEvent`s for "next" and "previous" button functionality.
-        *   Reactive expressions filter data by search term and select columns for the current page.
-        *   Dynamically inserts a styled "Cell state data" row with "TF activity score" (at row index 2 for main PageRank table, row index 0 for others).
-        *   `renderDT` outputs `DT::datatable` with custom options (fixed 45 rows, no search box, JS `rowCallback` to highlight the "TF activity score" row).
-3.  **TF Wave Analysis:**
-    *   Loads TF wave data from `www/waveanalysis/searchtfwaves.xlsx`.
-    *   Allows users to search for a TF and view its associated wave(s) in a transposed table.
-4.  **TF-TF Correlation in TRM/TEXterm:**
-    *   Loads data from `www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`.
-    *   Allows TF search.
-    *   Renders a clickable list of TFs (`actionLink`s).
-    *   Displays tabular data and an associated image ("TF Merged Graph Path") for the selected/searched TF.
-5.  **TF Communities:**
-    *   Loads data from `www/tfcommunities/trmcommunities.xlsx` and `www/tfcommunities/texcommunities.xlsx`.
-    *   Displays them as simple `DT::datatable` objects.
-6.  **Multi-omics Data Table:**
-    *   Loads data from `www/multi-omicsdata.xlsx`.
-    *   Renders as a `DT::datatable`, creating hyperlinks in the "Author" column from a "DOI" column, removing empty columns, and enabling scrolling.
-7.  **Navigation & Other:**
-    *   `observeEvent`s for UI element clicks (e.g., `input$c1_link`) to navigate tabs via `updateNavbarPage`.
-    *   Redirects to a bioRxiv paper URL via `session$sendCustomMessage`.
-    *   Contains significant commented-out code (older logic).
-**Libraries Used:** `shiny`, `readxl`, `DT`, `dplyr`.
-### `ui.R` (User Interface)
-**Key Functionalities:**
-1.  **Overall Structure:**
-    *   Uses `shinytheme("flatly")`.
-    *   `navbarPage` for the main tabbed interface.
-    *   Custom CSS for fonts (`Arial`).
-    *   JavaScript for URL redirection and a modal dialog.
-2.  **Home Tab:**
-    *   Project/study description.
-    *   Layout with an image (`homedesc.png`) featuring clickable `actionLink`s for navigation.
-    *   "Read Now" button linking to the research paper.
-    *   Footer with lab links and logos.
-3.  **TF Catalog (`navbarMenu`):**
-    *   **"Search TF Scores" Tab:**
-        *   Explanatory text, image (`tfcat/onlycellstates.png`).
-        *   Search input (`search_input`), column pagination buttons (`prev_btn`, `next_btn`), `DTOutput("table")`.
-    *   **"Cell State Specific TF Catalog" Tab (`navlistPanel`):**
-        *   Sub-tabs for Naive, TE, MP, Tcm, Tem, Trm, TEXprog, TEXeff-like, TEXterm.
-        *   Each sub-tab has a consistent layout: header, text, a specific bubble plot image (from `www/bubbleplots/`), search input, pagination buttons, and `DTOutput`.
-    *   **"Multi-State TFs" Tab:** Displays a heatmap image (`tfcat/multistatesheatmap.png`).
-4.  **TF Wave Analysis (`navbarMenu`):**
-    *   **"Overview" Tab:**
-        *   Explanatory text, overview image (`tfwaveanal.png`).
-        *   Clickable images (`waveanalysis/c1.jpg` to `c6.jpg`, linked via `c1_link` etc.) for navigation to detail tabs.
-        *   Search input (`search_input_wave`), `DTOutput("table_wave")`.
-    *   **Individual Wave Tabs ("Wave 1" to "Wave 7"):**
-        *   Each tab displays the wave image, a GO KEGG result image, and "Ranked Text" image(s) from `www/waveanalysis/` and `www/waveanalysis/txtJPG/`.
-5.  **TF Network Analysis (`navbarMenu`):**
-    *   **"Search TF-TF correlation in TRM/TEXterm" Tab:**
-        *   Methodology description, image (`networkanalysis/tfcorrdesc.png`).
-        *   `sidebarLayout` with search input (`search`), button (`search_btn`), `tableOutput("gene_list_table")` for available TFs.
-        *   `mainPanel` with `tableOutput("result_table")`, legend, and `uiOutput("image_gallery")`.
-        *   Footer with citations.
-    *   **"TRM/TEXterm TF communities" Tab:**
-        *   Descriptive text, images (`networkanalysis/community.jpg`, `networkanalysis/trmtexcom.png`, `networkanalysis/tfcompathway.png`).
-        *   Two `DTOutput`s (`trmcom`, `texcom`) for community tables.
-        *   Footer with citations.
-6.  **Multi-omics Data Tab:**
-    *   Header, text, `dataTableOutput("multiomicsdatatable")`.
-7.  **Global Header Elements:**
-    *   Defines a modal dialog and associated JavaScript (triggered by an element `#csdescrip_link`, not explicitly found in the provided UI snippets for the main content area).
-    *   JavaScript to send a Shiny input upon `#c1_link` click.
-**Libraries Used:** `shiny`, `shinythemes`, `DT`.
-## General Architecture and Observations
-*   **Purpose:** The application serves as an interactive data exploration tool, likely accompanying a scientific publication on T cell biology.
-*   **Data Source:** Heavily reliant on pre-processed data stored in Excel files and pre-generated images within the `www/` directory. This indicates that the core data processing happens outside this Shiny app.
-*   **Repetitive Code Structure:** Significant code duplication exists in both `server.R` and `ui.R`.
-    *   In `server.R`, the logic for loading, filtering, paginating, and rendering tables for the nine different cell state TF scores is nearly identical.
-    *   In `ui.R`, the layout for each of these cell state specific tabs, and also for each of the seven individual TF wave analysis tabs, is highly repetitive.
-    *   This repetition suggests a strong opportunity for refactoring by creating reusable R functions or Shiny modules to generate these UI and server components dynamically.
-*   **User Interface (UI):** The UI is well-structured with a `navbarPage` and logical tab groupings. It provides good contextual information (descriptions, explanations of scores/plots) for users.
-*   **Interactivity:**
-    *   Search functionality for TFs/regulators across various datasets.
-    *   Custom column-based pagination for wide tables.
-    *   Clickable images and links for navigation between sections.
-    *   Dynamic display of tables and images based on user selections.
-*   **Modularity (Potential):** While not heavily modularized currently due to repetition, the distinct analytical sections (TF Catalog, Wave Analysis, Network Analysis) could be prime candidates for separation into modules if the application were to be expanded or refactored.
-*   **Static Content:** A significant portion of the content, especially in the Wave Analysis and Network Analysis tabs, involves displaying pre-generated static images (plots, pathway results).
-*   **Code Graveyard:** Both files end with a "CODE GRAVEYARD" comment, indicating that there's older, unused code present.
-## Potential Areas for Improvement/Refactoring
-*   **Modularization:** Encapsulate the repetitive UI and server logic for cell-state specific tables and individual wave pages into functions or Shiny modules to reduce code duplication and improve maintainability.
-*   **Dynamic Image Generation (Optional):** If source data and plotting scripts were available, some images currently served statically could potentially be generated dynamically, offering more flexibility. However, for a publication companion app, static images are often sufficient and ensure reproducibility of figures.
-*   **Consolidate Helper Functions:** General utility functions (like `new_read_excel_file` and `new_filter_data`) are well-defined but ensure they are used consistently.
-*   **CSS Styling:** Centralize CSS styling rather than relying heavily on inline `style` attributes within `tags$div` and other elements, potentially using a separate CSS file.
-*   **Modal Trigger:** Clarify or ensure the `#csdescrip_link` element, which triggers the global modal, is present and functional in the UI.
-This analysis provides a snapshot of the codebase's structure, functionality, and potential areas for future development or refinement.

database_schema.sql ADDED Viewed

	@@ -0,0 +1,72 @@

+-- Supabase Database Schema for TaijiChat
+-- Execute this SQL in your Supabase project to create the required tables
+-- Users table
+-- Stores user information and token quota
+CREATE TABLE IF NOT EXISTS users (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  hf_user_id TEXT UNIQUE NOT NULL,
+  hf_username TEXT NOT NULL,
+  email TEXT,
+  token_quota INTEGER DEFAULT 100000,
+  tokens_used INTEGER DEFAULT 0,
+  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+  last_login TIMESTAMP WITH TIME ZONE,
+  is_active BOOLEAN DEFAULT TRUE
+);
+-- Usage logs table
+-- Stores comprehensive logs of every query with token usage and errors
+CREATE TABLE IF NOT EXISTS usage_logs (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  user_id UUID REFERENCES users(id) ON DELETE SET NULL,
+  hf_user_id TEXT NOT NULL,
+  timestamp TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+  query_text TEXT NOT NULL,
+  prompt_tokens INTEGER DEFAULT 0,
+  completion_tokens INTEGER DEFAULT 0,
+  total_tokens INTEGER DEFAULT 0,
+  model TEXT,
+  response_text TEXT,
+  error_message TEXT,
+  conversation_history JSONB,
+  is_image_response BOOLEAN DEFAULT FALSE,
+  image_path TEXT
+);
+-- Create indexes for performance
+CREATE INDEX IF NOT EXISTS idx_users_hf_id ON users(hf_user_id);
+CREATE INDEX IF NOT EXISTS idx_users_active ON users(is_active);
+CREATE INDEX IF NOT EXISTS idx_logs_user_id ON usage_logs(user_id);
+CREATE INDEX IF NOT EXISTS idx_logs_hf_user_id ON usage_logs(hf_user_id);
+CREATE INDEX IF NOT EXISTS idx_logs_timestamp ON usage_logs(timestamp DESC);
+CREATE INDEX IF NOT EXISTS idx_logs_error ON usage_logs(error_message) WHERE error_message IS NOT NULL;
+-- Create a view for user statistics
+CREATE OR REPLACE VIEW user_stats AS
+SELECT
+  u.id,
+  u.hf_user_id,
+  u.hf_username,
+  u.token_quota,
+  u.tokens_used,
+  u.token_quota - u.tokens_used AS tokens_remaining,
+  ROUND(100.0 * u.tokens_used / NULLIF(u.token_quota, 0), 2) AS usage_percentage,
+  COUNT(l.id) AS total_queries,
+  COUNT(CASE WHEN l.error_message IS NOT NULL THEN 1 END) AS error_count,
+  MAX(l.timestamp) AS last_query_time
+FROM users u
+LEFT JOIN usage_logs l ON u.id = l.user_id
+GROUP BY u.id, u.hf_user_id, u.hf_username, u.token_quota, u.tokens_used;
+-- Enable Row Level Security (RLS) - Optional, uncomment if needed
+-- ALTER TABLE users ENABLE ROW LEVEL SECURITY;
+-- ALTER TABLE usage_logs ENABLE ROW LEVEL SECURITY;
+-- Create policies for RLS (if needed)
+-- CREATE POLICY "Users can view own data" ON users FOR SELECT USING (hf_user_id = auth.jwt() ->> 'sub');
+-- CREATE POLICY "Users can view own logs" ON usage_logs FOR SELECT USING (hf_user_id = auth.jwt() ->> 'sub');
+COMMENT ON TABLE users IS 'Stores user authentication and token quota information';
+COMMENT ON TABLE usage_logs IS 'Logs every query with token usage, response, and errors';
+COMMENT ON VIEW user_stats IS 'Provides aggregated statistics for each user';

plan_temp.txt DELETED Viewed

@@ -1,30 +0,0 @@
-i dont think that's reasonable. here's my plan and you can compare current agents against the plan. Correct current implementation to align with my plan:
-For every query, the generation agent go through the steps:
-if a dataset, an image, or a paper is provided, add them when creating chat completion. If not, proceed to step 1.
-1. analyze query
-2. analyze the conversation history if there's any
-3. analyze images, paper, data according to the plan if there's any provided with chat completion.
-4. analyze the error from previous attempt is there's any
-5. read the paper description short version to understand what the paper is about
-6. decide whether the user query can be answered directly or need more information from the paper; if yes, read it
-7. read the tools documentation
-8. decide which tools can be helpful when answering the query; if there are any, prepare the list of tools going to be used
-9. read the data documentation
-10. decide which datasets are relevant to the user query; if there are any, prepare the list of datasets going to be used
-11. decide whether the user query can be solved by paper or tools or data or a combnation of them, if not, prepare a signal NEED_CODING = TRUE but dont send it yet. if not move to the next step
-12. decide whether the user query is about image(s). if so, prepare a list of images needed.
-13. put everything together to make a plan
-- this process of thinking must be included in generation agent's LLM's output. it will be used to
-Supervisor agent reviews the plan, focusing on the code and check for suspicious, malicious behavior. only common packages import are allowed
-executor agent executes the plan if the plan contains tool execution or code
-manager records everything from all LLMs and users, and deem whether the user's query can be considered as answered. Note that if agents only propose a plan but the results are not gathered yet it cannot be consider as a proper answer - as in most cases where generation agent propose a plan in iteration 1. if the manager agent deems that a plan is proposed, but results not collected / plan not executed and there's no error from the LLM, then manager agent tells generation agent to initialize a different chat completion with the images, datasets requested by generation's plan. This attempt instructed by manager will be different from a normal attempt, it does not count to the allowed attempt count.
-if an error occurs in any stage, the error must be reported to the manager. the manager will record all the errors. once an error is detected, another attempt will start. and we go back to the generation agent step again. there will be 3 attempts allowed
-tell me whether you think my plan is clear and reasonable. if there any part missing or problematic
-if not, proceed to implementation

requirements.txt CHANGED Viewed

@@ -15,7 +15,10 @@ feedparser
 tqdm
 pydantic
 pillow
 # shinyjs # This is an R package, should be installed via install.packages() in R
 # R package dependencies (ensure these are installed in your R environment)
-# digest # Used for caching in R/caching.R

 tqdm
 pydantic
 pillow
+supabase>=2.0.0
+python-dotenv>=1.0.0
 # shinyjs # This is an R package, should be installed via install.packages() in R
 # R package dependencies (ensure these are installed in your R environment)
+# digest # Used for caching in R/caching.R
+# httr2 # Required for OAuth authentication

server.R CHANGED Viewed

@@ -7,12 +7,19 @@ library(dplyr)
 # Source the warning overlay and long operations code
 source("warning_overlay.R", local = TRUE)
 source("long_operations.R", local = TRUE)
 # setwd("/Users/audrey/Downloads/research/ckweb/Tcellstates")
 # Define server logic
 function(input, output, session) {
   # --- START: TaijiChat R Callback for Python Agent Thoughts ---
   python_agent_thought_callback <- function(thought_message_from_python) {
     # Attempt to explicitly convert to R character and clean up
@@ -112,9 +119,14 @@ if 'agents.manager_agent' in sys.modules:
       # Module is available, now try to instantiate the agent
       if (!is.null(py_openai_client_instance)) {
           tryCatch({
               agent_inst <- current_manager_agent_module$ManagerAgent(
                 openai_client = py_openai_client_instance,
-                r_callback_fn = python_agent_thought_callback # Pass the R callback here
               )
               rv_agent_instance(agent_inst)
               print("TaijiChat: Python ManagerAgent instance created in server.R using pre-initialized client and R callback.")
@@ -124,9 +136,14 @@ if 'agents.manager_agent' in sys.modules:
           })
       } else if (!is.null(api_key_val)) { # Try with API key if client object failed but key exists
           tryCatch({
               agent_inst <- current_manager_agent_module$ManagerAgent(
                 openai_api_key = api_key_val,
-                r_callback_fn = python_agent_thought_callback # Pass the R callback here
               )
               rv_agent_instance(agent_inst)
               print("TaijiChat: Python ManagerAgent instance created in server.R with API key and R callback (client to be init by Python).")
@@ -145,7 +162,13 @@ if 'agents.manager_agent' in sys.modules:
       print("TaijiChat: agents.manager_agent module is NULL after import attempt. Agent not created.")
   }
   # --- END: TaijiChat Agent Initialization ---
   # Server logic for home tab
   output$home <- renderText({
     "Welcome to the Home page"
@@ -2078,16 +2101,41 @@ if 'agents.manager_agent' in sys.modules:
   chat_history <- reactiveVal(list()) # Stores list of lists: list(role="user/assistant", content="message")
   observeEvent(input$user_chat_message, {
-    req(input$user_chat_message)
     user_message_text <- trimws(input$user_chat_message)
     print(paste("TaijiChat: Received user_chat_message -", user_message_text))
     if (nzchar(user_message_text)) {
       current_hist <- chat_history()
       updated_hist_user <- append(current_hist, list(list(role = "user", content = user_message_text)))
       chat_history(updated_hist_user)
-      agent_instance_val <- rv_agent_instance()
       if (!is.null(agent_instance_val)) {
         # Ensure history is a list of R named lists, then r_to_py will convert to list of Python dicts

 # Source the warning overlay and long operations code
 source("warning_overlay.R", local = TRUE)
 source("long_operations.R", local = TRUE)
+source("auth/hf_oauth.R", local = TRUE)
+source("utils/supabase_r.R", local = TRUE)
 # setwd("/Users/audrey/Downloads/research/ckweb/Tcellstates")
 # Define server logic
 function(input, output, session) {
+  # --- START: OAuth and Supabase Initialization ---
+  oauth_config <- initialize_oauth()
+  supabase_client <- initialize_supabase()
+  # --- END: OAuth and Supabase Initialization ---
   # --- START: TaijiChat R Callback for Python Agent Thoughts ---
   python_agent_thought_callback <- function(thought_message_from_python) {
     # Attempt to explicitly convert to R character and clean up
       # Module is available, now try to instantiate the agent
       if (!is.null(py_openai_client_instance)) {
           tryCatch({
+              supabase_py_client <- if (!is.null(supabase_client)) supabase_client else NULL
               agent_inst <- current_manager_agent_module$ManagerAgent(
                 openai_client = py_openai_client_instance,
+                r_callback_fn = python_agent_thought_callback,
+                supabase_client = supabase_py_client,
+                user_id = NULL,
+                hf_user_id = NULL
               )
               rv_agent_instance(agent_inst)
               print("TaijiChat: Python ManagerAgent instance created in server.R using pre-initialized client and R callback.")
           })
       } else if (!is.null(api_key_val)) { # Try with API key if client object failed but key exists
           tryCatch({
+              supabase_py_client <- if (!is.null(supabase_client)) supabase_client else NULL
               agent_inst <- current_manager_agent_module$ManagerAgent(
                 openai_api_key = api_key_val,
+                r_callback_fn = python_agent_thought_callback,
+                supabase_client = supabase_py_client,
+                user_id = NULL,
+                hf_user_id = NULL
               )
               rv_agent_instance(agent_inst)
               print("TaijiChat: Python ManagerAgent instance created in server.R with API key and R callback (client to be init by Python).")
       print("TaijiChat: agents.manager_agent module is NULL after import attempt. Agent not created.")
   }
   # --- END: TaijiChat Agent Initialization ---
+  # --- START: OAuth Callback Handler ---
+  # Note: OAuth flow will be fully implemented when ui.R login UI is added
+  # This handler processes OAuth callback and creates/retrieves user in Supabase
+  # For now, this is a placeholder for future OAuth integration
+  # --- END: OAuth Callback Handler ---
   # Server logic for home tab
   output$home <- renderText({
     "Welcome to the Home page"
   chat_history <- reactiveVal(list()) # Stores list of lists: list(role="user/assistant", content="message")
   observeEvent(input$user_chat_message, {
+    req(input$user_chat_message)
     user_message_text <- trimws(input$user_chat_message)
     print(paste("TaijiChat: Received user_chat_message -", user_message_text))
     if (nzchar(user_message_text)) {
+      # Check authentication (implement OAuth later in ui.R)
+      # For now, system works without auth, but logs as "anonymous"
+      current_user <- session$userData$hf_user
+      hf_user_id <- if (!is.null(current_user)) current_user$hf_user_id else "anonymous"
+      # Check quota before processing
+      if (!is.null(supabase_client) && !is.null(current_user)) {
+        quota_result <- check_user_quota(supabase_client, hf_user_id)
+        if (!quota_result$has_quota) {
+          session$sendCustomMessage(type = "agent_response", message = list(
+            text = paste("Token quota exceeded. Used:", quota_result$tokens_used, "Remaining: 0")
+          ))
+          return()
+        }
+      }
       current_hist <- chat_history()
       updated_hist_user <- append(current_hist, list(list(role = "user", content = user_message_text)))
       chat_history(updated_hist_user)
+      agent_instance_val <- rv_agent_instance()
+      # Set user context in agent
+      if (!is.null(agent_instance_val) && !is.null(current_user)) {
+        supabase_user <- session$userData$supabase_user
+        agent_instance_val$set_user_context(
+          user_id = supabase_user$id,
+          hf_user_id = hf_user_id
+        )
+      }
       if (!is.null(agent_instance_val)) {
         # Ensure history is a list of R named lists, then r_to_py will convert to list of Python dicts

tools/agent_tools.py CHANGED Viewed

@@ -1064,10 +1064,22 @@ def describe_image(file_id: str, api_key: str = None) -> str:
             max_tokens=1000,
             temperature=0.2  # Lower temperature for more accurate descriptions
         )
         # Extract the description from the response
         description = response.choices[0].message.content
         return description
     except Exception as e:

             max_tokens=1000,
             temperature=0.2  # Lower temperature for more accurate descriptions
         )
+        # Capture token usage
+        if hasattr(response, 'usage') and response.usage:
+            usage_info = {
+                'prompt_tokens': response.usage.prompt_tokens,
+                'completion_tokens': response.usage.completion_tokens,
+                'total_tokens': response.usage.total_tokens
+            }
+            # Store usage in global collector if available (set by ExecutorAgent)
+            import builtins
+            if hasattr(builtins, '__agent_usage_collector__'):
+                builtins.__agent_usage_collector__.append(usage_info)
         # Extract the description from the response
         description = response.choices[0].message.content
         return description
     except Exception as e:

ui.R CHANGED Viewed

@@ -1466,6 +1466,26 @@ ui <- navbarPage(
   # Adding header with modal and JS
   header = tags$div(
     chatSidebarUI(),
     # Modal dialog to display the expanded image
     tags$div(
       id = "modalDialog",
@@ -1511,7 +1531,38 @@ ui <- navbarPage(
     });
   "
       )
-    )
   )
 )

   # Adding header with modal and JS
   header = tags$div(
     chatSidebarUI(),
+    # Login overlay for authentication
+    tags$div(
+      id = "authOverlay",
+      style = "display: none; position: fixed; top: 0; left: 0; width: 100%; height: 100%;
+              background-color: rgba(0, 0, 0, 0.7); z-index: 9999; justify-content: center; align-items: center;",
+      tags$div(
+        style = "background-color: white; padding: 40px; border-radius: 10px; text-align: center; max-width: 400px;",
+        tags$h2("Welcome to TaijiChat"),
+        tags$p("Please sign in with your Hugging Face account to continue."),
+        tags$br(),
+        tags$a(
+          href = "/login/huggingface",
+          class = "btn btn-primary btn-lg",
+          style = "background-color: #ff9d00; border-color: #ff9d00;",
+          "Sign in with Hugging Face"
+        )
+      )
+    ),
     # Modal dialog to display the expanded image
     tags$div(
       id = "modalDialog",
     });
   "
       )
+    ),
+    # Authentication overlay control JavaScript
+    tags$script(
+      HTML(
+        "
+    // Check authentication state and show/hide overlay
+    Shiny.addCustomMessageHandler('auth_state', function(message) {
+      var overlay = document.getElementById('authOverlay');
+      if (message.authenticated) {
+        overlay.style.display = 'none';
+      } else {
+        overlay.style.display = 'flex';
+      }
+    });
+    // Handle OAuth callback
+    $(document).ready(function() {
+      var urlParams = new URLSearchParams(window.location.search);
+      if (urlParams.has('code')) {
+        var code = urlParams.get('code');
+        Shiny.setInputValue('oauth_code', code, {priority: 'event'});
+        // Clean URL
+        window.history.replaceState({}, document.title, window.location.pathname);
+      }
+    });
+  "
+      )
+    ),
+    # Auth state output (server will populate this)
+    uiOutput("auth_state_ui")
   )
 )

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # utils/__init__.py
2	+ # Utility modules for TaijiChat

utils/supabase_client.py ADDED Viewed

	@@ -0,0 +1,259 @@

+# utils/supabase_client.py
+"""
+Supabase client for TaijiChat
+Handles user management, quota tracking, and usage logging
+"""
+import os
+from datetime import datetime
+from typing import Optional, Dict, List, Tuple
+from supabase import create_client, Client
+import json
+class SupabaseClient:
+    """Client for interacting with Supabase database"""
+    def __init__(self, supabase_url: Optional[str] = None, supabase_key: Optional[str] = None):
+        """
+        Initialize Supabase client
+        Args:
+            supabase_url: Supabase project URL (defaults to SUPABASE_URL env var)
+            supabase_key: Supabase service role key (defaults to SUPABASE_KEY env var)
+        """
+        self.supabase_url = supabase_url or os.getenv('SUPABASE_URL')
+        self.supabase_key = supabase_key or os.getenv('SUPABASE_KEY')
+        if not self.supabase_url or not self.supabase_key:
+            print("WARNING: Supabase credentials not configured. Logging will be disabled.")
+            self.client = None
+        else:
+            try:
+                self.client: Client = create_client(self.supabase_url, self.supabase_key)
+                print("SupabaseClient: Successfully initialized")
+            except Exception as e:
+                print(f"SupabaseClient: Failed to initialize - {e}")
+                self.client = None
+    def is_enabled(self) -> bool:
+        """Check if Supabase client is properly configured"""
+        return self.client is not None
+    def get_or_create_user(self, hf_user_id: str, hf_username: str, email: Optional[str] = None) -> Optional[Dict]:
+        """
+        Get existing user or create new user
+        Args:
+            hf_user_id: Hugging Face user ID
+            hf_username: Hugging Face username
+            email: User email (optional)
+        Returns:
+            User record dict or None if error
+        """
+        if not self.is_enabled():
+            return None
+        try:
+            # Check if user exists
+            response = self.client.table('users').select('*').eq('hf_user_id', hf_user_id).execute()
+            if response.data and len(response.data) > 0:
+                # User exists, update last_login
+                user = response.data[0]
+                self.client.table('users').update({
+                    'last_login': datetime.utcnow().isoformat()
+                }).eq('id', user['id']).execute()
+                print(f"SupabaseClient: User {hf_username} logged in")
+                return user
+            else:
+                # Create new user
+                new_user = {
+                    'hf_user_id': hf_user_id,
+                    'hf_username': hf_username,
+                    'email': email,
+                    'token_quota': 100000,  # Default quota
+                    'tokens_used': 0,
+                    'last_login': datetime.utcnow().isoformat(),
+                    'is_active': True
+                }
+                response = self.client.table('users').insert(new_user).execute()
+                if response.data:
+                    print(f"SupabaseClient: Created new user {hf_username}")
+                    return response.data[0]
+                else:
+                    print(f"SupabaseClient: Failed to create user - no data returned")
+                    return None
+        except Exception as e:
+            print(f"SupabaseClient: Error in get_or_create_user - {e}")
+            return None
+    def check_quota(self, hf_user_id: str) -> Tuple[bool, int, int]:
+        """
+        Check if user has tokens remaining in quota
+        Args:
+            hf_user_id: Hugging Face user ID
+        Returns:
+            Tuple of (has_quota: bool, tokens_remaining: int, tokens_used: int)
+        """
+        if not self.is_enabled():
+            return (True, 999999, 0)  # Allow unlimited if Supabase disabled
+        try:
+            response = self.client.table('users').select('token_quota, tokens_used').eq('hf_user_id', hf_user_id).execute()
+            if response.data and len(response.data) > 0:
+                user = response.data[0]
+                quota = user.get('token_quota', 100000)
+                used = user.get('tokens_used', 0)
+                remaining = quota - used
+                has_quota = remaining > 0
+                return (has_quota, remaining, used)
+            else:
+                print(f"SupabaseClient: User not found for quota check")
+                return (False, 0, 0)
+        except Exception as e:
+            print(f"SupabaseClient: Error checking quota - {e}")
+            return (True, 999999, 0)  # Fail open to allow usage if DB error
+    def update_token_usage(self, hf_user_id: str, tokens_to_add: int) -> bool:
+        """
+        Increment user's token usage
+        Args:
+            hf_user_id: Hugging Face user ID
+            tokens_to_add: Number of tokens to add to usage
+        Returns:
+            True if successful, False otherwise
+        """
+        if not self.is_enabled():
+            return True
+        try:
+            # Get current usage
+            response = self.client.table('users').select('id, tokens_used').eq('hf_user_id', hf_user_id).execute()
+            if response.data and len(response.data) > 0:
+                user = response.data[0]
+                new_usage = user.get('tokens_used', 0) + tokens_to_add
+                # Update usage
+                self.client.table('users').update({
+                    'tokens_used': new_usage
+                }).eq('id', user['id']).execute()
+                print(f"SupabaseClient: Updated token usage for user {hf_user_id} - added {tokens_to_add} tokens")
+                return True
+            else:
+                print(f"SupabaseClient: User not found for token update")
+                return False
+        except Exception as e:
+            print(f"SupabaseClient: Error updating token usage - {e}")
+            return False
+    def log_usage(self,
+                  hf_user_id: str,
+                  query_text: str,
+                  user_id: Optional[str] = None,
+                  prompt_tokens: int = 0,
+                  completion_tokens: int = 0,
+                  total_tokens: int = 0,
+                  model: Optional[str] = None,
+                  response_text: Optional[str] = None,
+                  error_message: Optional[str] = None,
+                  conversation_history: Optional[List[Dict]] = None,
+                  is_image_response: bool = False,
+                  image_path: Optional[str] = None) -> bool:
+        """
+        Log a query to usage_logs table
+        This is called IMMEDIATELY after getting a response from the agent
+        or when an error occurs.
+        Args:
+            hf_user_id: Hugging Face user ID (required)
+            query_text: User's query text (required)
+            user_id: UUID of user from users table (optional)
+            prompt_tokens: Number of prompt tokens used
+            completion_tokens: Number of completion tokens used
+            total_tokens: Total tokens used
+            model: Model name (e.g., "gpt-4o")
+            response_text: Assistant's response
+            error_message: Error message if query failed
+            conversation_history: Full conversation history as list of dicts
+            is_image_response: Whether response included an image
+            image_path: Path to image if applicable
+        Returns:
+            True if logged successfully, False otherwise
+        """
+        if not self.is_enabled():
+            print(f"SupabaseClient: Logging disabled, skipping log for query: {query_text[:50]}...")
+            return True
+        try:
+            log_entry = {
+                'hf_user_id': hf_user_id,
+                'user_id': user_id,
+                'query_text': query_text,
+                'prompt_tokens': prompt_tokens,
+                'completion_tokens': completion_tokens,
+                'total_tokens': total_tokens,
+                'model': model,
+                'response_text': response_text,
+                'error_message': error_message,
+                'conversation_history': json.dumps(conversation_history) if conversation_history else None,
+                'is_image_response': is_image_response,
+                'image_path': image_path
+            }
+            response = self.client.table('usage_logs').insert(log_entry).execute()
+            if response.data:
+                print(f"SupabaseClient: Logged usage - tokens: {total_tokens}, error: {error_message is not None}")
+                return True
+            else:
+                print(f"SupabaseClient: Failed to log usage - no data returned")
+                return False
+        except Exception as e:
+            print(f"SupabaseClient: Error logging usage - {e}")
+            return False
+    def get_user_stats(self, hf_user_id: str) -> Optional[Dict]:
+        """
+        Get user statistics from user_stats view
+        Args:
+            hf_user_id: Hugging Face user ID
+        Returns:
+            Dict with user stats or None if error
+        """
+        if not self.is_enabled():
+            return None
+        try:
+            response = self.client.table('user_stats').select('*').eq('hf_user_id', hf_user_id).execute()
+            if response.data and len(response.data) > 0:
+                return response.data[0]
+            else:
+                return None
+        except Exception as e:
+            print(f"SupabaseClient: Error getting user stats - {e}")
+            return None
+# Singleton instance for easy import
+_supabase_client_instance = None
+def get_supabase_client() -> SupabaseClient:
+    """Get singleton Supabase client instance"""
+    global _supabase_client_instance
+    if _supabase_client_instance is None:
+        _supabase_client_instance = SupabaseClient()
+    return _supabase_client_instance

utils/supabase_r.R ADDED Viewed

	@@ -0,0 +1,132 @@

+# utils/supabase_r.R
+# R interface to Python Supabase client
+library(reticulate)
+# Initialize Supabase client wrapper
+initialize_supabase <- function() {
+  tryCatch({
+    # Import Python Supabase client module
+    supabase_module <- reticulate::import("utils.supabase_client")
+    supabase_client <- supabase_module$get_supabase_client()
+    if (!is.null(supabase_client$is_enabled()) && supabase_client$is_enabled()) {
+      print("R Supabase: Successfully initialized Supabase client")
+      return(supabase_client)
+    } else {
+      warning("R Supabase: Supabase client not properly configured")
+      return(NULL)
+    }
+  }, error = function(e) {
+    warning(paste("R Supabase: Failed to initialize -", e$message))
+    return(NULL)
+  })
+}
+# Get or create user
+get_or_create_user <- function(supabase_client, hf_user_id, hf_username, email = NULL) {
+  if (is.null(supabase_client)) {
+    return(NULL)
+  }
+  tryCatch({
+    user <- supabase_client$get_or_create_user(
+      hf_user_id = hf_user_id,
+      hf_username = hf_username,
+      email = email
+    )
+    return(user)
+  }, error = function(e) {
+    warning(paste("R Supabase: Error in get_or_create_user -", e$message))
+    return(NULL)
+  })
+}
+# Check user quota
+check_user_quota <- function(supabase_client, hf_user_id) {
+  if (is.null(supabase_client)) {
+    # Return default values if Supabase disabled
+    return(list(
+      has_quota = TRUE,
+      tokens_remaining = 999999,
+      tokens_used = 0
+    ))
+  }
+  tryCatch({
+    # Call Python method which returns tuple (has_quota, remaining, used)
+    result <- supabase_client$check_quota(hf_user_id = hf_user_id)
+    # Convert Python tuple to R list
+    if (is.list(result) || length(result) == 3) {
+      return(list(
+        has_quota = result[[1]],
+        tokens_remaining = as.integer(result[[2]]),
+        tokens_used = as.integer(result[[3]])
+      ))
+    } else {
+      warning("R Supabase: Unexpected result format from check_quota")
+      return(list(has_quota = TRUE, tokens_remaining = 999999, tokens_used = 0))
+    }
+  }, error = function(e) {
+    warning(paste("R Supabase: Error checking quota -", e$message))
+    return(list(has_quota = TRUE, tokens_remaining = 999999, tokens_used = 0))
+  })
+}
+# Update token usage
+update_token_usage <- function(supabase_client, hf_user_id, tokens_to_add) {
+  if (is.null(supabase_client)) {
+    return(TRUE)
+  }
+  tryCatch({
+    result <- supabase_client$update_token_usage(
+      hf_user_id = hf_user_id,
+      tokens_to_add = as.integer(tokens_to_add)
+    )
+    return(result)
+  }, error = function(e) {
+    warning(paste("R Supabase: Error updating token usage -", e$message))
+    return(FALSE)
+  })
+}
+# Log usage (called from R if needed, but primarily handled in Python)
+log_usage_from_r <- function(supabase_client, hf_user_id, query_text,
+                              user_id = NULL, total_tokens = 0,
+                              response_text = NULL, error_message = NULL) {
+  if (is.null(supabase_client)) {
+    return(TRUE)
+  }
+  tryCatch({
+    result <- supabase_client$log_usage(
+      hf_user_id = hf_user_id,
+      query_text = query_text,
+      user_id = user_id,
+      total_tokens = as.integer(total_tokens),
+      response_text = response_text,
+      error_message = error_message
+    )
+    return(result)
+  }, error = function(e) {
+    warning(paste("R Supabase: Error logging usage -", e$message))
+    return(FALSE)
+  })
+}
+# Get user statistics
+get_user_stats <- function(supabase_client, hf_user_id) {
+  if (is.null(supabase_client)) {
+    return(NULL)
+  }
+  tryCatch({
+    stats <- supabase_client$get_user_stats(hf_user_id = hf_user_id)
+    return(stats)
+  }, error = function(e) {
+    warning(paste("R Supabase: Error getting user stats -", e$message))
+    return(NULL)
+  })
+}