SmolFactory / docs /TRACKIO_API_FIX_SUMMARY.md
Tonic's picture
use gradio for better connection with the spaces
f251d3d verified
|
raw
history blame
7.95 kB
# Trackio API Fix Summary
## Overview
This document summarizes the fixes applied to resolve the 404 errors in the Trackio integration and implement automatic Space URL resolution.
## Issues Identified
### 1. **404 Errors in Trackio API Calls**
- **Problem**: The original API client was using incorrect endpoints and HTTP request patterns
- **Error**: `POST request failed: 404 - Cannot POST /spaces/Tonic/trackio-monitoring-20250727/gradio_api/call/list_experiments_interface`
- **Root Cause**: Using raw HTTP requests instead of the proper Gradio client API
### 2. **Hardcoded Space URL**
- **Problem**: The Space URL was hardcoded, making it inflexible
- **Issue**: No automatic resolution of Space URLs from Space IDs
- **Impact**: Required manual URL updates when Space deployment changes
## Solutions Implemented
### 1. **Updated API Client to Use Gradio Client**
**File**: `scripts/trackio_tonic/trackio_api_client.py`
**Changes**:
- Replaced custom HTTP requests with `gradio_client.Client`
- Uses proper two-step process (POST to get event_id, then GET to get results)
- Handles all Gradio API endpoints correctly
**Before**:
```python
# Custom HTTP requests with manual event_id handling
response = requests.post(url, json=payload)
event_id = response.json()["event_id"]
result = requests.get(f"{url}/{event_id}")
```
**After**:
```python
# Using gradio_client for proper API communication
result = self.client.predict(*args, api_name=api_name)
```
### 2. **Automatic Space URL Resolution**
**Implementation**:
- Uses Hugging Face Hub API to resolve Space URLs from Space IDs
- Falls back to default URL format if API is unavailable
- Supports both authenticated and anonymous access
**Key Features**:
```python
def _resolve_space_url(self) -> Optional[str]:
"""Resolve Space URL using Hugging Face Hub API"""
api = HfApi(token=self.hf_token)
space_info = api.space_info(self.space_id)
if space_info and hasattr(space_info, 'host'):
return space_info.host
else:
# Fallback to default URL format
space_name = self.space_id.replace('/', '-')
return f"https://{space_name}.hf.space"
```
### 3. **Updated Client Interface**
**Before**:
```python
client = TrackioAPIClient("https://tonic-trackio-monitoring-20250727.hf.space")
```
**After**:
```python
client = TrackioAPIClient("Tonic/trackio-monitoring-20250727", hf_token)
```
### 4. **Enhanced Monitoring Integration**
**File**: `src/monitoring.py`
**Changes**:
- Updated to use Space ID instead of hardcoded URL
- Automatic experiment creation with proper ID extraction
- Better error handling and fallback mechanisms
## Dependencies Added
### Required Packages
```bash
pip install gradio_client huggingface_hub
```
### Package Versions
- `gradio_client>=1.10.4` - For proper Gradio API communication
- `huggingface_hub>=0.19.3` - For Space URL resolution
## API Endpoints Supported
The updated client supports all documented Gradio endpoints:
1. **Experiment Management**:
- `/create_experiment_interface` - Create new experiments
- `/list_experiments_interface` - List all experiments
- `/get_experiment_details` - Get experiment details
- `/update_experiment_status_interface` - Update experiment status
2. **Metrics and Parameters**:
- `/log_metrics_interface` - Log training metrics
- `/log_parameters_interface` - Log experiment parameters
3. **Visualization**:
- `/create_metrics_plot` - Create metrics plots
- `/create_experiment_comparison` - Compare experiments
4. **Testing and Demo**:
- `/simulate_training_data` - Simulate training data
- `/create_demo_experiment` - Create demo experiments
## Configuration
### Environment Variables
```bash
# Required for Space URL resolution
export HF_TOKEN="your_huggingface_token"
# Optional: Custom Space ID
export TRACKIO_SPACE_ID="your-username/your-space-name"
# Optional: Dataset repository
export TRACKIO_DATASET_REPO="your-username/your-dataset"
```
### Default Configuration
- **Default Space ID**: `Tonic/trackio-monitoring-20250727`
- **Default Dataset**: `tonic/trackio-experiments`
- **Auto-resolution**: Enabled by default
## Testing
### Test Script
**File**: `tests/test_trackio_api_fix.py`
**Tests Included**:
1. **Space URL Resolution** - Tests automatic URL resolution
2. **API Client** - Tests all API endpoints
3. **Monitoring Integration** - Tests full monitoring workflow
### Running Tests
```bash
python tests/test_trackio_api_fix.py
```
**Expected Output**:
```
πŸš€ Starting Trackio API Client Tests with Automatic URL Resolution
======================================================================
βœ… Space URL Resolution: PASSED
βœ… API Client Test: PASSED
βœ… Monitoring Integration: PASSED
πŸŽ‰ All tests passed! The Trackio integration with automatic URL resolution is working correctly.
```
## Benefits
### 1. **Reliability**
- βœ… No more 404 errors
- βœ… Proper error handling and fallbacks
- βœ… Automatic retry mechanisms
### 2. **Flexibility**
- βœ… Automatic Space URL resolution
- βœ… Support for any Trackio Space
- βœ… Configurable via environment variables
### 3. **Maintainability**
- βœ… Clean separation of concerns
- βœ… Proper logging and debugging
- βœ… Comprehensive test coverage
### 4. **User Experience**
- βœ… Seamless integration with training pipeline
- βœ… Real-time experiment monitoring
- βœ… Automatic experiment creation and management
## Usage Examples
### Basic Usage
```python
from scripts.trackio_tonic.trackio_api_client import TrackioAPIClient
# Initialize with Space ID (URL resolved automatically)
client = TrackioAPIClient("Tonic/trackio-monitoring-20250727")
# Create experiment
result = client.create_experiment("my_experiment", "Test experiment")
# Log metrics
metrics = {"loss": 1.234, "accuracy": 0.85}
client.log_metrics("exp_123", metrics, step=100)
```
### With Monitoring Integration
```python
from src.monitoring import SmolLM3Monitor
# Create monitor (automatically creates experiment)
monitor = SmolLM3Monitor(
experiment_name="my_training_run",
enable_tracking=True
)
# Log metrics during training
monitor.log_metrics({"loss": 1.234}, step=100)
# Log configuration
monitor.log_config({"learning_rate": 2e-5, "batch_size": 8})
```
## Troubleshooting
### Common Issues
1. **"gradio_client not available"**
```bash
pip install gradio_client
```
2. **"huggingface_hub not available"**
```bash
pip install huggingface_hub
```
3. **"Space not accessible"**
- Check if the Space is running
- Verify Space ID is correct
- Ensure HF token has proper permissions
4. **"Experiment not found"**
- Experiments are created automatically by the monitor
- Use the experiment ID returned by `create_experiment()`
### Debug Mode
Enable debug logging to see detailed API calls:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## Future Enhancements
### Planned Features
1. **Multi-Space Support** - Support for multiple Trackio Spaces
2. **Advanced Metrics** - Support for custom metric types
3. **Artifact Upload** - Direct file upload to Spaces
4. **Real-time Dashboard** - Live monitoring dashboard
5. **Export Capabilities** - Export experiments to various formats
### Extensibility
The new architecture is designed to be easily extensible:
- Modular API client design
- Plugin-based monitoring system
- Configurable Space resolution
- Support for custom endpoints
## Conclusion
The Trackio API integration has been successfully fixed and enhanced with:
- βœ… **Resolved 404 errors** through proper Gradio client usage
- βœ… **Automatic URL resolution** using Hugging Face Hub API
- βœ… **Comprehensive testing** with full test coverage
- βœ… **Enhanced monitoring** with seamless integration
- βœ… **Future-proof architecture** for easy extensions
The system is now production-ready and provides reliable experiment tracking for SmolLM3 fine-tuning workflows.