cmw-copilot / docs /20251010_API_ENDPOINTS_DOCUMENTATION.md
arterm-sedov
working api ask and ask_streaming with session memory
7bdcbc6
# CMW Platform Agent API Endpoints Documentation
**Date:** 2025-10-10
**Version:** 1.0
**Status:** Production Ready
## Overview
The CMW Platform Agent now exposes two REST API endpoints that allow external applications to interact with the agent programmatically. These endpoints support both single-turn and multi-turn conversations with session persistence.
## Base URL
```
http://localhost:7860
```
## Authentication
No authentication is required for these endpoints. All requests are processed with session isolation.
## Endpoints
### 1. `/ask` - Final Answer Endpoint
Returns the complete assistant response after processing is finished.
**Method:** `POST`
**Path:** `/gradio_api/call/ask`
**Content-Type:** `application/json`
#### Request Format
```json
{
"data": ["Your question here", "username", "password", "base_url"],
"session_hash": "optional-session-id"
}
```
#### Parameters
- `data[0]` (string, required): The user's question
- `data[1]` (string, optional): Username for Comindware Platform authentication
- `data[2]` (string, optional): Password for Comindware Platform authentication
- `data[3]` (string, optional): Base URL of the Comindware Platform (e.g., "https://your-platform.com")
- `session_hash` (string, optional): Session identifier for multi-turn conversations
#### Response Format
**Success Response:**
```json
{
"event_id": "unique-event-id"
}
```
**Final Result (via GET):**
```json
{
"data": ["Complete assistant response"]
}
```
#### Example Usage
**cURL:**
```bash
# Submit question with authentication
curl -X POST http://localhost:7860/gradio_api/call/ask \
-H "Content-Type: application/json" \
-d '{"data": ["Hello, who are you?", "myuser", "mypass", "https://my-platform.com"]}'
# Get result (replace EVENT_ID with actual ID)
curl -N http://localhost:7860/gradio_api/call/ask/EVENT_ID
```
**Python Client:**
```python
from gradio_client import Client
client = Client("http://localhost:7860/")
result = client.predict(
question="Hello, who are you?",
username="myuser",
password="mypass",
base_url="https://my-platform.com",
api_name="/ask"
)
print(result)
```
**Using Environment Variables:**
```python
import os
from dotenv import load_dotenv
from gradio_client import Client
# Load from root .env file
load_dotenv()
client = Client("http://localhost:7860/")
result = client.predict(
question="Hello, who are you?",
username=os.getenv("CMW_LOGIN"),
password=os.getenv("CMW_PASSWORD"),
base_url=os.getenv("CMW_BASE_URL"),
api_name="/ask"
)
print(result)
```
### 2. `/ask_stream` - Streaming Endpoint
Returns incremental chunks of the assistant response as it's being generated.
**Method:** `POST`
**Path:** `/gradio_api/call/ask_stream`
**Content-Type:** `application/json`
#### Request Format
```json
{
"data": ["Your question here", "username", "password", "base_url"],
"session_hash": "optional-session-id"
}
```
#### Parameters
- `data[0]` (string, required): The user's question
- `data[1]` (string, optional): Username for Comindware Platform authentication
- `data[2]` (string, optional): Password for Comindware Platform authentication
- `data[3]` (string, optional): Base URL of the Comindware Platform (e.g., "https://your-platform.com")
- `session_hash` (string, optional): Session identifier for multi-turn conversations
#### Response Format
**Success Response:**
```json
{
"event_id": "unique-event-id"
}
```
**Streaming Results (via GET):**
```
event: generating
data: ["Hello"]
event: generating
data: ["Hello, w"]
event: generating
data: ["Hello, wo"]
event: generating
data: ["Hello, wor"]
event: generating
data: ["Hello, worl"]
event: generating
data: ["Hello, world!"]
event: complete
data: ["Hello, world!"]
```
#### Example Usage
**cURL:**
```bash
# Submit question
curl -X POST http://localhost:7860/call/ask_stream \
-H "Content-Type: application/json" \
-d '{"data": ["Stream this please"]}'
# Get streaming result (replace EVENT_ID with actual ID)
curl -N http://localhost:7860/call/ask_stream/EVENT_ID
```
**Python Client:**
```python
from gradio_client import Client
client = Client("http://localhost:7860/")
job = client.submit(
question="Stream this please",
api_name="/ask_stream"
)
# Iterate through streaming chunks
for chunk in job:
print(f"Chunk: {chunk}")
```
## Session Management
### Multi-turn Conversations
Both endpoints support session persistence using the `session_hash` parameter:
```python
# First message in a session
client = Client("http://localhost:7860/")
result1 = client.predict(
question="What is 2+2?",
api_name="/ask",
session_hash="my-session-123"
)
# Follow-up message in the same session
result2 = client.predict(
question="What about 3+3?",
api_name="/ask",
session_hash="my-session-123"
)
```
### Session Behavior
- **With session_hash:** Messages are part of the same conversation context
- **Without session_hash:** Each request is treated as a new conversation
- **Session isolation:** Different session hashes maintain separate conversation histories
## Error Handling
### Common Error Responses
**Connection Error:**
```json
{
"error": "Connection refused"
}
```
**Timeout Error:**
```json
{
"error": "Request timeout"
}
```
**Invalid Request:**
```json
{
"error": "Invalid request format"
}
```
### Error Event (Streaming)
For streaming endpoints, errors are returned as events:
```
event: error
data: ["Error message here"]
```
## Rate Limiting
- **Concurrent requests:** Limited by Gradio's queue system (default: 1)
- **Rate limits:** No explicit rate limiting implemented
- **Queue timeout:** 30 seconds per request
## Response Times
- **Final endpoint (`/ask`):** 2-10 seconds depending on complexity
- **Streaming endpoint (`/ask_stream`):** First chunk within 1-2 seconds, then incremental updates
## Best Practices
### 1. Use Appropriate Endpoint
- **Use `/ask`** for simple queries where you need the complete response
- **Use `/ask_stream`** for better user experience with real-time feedback
### 2. Session Management
- **Always use session_hash** for multi-turn conversations
- **Generate unique session IDs** for different users/conversations
- **Reuse session_hash** within the same conversation thread
### 3. Error Handling
```python
try:
result = client.predict(question="Hello", api_name="/ask")
print(result)
except Exception as e:
print(f"Error: {e}")
# Handle error appropriately
```
### 4. Streaming Best Practices
```python
# For streaming, always iterate through chunks
job = client.submit(question="Stream this", api_name="/ask_stream")
for chunk in job:
# Process each chunk
print(f"Received: {chunk}")
```
## Integration Examples
### JavaScript/Node.js
```javascript
const fetch = require('node-fetch');
// Final endpoint
async function askQuestion(question) {
const response = await fetch('http://localhost:7860/call/ask', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ data: [question] })
});
const { event_id } = await response.json();
// Get result
const resultResponse = await fetch(`http://localhost:7860/call/ask/${event_id}`);
const result = await resultResponse.json();
return result.data[0];
}
```
### Python with requests
```python
import requests
import json
def ask_question(question, session_hash=None):
# Submit question
payload = {"data": [question]}
if session_hash:
payload["session_hash"] = session_hash
response = requests.post(
"http://localhost:7860/call/ask",
headers={"Content-Type": "application/json"},
json=payload
)
event_id = response.json()["event_id"]
# Get result
result_response = requests.get(f"http://localhost:7860/call/ask/{event_id}")
return result_response.json()["data"][0]
```
## Testing
### Test Script
A test script is available at `agent_ng/_tests/api_test.py`:
```bash
# Run tests
python agent_ng/_tests/api_test.py
# With custom URL
BASE_URL=http://your-server:7860/ python agent_ng/_tests/api_test.py
# With session hash
SESSION_HASH=test-session-123 python agent_ng/_tests/api_test.py
```
### Manual Testing
1. **Start the agent:**
```bash
python -m agent_ng.app_ng_modular
```
2. **Test final endpoint:**
```bash
curl -X POST http://localhost:7860/call/ask \
-H "Content-Type: application/json" \
-d '{"data": ["Hello"]}'
```
3. **Test streaming endpoint:**
```bash
curl -X POST http://localhost:7860/call/ask_stream \
-H "Content-Type: application/json" \
-d '{"data": ["Stream this"]}'
```
## Troubleshooting
### Common Issues
1. **"Application is initializing..."**
- Wait for the agent to fully initialize
- Check logs for initialization errors
2. **Connection refused**
- Ensure the agent is running on the correct port
- Check firewall settings
3. **Timeout errors**
- Increase timeout values
- Check server performance
4. **Empty responses**
- Verify the question is not empty
- Check agent configuration
### Debug Mode
Enable debug logging by setting environment variables:
```bash
export GRADIO_DEBUG=1
export LOG_LEVEL=DEBUG
python -m agent_ng.app_ng_modular
```
## Changelog
### Version 1.0 (2025-10-10)
- Initial release of API endpoints
- Added `/ask` final answer endpoint
- Added `/ask_stream` streaming endpoint
- Implemented session management
- Added comprehensive documentation
## Support
For issues or questions regarding the API endpoints:
1. Check this documentation
2. Review the test script examples
3. Check the agent logs for error details
4. Verify the agent is running and accessible
---
**Note:** This documentation covers the API endpoints as implemented in the CMW Platform Agent. For Gradio-specific API details, refer to the [official Gradio documentation](https://www.gradio.app/guides/querying-gradio-apps-with-curl).