cmw-copilot / docs /20251010_API_ENDPOINTS_DOCUMENTATION.md
arterm-sedov
working api ask and ask_streaming with session memory
7bdcbc6

A newer version of the Gradio SDK is available: 6.4.0

Upgrade

CMW Platform Agent API Endpoints Documentation

Date: 2025-10-10
Version: 1.0
Status: Production Ready

Overview

The CMW Platform Agent now exposes two REST API endpoints that allow external applications to interact with the agent programmatically. These endpoints support both single-turn and multi-turn conversations with session persistence.

Base URL

http://localhost:7860

Authentication

No authentication is required for these endpoints. All requests are processed with session isolation.

Endpoints

1. /ask - Final Answer Endpoint

Returns the complete assistant response after processing is finished.

Method: POST
Path: /gradio_api/call/ask
Content-Type: application/json

Request Format

{
  "data": ["Your question here", "username", "password", "base_url"],
  "session_hash": "optional-session-id"
}

Parameters

  • data[0] (string, required): The user's question
  • data[1] (string, optional): Username for Comindware Platform authentication
  • data[2] (string, optional): Password for Comindware Platform authentication
  • data[3] (string, optional): Base URL of the Comindware Platform (e.g., "https://your-platform.com")
  • session_hash (string, optional): Session identifier for multi-turn conversations

Response Format

Success Response:

{
  "event_id": "unique-event-id"
}

Final Result (via GET):

{
  "data": ["Complete assistant response"]
}

Example Usage

cURL:

# Submit question with authentication
curl -X POST http://localhost:7860/gradio_api/call/ask \
  -H "Content-Type: application/json" \
  -d '{"data": ["Hello, who are you?", "myuser", "mypass", "https://my-platform.com"]}'

# Get result (replace EVENT_ID with actual ID)
curl -N http://localhost:7860/gradio_api/call/ask/EVENT_ID

Python Client:

from gradio_client import Client

client = Client("http://localhost:7860/")
result = client.predict(
    question="Hello, who are you?",
    username="myuser",
    password="mypass", 
    base_url="https://my-platform.com",
    api_name="/ask"
)
print(result)

Using Environment Variables:

import os
from dotenv import load_dotenv
from gradio_client import Client

# Load from root .env file
load_dotenv()

client = Client("http://localhost:7860/")
result = client.predict(
    question="Hello, who are you?",
    username=os.getenv("CMW_LOGIN"),
    password=os.getenv("CMW_PASSWORD"), 
    base_url=os.getenv("CMW_BASE_URL"),
    api_name="/ask"
)
print(result)

2. /ask_stream - Streaming Endpoint

Returns incremental chunks of the assistant response as it's being generated.

Method: POST
Path: /gradio_api/call/ask_stream
Content-Type: application/json

Request Format

{
  "data": ["Your question here", "username", "password", "base_url"],
  "session_hash": "optional-session-id"
}

Parameters

  • data[0] (string, required): The user's question
  • data[1] (string, optional): Username for Comindware Platform authentication
  • data[2] (string, optional): Password for Comindware Platform authentication
  • data[3] (string, optional): Base URL of the Comindware Platform (e.g., "https://your-platform.com")
  • session_hash (string, optional): Session identifier for multi-turn conversations

Response Format

Success Response:

{
  "event_id": "unique-event-id"
}

Streaming Results (via GET):

event: generating
data: ["Hello"]

event: generating
data: ["Hello, w"]

event: generating
data: ["Hello, wo"]

event: generating
data: ["Hello, wor"]

event: generating
data: ["Hello, worl"]

event: generating
data: ["Hello, world!"]

event: complete
data: ["Hello, world!"]

Example Usage

cURL:

# Submit question
curl -X POST http://localhost:7860/call/ask_stream \
  -H "Content-Type: application/json" \
  -d '{"data": ["Stream this please"]}'

# Get streaming result (replace EVENT_ID with actual ID)
curl -N http://localhost:7860/call/ask_stream/EVENT_ID

Python Client:

from gradio_client import Client

client = Client("http://localhost:7860/")
job = client.submit(
    question="Stream this please",
    api_name="/ask_stream"
)

# Iterate through streaming chunks
for chunk in job:
    print(f"Chunk: {chunk}")

Session Management

Multi-turn Conversations

Both endpoints support session persistence using the session_hash parameter:

# First message in a session
client = Client("http://localhost:7860/")
result1 = client.predict(
    question="What is 2+2?",
    api_name="/ask",
    session_hash="my-session-123"
)

# Follow-up message in the same session
result2 = client.predict(
    question="What about 3+3?",
    api_name="/ask", 
    session_hash="my-session-123"
)

Session Behavior

  • With session_hash: Messages are part of the same conversation context
  • Without session_hash: Each request is treated as a new conversation
  • Session isolation: Different session hashes maintain separate conversation histories

Error Handling

Common Error Responses

Connection Error:

{
  "error": "Connection refused"
}

Timeout Error:

{
  "error": "Request timeout"
}

Invalid Request:

{
  "error": "Invalid request format"
}

Error Event (Streaming)

For streaming endpoints, errors are returned as events:

event: error
data: ["Error message here"]

Rate Limiting

  • Concurrent requests: Limited by Gradio's queue system (default: 1)
  • Rate limits: No explicit rate limiting implemented
  • Queue timeout: 30 seconds per request

Response Times

  • Final endpoint (/ask): 2-10 seconds depending on complexity
  • Streaming endpoint (/ask_stream): First chunk within 1-2 seconds, then incremental updates

Best Practices

1. Use Appropriate Endpoint

  • Use /ask for simple queries where you need the complete response
  • Use /ask_stream for better user experience with real-time feedback

2. Session Management

  • Always use session_hash for multi-turn conversations
  • Generate unique session IDs for different users/conversations
  • Reuse session_hash within the same conversation thread

3. Error Handling

try:
    result = client.predict(question="Hello", api_name="/ask")
    print(result)
except Exception as e:
    print(f"Error: {e}")
    # Handle error appropriately

4. Streaming Best Practices

# For streaming, always iterate through chunks
job = client.submit(question="Stream this", api_name="/ask_stream")
for chunk in job:
    # Process each chunk
    print(f"Received: {chunk}")

Integration Examples

JavaScript/Node.js

const fetch = require('node-fetch');

// Final endpoint
async function askQuestion(question) {
    const response = await fetch('http://localhost:7860/call/ask', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ data: [question] })
    });
    
    const { event_id } = await response.json();
    
    // Get result
    const resultResponse = await fetch(`http://localhost:7860/call/ask/${event_id}`);
    const result = await resultResponse.json();
    
    return result.data[0];
}

Python with requests

import requests
import json

def ask_question(question, session_hash=None):
    # Submit question
    payload = {"data": [question]}
    if session_hash:
        payload["session_hash"] = session_hash
    
    response = requests.post(
        "http://localhost:7860/call/ask",
        headers={"Content-Type": "application/json"},
        json=payload
    )
    
    event_id = response.json()["event_id"]
    
    # Get result
    result_response = requests.get(f"http://localhost:7860/call/ask/{event_id}")
    return result_response.json()["data"][0]

Testing

Test Script

A test script is available at agent_ng/_tests/api_test.py:

# Run tests
python agent_ng/_tests/api_test.py

# With custom URL
BASE_URL=http://your-server:7860/ python agent_ng/_tests/api_test.py

# With session hash
SESSION_HASH=test-session-123 python agent_ng/_tests/api_test.py

Manual Testing

  1. Start the agent:

    python -m agent_ng.app_ng_modular
    
  2. Test final endpoint:

    curl -X POST http://localhost:7860/call/ask \
      -H "Content-Type: application/json" \
      -d '{"data": ["Hello"]}'
    
  3. Test streaming endpoint:

    curl -X POST http://localhost:7860/call/ask_stream \
      -H "Content-Type: application/json" \
      -d '{"data": ["Stream this"]}'
    

Troubleshooting

Common Issues

  1. "Application is initializing..."

    • Wait for the agent to fully initialize
    • Check logs for initialization errors
  2. Connection refused

    • Ensure the agent is running on the correct port
    • Check firewall settings
  3. Timeout errors

    • Increase timeout values
    • Check server performance
  4. Empty responses

    • Verify the question is not empty
    • Check agent configuration

Debug Mode

Enable debug logging by setting environment variables:

export GRADIO_DEBUG=1
export LOG_LEVEL=DEBUG
python -m agent_ng.app_ng_modular

Changelog

Version 1.0 (2025-10-10)

  • Initial release of API endpoints
  • Added /ask final answer endpoint
  • Added /ask_stream streaming endpoint
  • Implemented session management
  • Added comprehensive documentation

Support

For issues or questions regarding the API endpoints:

  1. Check this documentation
  2. Review the test script examples
  3. Check the agent logs for error details
  4. Verify the agent is running and accessible

Note: This documentation covers the API endpoints as implemented in the CMW Platform Agent. For Gradio-specific API details, refer to the official Gradio documentation.