ai-engineering-project / docs /API_DOCUMENTATION.md
GitHub Action
Clean deployment without binary files
f884e6e

API Documentation - HuggingFace Edition

Overview

PolicyWise provides a RESTful API for corporate policy question-answering using HuggingFace free-tier services. All endpoints return JSON responses and support CORS for web integration.

Base URL

  • Local Development: http://localhost:5000
  • HuggingFace Spaces: https://your-username-policywise-rag.hf.space

Authentication

No authentication required for public deployment. For production use, consider implementing API key authentication.

Core Endpoints

Chat Endpoint (Primary Interface)

POST /chat

Ask questions about company policies and receive intelligent responses with automatic source citations.

Request

POST /chat
Content-Type: application/json

{
  "message": "What is the remote work policy for new employees?",
  "max_tokens": 500,
  "include_sources": true,
  "guardrails_level": "standard"
}

Parameters

Parameter Type Required Default Description
message string Yes - User question about company policies
max_tokens integer No 500 Maximum response length (100-1000)
include_sources boolean No true Include source document details
guardrails_level string No "standard" Safety level: "strict", "standard", "relaxed"

Response

{
  "status": "success",
  "message": "What is the remote work policy for new employees?",
  "response": "New employees are eligible for remote work after completing their initial 90-day onboarding period. During this period, they must work from the office to facilitate mentoring and team integration. After the probationary period, employees can work remotely up to 3 days per week, subject to manager approval and role requirements. [Source: remote_work_policy.md] [Source: employee_handbook.md]",
  "confidence": 0.91,
  "sources": [
    {
      "filename": "remote_work_policy.md",
      "chunk_id": "remote_work_policy_chunk_3",
      "relevance_score": 0.89,
      "content_preview": "New employees must complete a 90-day onboarding period..."
    },
    {
      "filename": "employee_handbook.md",
      "chunk_id": "employee_handbook_chunk_7",
      "relevance_score": 0.76,
      "content_preview": "Remote work eligibility requirements include..."
    }
  ],
  "response_time_ms": 2340,
  "guardrails": {
    "safety_score": 0.98,
    "quality_score": 0.91,
    "citation_count": 2
  },
  "services_used": {
    "embedding_model": "intfloat/multilingual-e5-large",
    "llm_model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "vector_store": "huggingface_dataset"
  }
}

Error Response

{
  "status": "error",
  "error": "Request too long",
  "message": "Message exceeds maximum character limit of 5000",
  "error_code": "MESSAGE_TOO_LONG"
}

Search Endpoint

POST /search

Perform semantic search across policy documents using HuggingFace embeddings.

Request

POST /search
Content-Type: application/json

{
  "query": "What is the remote work policy?",
  "top_k": 5,
  "threshold": 0.3,
  "include_metadata": true
}

Parameters

Parameter Type Required Default Description
query string Yes - Search query text
top_k integer No 5 Number of results to return (1-20)
threshold float No 0.3 Minimum similarity threshold (0.0-1.0)
include_metadata boolean No true Include document metadata

Response

{
  "status": "success",
  "query": "What is the remote work policy?",
  "results_count": 3,
  "embedding_model": "intfloat/multilingual-e5-large",
  "embedding_dimensions": 1024,
  "results": [
    {
      "chunk_id": "remote_work_policy_chunk_2",
      "content": "Employees may work remotely up to 3 days per week with manager approval. Remote work arrangements must be documented and reviewed quarterly.",
      "similarity_score": 0.87,
      "metadata": {
        "source_file": "remote_work_policy.md",
        "chunk_index": 2,
        "category": "HR",
        "word_count": 95,
        "created_at": "2025-10-25T10:30:00Z"
      }
    },
    {
      "chunk_id": "remote_work_policy_chunk_1",
      "content": "Remote work eligibility requires completion of probationary period and manager approval. New employees must work on-site for first 90 days.",
      "similarity_score": 0.82,
      "metadata": {
        "source_file": "remote_work_policy.md",
        "chunk_index": 1,
        "category": "HR",
        "word_count": 88,
        "created_at": "2025-10-25T10:30:00Z"
      }
    }
  ],
  "search_time_ms": 234,
  "vector_store_size": 98
}

Document Processing

POST /process-documents

Process and embed policy documents using HuggingFace services (automatically run on startup).

Request

POST /process-documents
Content-Type: application/json

{
  "force_reprocess": false,
  "batch_size": 10
}

Parameters

Parameter Type Required Default Description
force_reprocess boolean No false Force reprocessing even if documents exist
batch_size integer No 10 Number of documents to process per batch

Response

{
  "status": "success",
  "processing_details": {
    "files_processed": 22,
    "chunks_generated": 98,
    "embeddings_created": 98,
    "processing_time_seconds": 18.7
  },
  "embedding_service": {
    "model": "intfloat/multilingual-e5-large",
    "dimensions": 1024,
    "api_status": "operational"
  },
  "vector_store": {
    "type": "huggingface_dataset",
    "dataset_name": "policy-vectors",
    "total_embeddings": 98,
    "storage_size_mb": 2.4
  },
  "corpus_statistics": {
    "total_words": 10637,
    "average_chunk_size": 95,
    "documents_by_category": {
      "HR": 8,
      "Finance": 4,
      "Security": 3,
      "Operations": 4,
      "EHS": 3
    }
  },
  "quality_metrics": {
    "embedding_generation_success_rate": 1.0,
    "average_embedding_time_ms": 450,
    "metadata_completeness": 1.0
  }
}

Health Check

GET /health

Comprehensive system health check including all HuggingFace services.

Request

GET /health

Response

{
  "status": "healthy",
  "timestamp": "2025-10-25T10:30:00Z",
  "services": {
    "hf_embedding_api": "operational",
    "hf_inference_api": "operational",
    "hf_dataset_store": "operational"
  },
  "service_details": {
    "embedding_api": {
      "model": "intfloat/multilingual-e5-large",
      "last_request_ms": 450,
      "requests_today": 247,
      "error_rate": 0.02
    },
    "inference_api": {
      "model": "meta-llama/Meta-Llama-3-8B-Instruct",
      "last_request_ms": 2340,
      "requests_today": 89,
      "error_rate": 0.01
    },
    "dataset_store": {
      "dataset_name": "policy-vectors",
      "total_embeddings": 98,
      "last_updated": "2025-10-25T09:15:00Z",
      "access_status": "operational"
    }
  },
  "configuration": {
    "use_openai_embedding": false,
    "hf_token_configured": true,
    "embedding_model": "intfloat/multilingual-e5-large",
    "embedding_dimensions": 1024,
    "deployment_platform": "huggingface_spaces"
  },
  "statistics": {
    "total_documents": 98,
    "total_queries_processed": 1247,
    "average_response_time_ms": 2140,
    "vector_store_size": 98,
    "uptime_hours": 72.5
  },
  "performance": {
    "memory_usage_mb": 156,
    "cpu_usage_percent": 12,
    "disk_usage_mb": 45,
    "cache_hit_rate": 0.78
  }
}

System Information

GET /

Welcome page with system information and capabilities.

Response

{
  "message": "Welcome to PolicyWise - HuggingFace Edition",
  "version": "2.0.0-hf",
  "description": "Corporate policy RAG system powered by HuggingFace free-tier services",
  "capabilities": [
    "Policy question answering with citations",
    "Semantic document search",
    "Automatic document processing",
    "Multilingual embedding support",
    "Real-time health monitoring"
  ],
  "services": {
    "embedding": "HuggingFace Inference API (intfloat/multilingual-e5-large)",
    "llm": "HuggingFace Inference API (meta-llama/Meta-Llama-3-8B-Instruct)",
    "vector_store": "HuggingFace Dataset",
    "deployment": "HuggingFace Spaces"
  },
  "api_endpoints": {
    "chat": "POST /chat",
    "search": "POST /search",
    "process": "POST /process-documents",
    "health": "GET /health"
  },
  "documentation": {
    "api_docs": "/docs/api",
    "technical_architecture": "/docs/architecture",
    "deployment_guide": "/docs/deployment"
  },
  "policy_corpus": {
    "total_documents": 22,
    "total_chunks": 98,
    "categories": ["HR", "Finance", "Security", "Operations", "EHS"],
    "last_updated": "2025-10-25T09:15:00Z"
  }
}

Error Handling

HTTP Status Codes

Code Status Description
200 OK Request successful
400 Bad Request Invalid request parameters
413 Payload Too Large Request body too large
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Server error
503 Service Unavailable HuggingFace API unavailable

Error Response Format

{
  "status": "error",
  "error": "Error type",
  "message": "Human-readable error description",
  "error_code": "MACHINE_READABLE_CODE",
  "timestamp": "2025-10-25T10:30:00Z",
  "request_id": "req_abc123",
  "suggestions": [
    "Check your request parameters",
    "Retry with smaller payload"
  ]
}

Common Error Codes

Error Code Description Solution
MESSAGE_TOO_LONG Message exceeds character limit Reduce message length
INVALID_PARAMETERS Invalid request parameters Check parameter types and ranges
HF_API_UNAVAILABLE HuggingFace API temporarily unavailable Retry after delay
RATE_LIMIT_EXCEEDED Too many requests Wait before retrying
EMBEDDING_FAILED Embedding generation failed Check input text format
SEARCH_FAILED Vector search failed Verify query parameters
DATASET_UNAVAILABLE HuggingFace Dataset inaccessible Check dataset permissions

Rate Limiting

HuggingFace Free Tier Limits

  • Inference API: 1000 requests/hour per model
  • Dataset API: 100 requests/hour
  • Embedding API: 1000 requests/hour

Application Rate Limiting

  • Chat API: 60 requests/minute per IP
  • Search API: 120 requests/minute per IP
  • Processing API: 10 requests/hour per IP

Rate Limit Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200
X-RateLimit-Window: 60

SDK and Integration Examples

Python SDK Example

import requests
import json

class PolicyWiseClient:
    def __init__(self, base_url="http://localhost:5000"):
        self.base_url = base_url

    def ask_question(self, question, max_tokens=500):
        """Ask a policy question"""
        response = requests.post(
            f"{self.base_url}/chat",
            json={
                "message": question,
                "max_tokens": max_tokens,
                "include_sources": True
            }
        )
        return response.json()

    def search_policies(self, query, top_k=5):
        """Search policy documents"""
        response = requests.post(
            f"{self.base_url}/search",
            json={
                "query": query,
                "top_k": top_k,
                "threshold": 0.3
            }
        )
        return response.json()

    def check_health(self):
        """Check system health"""
        response = requests.get(f"{self.base_url}/health")
        return response.json()

# Usage
client = PolicyWiseClient("https://your-space.hf.space")

# Ask a question
result = client.ask_question("What is the PTO policy?")
print(f"Response: {result['response']}")
print(f"Sources: {[s['filename'] for s in result['sources']]}")

# Search documents
search_results = client.search_policies("remote work")
for result in search_results['results']:
    print(f"Found: {result['content'][:100]}...")

JavaScript/Node.js Example

class PolicyWiseClient {
    constructor(baseUrl = 'http://localhost:5000') {
        this.baseUrl = baseUrl;
    }

    async askQuestion(question, maxTokens = 500) {
        const response = await fetch(`${this.baseUrl}/chat`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                message: question,
                max_tokens: maxTokens,
                include_sources: true
            })
        });
        return await response.json();
    }

    async searchPolicies(query, topK = 5) {
        const response = await fetch(`${this.baseUrl}/search`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                query: query,
                top_k: topK,
                threshold: 0.3
            })
        });
        return await response.json();
    }

    async checkHealth() {
        const response = await fetch(`${this.baseUrl}/health`);
        return await response.json();
    }
}

// Usage
const client = new PolicyWiseClient('https://your-space.hf.space');

// Ask a question
client.askQuestion('What are the expense policies?')
    .then(result => {
        console.log('Response:', result.response);
        console.log('Sources:', result.sources.map(s => s.filename));
    });

cURL Examples

# Ask a policy question
curl -X POST https://your-space.hf.space/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the remote work policy?",
    "max_tokens": 500,
    "include_sources": true
  }'

# Search policy documents
curl -X POST https://your-space.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "expense reimbursement",
    "top_k": 3,
    "threshold": 0.4
  }'

# Check system health
curl https://your-space.hf.space/health

# Process documents (admin operation)
curl -X POST https://your-space.hf.space/process-documents \
  -H "Content-Type: application/json" \
  -d '{
    "force_reprocess": false,
    "batch_size": 10
  }'

Performance Guidelines

Optimization Tips

  1. Batch Requests: Group multiple questions for better throughput
  2. Cache Results: Cache frequently asked questions
  3. Optimize Queries: Use specific, focused questions for better results
  4. Monitor Usage: Track API usage to stay within rate limits

Expected Performance

Operation Average Time Throughput
Chat (with sources) 2-3 seconds 20-30 req/min
Search only 200-500ms 60-80 req/min
Health check <100ms 200+ req/min
Document processing 15-20 seconds 1 req/hour

Monitoring

Monitor these metrics for optimal performance:

  • Response time percentiles (p50, p95, p99)
  • Error rates by endpoint
  • HuggingFace API response times
  • Vector store query performance
  • Memory and CPU usage

This API documentation provides everything needed to integrate with the PolicyWise HuggingFace-powered RAG system!