Spaces:

msse-team-3
/

ai-engineering-project

Sleeping

App Files Files Community

ai-engineering-project / docs /API_DOCUMENTATION.md

GitHub Action

Clean deployment without binary files

f884e6e 2 months ago

preview code

raw

history blame contribute delete

15.4 kB

API Documentation - HuggingFace Edition

Overview

PolicyWise provides a RESTful API for corporate policy question-answering using HuggingFace free-tier services. All endpoints return JSON responses and support CORS for web integration.

Base URL

Local Development: http://localhost:5000
HuggingFace Spaces: https://your-username-policywise-rag.hf.space

Authentication

No authentication required for public deployment. For production use, consider implementing API key authentication.

Core Endpoints

Chat Endpoint (Primary Interface)

POST /chat

Ask questions about company policies and receive intelligent responses with automatic source citations.

Request

POST /chat
Content-Type: application/json

{
  "message": "What is the remote work policy for new employees?",
  "max_tokens": 500,
  "include_sources": true,
  "guardrails_level": "standard"
}

Parameters

Parameter	Type	Required	Default	Description
`message`	string	Yes	-	User question about company policies
`max_tokens`	integer	No	500	Maximum response length (100-1000)
`include_sources`	boolean	No	true	Include source document details
`guardrails_level`	string	No	"standard"	Safety level: "strict", "standard", "relaxed"

Response

{
  "status": "success",
  "message": "What is the remote work policy for new employees?",
  "response": "New employees are eligible for remote work after completing their initial 90-day onboarding period. During this period, they must work from the office to facilitate mentoring and team integration. After the probationary period, employees can work remotely up to 3 days per week, subject to manager approval and role requirements. [Source: remote_work_policy.md] [Source: employee_handbook.md]",
  "confidence": 0.91,
  "sources": [
    {
      "filename": "remote_work_policy.md",
      "chunk_id": "remote_work_policy_chunk_3",
      "relevance_score": 0.89,
      "content_preview": "New employees must complete a 90-day onboarding period..."
    },
    {
      "filename": "employee_handbook.md",
      "chunk_id": "employee_handbook_chunk_7",
      "relevance_score": 0.76,
      "content_preview": "Remote work eligibility requirements include..."
    }
  ],
  "response_time_ms": 2340,
  "guardrails": {
    "safety_score": 0.98,
    "quality_score": 0.91,
    "citation_count": 2
  },
  "services_used": {
    "embedding_model": "intfloat/multilingual-e5-large",
    "llm_model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "vector_store": "huggingface_dataset"
  }
}

Error Response

{
  "status": "error",
  "error": "Request too long",
  "message": "Message exceeds maximum character limit of 5000",
  "error_code": "MESSAGE_TOO_LONG"
}

Search Endpoint

POST /search

Perform semantic search across policy documents using HuggingFace embeddings.

Request

POST /search
Content-Type: application/json

{
  "query": "What is the remote work policy?",
  "top_k": 5,
  "threshold": 0.3,
  "include_metadata": true
}

Parameters

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	Search query text
`top_k`	integer	No	5	Number of results to return (1-20)
`threshold`	float	No	0.3	Minimum similarity threshold (0.0-1.0)
`include_metadata`	boolean	No	true	Include document metadata

Response

{
  "status": "success",
  "query": "What is the remote work policy?",
  "results_count": 3,
  "embedding_model": "intfloat/multilingual-e5-large",
  "embedding_dimensions": 1024,
  "results": [
    {
      "chunk_id": "remote_work_policy_chunk_2",
      "content": "Employees may work remotely up to 3 days per week with manager approval. Remote work arrangements must be documented and reviewed quarterly.",
      "similarity_score": 0.87,
      "metadata": {
        "source_file": "remote_work_policy.md",
        "chunk_index": 2,
        "category": "HR",
        "word_count": 95,
        "created_at": "2025-10-25T10:30:00Z"
      }
    },
    {
      "chunk_id": "remote_work_policy_chunk_1",
      "content": "Remote work eligibility requires completion of probationary period and manager approval. New employees must work on-site for first 90 days.",
      "similarity_score": 0.82,
      "metadata": {
        "source_file": "remote_work_policy.md",
        "chunk_index": 1,
        "category": "HR",
        "word_count": 88,
        "created_at": "2025-10-25T10:30:00Z"
      }
    }
  ],
  "search_time_ms": 234,
  "vector_store_size": 98
}

Document Processing

POST /process-documents

Process and embed policy documents using HuggingFace services (automatically run on startup).

Request

POST /process-documents
Content-Type: application/json

{
  "force_reprocess": false,
  "batch_size": 10
}

Parameters

Parameter	Type	Required	Default	Description
`force_reprocess`	boolean	No	false	Force reprocessing even if documents exist
`batch_size`	integer	No	10	Number of documents to process per batch

Response

{
  "status": "success",
  "processing_details": {
    "files_processed": 22,
    "chunks_generated": 98,
    "embeddings_created": 98,
    "processing_time_seconds": 18.7
  },
  "embedding_service": {
    "model": "intfloat/multilingual-e5-large",
    "dimensions": 1024,
    "api_status": "operational"
  },
  "vector_store": {
    "type": "huggingface_dataset",
    "dataset_name": "policy-vectors",
    "total_embeddings": 98,
    "storage_size_mb": 2.4
  },
  "corpus_statistics": {
    "total_words": 10637,
    "average_chunk_size": 95,
    "documents_by_category": {
      "HR": 8,
      "Finance": 4,
      "Security": 3,
      "Operations": 4,
      "EHS": 3
    }
  },
  "quality_metrics": {
    "embedding_generation_success_rate": 1.0,
    "average_embedding_time_ms": 450,
    "metadata_completeness": 1.0
  }
}

Health Check

GET /health

Comprehensive system health check including all HuggingFace services.

Request

GET /health

Response

{
  "status": "healthy",
  "timestamp": "2025-10-25T10:30:00Z",
  "services": {
    "hf_embedding_api": "operational",
    "hf_inference_api": "operational",
    "hf_dataset_store": "operational"
  },
  "service_details": {
    "embedding_api": {
      "model": "intfloat/multilingual-e5-large",
      "last_request_ms": 450,
      "requests_today": 247,
      "error_rate": 0.02
    },
    "inference_api": {
      "model": "meta-llama/Meta-Llama-3-8B-Instruct",
      "last_request_ms": 2340,
      "requests_today": 89,
      "error_rate": 0.01
    },
    "dataset_store": {
      "dataset_name": "policy-vectors",
      "total_embeddings": 98,
      "last_updated": "2025-10-25T09:15:00Z",
      "access_status": "operational"
    }
  },
  "configuration": {
    "use_openai_embedding": false,
    "hf_token_configured": true,
    "embedding_model": "intfloat/multilingual-e5-large",
    "embedding_dimensions": 1024,
    "deployment_platform": "huggingface_spaces"
  },
  "statistics": {
    "total_documents": 98,
    "total_queries_processed": 1247,
    "average_response_time_ms": 2140,
    "vector_store_size": 98,
    "uptime_hours": 72.5
  },
  "performance": {
    "memory_usage_mb": 156,
    "cpu_usage_percent": 12,
    "disk_usage_mb": 45,
    "cache_hit_rate": 0.78
  }
}

System Information

GET /

Welcome page with system information and capabilities.

Response

{
  "message": "Welcome to PolicyWise - HuggingFace Edition",
  "version": "2.0.0-hf",
  "description": "Corporate policy RAG system powered by HuggingFace free-tier services",
  "capabilities": [
    "Policy question answering with citations",
    "Semantic document search",
    "Automatic document processing",
    "Multilingual embedding support",
    "Real-time health monitoring"
  ],
  "services": {
    "embedding": "HuggingFace Inference API (intfloat/multilingual-e5-large)",
    "llm": "HuggingFace Inference API (meta-llama/Meta-Llama-3-8B-Instruct)",
    "vector_store": "HuggingFace Dataset",
    "deployment": "HuggingFace Spaces"
  },
  "api_endpoints": {
    "chat": "POST /chat",
    "search": "POST /search",
    "process": "POST /process-documents",
    "health": "GET /health"
  },
  "documentation": {
    "api_docs": "/docs/api",
    "technical_architecture": "/docs/architecture",
    "deployment_guide": "/docs/deployment"
  },
  "policy_corpus": {
    "total_documents": 22,
    "total_chunks": 98,
    "categories": ["HR", "Finance", "Security", "Operations", "EHS"],
    "last_updated": "2025-10-25T09:15:00Z"
  }
}

Error Handling

HTTP Status Codes

Code	Status	Description
200	OK	Request successful
400	Bad Request	Invalid request parameters
413	Payload Too Large	Request body too large
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Server error
503	Service Unavailable	HuggingFace API unavailable

Error Response Format

{
  "status": "error",
  "error": "Error type",
  "message": "Human-readable error description",
  "error_code": "MACHINE_READABLE_CODE",
  "timestamp": "2025-10-25T10:30:00Z",
  "request_id": "req_abc123",
  "suggestions": [
    "Check your request parameters",
    "Retry with smaller payload"
  ]
}

Common Error Codes

Error Code	Description	Solution
`MESSAGE_TOO_LONG`	Message exceeds character limit	Reduce message length
`INVALID_PARAMETERS`	Invalid request parameters	Check parameter types and ranges
`HF_API_UNAVAILABLE`	HuggingFace API temporarily unavailable	Retry after delay
`RATE_LIMIT_EXCEEDED`	Too many requests	Wait before retrying
`EMBEDDING_FAILED`	Embedding generation failed	Check input text format
`SEARCH_FAILED`	Vector search failed	Verify query parameters
`DATASET_UNAVAILABLE`	HuggingFace Dataset inaccessible	Check dataset permissions

Rate Limiting

HuggingFace Free Tier Limits

Inference API: 1000 requests/hour per model
Dataset API: 100 requests/hour
Embedding API: 1000 requests/hour

Application Rate Limiting

Chat API: 60 requests/minute per IP
Search API: 120 requests/minute per IP
Processing API: 10 requests/hour per IP

Rate Limit Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200
X-RateLimit-Window: 60

SDK and Integration Examples

Python SDK Example

import requests
import json

class PolicyWiseClient:
    def __init__(self, base_url="http://localhost:5000"):
        self.base_url = base_url

    def ask_question(self, question, max_tokens=500):
        """Ask a policy question"""
        response = requests.post(
            f"{self.base_url}/chat",
            json={
                "message": question,
                "max_tokens": max_tokens,
                "include_sources": True
            }
        )
        return response.json()

    def search_policies(self, query, top_k=5):
        """Search policy documents"""
        response = requests.post(
            f"{self.base_url}/search",
            json={
                "query": query,
                "top_k": top_k,
                "threshold": 0.3
            }
        )
        return response.json()

    def check_health(self):
        """Check system health"""
        response = requests.get(f"{self.base_url}/health")
        return response.json()

# Usage
client = PolicyWiseClient("https://your-space.hf.space")

# Ask a question
result = client.ask_question("What is the PTO policy?")
print(f"Response: {result['response']}")
print(f"Sources: {[s['filename'] for s in result['sources']]}")

# Search documents
search_results = client.search_policies("remote work")
for result in search_results['results']:
    print(f"Found: {result['content'][:100]}...")

JavaScript/Node.js Example

class PolicyWiseClient {
    constructor(baseUrl = 'http://localhost:5000') {
        this.baseUrl = baseUrl;
    }

    async askQuestion(question, maxTokens = 500) {
        const response = await fetch(`${this.baseUrl}/chat`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                message: question,
                max_tokens: maxTokens,
                include_sources: true
            })
        });
        return await response.json();
    }

    async searchPolicies(query, topK = 5) {
        const response = await fetch(`${this.baseUrl}/search`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                query: query,
                top_k: topK,
                threshold: 0.3
            })
        });
        return await response.json();
    }

    async checkHealth() {
        const response = await fetch(`${this.baseUrl}/health`);
        return await response.json();
    }
}

// Usage
const client = new PolicyWiseClient('https://your-space.hf.space');

// Ask a question
client.askQuestion('What are the expense policies?')
    .then(result => {
        console.log('Response:', result.response);
        console.log('Sources:', result.sources.map(s => s.filename));
    });

cURL Examples

# Ask a policy question
curl -X POST https://your-space.hf.space/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the remote work policy?",
    "max_tokens": 500,
    "include_sources": true
  }'

# Search policy documents
curl -X POST https://your-space.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "expense reimbursement",
    "top_k": 3,
    "threshold": 0.4
  }'

# Check system health
curl https://your-space.hf.space/health

# Process documents (admin operation)
curl -X POST https://your-space.hf.space/process-documents \
  -H "Content-Type: application/json" \
  -d '{
    "force_reprocess": false,
    "batch_size": 10
  }'

Performance Guidelines

Optimization Tips

Batch Requests: Group multiple questions for better throughput
Cache Results: Cache frequently asked questions
Optimize Queries: Use specific, focused questions for better results
Monitor Usage: Track API usage to stay within rate limits

Expected Performance

Operation	Average Time	Throughput
Chat (with sources)	2-3 seconds	20-30 req/min
Search only	200-500ms	60-80 req/min
Health check	<100ms	200+ req/min
Document processing	15-20 seconds	1 req/hour

Monitoring

Monitor these metrics for optimal performance:

Response time percentiles (p50, p95, p99)
Error rates by endpoint
HuggingFace API response times
Vector store query performance
Memory and CPU usage

This API documentation provides everything needed to integrate with the PolicyWise HuggingFace-powered RAG system!