Spaces:
Sleeping
API Documentation - HuggingFace Edition
Overview
PolicyWise provides a RESTful API for corporate policy question-answering using HuggingFace free-tier services. All endpoints return JSON responses and support CORS for web integration.
Base URL
- Local Development:
http://localhost:5000 - HuggingFace Spaces:
https://your-username-policywise-rag.hf.space
Authentication
No authentication required for public deployment. For production use, consider implementing API key authentication.
Core Endpoints
Chat Endpoint (Primary Interface)
POST /chat
Ask questions about company policies and receive intelligent responses with automatic source citations.
Request
POST /chat
Content-Type: application/json
{
"message": "What is the remote work policy for new employees?",
"max_tokens": 500,
"include_sources": true,
"guardrails_level": "standard"
}
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
message |
string | Yes | - | User question about company policies |
max_tokens |
integer | No | 500 | Maximum response length (100-1000) |
include_sources |
boolean | No | true | Include source document details |
guardrails_level |
string | No | "standard" | Safety level: "strict", "standard", "relaxed" |
Response
{
"status": "success",
"message": "What is the remote work policy for new employees?",
"response": "New employees are eligible for remote work after completing their initial 90-day onboarding period. During this period, they must work from the office to facilitate mentoring and team integration. After the probationary period, employees can work remotely up to 3 days per week, subject to manager approval and role requirements. [Source: remote_work_policy.md] [Source: employee_handbook.md]",
"confidence": 0.91,
"sources": [
{
"filename": "remote_work_policy.md",
"chunk_id": "remote_work_policy_chunk_3",
"relevance_score": 0.89,
"content_preview": "New employees must complete a 90-day onboarding period..."
},
{
"filename": "employee_handbook.md",
"chunk_id": "employee_handbook_chunk_7",
"relevance_score": 0.76,
"content_preview": "Remote work eligibility requirements include..."
}
],
"response_time_ms": 2340,
"guardrails": {
"safety_score": 0.98,
"quality_score": 0.91,
"citation_count": 2
},
"services_used": {
"embedding_model": "intfloat/multilingual-e5-large",
"llm_model": "meta-llama/Meta-Llama-3-8B-Instruct",
"vector_store": "huggingface_dataset"
}
}
Error Response
{
"status": "error",
"error": "Request too long",
"message": "Message exceeds maximum character limit of 5000",
"error_code": "MESSAGE_TOO_LONG"
}
Search Endpoint
POST /search
Perform semantic search across policy documents using HuggingFace embeddings.
Request
POST /search
Content-Type: application/json
{
"query": "What is the remote work policy?",
"top_k": 5,
"threshold": 0.3,
"include_metadata": true
}
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Search query text |
top_k |
integer | No | 5 | Number of results to return (1-20) |
threshold |
float | No | 0.3 | Minimum similarity threshold (0.0-1.0) |
include_metadata |
boolean | No | true | Include document metadata |
Response
{
"status": "success",
"query": "What is the remote work policy?",
"results_count": 3,
"embedding_model": "intfloat/multilingual-e5-large",
"embedding_dimensions": 1024,
"results": [
{
"chunk_id": "remote_work_policy_chunk_2",
"content": "Employees may work remotely up to 3 days per week with manager approval. Remote work arrangements must be documented and reviewed quarterly.",
"similarity_score": 0.87,
"metadata": {
"source_file": "remote_work_policy.md",
"chunk_index": 2,
"category": "HR",
"word_count": 95,
"created_at": "2025-10-25T10:30:00Z"
}
},
{
"chunk_id": "remote_work_policy_chunk_1",
"content": "Remote work eligibility requires completion of probationary period and manager approval. New employees must work on-site for first 90 days.",
"similarity_score": 0.82,
"metadata": {
"source_file": "remote_work_policy.md",
"chunk_index": 1,
"category": "HR",
"word_count": 88,
"created_at": "2025-10-25T10:30:00Z"
}
}
],
"search_time_ms": 234,
"vector_store_size": 98
}
Document Processing
POST /process-documents
Process and embed policy documents using HuggingFace services (automatically run on startup).
Request
POST /process-documents
Content-Type: application/json
{
"force_reprocess": false,
"batch_size": 10
}
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
force_reprocess |
boolean | No | false | Force reprocessing even if documents exist |
batch_size |
integer | No | 10 | Number of documents to process per batch |
Response
{
"status": "success",
"processing_details": {
"files_processed": 22,
"chunks_generated": 98,
"embeddings_created": 98,
"processing_time_seconds": 18.7
},
"embedding_service": {
"model": "intfloat/multilingual-e5-large",
"dimensions": 1024,
"api_status": "operational"
},
"vector_store": {
"type": "huggingface_dataset",
"dataset_name": "policy-vectors",
"total_embeddings": 98,
"storage_size_mb": 2.4
},
"corpus_statistics": {
"total_words": 10637,
"average_chunk_size": 95,
"documents_by_category": {
"HR": 8,
"Finance": 4,
"Security": 3,
"Operations": 4,
"EHS": 3
}
},
"quality_metrics": {
"embedding_generation_success_rate": 1.0,
"average_embedding_time_ms": 450,
"metadata_completeness": 1.0
}
}
Health Check
GET /health
Comprehensive system health check including all HuggingFace services.
Request
GET /health
Response
{
"status": "healthy",
"timestamp": "2025-10-25T10:30:00Z",
"services": {
"hf_embedding_api": "operational",
"hf_inference_api": "operational",
"hf_dataset_store": "operational"
},
"service_details": {
"embedding_api": {
"model": "intfloat/multilingual-e5-large",
"last_request_ms": 450,
"requests_today": 247,
"error_rate": 0.02
},
"inference_api": {
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"last_request_ms": 2340,
"requests_today": 89,
"error_rate": 0.01
},
"dataset_store": {
"dataset_name": "policy-vectors",
"total_embeddings": 98,
"last_updated": "2025-10-25T09:15:00Z",
"access_status": "operational"
}
},
"configuration": {
"use_openai_embedding": false,
"hf_token_configured": true,
"embedding_model": "intfloat/multilingual-e5-large",
"embedding_dimensions": 1024,
"deployment_platform": "huggingface_spaces"
},
"statistics": {
"total_documents": 98,
"total_queries_processed": 1247,
"average_response_time_ms": 2140,
"vector_store_size": 98,
"uptime_hours": 72.5
},
"performance": {
"memory_usage_mb": 156,
"cpu_usage_percent": 12,
"disk_usage_mb": 45,
"cache_hit_rate": 0.78
}
}
System Information
GET /
Welcome page with system information and capabilities.
Response
{
"message": "Welcome to PolicyWise - HuggingFace Edition",
"version": "2.0.0-hf",
"description": "Corporate policy RAG system powered by HuggingFace free-tier services",
"capabilities": [
"Policy question answering with citations",
"Semantic document search",
"Automatic document processing",
"Multilingual embedding support",
"Real-time health monitoring"
],
"services": {
"embedding": "HuggingFace Inference API (intfloat/multilingual-e5-large)",
"llm": "HuggingFace Inference API (meta-llama/Meta-Llama-3-8B-Instruct)",
"vector_store": "HuggingFace Dataset",
"deployment": "HuggingFace Spaces"
},
"api_endpoints": {
"chat": "POST /chat",
"search": "POST /search",
"process": "POST /process-documents",
"health": "GET /health"
},
"documentation": {
"api_docs": "/docs/api",
"technical_architecture": "/docs/architecture",
"deployment_guide": "/docs/deployment"
},
"policy_corpus": {
"total_documents": 22,
"total_chunks": 98,
"categories": ["HR", "Finance", "Security", "Operations", "EHS"],
"last_updated": "2025-10-25T09:15:00Z"
}
}
Error Handling
HTTP Status Codes
| Code | Status | Description |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid request parameters |
| 413 | Payload Too Large | Request body too large |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Server error |
| 503 | Service Unavailable | HuggingFace API unavailable |
Error Response Format
{
"status": "error",
"error": "Error type",
"message": "Human-readable error description",
"error_code": "MACHINE_READABLE_CODE",
"timestamp": "2025-10-25T10:30:00Z",
"request_id": "req_abc123",
"suggestions": [
"Check your request parameters",
"Retry with smaller payload"
]
}
Common Error Codes
| Error Code | Description | Solution |
|---|---|---|
MESSAGE_TOO_LONG |
Message exceeds character limit | Reduce message length |
INVALID_PARAMETERS |
Invalid request parameters | Check parameter types and ranges |
HF_API_UNAVAILABLE |
HuggingFace API temporarily unavailable | Retry after delay |
RATE_LIMIT_EXCEEDED |
Too many requests | Wait before retrying |
EMBEDDING_FAILED |
Embedding generation failed | Check input text format |
SEARCH_FAILED |
Vector search failed | Verify query parameters |
DATASET_UNAVAILABLE |
HuggingFace Dataset inaccessible | Check dataset permissions |
Rate Limiting
HuggingFace Free Tier Limits
- Inference API: 1000 requests/hour per model
- Dataset API: 100 requests/hour
- Embedding API: 1000 requests/hour
Application Rate Limiting
- Chat API: 60 requests/minute per IP
- Search API: 120 requests/minute per IP
- Processing API: 10 requests/hour per IP
Rate Limit Headers
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200
X-RateLimit-Window: 60
SDK and Integration Examples
Python SDK Example
import requests
import json
class PolicyWiseClient:
def __init__(self, base_url="http://localhost:5000"):
self.base_url = base_url
def ask_question(self, question, max_tokens=500):
"""Ask a policy question"""
response = requests.post(
f"{self.base_url}/chat",
json={
"message": question,
"max_tokens": max_tokens,
"include_sources": True
}
)
return response.json()
def search_policies(self, query, top_k=5):
"""Search policy documents"""
response = requests.post(
f"{self.base_url}/search",
json={
"query": query,
"top_k": top_k,
"threshold": 0.3
}
)
return response.json()
def check_health(self):
"""Check system health"""
response = requests.get(f"{self.base_url}/health")
return response.json()
# Usage
client = PolicyWiseClient("https://your-space.hf.space")
# Ask a question
result = client.ask_question("What is the PTO policy?")
print(f"Response: {result['response']}")
print(f"Sources: {[s['filename'] for s in result['sources']]}")
# Search documents
search_results = client.search_policies("remote work")
for result in search_results['results']:
print(f"Found: {result['content'][:100]}...")
JavaScript/Node.js Example
class PolicyWiseClient {
constructor(baseUrl = 'http://localhost:5000') {
this.baseUrl = baseUrl;
}
async askQuestion(question, maxTokens = 500) {
const response = await fetch(`${this.baseUrl}/chat`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: question,
max_tokens: maxTokens,
include_sources: true
})
});
return await response.json();
}
async searchPolicies(query, topK = 5) {
const response = await fetch(`${this.baseUrl}/search`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: query,
top_k: topK,
threshold: 0.3
})
});
return await response.json();
}
async checkHealth() {
const response = await fetch(`${this.baseUrl}/health`);
return await response.json();
}
}
// Usage
const client = new PolicyWiseClient('https://your-space.hf.space');
// Ask a question
client.askQuestion('What are the expense policies?')
.then(result => {
console.log('Response:', result.response);
console.log('Sources:', result.sources.map(s => s.filename));
});
cURL Examples
# Ask a policy question
curl -X POST https://your-space.hf.space/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What is the remote work policy?",
"max_tokens": 500,
"include_sources": true
}'
# Search policy documents
curl -X POST https://your-space.hf.space/search \
-H "Content-Type: application/json" \
-d '{
"query": "expense reimbursement",
"top_k": 3,
"threshold": 0.4
}'
# Check system health
curl https://your-space.hf.space/health
# Process documents (admin operation)
curl -X POST https://your-space.hf.space/process-documents \
-H "Content-Type: application/json" \
-d '{
"force_reprocess": false,
"batch_size": 10
}'
Performance Guidelines
Optimization Tips
- Batch Requests: Group multiple questions for better throughput
- Cache Results: Cache frequently asked questions
- Optimize Queries: Use specific, focused questions for better results
- Monitor Usage: Track API usage to stay within rate limits
Expected Performance
| Operation | Average Time | Throughput |
|---|---|---|
| Chat (with sources) | 2-3 seconds | 20-30 req/min |
| Search only | 200-500ms | 60-80 req/min |
| Health check | <100ms | 200+ req/min |
| Document processing | 15-20 seconds | 1 req/hour |
Monitoring
Monitor these metrics for optimal performance:
- Response time percentiles (p50, p95, p99)
- Error rates by endpoint
- HuggingFace API response times
- Vector store query performance
- Memory and CPU usage
This API documentation provides everything needed to integrate with the PolicyWise HuggingFace-powered RAG system!