Spaces:

msse-team-3
/

ai-engineering-project

Sleeping

App Files Files Community

ai-engineering-project / docs /API_DOCUMENTATION.md

GitHub Action

Clean deployment without binary files

f884e6e 2 months ago

preview code

raw

history blame contribute delete

15.4 kB

	# API Documentation - HuggingFace Edition

	## Overview

	PolicyWise provides a RESTful API for corporate policy question-answering using HuggingFace free-tier services. All endpoints return JSON responses and support CORS for web integration.

	## Base URL

	- Local Development: `http://localhost:5000`
	- HuggingFace Spaces: `https://your-username-policywise-rag.hf.space`

	## Authentication

	No authentication required for public deployment. For production use, consider implementing API key authentication.

	## Core Endpoints

	### Chat Endpoint (Primary Interface)

	POST /chat

	Ask questions about company policies and receive intelligent responses with automatic source citations.

	#### Request

	```http
	POST /chat
	Content-Type: application/json

	{
	"message": "What is the remote work policy for new employees?",
	"max_tokens": 500,
	"include_sources": true,
	"guardrails_level": "standard"
	}
	```

	#### Parameters

	\| Parameter \| Type \| Required \| Default \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|
	\| `message` \| string \| Yes \| - \| User question about company policies \|
	\| `max_tokens` \| integer \| No \| 500 \| Maximum response length (100-1000) \|
	\| `include_sources` \| boolean \| No \| true \| Include source document details \|
	\| `guardrails_level` \| string \| No \| "standard" \| Safety level: "strict", "standard", "relaxed" \|

	#### Response

	```json
	{
	"status": "success",
	"message": "What is the remote work policy for new employees?",
	"response": "New employees are eligible for remote work after completing their initial 90-day onboarding period. During this period, they must work from the office to facilitate mentoring and team integration. After the probationary period, employees can work remotely up to 3 days per week, subject to manager approval and role requirements. [Source: remote_work_policy.md] [Source: employee_handbook.md]",
	"confidence": 0.91,
	"sources": [
	{
	"filename": "remote_work_policy.md",
	"chunk_id": "remote_work_policy_chunk_3",
	"relevance_score": 0.89,
	"content_preview": "New employees must complete a 90-day onboarding period..."
	},
	{
	"filename": "employee_handbook.md",
	"chunk_id": "employee_handbook_chunk_7",
	"relevance_score": 0.76,
	"content_preview": "Remote work eligibility requirements include..."
	}
	],
	"response_time_ms": 2340,
	"guardrails": {
	"safety_score": 0.98,
	"quality_score": 0.91,
	"citation_count": 2
	},
	"services_used": {
	"embedding_model": "intfloat/multilingual-e5-large",
	"llm_model": "meta-llama/Meta-Llama-3-8B-Instruct",
	"vector_store": "huggingface_dataset"
	}
	}
	```

	#### Error Response

	```json
	{
	"status": "error",
	"error": "Request too long",
	"message": "Message exceeds maximum character limit of 5000",
	"error_code": "MESSAGE_TOO_LONG"
	}
	```

	### Search Endpoint

	POST /search

	Perform semantic search across policy documents using HuggingFace embeddings.

	#### Request

	```http
	POST /search
	Content-Type: application/json

	{
	"query": "What is the remote work policy?",
	"top_k": 5,
	"threshold": 0.3,
	"include_metadata": true
	}
	```

	#### Parameters

	\| Parameter \| Type \| Required \| Default \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|
	\| `query` \| string \| Yes \| - \| Search query text \|
	\| `top_k` \| integer \| No \| 5 \| Number of results to return (1-20) \|
	\| `threshold` \| float \| No \| 0.3 \| Minimum similarity threshold (0.0-1.0) \|
	\| `include_metadata` \| boolean \| No \| true \| Include document metadata \|

	#### Response

	```json
	{
	"status": "success",
	"query": "What is the remote work policy?",
	"results_count": 3,
	"embedding_model": "intfloat/multilingual-e5-large",
	"embedding_dimensions": 1024,
	"results": [
	{
	"chunk_id": "remote_work_policy_chunk_2",
	"content": "Employees may work remotely up to 3 days per week with manager approval. Remote work arrangements must be documented and reviewed quarterly.",
	"similarity_score": 0.87,
	"metadata": {
	"source_file": "remote_work_policy.md",
	"chunk_index": 2,
	"category": "HR",
	"word_count": 95,
	"created_at": "2025-10-25T10:30:00Z"
	}
	},
	{
	"chunk_id": "remote_work_policy_chunk_1",
	"content": "Remote work eligibility requires completion of probationary period and manager approval. New employees must work on-site for first 90 days.",
	"similarity_score": 0.82,
	"metadata": {
	"source_file": "remote_work_policy.md",
	"chunk_index": 1,
	"category": "HR",
	"word_count": 88,
	"created_at": "2025-10-25T10:30:00Z"
	}
	}
	],
	"search_time_ms": 234,
	"vector_store_size": 98
	}
	```

	### Document Processing

	POST /process-documents

	Process and embed policy documents using HuggingFace services (automatically run on startup).

	#### Request

	```http
	POST /process-documents
	Content-Type: application/json

	{
	"force_reprocess": false,
	"batch_size": 10
	}
	```

	#### Parameters

	\| Parameter \| Type \| Required \| Default \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|
	\| `force_reprocess` \| boolean \| No \| false \| Force reprocessing even if documents exist \|
	\| `batch_size` \| integer \| No \| 10 \| Number of documents to process per batch \|

	#### Response

	```json
	{
	"status": "success",
	"processing_details": {
	"files_processed": 22,
	"chunks_generated": 98,
	"embeddings_created": 98,
	"processing_time_seconds": 18.7
	},
	"embedding_service": {
	"model": "intfloat/multilingual-e5-large",
	"dimensions": 1024,
	"api_status": "operational"
	},
	"vector_store": {
	"type": "huggingface_dataset",
	"dataset_name": "policy-vectors",
	"total_embeddings": 98,
	"storage_size_mb": 2.4
	},
	"corpus_statistics": {
	"total_words": 10637,
	"average_chunk_size": 95,
	"documents_by_category": {
	"HR": 8,
	"Finance": 4,
	"Security": 3,
	"Operations": 4,
	"EHS": 3
	}
	},
	"quality_metrics": {
	"embedding_generation_success_rate": 1.0,
	"average_embedding_time_ms": 450,
	"metadata_completeness": 1.0
	}
	}
	```

	### Health Check

	GET /health

	Comprehensive system health check including all HuggingFace services.

	#### Request

	```http
	GET /health
	```

	#### Response

	```json
	{
	"status": "healthy",
	"timestamp": "2025-10-25T10:30:00Z",
	"services": {
	"hf_embedding_api": "operational",
	"hf_inference_api": "operational",
	"hf_dataset_store": "operational"
	},
	"service_details": {
	"embedding_api": {
	"model": "intfloat/multilingual-e5-large",
	"last_request_ms": 450,
	"requests_today": 247,
	"error_rate": 0.02
	},
	"inference_api": {
	"model": "meta-llama/Meta-Llama-3-8B-Instruct",
	"last_request_ms": 2340,
	"requests_today": 89,
	"error_rate": 0.01
	},
	"dataset_store": {
	"dataset_name": "policy-vectors",
	"total_embeddings": 98,
	"last_updated": "2025-10-25T09:15:00Z",
	"access_status": "operational"
	}
	},
	"configuration": {
	"use_openai_embedding": false,
	"hf_token_configured": true,
	"embedding_model": "intfloat/multilingual-e5-large",
	"embedding_dimensions": 1024,
	"deployment_platform": "huggingface_spaces"
	},
	"statistics": {
	"total_documents": 98,
	"total_queries_processed": 1247,
	"average_response_time_ms": 2140,
	"vector_store_size": 98,
	"uptime_hours": 72.5
	},
	"performance": {
	"memory_usage_mb": 156,
	"cpu_usage_percent": 12,
	"disk_usage_mb": 45,
	"cache_hit_rate": 0.78
	}
	}
	```

	### System Information

	GET /

	Welcome page with system information and capabilities.

	#### Response

	```json
	{
	"message": "Welcome to PolicyWise - HuggingFace Edition",
	"version": "2.0.0-hf",
	"description": "Corporate policy RAG system powered by HuggingFace free-tier services",
	"capabilities": [
	"Policy question answering with citations",
	"Semantic document search",
	"Automatic document processing",
	"Multilingual embedding support",
	"Real-time health monitoring"
	],
	"services": {
	"embedding": "HuggingFace Inference API (intfloat/multilingual-e5-large)",
	"llm": "HuggingFace Inference API (meta-llama/Meta-Llama-3-8B-Instruct)",
	"vector_store": "HuggingFace Dataset",
	"deployment": "HuggingFace Spaces"
	},
	"api_endpoints": {
	"chat": "POST /chat",
	"search": "POST /search",
	"process": "POST /process-documents",
	"health": "GET /health"
	},
	"documentation": {
	"api_docs": "/docs/api",
	"technical_architecture": "/docs/architecture",
	"deployment_guide": "/docs/deployment"
	},
	"policy_corpus": {
	"total_documents": 22,
	"total_chunks": 98,
	"categories": ["HR", "Finance", "Security", "Operations", "EHS"],
	"last_updated": "2025-10-25T09:15:00Z"
	}
	}
	```

	## Error Handling

	### HTTP Status Codes

	\| Code \| Status \| Description \|
	\|------\|--------\|-------------\|
	\| 200 \| OK \| Request successful \|
	\| 400 \| Bad Request \| Invalid request parameters \|
	\| 413 \| Payload Too Large \| Request body too large \|
	\| 429 \| Too Many Requests \| Rate limit exceeded \|
	\| 500 \| Internal Server Error \| Server error \|
	\| 503 \| Service Unavailable \| HuggingFace API unavailable \|

	### Error Response Format

	```json
	{
	"status": "error",
	"error": "Error type",
	"message": "Human-readable error description",
	"error_code": "MACHINE_READABLE_CODE",
	"timestamp": "2025-10-25T10:30:00Z",
	"request_id": "req_abc123",
	"suggestions": [
	"Check your request parameters",
	"Retry with smaller payload"
	]
	}
	```

	### Common Error Codes

	\| Error Code \| Description \| Solution \|
	\|------------\|-------------\|----------\|
	\| `MESSAGE_TOO_LONG` \| Message exceeds character limit \| Reduce message length \|
	\| `INVALID_PARAMETERS` \| Invalid request parameters \| Check parameter types and ranges \|
	\| `HF_API_UNAVAILABLE` \| HuggingFace API temporarily unavailable \| Retry after delay \|
	\| `RATE_LIMIT_EXCEEDED` \| Too many requests \| Wait before retrying \|
	\| `EMBEDDING_FAILED` \| Embedding generation failed \| Check input text format \|
	\| `SEARCH_FAILED` \| Vector search failed \| Verify query parameters \|
	\| `DATASET_UNAVAILABLE` \| HuggingFace Dataset inaccessible \| Check dataset permissions \|

	## Rate Limiting

	### HuggingFace Free Tier Limits

	- Inference API: 1000 requests/hour per model
	- Dataset API: 100 requests/hour
	- Embedding API: 1000 requests/hour

	### Application Rate Limiting

	- Chat API: 60 requests/minute per IP
	- Search API: 120 requests/minute per IP
	- Processing API: 10 requests/hour per IP

	### Rate Limit Headers

	```http
	X-RateLimit-Limit: 60
	X-RateLimit-Remaining: 45
	X-RateLimit-Reset: 1640995200
	X-RateLimit-Window: 60
	```

	## SDK and Integration Examples

	### Python SDK Example

	```python
	import requests
	import json

	class PolicyWiseClient:
	def __init__(self, base_url="http://localhost:5000"):
	self.base_url = base_url

	def ask_question(self, question, max_tokens=500):
	"""Ask a policy question"""
	response = requests.post(
	f"{self.base_url}/chat",
	json={
	"message": question,
	"max_tokens": max_tokens,
	"include_sources": True
	}
	)
	return response.json()

	def search_policies(self, query, top_k=5):
	"""Search policy documents"""
	response = requests.post(
	f"{self.base_url}/search",
	json={
	"query": query,
	"top_k": top_k,
	"threshold": 0.3
	}
	)
	return response.json()

	def check_health(self):
	"""Check system health"""
	response = requests.get(f"{self.base_url}/health")
	return response.json()

	# Usage
	client = PolicyWiseClient("https://your-space.hf.space")

	# Ask a question
	result = client.ask_question("What is the PTO policy?")
	print(f"Response: {result['response']}")
	print(f"Sources: {[s['filename'] for s in result['sources']]}")

	# Search documents
	search_results = client.search_policies("remote work")
	for result in search_results['results']:
	print(f"Found: {result['content'][:100]}...")
	```

	### JavaScript/Node.js Example

	```javascript
	class PolicyWiseClient {
	constructor(baseUrl = 'http://localhost:5000') {
	this.baseUrl = baseUrl;
	}

	async askQuestion(question, maxTokens = 500) {
	const response = await fetch(`${this.baseUrl}/chat`, {
	method: 'POST',
	headers: {
	'Content-Type': 'application/json',
	},
	body: JSON.stringify({
	message: question,
	max_tokens: maxTokens,
	include_sources: true
	})
	});
	return await response.json();
	}

	async searchPolicies(query, topK = 5) {
	const response = await fetch(`${this.baseUrl}/search`, {
	method: 'POST',
	headers: {
	'Content-Type': 'application/json',
	},
	body: JSON.stringify({
	query: query,
	top_k: topK,
	threshold: 0.3
	})
	});
	return await response.json();
	}

	async checkHealth() {
	const response = await fetch(`${this.baseUrl}/health`);
	return await response.json();
	}
	}

	// Usage
	const client = new PolicyWiseClient('https://your-space.hf.space');

	// Ask a question
	client.askQuestion('What are the expense policies?')
	.then(result => {
	console.log('Response:', result.response);
	console.log('Sources:', result.sources.map(s => s.filename));
	});
	```

	### cURL Examples

	```bash
	# Ask a policy question
	curl -X POST https://your-space.hf.space/chat \
	-H "Content-Type: application/json" \
	-d '{
	"message": "What is the remote work policy?",
	"max_tokens": 500,
	"include_sources": true
	}'

	# Search policy documents
	curl -X POST https://your-space.hf.space/search \
	-H "Content-Type: application/json" \
	-d '{
	"query": "expense reimbursement",
	"top_k": 3,
	"threshold": 0.4
	}'

	# Check system health
	curl https://your-space.hf.space/health

	# Process documents (admin operation)
	curl -X POST https://your-space.hf.space/process-documents \
	-H "Content-Type: application/json" \
	-d '{
	"force_reprocess": false,
	"batch_size": 10
	}'
	```

	## Performance Guidelines

	### Optimization Tips

	1. Batch Requests: Group multiple questions for better throughput
	2. Cache Results: Cache frequently asked questions
	3. Optimize Queries: Use specific, focused questions for better results
	4. Monitor Usage: Track API usage to stay within rate limits

	### Expected Performance

	\| Operation \| Average Time \| Throughput \|
	\|-----------\|--------------\|------------\|
	\| Chat (with sources) \| 2-3 seconds \| 20-30 req/min \|
	\| Search only \| 200-500ms \| 60-80 req/min \|
	\| Health check \| <100ms \| 200+ req/min \|
	\| Document processing \| 15-20 seconds \| 1 req/hour \|

	### Monitoring

	Monitor these metrics for optimal performance:

	- Response time percentiles (p50, p95, p99)
	- Error rates by endpoint
	- HuggingFace API response times
	- Vector store query performance
	- Memory and CPU usage

	This API documentation provides everything needed to integrate with the PolicyWise HuggingFace-powered RAG system!