| # TEXT-AUTH API Documentation | |
| ## Overview | |
| The TEXT-AUTH API provides evidence-based text forensics and statistical consistency assessment through a RESTful interface. This document covers all endpoints, request/response formats, authentication, rate limiting, and integration examples. | |
| **API Version:** 1.0.0 | |
| --- | |
| ## Table of Contents | |
| 1. [Authentication & Security](#authentication--security) | |
| 2. [Rate Limiting](#rate-limiting) | |
| 3. [Common Response Format](#common-response-format) | |
| 4. [Error Handling](#error-handling) | |
| 5. [Core Endpoints](#core-endpoints) | |
| - [Text Analysis](#text-analysis) | |
| - [File Analysis](#file-analysis) | |
| - [Batch Analysis](#batch-analysis) | |
| 6. [Report Endpoints](#report-endpoints) | |
| 7. [Utility Endpoints](#utility-endpoints) | |
| 8. [Best Practices](#best-practices) | |
| --- | |
| ## Authentication & Security | |
| ### API Key Authentication | |
| *Authentication is not enforced in the current deployment. API key authentication may be added in future versions.* | |
| ## Rate Limiting | |
| *Rate limiting is not enforced at the application level. Deployments should use an external gateway (NGINX, API Gateway, Cloudflare) to enforce rate limits if required.* | |
| --- | |
| ## Common Response Format | |
| All successful responses follow this structure: | |
| ```json | |
| { | |
| "status": "success", | |
| "analysis_id": "...", | |
| "detection_result": {...}, | |
| "highlighted_html": "...", | |
| "reasoning": {...}, | |
| "processing_time": 2.34, | |
| "timestamp": "..." | |
| } | |
| ``` | |
| ### HTTP Status Codes | |
| | Code | Meaning | Description | | |
| |------|---------|-------------| | |
| | 200 | OK | Request succeeded | | |
| | 201 | Created | Resource created successfully | | |
| | 400 | Bad Request | Invalid request parameters | | |
| | 404 | Not Found | Resource not found | | |
| | 500 | Internal Server Error | Server error | | |
| | 503 | Service Unavailable | Service temporarily unavailable | | |
| --- | |
| ## Error Handling | |
| ### Error Response Format | |
| ```json | |
| { | |
| "status": "error", | |
| "error": "Invalid domain...", | |
| "timestamp": "..." | |
| } | |
| ``` | |
| ### Common Error Codes | |
| | Code | Description | Resolution | | |
| |------|-------------|------------| | |
| | `TEXT_TOO_LONG` | Text exceeds maximum length (50,000 chars) | Split into multiple requests | | |
| | `FILE_TOO_LARGE` | File exceeds size limit | Compress or split file | | |
| | `UNSUPPORTED_FORMAT` | File format not supported | Use .txt, .pdf, .docx, .doc, or .md | | |
| | `EXTRACTION_FAILED` | Document text extraction failed | Ensure file is not corrupted or password-protected | | |
| | `MODEL_UNAVAILABLE` | Required model temporarily unavailable | Retry after a few minutes | | |
| --- | |
| ## Core Endpoints | |
| ### Text Analysis | |
| **Endpoint:** `POST /api/analyze` | |
| Analyze raw text for statistical consistency patterns and forensic signals. | |
| #### Request | |
| **Headers:** | |
| ```http | |
| Content-Type: application/json | |
| ``` | |
| **Body:** | |
| ```json | |
| { | |
| "text": "Your text content here...", | |
| "domain": "academic", | |
| "enable_highlighting": true, | |
| "skip_expensive_metrics": false, | |
| "use_sentence_level": true, | |
| "include_metrics_summary": true, | |
| "generate_report": false | |
| } | |
| ``` | |
| **Parameters:** | |
| | Parameter | Type | Required | Default | Description | | |
| |-----------|------|----------|---------|-------------| | |
| | `text` | string | **Yes** | - | Text to analyze (50-50,000 chars) | | |
| | `domain` | string | No | `null` (auto-detect) | Content domain (see [Domains](#supported-domains)) | | |
| | `enable_highlighting` | boolean | No | `true` | Generate sentence-level highlights | | |
| | `skip_expensive_metrics` | boolean | No | `false` | Skip computationally expensive metrics for faster results | | |
| | `use_sentence_level` | boolean | No | `true` | Use sentence-level granularity for highlighting | | |
| | `include_metrics_summary` | boolean | No | `true` | Include metric summaries in highlights | | |
| | `generate_report` | boolean | No | `false` | Generate downloadable PDF/JSON report | | |
| #### Response | |
| ```json | |
| { | |
| "status": "success", | |
| "analysis_id": "analysis_1735555800000", | |
| "detection_result": { | |
| "ensemble_result": { | |
| "final_verdict": "Synthetic", | |
| "overall_confidence": 0.89, | |
| "synthetic_probability": 0.92, | |
| "authentic_probability": 0.08, | |
| "uncertainty_score": 0.23, | |
| "decision_boundary_distance": 0.42 | |
| }, | |
| "metric_results": { | |
| "perplexity": { | |
| "synthetic_probability": 0.94, | |
| "confidence": 0.91, | |
| "raw_score": 15.23, | |
| "evidence_strength": "strong" | |
| }, | |
| "entropy": { | |
| "synthetic_probability": 0.88, | |
| "confidence": 0.85, | |
| "raw_score": 4.67, | |
| "evidence_strength": "moderate" | |
| }, | |
| "structural": { | |
| "synthetic_probability": 0.91, | |
| "confidence": 0.87, | |
| "burstiness": -0.12, | |
| "uniformity": 0.85, | |
| "evidence_strength": "strong" | |
| }, | |
| "linguistic": { | |
| "synthetic_probability": 0.86, | |
| "confidence": 0.82, | |
| "pos_diversity": 0.42, | |
| "mean_tree_depth": 4.2, | |
| "evidence_strength": "moderate" | |
| }, | |
| "semantic": { | |
| "synthetic_probability": 0.93, | |
| "confidence": 0.88, | |
| "coherence_mean": 0.91, | |
| "coherence_variance": 0.03, | |
| "evidence_strength": "strong" | |
| }, | |
| "multi_perturbation_stability": { | |
| "synthetic_probability": 0.89, | |
| "confidence": 0.84, | |
| "stability_score": 0.12, | |
| "evidence_strength": "moderate" | |
| } | |
| }, | |
| "domain_prediction": { | |
| "primary_domain": "academic", | |
| "confidence": 0.94, | |
| "alternative_domains": [ | |
| {"domain": "technical_doc", "probability": 0.23}, | |
| {"domain": "science", "probability": 0.18} | |
| ] | |
| }, | |
| "processed_text": { | |
| "word_count": 487, | |
| "sentence_count": 23, | |
| "paragraph_count": 5, | |
| "avg_sentence_length": 21.2, | |
| "language": "en" | |
| } | |
| }, | |
| "highlighted_html": "<div class=\"text-forensics-highlight\">...</div>", | |
| "reasoning": { | |
| "summary": "The text exhibits strong statistical consistency patterns typical of language model generation...", | |
| "key_indicators": [ | |
| "Unusually uniform sentence structure (burstiness: -0.12)", | |
| "High semantic coherence across all sentences (mean: 0.91)", | |
| "Low perplexity variance indicating predictable token sequences" | |
| ], | |
| "confidence_factors": { | |
| "supporting_evidence": [ | |
| "6/6 metrics indicate synthetic patterns", | |
| "Strong cross-metric agreement (correlation: 0.87)" | |
| ], | |
| "uncertainty_sources": [ | |
| "Domain-specific terminology may affect baseline expectations" | |
| ] | |
| }, | |
| "metric_contributions": { | |
| "perplexity": 0.28, | |
| "entropy": 0.19, | |
| "structural": 0.16, | |
| "semantic": 0.17, | |
| "linguistic": 0.12, | |
| "multi_perturbation_stability": 0.08 | |
| } | |
| }, | |
| "report_files": null, | |
| "processing_time": 2.34, | |
| "timestamp": "2025-12-30T10:30:00Z" | |
| } | |
| ``` | |
| #### Verdict Interpretation | |
| | Verdict | Probability Range | Interpretation | | |
| |---------|-------------------|----------------| | |
| | **Synthetic** | > 0.70 | High consistency with language model generation patterns | | |
| | **Likely Synthetic** | 0.55 - 0.70 | Moderate consistency with synthetic patterns | | |
| | **Inconclusive** | 0.45 - 0.55 | Insufficient evidence for confident assessment | | |
| | **Likely Authentic** | 0.30 - 0.45 | Moderate consistency with human authorship patterns | | |
| | **Authentic** | < 0.30 | High consistency with human authorship patterns | | |
| **Important:** These verdicts represent statistical consistency assessments, not definitive authorship claims. | |
| #### Highlighting Color Key | |
| | Color | Meaning | Probability Range | | |
| |-------|---------|-------------------| | |
| | 🔴 Red | Strong synthetic signals | > 0.80 | | |
| | 🟠 Orange | Moderate synthetic signals | 0.60 - 0.80 | | |
| | 🟡 Yellow | Weak signals | 0.40 - 0.60 | | |
| | 🟢 Green | Authentic signals | < 0.40 | | |
| --- | |
| ### File Analysis | |
| **Endpoint:** `POST /api/analyze/file` | |
| Analyze uploaded documents (PDF, DOCX, DOC, TXT, MD). | |
| #### Request | |
| **Headers:** | |
| ```http | |
| Content-Type: multipart/form-data | |
| ``` | |
| **Body (form-data):** | |
| ``` | |
| file: [binary file data] | |
| domain: "academic" | |
| skip_expensive_metrics: false | |
| use_sentence_level: true | |
| include_metrics_summary: true | |
| generate_report: false | |
| ``` | |
| **Parameters:** | |
| | Parameter | Type | Required | Default | Description | | |
| |-----------|------|----------|---------|-------------| | |
| | `file` | file | **Yes** | - | Document file (max 25MB) | | |
| | `domain` | string | No | `null` | Content domain override | | |
| | `skip_expensive_metrics` | boolean | No | `false` | Skip expensive metrics | | |
| | `use_sentence_level` | boolean | No | `true` | Sentence-level highlighting | | |
| | `include_metrics_summary` | boolean | No | `true` | Include metric summaries | | |
| | `generate_report` | boolean | No | `false` | Generate report | | |
| #### Supported File Formats | |
| | Format | Extensions | Max Size | Notes | | |
| |--------|-----------|----------|-------| | |
| | Plain Text | .txt, .md | 25MB | UTF-8 encoding recommended | | |
| | PDF | .pdf | 25MB | Text-based PDFs; OCR not supported | | |
| | Word | .docx, .doc | 25MB | Modern and legacy formats | | |
| #### Response | |
| Same structure as [Text Analysis](#text-analysis) with additional `file_info`: | |
| ```json | |
| { | |
| "status": "success", | |
| "analysis_id": "file_1735555800000", | |
| "file_info": { | |
| "filename": "research_paper.pdf", | |
| "file_type": ".pdf", | |
| "pages": 12, | |
| "extraction_method": "pdfplumber", | |
| "highlighted_html": true | |
| }, | |
| "detection_result": { /* same as text analysis */ }, | |
| "highlighted_html": "...", | |
| "reasoning": { /* same as text analysis */ }, | |
| "processing_time": 4.12, | |
| "timestamp": "2025-12-30T10:30:00Z" | |
| } | |
| ``` | |
| #### cURL Example | |
| ```bash | |
| curl -X POST https://your-domain.com/api/analyze/file \ | |
| -F "file=@/path/to/document.pdf" \ | |
| -F "domain=academic" \ | |
| -F "generate_report=true" | |
| ``` | |
| --- | |
| ### Batch Analysis | |
| **Endpoint:** `POST /api/analyze/batch` | |
| Analyze multiple texts in a single request for efficiency. | |
| #### Request | |
| ```json | |
| { | |
| "texts": [ | |
| "First text to analyze...", | |
| "Second text to analyze...", | |
| "Third text to analyze..." | |
| ], | |
| "domain": "academic", | |
| "skip_expensive_metrics": true, | |
| "generate_reports": false | |
| } | |
| ``` | |
| **Parameters:** | |
| | Parameter | Type | Required | Default | Description | | |
| |-----------|------|----------|---------|-------------| | |
| | `texts` | array[string] | **Yes** | - | 1-100 texts to analyze | | |
| | `domain` | string | No | `null` | Apply same domain to all texts | | |
| | `skip_expensive_metrics` | boolean | No | `true` | Skip expensive metrics (recommended for batch) | | |
| | `generate_reports` | boolean | No | `false` | Generate reports for each text | | |
| #### Response | |
| ```json | |
| { | |
| "status": "success", | |
| "batch_id": "batch_1735555800000", | |
| "total": 3, | |
| "successful": 3, | |
| "failed": 0, | |
| "results": [ | |
| { | |
| "index": 0, | |
| "status": "success", | |
| "detection": { | |
| "ensemble_result": { /* ... */ }, | |
| "metric_results": { /* ... */ } | |
| }, | |
| "reasoning": { /* ... */ }, | |
| "report_files": null | |
| }, | |
| { | |
| "index": 1, | |
| "status": "success", | |
| "detection": { /* ... */ } | |
| }, | |
| { | |
| "index": 2, | |
| "status": "error", | |
| "error": "Text too short (minimum 50 characters)" | |
| } | |
| ], | |
| "processing_time": 8.92, | |
| "timestamp": "2025-12-30T10:30:00Z" | |
| } | |
| ``` | |
| #### Performance Tips | |
| - Set `skip_expensive_metrics: true` for faster batch processing | |
| - Keep batch size under 50 texts for optimal performance | |
| - Consider parallel API calls for batches > 100 texts | |
| - Monitor `processing_time` to adjust batch sizes | |
| --- | |
| ## Report Endpoints | |
| ### Generate Report | |
| **Endpoint:** `POST /api/report/generate` | |
| Generate detailed PDF/JSON reports for cached analyses. | |
| #### Request | |
| **Headers:** | |
| ```http | |
| Content-Type: application/x-www-form-urlencoded | |
| ``` | |
| **Body:** | |
| ``` | |
| analysis_id=analysis_1735555800000 | |
| formats=json,pdf | |
| include_highlights=true | |
| ``` | |
| **Parameters:** | |
| | Parameter | Type | Required | Default | Description | | |
| |-----------|------|----------|---------|-------------| | |
| | `analysis_id` | string | **Yes** | - | Analysis ID from previous request | | |
| | `formats` | string | No | `"json,pdf"` | Comma-separated formats | | |
| | `include_highlights` | boolean | No | `true` | Include sentence highlights in report | | |
| #### Response | |
| ```json | |
| { | |
| "status": "success", | |
| "analysis_id": "analysis_1735555800000", | |
| "reports": { | |
| "json": "analysis_1735555800000.json", | |
| "pdf": "analysis_1735555800000.pdf" | |
| }, | |
| "timestamp": "2025-12-30T10:30:00Z" | |
| } | |
| ``` | |
| ### Download Report | |
| **Endpoint:** `GET /api/report/download/{filename}` | |
| Download a generated report file. | |
| #### Request | |
| ```http | |
| GET /api/report/download/analysis_1735555800000.pdf | |
| ``` | |
| #### Response | |
| Binary file download with appropriate `Content-Type` header. | |
| **Headers:** | |
| ```http | |
| Content-Type: application/pdf | |
| Content-Disposition: attachment; filename="analysis_1735555800000.pdf" | |
| Content-Length: 524288 | |
| ``` | |
| --- | |
| ## Utility Endpoints | |
| ### Health Check | |
| **Endpoint:** `GET /health` | |
| Check API health and model availability. | |
| #### Response | |
| ```json | |
| { | |
| "status": "healthy", | |
| "version": "1.0.0", | |
| "uptime": 86400.5, | |
| "models_loaded": { | |
| "orchestrator": true, | |
| "highlighter": true, | |
| "reporter": true, | |
| "reasoning_generator": true, | |
| "document_extractor": true, | |
| "analysis_cache": true, | |
| "parallel_executor": true | |
| } | |
| } | |
| ``` | |
| ### List Domains | |
| **Endpoint:** `GET /api/domains` | |
| Get all supported content domains with descriptions. | |
| #### Response | |
| ```json | |
| { | |
| "domains": [ | |
| { | |
| "value": "general", | |
| "name": "General", | |
| "description": "General-purpose text without domain-specific structure" | |
| }, | |
| { | |
| "value": "academic", | |
| "name": "Academic", | |
| "description": "Academic papers, essays, research" | |
| }, | |
| { | |
| "value": "creative", | |
| "name": "Creative", | |
| "description": "Creative writing, fiction, poetry" | |
| }, | |
| { | |
| "value": "technical_doc", | |
| "name": "Technical Doc", | |
| "description": "Technical documentation, manuals, specs" | |
| } | |
| // ... 12 more domains | |
| ] | |
| } | |
| ``` | |
| ### Supported Domains | |
| | Domain | Use Cases | Threshold Adjustments | | |
| |--------|-----------|----------------------| | |
| | `general` | Default fallback | Balanced weights | | |
| | `academic` | Research papers, essays | Higher linguistic weight | | |
| | `creative` | Fiction, poetry | Higher entropy/structural | | |
| | `ai_ml` | ML papers, technical AI content | Semantic prioritized | | |
| | `software_dev` | Code docs, READMEs | Structural relaxed | | |
| | `technical_doc` | Manuals, specs | Higher semantic weight | | |
| | `engineering` | Technical reports | Balanced technical focus | | |
| | `science` | Scientific papers | Academic-like calibration | | |
| | `business` | Reports, proposals | Formal structure emphasis | | |
| | `legal` | Contracts, court filings | Strict structural patterns | | |
| | `medical` | Clinical notes, research | Domain-specific terminology | | |
| | `journalism` | News articles | Balanced, lower burstiness | | |
| | `marketing` | Ad copy, campaigns | Creative elements | | |
| | `social_media` | Posts, casual writing | Relaxed metrics, high perplexity weight | | |
| | `blog_personal` | Personal blogs, diaries | Creative + casual mix | | |
| | `tutorial` | How-to guides | Instructional patterns | | |
| ### Cache Statistics | |
| **Endpoint:** `GET /api/cache/stats` | |
| Get analysis cache statistics (admin only). | |
| #### Response | |
| ```json | |
| { | |
| "cache_size": 42, | |
| "max_size": 100, | |
| "ttl_seconds": 3600 | |
| } | |
| ``` | |
| ### Clear Cache | |
| **Endpoint:** `POST /api/cache/clear` | |
| Clear analysis cache (admin only). | |
| #### Response | |
| ```json | |
| { | |
| "status": "success", | |
| "message": "Cache cleared" | |
| } | |
| ``` | |
| --- | |
| ## Best Practices | |
| ### Optimization Tips | |
| 1. **Domain Selection** | |
| - Always specify domain when known for better accuracy | |
| - Use `/api/domains` to explore available options | |
| - Let system auto-detect only when domain is truly unknown | |
| 2. **Performance** | |
| - Set `skip_expensive_metrics: true` for faster results when speed matters | |
| - Use batch API for multiple texts instead of sequential single requests | |
| - Cache `analysis_id` to regenerate reports without reanalysis | |
| 3. **Accuracy** | |
| - Provide clean, well-formatted text (remove excessive whitespace) | |
| - Minimum 100 words recommended for reliable results | |
| - Avoid mixing languages in single analysis | |
| 4. **Rate Limiting** | |
| - Implement exponential backoff on 429 responses | |
| - Monitor `X-RateLimit-Remaining` header | |
| - Upgrade tier if consistently hitting limits | |
| 5. **Error Handling** | |
| - Always check `status` field in response | |
| - Log `request_id` for support requests | |
| - Implement retry logic with jitter for transient errors | |
| ### Security Recommendations | |
| 1. **API Key Management** | |
| - Rotate keys every 90 days | |
| - Use separate keys for dev/staging/production | |
| - Revoke compromised keys immediately | |
| 2. **Data Privacy** | |
| - Never send PII unless absolutely necessary | |
| - Use client-side redaction before API calls | |
| - Enable data retention policies in dashboard | |
| 3. **Input Validation** | |
| - Sanitize user input before sending to API | |
| - Validate file types client-side | |
| - Implement size limits before upload | |
| --- | |
| ## Version History: | |
| - **1.0.0** (2025-12-30): Initial release | |
| - 6 forensic metrics | |
| - 16 domain support | |
| - PDF/JSON reporting | |
| - Batch processing | |
| --- | |
| ## Appendix | |
| ### Complete Domain List with Aliases | |
| ```python | |
| DOMAIN_ALIASES = { | |
| 'general': ['default', 'generic'], | |
| 'academic': ['education', 'research', 'scholarly', 'university'], | |
| 'creative': ['fiction', 'literature', 'story', 'narrative'], | |
| 'ai_ml': ['ai', 'ml', 'machinelearning', 'neural'], | |
| 'software_dev': ['software', 'code', 'programming', 'dev'], | |
| 'technical_doc': ['technical', 'tech', 'documentation', 'manual'], | |
| 'engineering': ['engineer'], | |
| 'science': ['scientific'], | |
| 'business': ['corporate', 'commercial', 'enterprise'], | |
| 'legal': ['law', 'contract', 'court'], | |
| 'medical': ['healthcare', 'clinical', 'medicine', 'health'], | |
| 'journalism': ['news', 'reporting', 'media', 'press'], | |
| 'marketing': ['advertising', 'promotional', 'brand', 'sales'], | |
| 'social_media': ['social', 'casual', 'informal', 'posts'], | |
| 'blog_personal': ['blog', 'personal', 'diary', 'lifestyle'], | |
| 'tutorial': ['guide', 'howto', 'instructional', 'walkthrough'] | |
| } | |
| ``` | |
| ### Metric Weight Defaults | |
| ```python | |
| DEFAULT_WEIGHTS = { | |
| 'perplexity': 0.25, | |
| 'entropy': 0.20, | |
| 'structural': 0.15, | |
| 'semantic': 0.15, | |
| 'linguistic': 0.15, | |
| 'multi_perturbation_stability': 0.10 | |
| } | |
| ``` | |
| ### Response Time Estimates | |
| | Operation | Min | Avg | Max | P95 | | |
| |-----------|-----|-----|-----|-----| | |
| | Text Analysis (500 words) | 1.2s | 2.3s | 4.5s | 3.8s | | |
| | File Analysis (PDF, 10 pages) | 2.5s | 4.1s | 8.2s | 6.9s | | |
| | Batch (10 texts) | 5.8s | 9.2s | 15.3s | 13.1s | | |
| | Report Generation | 0.3s | 0.8s | 2.1s | 1.5s | | |
| --- | |
| *Last Updated: December 30, 2025* | |
| *API Version: 1.0.0* | |
| *Documentation Version: 1.0.0* |