# TEXT-AUTH API Documentation
## Overview
The TEXT-AUTH API provides evidence-based text forensics and statistical consistency assessment through a RESTful interface. This document covers all endpoints, request/response formats, authentication, rate limiting, and integration examples.
**API Version:** 1.0.0
---
## Table of Contents
1. [Authentication & Security](#authentication--security)
2. [Rate Limiting](#rate-limiting)
3. [Common Response Format](#common-response-format)
4. [Error Handling](#error-handling)
5. [Core Endpoints](#core-endpoints)
- [Text Analysis](#text-analysis)
- [File Analysis](#file-analysis)
- [Batch Analysis](#batch-analysis)
6. [Report Endpoints](#report-endpoints)
7. [Utility Endpoints](#utility-endpoints)
8. [Best Practices](#best-practices)
---
## Authentication & Security
### API Key Authentication
*Authentication is not enforced in the current deployment. API key authentication may be added in future versions.*
## Rate Limiting
*Rate limiting is not enforced at the application level. Deployments should use an external gateway (NGINX, API Gateway, Cloudflare) to enforce rate limits if required.*
---
## Common Response Format
All successful responses follow this structure:
```json
{
"status": "success",
"analysis_id": "...",
"detection_result": {...},
"highlighted_html": "...",
"reasoning": {...},
"processing_time": 2.34,
"timestamp": "..."
}
```
### HTTP Status Codes
| Code | Meaning | Description |
|------|---------|-------------|
| 200 | OK | Request succeeded |
| 201 | Created | Resource created successfully |
| 400 | Bad Request | Invalid request parameters |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server error |
| 503 | Service Unavailable | Service temporarily unavailable |
---
## Error Handling
### Error Response Format
```json
{
"status": "error",
"error": "Invalid domain...",
"timestamp": "..."
}
```
### Common Error Codes
| Code | Description | Resolution |
|------|-------------|------------|
| `TEXT_TOO_LONG` | Text exceeds maximum length (50,000 chars) | Split into multiple requests |
| `FILE_TOO_LARGE` | File exceeds size limit | Compress or split file |
| `UNSUPPORTED_FORMAT` | File format not supported | Use .txt, .pdf, .docx, .doc, or .md |
| `EXTRACTION_FAILED` | Document text extraction failed | Ensure file is not corrupted or password-protected |
| `MODEL_UNAVAILABLE` | Required model temporarily unavailable | Retry after a few minutes |
---
## Core Endpoints
### Text Analysis
**Endpoint:** `POST /api/analyze`
Analyze raw text for statistical consistency patterns and forensic signals.
#### Request
**Headers:**
```http
Content-Type: application/json
```
**Body:**
```json
{
"text": "Your text content here...",
"domain": "academic",
"enable_highlighting": true,
"skip_expensive_metrics": false,
"use_sentence_level": true,
"include_metrics_summary": true,
"generate_report": false
}
```
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `text` | string | **Yes** | - | Text to analyze (50-50,000 chars) |
| `domain` | string | No | `null` (auto-detect) | Content domain (see [Domains](#supported-domains)) |
| `enable_highlighting` | boolean | No | `true` | Generate sentence-level highlights |
| `skip_expensive_metrics` | boolean | No | `false` | Skip computationally expensive metrics for faster results |
| `use_sentence_level` | boolean | No | `true` | Use sentence-level granularity for highlighting |
| `include_metrics_summary` | boolean | No | `true` | Include metric summaries in highlights |
| `generate_report` | boolean | No | `false` | Generate downloadable PDF/JSON report |
#### Response
```json
{
"status": "success",
"analysis_id": "analysis_1735555800000",
"detection_result": {
"ensemble_result": {
"final_verdict": "Synthetic",
"overall_confidence": 0.89,
"synthetic_probability": 0.92,
"authentic_probability": 0.08,
"uncertainty_score": 0.23,
"decision_boundary_distance": 0.42
},
"metric_results": {
"perplexity": {
"synthetic_probability": 0.94,
"confidence": 0.91,
"raw_score": 15.23,
"evidence_strength": "strong"
},
"entropy": {
"synthetic_probability": 0.88,
"confidence": 0.85,
"raw_score": 4.67,
"evidence_strength": "moderate"
},
"structural": {
"synthetic_probability": 0.91,
"confidence": 0.87,
"burstiness": -0.12,
"uniformity": 0.85,
"evidence_strength": "strong"
},
"linguistic": {
"synthetic_probability": 0.86,
"confidence": 0.82,
"pos_diversity": 0.42,
"mean_tree_depth": 4.2,
"evidence_strength": "moderate"
},
"semantic": {
"synthetic_probability": 0.93,
"confidence": 0.88,
"coherence_mean": 0.91,
"coherence_variance": 0.03,
"evidence_strength": "strong"
},
"multi_perturbation_stability": {
"synthetic_probability": 0.89,
"confidence": 0.84,
"stability_score": 0.12,
"evidence_strength": "moderate"
}
},
"domain_prediction": {
"primary_domain": "academic",
"confidence": 0.94,
"alternative_domains": [
{"domain": "technical_doc", "probability": 0.23},
{"domain": "science", "probability": 0.18}
]
},
"processed_text": {
"word_count": 487,
"sentence_count": 23,
"paragraph_count": 5,
"avg_sentence_length": 21.2,
"language": "en"
}
},
"highlighted_html": "
...
",
"reasoning": {
"summary": "The text exhibits strong statistical consistency patterns typical of language model generation...",
"key_indicators": [
"Unusually uniform sentence structure (burstiness: -0.12)",
"High semantic coherence across all sentences (mean: 0.91)",
"Low perplexity variance indicating predictable token sequences"
],
"confidence_factors": {
"supporting_evidence": [
"6/6 metrics indicate synthetic patterns",
"Strong cross-metric agreement (correlation: 0.87)"
],
"uncertainty_sources": [
"Domain-specific terminology may affect baseline expectations"
]
},
"metric_contributions": {
"perplexity": 0.28,
"entropy": 0.19,
"structural": 0.16,
"semantic": 0.17,
"linguistic": 0.12,
"multi_perturbation_stability": 0.08
}
},
"report_files": null,
"processing_time": 2.34,
"timestamp": "2025-12-30T10:30:00Z"
}
```
#### Verdict Interpretation
| Verdict | Probability Range | Interpretation |
|---------|-------------------|----------------|
| **Synthetic** | > 0.70 | High consistency with language model generation patterns |
| **Likely Synthetic** | 0.55 - 0.70 | Moderate consistency with synthetic patterns |
| **Inconclusive** | 0.45 - 0.55 | Insufficient evidence for confident assessment |
| **Likely Authentic** | 0.30 - 0.45 | Moderate consistency with human authorship patterns |
| **Authentic** | < 0.30 | High consistency with human authorship patterns |
**Important:** These verdicts represent statistical consistency assessments, not definitive authorship claims.
#### Highlighting Color Key
| Color | Meaning | Probability Range |
|-------|---------|-------------------|
| 🔴 Red | Strong synthetic signals | > 0.80 |
| 🟠Orange | Moderate synthetic signals | 0.60 - 0.80 |
| 🟡 Yellow | Weak signals | 0.40 - 0.60 |
| 🟢 Green | Authentic signals | < 0.40 |
---
### File Analysis
**Endpoint:** `POST /api/analyze/file`
Analyze uploaded documents (PDF, DOCX, DOC, TXT, MD).
#### Request
**Headers:**
```http
Content-Type: multipart/form-data
```
**Body (form-data):**
```
file: [binary file data]
domain: "academic"
skip_expensive_metrics: false
use_sentence_level: true
include_metrics_summary: true
generate_report: false
```
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `file` | file | **Yes** | - | Document file (max 25MB) |
| `domain` | string | No | `null` | Content domain override |
| `skip_expensive_metrics` | boolean | No | `false` | Skip expensive metrics |
| `use_sentence_level` | boolean | No | `true` | Sentence-level highlighting |
| `include_metrics_summary` | boolean | No | `true` | Include metric summaries |
| `generate_report` | boolean | No | `false` | Generate report |
#### Supported File Formats
| Format | Extensions | Max Size | Notes |
|--------|-----------|----------|-------|
| Plain Text | .txt, .md | 25MB | UTF-8 encoding recommended |
| PDF | .pdf | 25MB | Text-based PDFs; OCR not supported |
| Word | .docx, .doc | 25MB | Modern and legacy formats |
#### Response
Same structure as [Text Analysis](#text-analysis) with additional `file_info`:
```json
{
"status": "success",
"analysis_id": "file_1735555800000",
"file_info": {
"filename": "research_paper.pdf",
"file_type": ".pdf",
"pages": 12,
"extraction_method": "pdfplumber",
"highlighted_html": true
},
"detection_result": { /* same as text analysis */ },
"highlighted_html": "...",
"reasoning": { /* same as text analysis */ },
"processing_time": 4.12,
"timestamp": "2025-12-30T10:30:00Z"
}
```
#### cURL Example
```bash
curl -X POST https://your-domain.com/api/analyze/file \
-F "file=@/path/to/document.pdf" \
-F "domain=academic" \
-F "generate_report=true"
```
---
### Batch Analysis
**Endpoint:** `POST /api/analyze/batch`
Analyze multiple texts in a single request for efficiency.
#### Request
```json
{
"texts": [
"First text to analyze...",
"Second text to analyze...",
"Third text to analyze..."
],
"domain": "academic",
"skip_expensive_metrics": true,
"generate_reports": false
}
```
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `texts` | array[string] | **Yes** | - | 1-100 texts to analyze |
| `domain` | string | No | `null` | Apply same domain to all texts |
| `skip_expensive_metrics` | boolean | No | `true` | Skip expensive metrics (recommended for batch) |
| `generate_reports` | boolean | No | `false` | Generate reports for each text |
#### Response
```json
{
"status": "success",
"batch_id": "batch_1735555800000",
"total": 3,
"successful": 3,
"failed": 0,
"results": [
{
"index": 0,
"status": "success",
"detection": {
"ensemble_result": { /* ... */ },
"metric_results": { /* ... */ }
},
"reasoning": { /* ... */ },
"report_files": null
},
{
"index": 1,
"status": "success",
"detection": { /* ... */ }
},
{
"index": 2,
"status": "error",
"error": "Text too short (minimum 50 characters)"
}
],
"processing_time": 8.92,
"timestamp": "2025-12-30T10:30:00Z"
}
```
#### Performance Tips
- Set `skip_expensive_metrics: true` for faster batch processing
- Keep batch size under 50 texts for optimal performance
- Consider parallel API calls for batches > 100 texts
- Monitor `processing_time` to adjust batch sizes
---
## Report Endpoints
### Generate Report
**Endpoint:** `POST /api/report/generate`
Generate detailed PDF/JSON reports for cached analyses.
#### Request
**Headers:**
```http
Content-Type: application/x-www-form-urlencoded
```
**Body:**
```
analysis_id=analysis_1735555800000
formats=json,pdf
include_highlights=true
```
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `analysis_id` | string | **Yes** | - | Analysis ID from previous request |
| `formats` | string | No | `"json,pdf"` | Comma-separated formats |
| `include_highlights` | boolean | No | `true` | Include sentence highlights in report |
#### Response
```json
{
"status": "success",
"analysis_id": "analysis_1735555800000",
"reports": {
"json": "analysis_1735555800000.json",
"pdf": "analysis_1735555800000.pdf"
},
"timestamp": "2025-12-30T10:30:00Z"
}
```
### Download Report
**Endpoint:** `GET /api/report/download/{filename}`
Download a generated report file.
#### Request
```http
GET /api/report/download/analysis_1735555800000.pdf
```
#### Response
Binary file download with appropriate `Content-Type` header.
**Headers:**
```http
Content-Type: application/pdf
Content-Disposition: attachment; filename="analysis_1735555800000.pdf"
Content-Length: 524288
```
---
## Utility Endpoints
### Health Check
**Endpoint:** `GET /health`
Check API health and model availability.
#### Response
```json
{
"status": "healthy",
"version": "1.0.0",
"uptime": 86400.5,
"models_loaded": {
"orchestrator": true,
"highlighter": true,
"reporter": true,
"reasoning_generator": true,
"document_extractor": true,
"analysis_cache": true,
"parallel_executor": true
}
}
```
### List Domains
**Endpoint:** `GET /api/domains`
Get all supported content domains with descriptions.
#### Response
```json
{
"domains": [
{
"value": "general",
"name": "General",
"description": "General-purpose text without domain-specific structure"
},
{
"value": "academic",
"name": "Academic",
"description": "Academic papers, essays, research"
},
{
"value": "creative",
"name": "Creative",
"description": "Creative writing, fiction, poetry"
},
{
"value": "technical_doc",
"name": "Technical Doc",
"description": "Technical documentation, manuals, specs"
}
// ... 12 more domains
]
}
```
### Supported Domains
| Domain | Use Cases | Threshold Adjustments |
|--------|-----------|----------------------|
| `general` | Default fallback | Balanced weights |
| `academic` | Research papers, essays | Higher linguistic weight |
| `creative` | Fiction, poetry | Higher entropy/structural |
| `ai_ml` | ML papers, technical AI content | Semantic prioritized |
| `software_dev` | Code docs, READMEs | Structural relaxed |
| `technical_doc` | Manuals, specs | Higher semantic weight |
| `engineering` | Technical reports | Balanced technical focus |
| `science` | Scientific papers | Academic-like calibration |
| `business` | Reports, proposals | Formal structure emphasis |
| `legal` | Contracts, court filings | Strict structural patterns |
| `medical` | Clinical notes, research | Domain-specific terminology |
| `journalism` | News articles | Balanced, lower burstiness |
| `marketing` | Ad copy, campaigns | Creative elements |
| `social_media` | Posts, casual writing | Relaxed metrics, high perplexity weight |
| `blog_personal` | Personal blogs, diaries | Creative + casual mix |
| `tutorial` | How-to guides | Instructional patterns |
### Cache Statistics
**Endpoint:** `GET /api/cache/stats`
Get analysis cache statistics (admin only).
#### Response
```json
{
"cache_size": 42,
"max_size": 100,
"ttl_seconds": 3600
}
```
### Clear Cache
**Endpoint:** `POST /api/cache/clear`
Clear analysis cache (admin only).
#### Response
```json
{
"status": "success",
"message": "Cache cleared"
}
```
---
## Best Practices
### Optimization Tips
1. **Domain Selection**
- Always specify domain when known for better accuracy
- Use `/api/domains` to explore available options
- Let system auto-detect only when domain is truly unknown
2. **Performance**
- Set `skip_expensive_metrics: true` for faster results when speed matters
- Use batch API for multiple texts instead of sequential single requests
- Cache `analysis_id` to regenerate reports without reanalysis
3. **Accuracy**
- Provide clean, well-formatted text (remove excessive whitespace)
- Minimum 100 words recommended for reliable results
- Avoid mixing languages in single analysis
4. **Rate Limiting**
- Implement exponential backoff on 429 responses
- Monitor `X-RateLimit-Remaining` header
- Upgrade tier if consistently hitting limits
5. **Error Handling**
- Always check `status` field in response
- Log `request_id` for support requests
- Implement retry logic with jitter for transient errors
### Security Recommendations
1. **API Key Management**
- Rotate keys every 90 days
- Use separate keys for dev/staging/production
- Revoke compromised keys immediately
2. **Data Privacy**
- Never send PII unless absolutely necessary
- Use client-side redaction before API calls
- Enable data retention policies in dashboard
3. **Input Validation**
- Sanitize user input before sending to API
- Validate file types client-side
- Implement size limits before upload
---
## Version History:
- **1.0.0** (2025-12-30): Initial release
- 6 forensic metrics
- 16 domain support
- PDF/JSON reporting
- Batch processing
---
## Appendix
### Complete Domain List with Aliases
```python
DOMAIN_ALIASES = {
'general': ['default', 'generic'],
'academic': ['education', 'research', 'scholarly', 'university'],
'creative': ['fiction', 'literature', 'story', 'narrative'],
'ai_ml': ['ai', 'ml', 'machinelearning', 'neural'],
'software_dev': ['software', 'code', 'programming', 'dev'],
'technical_doc': ['technical', 'tech', 'documentation', 'manual'],
'engineering': ['engineer'],
'science': ['scientific'],
'business': ['corporate', 'commercial', 'enterprise'],
'legal': ['law', 'contract', 'court'],
'medical': ['healthcare', 'clinical', 'medicine', 'health'],
'journalism': ['news', 'reporting', 'media', 'press'],
'marketing': ['advertising', 'promotional', 'brand', 'sales'],
'social_media': ['social', 'casual', 'informal', 'posts'],
'blog_personal': ['blog', 'personal', 'diary', 'lifestyle'],
'tutorial': ['guide', 'howto', 'instructional', 'walkthrough']
}
```
### Metric Weight Defaults
```python
DEFAULT_WEIGHTS = {
'perplexity': 0.25,
'entropy': 0.20,
'structural': 0.15,
'semantic': 0.15,
'linguistic': 0.15,
'multi_perturbation_stability': 0.10
}
```
### Response Time Estimates
| Operation | Min | Avg | Max | P95 |
|-----------|-----|-----|-----|-----|
| Text Analysis (500 words) | 1.2s | 2.3s | 4.5s | 3.8s |
| File Analysis (PDF, 10 pages) | 2.5s | 4.1s | 8.2s | 6.9s |
| Batch (10 texts) | 5.8s | 9.2s | 15.3s | 13.1s |
| Report Generation | 0.3s | 0.8s | 2.1s | 1.5s |
---
*Last Updated: December 30, 2025*
*API Version: 1.0.0*
*Documentation Version: 1.0.0*