# Quick Start Guide 🚀

## Local Development (5 minutes)

### 1. Install System Dependencies

**Ubuntu/Debian:**
```bash
sudo apt-get update
sudo apt-get install -y tesseract-ocr poppler-utils
```

**macOS:**
```bash
brew install tesseract poppler
```

**Windows:**
- Download Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Download Poppler: https://github.com/oschwartz10612/poppler-windows/releases

### 2. Install Python Dependencies

```bash
pip install -r requirements.txt
```

### 3. Run the Server

```bash
python main.py
```

The API will be available at: `http://localhost:7860`

### 4. Test with cURL

```bash
# Health check
curl http://localhost:7860/health

# Redact a PDF
curl -X POST "http://localhost:7860/redact" \
  -F "file=@your_document.pdf" \
  -F "dpi=300"
```

### 5. Access API Documentation

Open in browser: `http://localhost:7860/docs`

## Using Docker (3 minutes)

### 1. Build Image

```bash
docker build -t pdf-redaction-api .
```

### 2. Run Container

```bash
docker run -p 7860:7860 pdf-redaction-api
```

### 3. Test

```bash
curl http://localhost:7860/health
```

## Deploy to HuggingFace Spaces (10 minutes)

### 1. Create Space

1. Go to https://huggingface.co/spaces
2. Click "Create new Space"
3. Name: `pdf-redaction-api`
4. SDK: **Docker**
5. Click "Create Space"

### 2. Push Code

```bash
# Clone your space
git clone https://huggingface.co/spaces/YOUR_USERNAME/pdf-redaction-api
cd pdf-redaction-api

# Copy all project files
cp -r /path/to/project/* .

# Commit and push
git add .
git commit -m "Initial deployment"
git push
```

### 3. Wait for Build

Monitor at: `https://huggingface.co/spaces/YOUR_USERNAME/pdf-redaction-api`

### 4. Test Your Deployed API

```bash
curl https://YOUR_USERNAME-pdf-redaction-api.hf.space/health
```

## Example Usage

### Python Client

```python
import requests

# Upload and redact
files = {"file": open("document.pdf", "rb")}
response = requests.post(
    "http://localhost:7860/redact",
    files=files,
    params={"dpi": 300}
)

result = response.json()
job_id = result["job_id"]

# Download redacted PDF
redacted = requests.get(f"http://localhost:7860/download/{job_id}")
with open("redacted.pdf", "wb") as f:
    f.write(redacted.content)

print(f"Redacted {len(result['entities'])} entities")
```

### JavaScript/Node.js

```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function redactPDF() {
  const form = new FormData();
  form.append('file', fs.createReadStream('document.pdf'));
  
  // Upload and redact
  const response = await axios.post(
    'http://localhost:7860/redact',
    form,
    {
      headers: form.getHeaders(),
      params: { dpi: 300 }
    }
  );
  
  const { job_id } = response.data;
  
  // Download redacted PDF
  const redacted = await axios.get(
    `http://localhost:7860/download/${job_id}`,
    { responseType: 'arraybuffer' }
  );
  
  fs.writeFileSync('redacted.pdf', redacted.data);
  console.log('Redaction complete!');
}

redactPDF();
```

### cURL Advanced

```bash
# Redact only specific entity types
curl -X POST "http://localhost:7860/redact" \
  -F "file=@document.pdf" \
  -F "dpi=300" \
  -F "entity_types=PER,ORG"

# Get statistics
curl http://localhost:7860/stats

# Download specific file
curl -O -J http://localhost:7860/download/JOB_ID_HERE
```

## Common Use Cases

### 1. Redact All Personal Information

```python
response = requests.post(
    "http://localhost:7860/redact",
    files={"file": open("resume.pdf", "rb")},
    params={"dpi": 300}
)
```

### 2. Redact Only Names and Organizations

```python
response = requests.post(
    "http://localhost:7860/redact",
    files={"file": open("contract.pdf", "rb")},
    params={
        "dpi": 300,
        "entity_types": "PER,ORG"
    }
)
```

### 3. Fast Processing (Lower Quality)

```python
response = requests.post(
    "http://localhost:7860/redact",
    files={"file": open("large_doc.pdf", "rb")},
    params={"dpi": 150}  # Faster but less accurate
)
```

### 4. High Quality (Slower)

```python
response = requests.post(
    "http://localhost:7860/redact",
    files={"file": open("important.pdf", "rb")},
    params={"dpi": 600}  # Best quality, slowest
)
```

## Troubleshooting

### "Model not loaded"
**Problem**: NER model failed to load  
**Solution**: Check internet connection, wait for model download

### "Tesseract not found"
**Problem**: OCR engine not installed  
**Solution**: Install tesseract-ocr system package

### "Poppler not found"
**Problem**: PDF converter not installed  
**Solution**: Install poppler-utils system package

### Slow processing
**Problem**: Redaction takes too long  
**Solution**: Lower DPI to 150-200

### Out of memory
**Problem**: Large PDF crashes the API  
**Solution**: 
- Process one page at a time
- Increase container memory
- Lower DPI

## Next Steps

- ✅ Read full [README.md](README.md) for API details
- ✅ Check [DEPLOYMENT.md](DEPLOYMENT.md) for production setup
- ✅ Review [STRUCTURE.md](STRUCTURE.md) for code organization
- ✅ Run tests: `pytest tests/`
- ✅ Add authentication for production use
- ✅ Set up monitoring and logging

## Support

- 📖 API Docs: `http://localhost:7860/docs`
- 🐛 Issues: Create on your repository
- 💬 HuggingFace: Community forums

Happy redacting! 🔒