Spaces:

Yeetek
/

anthropic-topic-segmentation

Runtime error

App Files Files Community

anthropic-topic-segmentation / README.md

Yeetek

Upload 14 files

8d0493c verified 10 months ago

preview code

raw

history blame contribute delete

7.03 kB

metadata

title: Anthropic Topic Segmentation Microservice
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: docker
app_file: gradio_app.py
pinned: false
license: mit

🎯 Anthropic Topic Segmentation Microservice

A production-ready microservice that uses Anthropic's Claude models for intelligent topic segmentation and business insight extraction from interview transcripts. Perfect for processing Czech e-commerce conversations and integrating with n8n workflows.

🚀 Live Demo on HuggingFace Spaces

Try the API directly: https://huggingface.co/spaces/Yeetek/anthropic-topic-segmentation

✨ Key Features

🤖 Anthropic Integration: Uses Claude-3.5-Sonnet for superior language understanding
🌍 Multi-Language: Perfect Czech, Slovak, and English processing
🏢 Business Intelligence: 11 specialized business categories
📊 Large Transcripts: Handles up to 1,500 sentences with sliding window processing
🔄 n8n Compatible: RESTful API with dynamic prompt injection
🎯 High Accuracy: 90%+ confidence scores with actionable insights
🐳 Docker Ready: Optimized for HuggingFace Spaces deployment

🎯 Perfect for E-commerce Analysis

Successfully processes Czech Shoptet integration discussions, extracting:

Client Requirements (B2B/B2C differentiated)
Technical Barriers and implementation challenges
Solution Benefits and "aha moments"
Performance Issues and optimization needs

🚀 Quick Start

Option 1: HuggingFace Spaces (Recommended)

Fork this Space or create a new one

Add your Anthropic API key to Spaces secrets:

ANTHROPIC_API_KEY = sk-ant-api03-your-key-here

The Space will automatically build and deploy!

Option 2: Local Docker

# Clone the repository
git clone https://huggingface.co/spaces/Yeetek/anthropic-topic-segmentation
cd anthropic-topic-segmentation

# Create .env file
cp env.example .env
# Edit .env with your Anthropic API key

# Build and run
docker build -t anthropic-topic-segmentation .
docker run -p 7860:7860 --env-file .env anthropic-topic-segmentation

Option 3: Python Development

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

# Run the server
uvicorn app:app --host 0.0.0.0 --port 7860

📡 API Usage

Health Check

curl -X POST https://yeetek-anthropic-topic-segmentation.hf.space/api/health \
  -H "Content-Type: application/json" \
  -d '{"data": []}'

Topic Extraction

curl -X POST https://yeetek-anthropic-topic-segmentation.hf.space/api/segment \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "[{\"text\": \"Zákazníci požadují nestandardní úpravy košíku v Shoptetu.\", \"speaker\": \"Client\", \"start_time\": 2.01, \"end_time\": 8.45, \"sentence_index\": 1}]",
      "customer_call",
      "cs", 
      "E-commerce"
    ]
  }'

Interactive Documentation

Gradio Interface: https://yeetek-anthropic-topic-segmentation.hf.space
API Documentation: Use the "📖 API Reference" tab in the interface

🔧 n8n Integration

Perfect for workflow automation:

{
  "workflow_name": "Czech E-commerce Analysis",
  "http_request": {
    "method": "POST",
         "url": "https://yeetek-anthropic-topic-segmentation.hf.space/api/segment",
    "body": {
      "sentences": "{{ $json.transcript }}",
      "prompt_config": {
        "template": "customer_call",
        "language": "cs"
      }
    }
  }
}

📊 Business Categories

The system extracts topics into 11 specialized categories:

🎯 client_needs_b2b - B2B client requirements
🛒 client_needs_b2c - B2C customer needs
🚧 solution_barriers - Implementation obstacles
⚙️ technical_requirements - Technical specifications
💬 customer_feedback - Customer opinions
👥 employee_feedback - Internal insights
✅ solution_benefits - Positive outcomes
💡 aha_moments - Key breakthroughs
🏢 company_info - Business context
📝 additional_comments - Miscellaneous insights
🔄 general - Fallback category

🌍 Language Support

Czech: Perfect diacritics and business terminology ✅
Slovak: Similar processing to Czech ✅
English: Full business context understanding ✅

📈 Performance Benchmarks

Small Transcripts (10 sentences): ~5-8 seconds
Medium Transcripts (100 sentences): ~15-30 seconds
Large Transcripts (640+ sentences): ~30-60 seconds
Confidence: 90%+ average accuracy
Memory: Efficient processing with minimal footprint

🔒 Security & Validation

✅ Input validation (max 1,500 sentences)
✅ Prompt injection protection
✅ Safety scoring and content filtering
✅ Rate limiting and error handling
✅ CORS configuration for web integration

🛠 Configuration

Environment Variables

# Essential Configuration
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
ENVIRONMENT=production
LOG_LEVEL=INFO
PORT=7860

# Optional Configuration  
CORS_ORIGINS=*
MAX_SENTENCES=1500
REQUEST_TIMEOUT=300

Model Options

claude-3-5-sonnet-20241022 (Recommended - Best quality)
claude-3-5-haiku-20241022 (Faster, cost-effective)

📚 Examples

Check the examples/ directory for:

sample_request.json - Czech e-commerce transcript
sample_response.json - Expected API response format

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Issues: Report bugs or request features
Documentation: Check /docs endpoint for interactive API docs
Health Check: Monitor service status at /health

🏆 Success Stories

✅ 640-sentence Czech transcripts processed successfully
✅ Shoptet e-commerce integration insights extracted
✅ 90% confidence business intelligence
✅ n8n workflow automation ready
✅ Production deployment on HuggingFace Spaces

Built with ❤️ using Anthropic Claude 3.5, FastAPI, and optimized for Czech e-commerce use cases.