Spaces:

empirenexus
/

TranscriptWriting

Sleeping

Variable	Value	Description
`DEBUG_MODE`	`True` or `False`	Enable detailed logging
`LLM_TEMPERATURE`	`0.7`	Model creativity (0.0-1.0)
`LLM_TIMEOUT`	`120`	Timeout in seconds
`LOCAL_MODEL`	`microsoft/Phi-3-mini-4k-instruct`	Model to use

Note: All settings have sensible defaults - you don't need to set these unless you want to customize.

Hardware Requirements

Recommended: GPU (T4 or better)

Phi-3-mini-4k-instruct: 3.8B params, ~8GB GPU RAM
Processing speed: ~30-60 seconds per transcript chunk
Best for: Production use with multiple users

Alternative: CPU (not recommended)

Will work but be very slow (5-10 minutes per chunk)
Only suitable for testing

Supported Models

You can change the model by setting the LOCAL_MODEL variable:

Small & Fast (Recommended for Free Tier)

LOCAL_MODEL=microsoft/Phi-3-mini-4k-instruct  (Default - 3.8B params)

Medium (Better quality, needs more GPU)

LOCAL_MODEL=mistralai/Mistral-7B-Instruct-v0.3  (7B params)

Alternatives

LOCAL_MODEL=HuggingFaceH4/zephyr-7b-beta       (7B params, good instruction following)
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B params, very fast but lower quality)

Configuration Files

✅ Required Files

app.py - Main application
requirements.txt - Python dependencies
llm.py, extractors.py, etc. - Core modules

⚠️ NOT Needed for Spaces

.env file - Use Spaces Variables instead
Local database files
API keys (unless using external APIs)

Environment Configuration

The app automatically detects if it's running on HuggingFace Spaces and uses local model inference by default.

Default Configuration (no .env needed):

USE_HF_API = False        # Don't use HF Inference API
USE_LMSTUDIO = False      # Don't use LM Studio
LLM_BACKEND = local       # Use local transformers
DEBUG_MODE = False        # Disable debug logs

To override: Set Spaces Variables (Settings → Variables)

Troubleshooting

Issue: "Out of Memory" Error

Solution: Switch to a smaller model

LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0

Issue: Very Slow Processing

Solution:

Make sure you selected GPU hardware (not CPU)
Check Space logs for "Model loaded on cuda" confirmation
If on CPU, upgrade to GPU tier

Issue: Quality Score 0.00

Causes:

Model not loaded properly (check logs for "[Local Model] Loading...")
GPU out of memory (model falls back to CPU)
Timeout too short (increase LLM_TIMEOUT)

Debug Steps:

Set DEBUG_MODE=True in Spaces Variables
Check logs for detailed error messages
Look for "[Local Model] ✅ Generated X characters"

Issue: Model Downloads Every Time

Solution: HuggingFace Spaces caches models automatically, but first load takes 2-5 minutes.

Subsequent starts are faster (~30 seconds)
Don't restart Space unnecessarily

Performance Optimization

1. Reduce Context Window

Edit llm.py line 399:

max_length=2000  # Reduce from 3500 for faster processing

2. Lower Token Limit

Set Spaces Variable:

MAX_TOKENS_PER_REQUEST=800  # Default is 1500

3. Use Smaller Model

LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0

4. Disable Debug Mode

DEBUG_MODE=False

Monitoring

View Logs

Go to your Space
Click Logs tab at the top
Look for startup messages:

✅ Configuration loaded for HuggingFace Spaces
🚀 TranscriptorAI Enterprise - LLM Backend: local
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
[Local Model] ✅ Model loaded on cuda:0

Check Processing

During analysis, you should see:

[Local Model] Generating (1500 max tokens, temp=0.7)...
[Local Model] ✅ Generated 1247 characters
[LLM Debug] ✅ Successfully extracted JSON with 7 fields

Cost Estimation

Free Tier (CPU)

⚠️ Very slow but free
~5-10 minutes per transcript

GPU (T4) - ~$0.60/hour

⚡ Fast processing
~30-60 seconds per transcript
Space sleeps after inactivity (saves money)

Persistent GPU (Upgraded)

Always-on for instant access
Higher cost but best user experience

Security Notes

No API Keys Needed: Everything runs locally
Private Processing: Data never leaves your Space
Secrets Management: Use Spaces Secrets (not Variables) for sensitive data
Model Access: Phi-3 and most models don't require gated access

Next Steps

✅ Upload code to your Space
✅ Select GPU hardware
✅ Wait for first model download (~2-5 min)
✅ Test with a sample transcript
🎉 Share your Space URL!

Support

HuggingFace Spaces Docs: https://huggingface.co/docs/hub/spaces
Transformers Docs: https://huggingface.co/docs/transformers
GPU Pricing: https://huggingface.co/pricing

Last Updated: October 2025