Spaces:

vimalk78
/

abc123

Sleeping

App Files Files Community

vimalk78 commited on Aug 18

Commit

5a66ce1

1 Parent(s): befd225

chore(config): add config.md to backend-py

Browse files

Signed-off-by: Vimal Kumar <vimal78@gmail.com>

Files changed (1) hide show

crossword-app/backend-py/CONFIG.md +124 -0

crossword-app/backend-py/CONFIG.md ADDED Viewed

	@@ -0,0 +1,124 @@

+# Environment Configuration for Hugging Face Spaces
+This document lists all environment variables needed for the crossword generator backend when deployed on Hugging Face Spaces.
+## Required Variables
+### Core Application Settings
+```env
+NODE_ENV=production
+PORT=7860
+PYTHONPATH=/app/backend-py
+PYTHONUNBUFFERED=1
+```
+### AI/ML Model Configuration
+```env
+EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
+WORD_SIMILARITY_THRESHOLD=0.55
+USE_AI_WORDS=true
+FALLBACK_TO_STATIC=true
+USE_HIERARCHICAL_SEARCH=true
+```
+## Optional Variables (with defaults)
+### Performance & Caching
+```env
+MAX_CACHED_WORDS=150
+SEARCH_RANDOMNESS=0.02
+FAISS_CACHE_DIR=/tmp/faiss_cache
+```
+### Word Variety & Quality Control
+```env
+MAX_USED_WORDS_MEMORY=50
+EXCLUDED_WORDS=WORD,THING,STUFF,GENERIC
+```
+### Advanced Configuration
+```env
+MAX_RESULTS=40
+MIN_SIMILARITY_THRESHOLD=0.45
+WORD_CACHE_DIR=/tmp/word_cache
+```
+## Variable Explanations
+### **WORD_SIMILARITY_THRESHOLD** (Default: 0.55)
+- Controls semantic similarity requirement for AI-generated words
+- Range: 0.3-0.7 (higher = stricter quality, fewer words)
+- System uses adaptive thresholds if insufficient words found
+### **USE_HIERARCHICAL_SEARCH** (Default: true)
+- Enables advanced semantic search with topic variations and subcategories
+- Significantly improves word diversity and topic coverage
+- Set to `false` to use simpler single-search approach
+### **MAX_USED_WORDS_MEMORY** (Default: 50)
+- Number of previously used words to remember per topic
+- Prevents repetition across multiple puzzle generations
+- Higher values = better variety but more memory usage
+### **EXCLUDED_WORDS** (Optional)
+- Comma-separated list of words to never include in puzzles
+- Blocks overly generic or inappropriate terms
+- Example: `WORD,THING,STUFF,DATA,INFO`
+### **FALLBACK_TO_STATIC** (Default: true)
+- Falls back to static word lists if AI generation fails
+- Ensures puzzle generation always succeeds
+- Recommended to keep as `true` for production reliability
+## Recommended HF Spaces Configuration
+**Minimal Setup (Core functionality):**
+```env
+NODE_ENV=production
+PORT=7860
+PYTHONPATH=/app/backend-py
+PYTHONUNBUFFERED=1
+EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
+WORD_SIMILARITY_THRESHOLD=0.55
+USE_AI_WORDS=true
+FALLBACK_TO_STATIC=true
+```
+**Optimized Setup (Better performance & variety):**
+```env
+NODE_ENV=production
+PORT=7860
+PYTHONPATH=/app/backend-py
+PYTHONUNBUFFERED=1
+EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
+WORD_SIMILARITY_THRESHOLD=0.55
+USE_AI_WORDS=true
+FALLBACK_TO_STATIC=true
+USE_HIERARCHICAL_SEARCH=true
+MAX_USED_WORDS_MEMORY=50
+MAX_CACHED_WORDS=150
+SEARCH_RANDOMNESS=0.02
+```
+## Performance Notes
+- **Startup Time**: ~30-60 seconds with AI models, ~2 seconds without
+- **Memory Usage**: ~500MB-1GB with AI, ~100MB without
+- **First Request**: May take longer due to model initialization
+- **FAISS Cache**: Speeds up subsequent startups significantly
+## Troubleshooting
+**If puzzle generation fails:**
+1. Check `WORD_SIMILARITY_THRESHOLD` (try lowering to 0.5 or 0.45)
+2. Ensure `FALLBACK_TO_STATIC=true`
+3. Monitor logs for "Not enough words" errors
+**If words seem too generic:**
+1. Raise `WORD_SIMILARITY_THRESHOLD` to 0.6 or 0.65
+2. Add problematic words to `EXCLUDED_WORDS`
+3. Enable `USE_HIERARCHICAL_SEARCH=true`
+**If startup is too slow:**
+1. FAISS index caching should help after first run
+2. Consider smaller embedding model for faster startup (trade-off with quality)