vimalk78 commited on
Commit
5a66ce1
·
1 Parent(s): befd225

chore(config): add config.md to backend-py

Browse files

Signed-off-by: Vimal Kumar <vimal78@gmail.com>

Files changed (1) hide show
  1. crossword-app/backend-py/CONFIG.md +124 -0
crossword-app/backend-py/CONFIG.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment Configuration for Hugging Face Spaces
2
+
3
+ This document lists all environment variables needed for the crossword generator backend when deployed on Hugging Face Spaces.
4
+
5
+ ## Required Variables
6
+
7
+ ### Core Application Settings
8
+ ```env
9
+ NODE_ENV=production
10
+ PORT=7860
11
+ PYTHONPATH=/app/backend-py
12
+ PYTHONUNBUFFERED=1
13
+ ```
14
+
15
+ ### AI/ML Model Configuration
16
+ ```env
17
+ EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
18
+ WORD_SIMILARITY_THRESHOLD=0.55
19
+ USE_AI_WORDS=true
20
+ FALLBACK_TO_STATIC=true
21
+ USE_HIERARCHICAL_SEARCH=true
22
+ ```
23
+
24
+ ## Optional Variables (with defaults)
25
+
26
+ ### Performance & Caching
27
+ ```env
28
+ MAX_CACHED_WORDS=150
29
+ SEARCH_RANDOMNESS=0.02
30
+ FAISS_CACHE_DIR=/tmp/faiss_cache
31
+ ```
32
+
33
+ ### Word Variety & Quality Control
34
+ ```env
35
+ MAX_USED_WORDS_MEMORY=50
36
+ EXCLUDED_WORDS=WORD,THING,STUFF,GENERIC
37
+ ```
38
+
39
+ ### Advanced Configuration
40
+ ```env
41
+ MAX_RESULTS=40
42
+ MIN_SIMILARITY_THRESHOLD=0.45
43
+ WORD_CACHE_DIR=/tmp/word_cache
44
+ ```
45
+
46
+ ## Variable Explanations
47
+
48
+ ### **WORD_SIMILARITY_THRESHOLD** (Default: 0.55)
49
+ - Controls semantic similarity requirement for AI-generated words
50
+ - Range: 0.3-0.7 (higher = stricter quality, fewer words)
51
+ - System uses adaptive thresholds if insufficient words found
52
+
53
+ ### **USE_HIERARCHICAL_SEARCH** (Default: true)
54
+ - Enables advanced semantic search with topic variations and subcategories
55
+ - Significantly improves word diversity and topic coverage
56
+ - Set to `false` to use simpler single-search approach
57
+
58
+ ### **MAX_USED_WORDS_MEMORY** (Default: 50)
59
+ - Number of previously used words to remember per topic
60
+ - Prevents repetition across multiple puzzle generations
61
+ - Higher values = better variety but more memory usage
62
+
63
+ ### **EXCLUDED_WORDS** (Optional)
64
+ - Comma-separated list of words to never include in puzzles
65
+ - Blocks overly generic or inappropriate terms
66
+ - Example: `WORD,THING,STUFF,DATA,INFO`
67
+
68
+ ### **FALLBACK_TO_STATIC** (Default: true)
69
+ - Falls back to static word lists if AI generation fails
70
+ - Ensures puzzle generation always succeeds
71
+ - Recommended to keep as `true` for production reliability
72
+
73
+ ## Recommended HF Spaces Configuration
74
+
75
+ **Minimal Setup (Core functionality):**
76
+ ```env
77
+ NODE_ENV=production
78
+ PORT=7860
79
+ PYTHONPATH=/app/backend-py
80
+ PYTHONUNBUFFERED=1
81
+ EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
82
+ WORD_SIMILARITY_THRESHOLD=0.55
83
+ USE_AI_WORDS=true
84
+ FALLBACK_TO_STATIC=true
85
+ ```
86
+
87
+ **Optimized Setup (Better performance & variety):**
88
+ ```env
89
+ NODE_ENV=production
90
+ PORT=7860
91
+ PYTHONPATH=/app/backend-py
92
+ PYTHONUNBUFFERED=1
93
+ EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
94
+ WORD_SIMILARITY_THRESHOLD=0.55
95
+ USE_AI_WORDS=true
96
+ FALLBACK_TO_STATIC=true
97
+ USE_HIERARCHICAL_SEARCH=true
98
+ MAX_USED_WORDS_MEMORY=50
99
+ MAX_CACHED_WORDS=150
100
+ SEARCH_RANDOMNESS=0.02
101
+ ```
102
+
103
+ ## Performance Notes
104
+
105
+ - **Startup Time**: ~30-60 seconds with AI models, ~2 seconds without
106
+ - **Memory Usage**: ~500MB-1GB with AI, ~100MB without
107
+ - **First Request**: May take longer due to model initialization
108
+ - **FAISS Cache**: Speeds up subsequent startups significantly
109
+
110
+ ## Troubleshooting
111
+
112
+ **If puzzle generation fails:**
113
+ 1. Check `WORD_SIMILARITY_THRESHOLD` (try lowering to 0.5 or 0.45)
114
+ 2. Ensure `FALLBACK_TO_STATIC=true`
115
+ 3. Monitor logs for "Not enough words" errors
116
+
117
+ **If words seem too generic:**
118
+ 1. Raise `WORD_SIMILARITY_THRESHOLD` to 0.6 or 0.65
119
+ 2. Add problematic words to `EXCLUDED_WORDS`
120
+ 3. Enable `USE_HIERARCHICAL_SEARCH=true`
121
+
122
+ **If startup is too slow:**
123
+ 1. FAISS index caching should help after first run
124
+ 2. Consider smaller embedding model for faster startup (trade-off with quality)