IsmatS commited on
Commit
0203ccf
·
1 Parent(s): 6ac41d1
Files changed (1) hide show
  1. scripts/README.md +0 -143
scripts/README.md DELETED
@@ -1,143 +0,0 @@
1
- # Scripts Directory
2
-
3
- One-time utility scripts for SOCAR Hackathon project.
4
-
5
- ## Available Scripts
6
-
7
- ### 📊 Data Management
8
-
9
- #### `check_pinecone.py`
10
- Check Pinecone vector database status and statistics.
11
-
12
- ```bash
13
- python scripts/check_pinecone.py
14
- ```
15
-
16
- **Output:**
17
- - Total vector count
18
- - Index dimensions
19
- - Namespaces (if any)
20
- - Connection status
21
-
22
- #### `clear_pinecone.py`
23
- Clear all data from Pinecone index before re-ingestion.
24
-
25
- ```bash
26
- python scripts/clear_pinecone.py
27
- ```
28
-
29
- **⚠️ WARNING**: This deletes ALL vectors! Requires typing 'DELETE' to confirm.
30
-
31
- **Use case:**
32
- - Before re-ingesting documents with new chunking strategy
33
- - Testing with fresh data
34
- - Cleaning up after experiments
35
-
36
- ### 🤖 Azure OpenAI
37
-
38
- #### `list_azure_models.py`
39
- List all deployed Azure OpenAI models.
40
-
41
- ```bash
42
- python scripts/list_azure_models.py
43
- ```
44
-
45
- **Output:**
46
- - Vision models (GPT-4.1, GPT-5, Claude, etc.)
47
- - Text models (Llama, DeepSeek, etc.)
48
- - Total count and categorization
49
-
50
- **Use case:**
51
- - Verify which models are deployed
52
- - Check model availability before updating notebooks
53
- - Debugging 404 errors
54
-
55
- ## Setup
56
-
57
- All scripts use environment variables from `.env` file:
58
-
59
- ```bash
60
- # Required in .env
61
- PINECONE_API_KEY=your_key
62
- PINECONE_INDEX_NAME=hackathon
63
- AZURE_OPENAI_API_KEY=your_key
64
- AZURE_OPENAI_ENDPOINT=your_endpoint
65
- ```
66
-
67
- ## Dependencies
68
-
69
- Scripts use the same dependencies as the main project:
70
- - `python-dotenv` - Environment variables
71
- - `pinecone-client` - Vector database
72
- - `openai` - Azure OpenAI
73
-
74
- Install from project root:
75
- ```bash
76
- pip install -r notebooks/requirements.txt
77
- ```
78
-
79
- ## Common Workflows
80
-
81
- ### Re-ingesting Documents
82
-
83
- ```bash
84
- # 1. Check current data
85
- python scripts/check_pinecone.py
86
-
87
- # 2. Clear existing data
88
- python scripts/clear_pinecone.py
89
-
90
- # 3. Run ingestion script (not included - create as needed)
91
- # python scripts/ingest_documents.py
92
-
93
- # 4. Verify new data
94
- python scripts/check_pinecone.py
95
- ```
96
-
97
- ### Verifying Model Availability
98
-
99
- ```bash
100
- # List all deployed models
101
- python scripts/list_azure_models.py
102
-
103
- # Check if specific model exists in output
104
- python scripts/list_azure_models.py | grep "Llama-3.2-Vision"
105
- ```
106
-
107
- ## Adding New Scripts
108
-
109
- When creating new scripts:
110
- 1. Add descriptive docstring at top
111
- 2. Use environment variables from `.env`
112
- 3. Include error handling with helpful messages
113
- 4. Update this README with usage instructions
114
- 5. Follow existing naming convention: `verb_noun.py`
115
-
116
- ## Examples
117
-
118
- ### Safe Pinecone Cleanup
119
- ```python
120
- # First check what's there
121
- $ python scripts/check_pinecone.py
122
- Total Vectors: 1,300
123
- Dimensions: 1024
124
-
125
- # Then clear if needed
126
- $ python scripts/clear_pinecone.py
127
- ⚠️ WARNING: This will delete ALL 1,300 vectors!
128
- Type 'DELETE' to confirm: DELETE
129
- ✅ Deletion completed!
130
- ```
131
-
132
- ### Check Vision Models
133
- ```python
134
- $ python scripts/list_azure_models.py
135
-
136
- 🖼️ Vision Models (6):
137
- ✅ gpt-4.1
138
- ✅ gpt-5
139
- ✅ gpt-5-mini
140
- ✅ claude-sonnet-4-5
141
- ✅ claude-opus-4-1
142
- ✅ Phi-4-multimodal-instruct
143
- ```