vkoottu commited on
Commit
7a5665b
·
verified ·
1 Parent(s): 7be4a5c

Upload 7 files

Browse files
Files changed (7) hide show
  1. .gitignore +44 -0
  2. Dockerfile +12 -0
  3. README.md +229 -10
  4. config.py +99 -0
  5. handler.py +354 -0
  6. main.py +176 -0
  7. requirements.txt +15 -0
.gitignore ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Environment
24
+ .env
25
+ .venv
26
+ env/
27
+ venv/
28
+ ENV/
29
+ env.bak/
30
+ venv.bak/
31
+
32
+ # IDE
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+
38
+ # OS
39
+ .DS_Store
40
+ Thumbs.db
41
+
42
+ # Application specific
43
+ user_preferences.json
44
+ *.log
Dockerfile ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /code
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ EXPOSE 7860
11
+
12
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,229 @@
1
- ---
2
- title: FaceMatch Azure Dev
3
- emoji: 🐨
4
- colorFrom: red
5
- colorTo: green
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FaceMatch FastAPI
2
+
3
+ A face matching and recommendation system built with FastAPI, InsightFace, and Azure Blob Storage. This application provides personalized face recommendations based on user preferences and similarity matching.
4
+
5
+ ## Features
6
+
7
+ - **Face Detection & Embedding**: Uses InsightFace for robust face detection and embedding extraction
8
+ - **Similarity Matching**: Finds similar faces using cosine similarity on face embeddings
9
+ - **Personalized Recommendations**: Learns from user likes/dislikes to provide personalized matches
10
+ - **Gender Filtering**: Filter recommendations by gender (male, female, or all)
11
+ - **Azure Integration**: Stores images and embeddings in Azure Blob Storage
12
+ - **FastAPI**: Modern, fast web framework with automatic API documentation
13
+
14
+ ## API Endpoints
15
+
16
+ ### Core Endpoints
17
+
18
+ - `GET /` - Health check and welcome message
19
+ - `POST /api/init_user` - Initialize a new user session
20
+ - `GET /api/get_training_images` - Get training images for user preference learning
21
+ - `POST /api/record_preference` - Record user like/dislike preferences
22
+ - `POST /api/get_matches` - Get personalized matches based on user preferences
23
+ - `POST /api/get_recommendations` - Get recommendations based on query images
24
+ - `POST /api/extract_embeddings` - Extract embeddings from all images (admin)
25
+
26
+ ### API Documentation
27
+
28
+ Visit `/docs` for interactive Swagger UI documentation when running locally.
29
+
30
+ ## Local Setup
31
+
32
+ ### Prerequisites
33
+
34
+ - Python 3.8+
35
+ - Azure Blob Storage account
36
+ - Azure credentials
37
+
38
+ ### Installation
39
+
40
+ 1. **Clone the repository**
41
+ ```bash
42
+ git clone <your-repo-url>
43
+ cd Facematch_Dev
44
+ ```
45
+
46
+ 2. **Install dependencies**
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ 3. **Configure Azure credentials**
52
+
53
+ Set your Azure credentials as environment variables:
54
+ ```bash
55
+ export AZURE_STORAGE_CONNECTION_STRING="your_connection_string"
56
+ export AZURE_CONTAINER_NAME="your_container_name"
57
+ ```
58
+
59
+ Or create a `config.py` file with your credentials.
60
+
61
+ 4. **Run the application**
62
+ ```bash
63
+ python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
64
+ ```
65
+
66
+ 5. **Access the API**
67
+ - API: http://localhost:8000
68
+ - Documentation: http://localhost:8000/docs
69
+
70
+ ## Usage Examples
71
+
72
+ ### Get Recommendations
73
+
74
+ **Direct Format:**
75
+ ```bash
76
+ curl -X POST "http://localhost:8000/api/get_recommendations" \
77
+ -H "Content-Type: application/json" \
78
+ -d '{
79
+ "query_images": [
80
+ "https://your-azure-url/image1.jpg",
81
+ "https://your-azure-url/image2.jpg"
82
+ ],
83
+ "gender": "female",
84
+ "top_n": 5
85
+ }'
86
+ ```
87
+
88
+ **Hugging Face Format:**
89
+ ```bash
90
+ curl -X POST "http://localhost:8000/api/get_recommendations" \
91
+ -H "Content-Type: application/json" \
92
+ -d '{
93
+ "inputs": {
94
+ "query_images": [
95
+ "https://your-azure-url/image1.jpg",
96
+ "https://your-azure-url/image2.jpg"
97
+ ],
98
+ "gender": "female",
99
+ "top_n": 5
100
+ }
101
+ }'
102
+ ```
103
+
104
+ ### Initialize User Session
105
+ ```bash
106
+ curl -X POST "http://localhost:8000/api/init_user"
107
+ ```
108
+
109
+ ### Record Preferences
110
+ ```bash
111
+ curl -X POST "http://localhost:8000/api/record_preference" \
112
+ -H "Content-Type: application/json" \
113
+ -d '{
114
+ "user_id": "your_user_id",
115
+ "image_url": "https://your-azure-url/image.jpg",
116
+ "preference": "like"
117
+ }'
118
+ ```
119
+
120
+ ## Hugging Face Spaces Deployment
121
+
122
+ ### 1. Create a Hugging Face Space
123
+
124
+ 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
125
+ 2. Click "Create new Space"
126
+ 3. Choose "FastAPI" as the SDK
127
+ 4. Set visibility (public or private)
128
+ 5. Create the space
129
+
130
+ ### 2. Configure Secrets
131
+
132
+ In your Hugging Face Space settings, add these secrets:
133
+
134
+ - `AZURE_STORAGE_CONNECTION_STRING`: Your Azure connection string
135
+ - `AZURE_CONTAINER_NAME`: Your Azure container name
136
+
137
+ ### 3. Upload Files
138
+
139
+ Upload these files to your Hugging Face Space:
140
+
141
+ - `main.py` - FastAPI application
142
+ - `handler.py` - Face matching logic
143
+ - `requirements.txt` - Dependencies
144
+ - `config.py` - Configuration (if using file-based config)
145
+
146
+ ### 4. Deploy
147
+
148
+ The space will automatically build and deploy your FastAPI application.
149
+
150
+ ### 5. Access Your API
151
+
152
+ Your API will be available at:
153
+ ```
154
+ https://your-username-your-space-name.hf.space
155
+ ```
156
+
157
+ ## Azure Setup
158
+
159
+ ### Required Azure Resources
160
+
161
+ 1. **Storage Account**: For storing images and embeddings
162
+ 2. **Blob Container**: Organized with folders:
163
+ - `ai-images/men/` - Training images for men
164
+ - `ai-images/women/` - Training images for women
165
+ - `profile-media/` - Images to search for matches
166
+
167
+ ### Configuration
168
+
169
+ The application expects these Azure settings:
170
+
171
+ ```python
172
+ # In config.py or environment variables
173
+ AZURE_STORAGE_CONNECTION_STRING = "your_connection_string"
174
+ AZURE_CONTAINER_NAME = "your_container_name"
175
+ ```
176
+
177
+ ## File Structure
178
+
179
+ ```
180
+ Facematch_Dev/
181
+ ├── main.py # FastAPI application
182
+ ├── handler.py # Face matching logic
183
+ ├── config.py # Configuration
184
+ ├── requirements.txt # Dependencies
185
+ ├── README.md # This file
186
+ ├── templates/ # HTML templates (if needed)
187
+ └── user_preferences.json # User preferences storage
188
+ ```
189
+
190
+ ## Performance Notes
191
+
192
+ - **Local Development**: Runs on CPU, suitable for testing
193
+ - **Hugging Face Spaces**: Runs on GPU, much faster for production
194
+ - **Embedding Extraction**: Run `/api/extract_embeddings` after uploading new images
195
+ - **Caching**: Embeddings are cached in Azure for faster subsequent queries
196
+
197
+ ## Troubleshooting
198
+
199
+ ### Common Issues
200
+
201
+ 1. **Face Detection Fails**: Some images may not contain detectable faces
202
+ 2. **Azure Connection**: Ensure credentials are correctly set
203
+ 3. **Memory Issues**: Large image collections may require more memory on Hugging Face
204
+
205
+ ### Debug Mode
206
+
207
+ Enable debug logging by setting environment variable:
208
+ ```bash
209
+ export DEBUG=1
210
+ ```
211
+
212
+ ## Contributing
213
+
214
+ 1. Fork the repository
215
+ 2. Create a feature branch
216
+ 3. Make your changes
217
+ 4. Test thoroughly
218
+ 5. Submit a pull request
219
+
220
+ ## License
221
+
222
+ [Add your license information here]
223
+
224
+ ## Support
225
+
226
+ For issues and questions:
227
+ - Create an issue on GitHub
228
+ - Check the API documentation at `/docs`
229
+ - Review the debug logs for detailed error information
config.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from typing import Dict, Any
3
+
4
+ class Config:
5
+ """Configuration class for FaceMatch application"""
6
+
7
+ # Azure Storage Configuration
8
+ AZURE_STORAGE_CONNECTION_STRING = os.getenv('AZURE_STORAGE_CONNECTION_STRING', 'DefaultEndpointsProtocol=https;AccountName=koottumedia;AccountKey=jqAuUPk6tiCuhpIqlcguTBWoVR++kcqiQDgfPVlE05bXfYH4W/TmbXez3kKHdVImVfz/FZ1wltnq+AStx/Bakw==;EndpointSuffix=core.windows.net')
9
+ AZURE_STORAGE_ACCOUNT_NAME = os.getenv('AZURE_STORAGE_ACCOUNT_NAME', 'koottumedia')
10
+ AZURE_STORAGE_ACCOUNT_KEY = os.getenv('AZURE_STORAGE_ACCOUNT_KEY', 'jqAuUPk6tiCuhpIqlcguTBWoVR++kcqiQDgfPVlE05bXfYH4W/TmbXez3kKHdVImVfz/FZ1wltnq+AStx/Bakw==')
11
+ AZURE_CONTAINER_NAME = os.getenv('AZURE_CONTAINER_NAME', 'koottu-media')
12
+ AZURE_PREFIX = os.getenv('AZURE_PREFIX', 'koottu-media/profile-media/')
13
+ AZURE_EMBEDDINGS_FOLDER = os.getenv('AZURE_EMBEDDINGS_FOLDER', 'koottu-media/embeddings/')
14
+ AZURE_TRAINING_IMAGES_FOLDER = os.getenv('AZURE_TRAINING_IMAGES_FOLDER', 'koottu-media/ai-images/')
15
+
16
+ # Face Recognition Configuration
17
+ INSIGHTFACE_CTX_ID = int(os.getenv('INSIGHTFACE_CTX_ID', '0')) # 0 for GPU, -1 for CPU
18
+ FACE_EMBEDDING_DIMENSION = 512
19
+ SIMILARITY_THRESHOLD = float(os.getenv('SIMILARITY_THRESHOLD', '0.5'))
20
+
21
+ # Application Configuration
22
+ FLASK_SECRET_KEY = os.getenv('FLASK_SECRET_KEY', 'your-secret-key-here')
23
+ FLASK_HOST = os.getenv('FLASK_HOST', '0.0.0.0')
24
+ FLASK_PORT = int(os.getenv('FLASK_PORT', '5000'))
25
+ FLASK_DEBUG = os.getenv('FLASK_DEBUG', 'True').lower() == 'true'
26
+
27
+ # User Preferences Configuration
28
+ USER_PREFERENCES_FILE = os.getenv('USER_PREFERENCES_FILE', 'user_preferences.json')
29
+ MAX_TRAINING_IMAGES = int(os.getenv('MAX_TRAINING_IMAGES', '10'))
30
+ DEFAULT_MATCH_COUNT = int(os.getenv('DEFAULT_MATCH_COUNT', '10'))
31
+ MAX_MATCH_COUNT = int(os.getenv('MAX_MATCH_COUNT', '50'))
32
+
33
+ # Embedding Database Configuration
34
+ EMBEDDING_UPDATE_DAYS = int(os.getenv('EMBEDDING_UPDATE_DAYS', '30'))
35
+ MIN_FACE_CONFIDENCE = float(os.getenv('MIN_FACE_CONFIDENCE', '0.5'))
36
+
37
+ # Performance Configuration
38
+ BATCH_SIZE = int(os.getenv('BATCH_SIZE', '10'))
39
+ CACHE_TTL = int(os.getenv('CACHE_TTL', '3600')) # 1 hour
40
+
41
+ @classmethod
42
+ def get_azure_config(cls) -> Dict[str, Any]:
43
+ """Get Azure Storage configuration dictionary"""
44
+ return {
45
+ 'connection_string': cls.AZURE_STORAGE_CONNECTION_STRING,
46
+ 'account_name': cls.AZURE_STORAGE_ACCOUNT_NAME,
47
+ 'account_key': cls.AZURE_STORAGE_ACCOUNT_KEY,
48
+ 'container_name': cls.AZURE_CONTAINER_NAME
49
+ }
50
+
51
+ @classmethod
52
+ def get_storage_config(cls) -> Dict[str, str]:
53
+ """Get storage configuration dictionary"""
54
+ return {
55
+ 'container_name': cls.AZURE_CONTAINER_NAME,
56
+ 'prefix': cls.AZURE_PREFIX,
57
+ 'embeddings_folder': cls.AZURE_EMBEDDINGS_FOLDER
58
+ }
59
+
60
+ @classmethod
61
+ def get_flask_config(cls) -> Dict[str, Any]:
62
+ """Get Flask configuration dictionary"""
63
+ return {
64
+ 'host': cls.FLASK_HOST,
65
+ 'port': cls.FLASK_PORT,
66
+ 'debug': cls.FLASK_DEBUG
67
+ }
68
+
69
+ class DevelopmentConfig(Config):
70
+ """Development configuration"""
71
+ FLASK_DEBUG = True
72
+ INSIGHTFACE_CTX_ID = -1 # Use CPU for development
73
+
74
+ class ProductionConfig(Config):
75
+ """Production configuration"""
76
+ FLASK_DEBUG = False
77
+ INSIGHTFACE_CTX_ID = 0 # Use GPU for production
78
+ FLASK_SECRET_KEY = os.getenv('FLASK_SECRET_KEY', 'change-this-in-production')
79
+
80
+ class TestingConfig(Config):
81
+ """Testing configuration"""
82
+ FLASK_DEBUG = True
83
+ INSIGHTFACE_CTX_ID = -1
84
+ AZURE_CONTAINER_NAME = 'test-facematch-images'
85
+ USER_PREFERENCES_FILE = 'test_user_preferences.json'
86
+
87
+ # Configuration mapping
88
+ config_map = {
89
+ 'development': DevelopmentConfig,
90
+ 'production': ProductionConfig,
91
+ 'testing': TestingConfig
92
+ }
93
+
94
+ def get_config(config_name: str = None) -> Config:
95
+ """Get configuration based on environment"""
96
+ if config_name is None:
97
+ config_name = os.getenv('FLASK_ENV', 'development') or 'development'
98
+
99
+ return config_map.get(config_name, DevelopmentConfig)
handler.py ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import tempfile
4
+ import numpy as np
5
+ from insightface.app import FaceAnalysis
6
+ from scipy.spatial.distance import cosine
7
+ import cv2 # OpenCV for image processing
8
+ from typing import List, Dict, Any
9
+ from datetime import datetime, timedelta
10
+ import requests
11
+ import base64
12
+ from io import BytesIO
13
+ from PIL import Image
14
+ from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
15
+ from config import get_config
16
+ import time
17
+
18
+ class EndpointHandler:
19
+ def __init__(self, model_dir=None):
20
+ self.app = FaceAnalysis()
21
+ self.app.prepare(ctx_id=0) # Set to 0 for GPU, or -1 for CPU
22
+
23
+ # Get configuration
24
+ config = get_config()
25
+ azure_config = config.get_azure_config()
26
+ storage_config = config.get_storage_config()
27
+
28
+ # Initialize Azure Blob Storage client
29
+ if azure_config['connection_string']:
30
+ self.blob_service_client = BlobServiceClient.from_connection_string(
31
+ azure_config['connection_string']
32
+ )
33
+ else:
34
+ # Use account name and key if connection string not available
35
+ account_url = f"https://{azure_config['account_name']}.blob.core.windows.net"
36
+ self.blob_service_client = BlobServiceClient(
37
+ account_url=account_url,
38
+ credential=azure_config['account_key']
39
+ )
40
+
41
+ self.container_name = storage_config['container_name']
42
+ self.prefix = storage_config['prefix']
43
+ self.embeddings_folder = storage_config['embeddings_folder']
44
+
45
+ # Get container client
46
+ self.container_client = self.blob_service_client.get_container_client(self.container_name)
47
+
48
+ def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]:
49
+ try:
50
+ if "inputs" in data:
51
+ return self.process_hf_input(data)
52
+ else:
53
+ return self.process_json_input(data)
54
+ except ValueError as e:
55
+ return {"error": str(e)}
56
+ except Exception as e:
57
+ return {"error": str(e)}
58
+
59
+ def process_hf_input(self, hf_data):
60
+ """Process Hugging Face format input."""
61
+ if "inputs" in hf_data:
62
+ actual_data = hf_data["inputs"]
63
+ return self.process_json_input(actual_data)
64
+ else:
65
+ return {"error": "Invalid Hugging Face JSON structure."}
66
+
67
+ def process_json_input(self, json_data):
68
+ if "query_images" in json_data and "gender" in json_data:
69
+ query_images = json_data["query_images"]
70
+ gender = json_data["gender"]
71
+ top_n = json_data.get("top_n", 5)
72
+ similar_images = self.find_similar_images_aggregate(query_images, gender, top_n)
73
+ return {"similar_images": similar_images}
74
+ elif "extract_embeddings" in json_data and json_data["extract_embeddings"]:
75
+ self.extract_and_save_embeddings()
76
+ return {"status": "Embeddings extraction completed."}
77
+ else:
78
+ raise ValueError("Invalid JSON structure.")
79
+
80
+ def load_embeddings_from_azure(self):
81
+ """Load existing embeddings from Azure Blob Storage if they exist, else return an empty list."""
82
+ try:
83
+ # Check if embeddings file exists in Azure - look in profile-media/embeddings/
84
+ blob_name = f'profile-media/embeddings/embeddings_db.json'
85
+ blob_client = self.container_client.get_blob_client(blob_name)
86
+
87
+ # Download the existing embeddings file if it exists
88
+ temp_dir = tempfile.gettempdir()
89
+ temp_file_path = os.path.join(temp_dir, 'embeddings_db.json')
90
+
91
+ with open(temp_file_path, 'wb') as download_file:
92
+ download_stream = blob_client.download_blob()
93
+ download_file.write(download_stream.readall())
94
+
95
+ with open(temp_file_path, 'r') as f:
96
+ return json.load(f)
97
+ except Exception as e:
98
+ print(f'Embeddings file not found in Azure, initializing a new one: {e}')
99
+ return []
100
+
101
+ def extract_and_save_embeddings(self):
102
+ """Extract embeddings from images and save them to Azure Blob Storage."""
103
+ embeddings_db = self.load_embeddings_from_azure()
104
+ now = datetime.utcnow()
105
+ thirty_days_ago = now - timedelta(days=30)
106
+
107
+ # Process images from both profile-media and ai-images folders
108
+ folders_to_process = [
109
+ 'profile-media/', # profile-media folder (without container name)
110
+ 'ai-images/men/', # ai-images/men folder (without container name)
111
+ 'ai-images/women/' # ai-images/women folder (without container name)
112
+ ]
113
+
114
+ for folder_prefix in folders_to_process:
115
+ try:
116
+ print(f"Processing folder: {folder_prefix}")
117
+ # List all blobs in the container with the current prefix
118
+ blob_list = self.container_client.list_blobs(name_starts_with=folder_prefix)
119
+
120
+ for blob in blob_list:
121
+ blob_name = blob.name
122
+
123
+ if blob_name.endswith(('.jpg', '.jpeg', '.png')):
124
+ image_url = f'https://{self.blob_service_client.account_name}.blob.core.windows.net/{self.container_name}/{blob_name}'
125
+ existing_entry = next((item for item in embeddings_db if item['image_url'] == image_url), None)
126
+
127
+ if existing_entry:
128
+ embedding_timestamp = datetime.fromisoformat(existing_entry['timestamp'])
129
+ if (existing_entry.get('no_face_detected') or embedding_timestamp > thirty_days_ago) and blob.last_modified.replace(tzinfo=None) <= thirty_days_ago:
130
+ continue
131
+
132
+ print(f"Processing image: {blob_name}")
133
+ try:
134
+ # Create a unique temporary file with proper permissions
135
+ temp_suffix = os.path.splitext(blob_name)[1] or '.jpg'
136
+ with tempfile.NamedTemporaryFile(suffix=temp_suffix, delete=False) as temp_image_file:
137
+ temp_file_path = temp_image_file.name
138
+
139
+ # Download blob to temporary file
140
+ blob_client = self.container_client.get_blob_client(blob_name)
141
+ with open(temp_file_path, 'wb') as download_file:
142
+ download_stream = blob_client.download_blob()
143
+ download_file.write(download_stream.readall())
144
+
145
+ img = self.load_image_from_blob(blob_client)
146
+
147
+ # Clean up temporary file immediately after reading
148
+ try:
149
+ os.unlink(temp_file_path)
150
+ except:
151
+ pass # Ignore cleanup errors
152
+
153
+ if img is None:
154
+ print(f"Failed to read image: {blob_name}")
155
+ continue
156
+
157
+ faces = self.app.get(img)
158
+
159
+ if len(faces) == 0:
160
+ print(f"No face detected in: {blob_name}")
161
+ no_face_entry = {
162
+ 'image_url': image_url,
163
+ 'no_face_detected': True,
164
+ 'timestamp': now.isoformat()
165
+ }
166
+ if existing_entry:
167
+ existing_entry.update(no_face_entry)
168
+ else:
169
+ embeddings_db.append(no_face_entry)
170
+ continue
171
+
172
+ face = faces[0]
173
+ embedding = face.embedding.tolist()
174
+ gender = 'male' if face.gender == 1 else 'female'
175
+
176
+ new_entry = {
177
+ 'embedding': embedding,
178
+ 'gender': gender,
179
+ 'image_url': image_url,
180
+ 'timestamp': now.isoformat()
181
+ }
182
+
183
+ if existing_entry:
184
+ existing_entry.update(new_entry)
185
+ else:
186
+ embeddings_db.append(new_entry)
187
+
188
+ print(f"Successfully processed: {blob_name} (gender: {gender})")
189
+
190
+ except Exception as e:
191
+ print(f"Error processing image {blob_name}: {e}")
192
+ continue
193
+ except Exception as e:
194
+ print(f"Error processing folder {folder_prefix}: {e}")
195
+ continue
196
+
197
+ print(f"Total embeddings in database: {len(embeddings_db)}")
198
+
199
+ # Save embeddings back to Azure
200
+ try:
201
+ temp_json_path = os.path.join(tempfile.gettempdir(), f'embeddings_db_{int(time.time())}.json')
202
+ with open(temp_json_path, 'w') as temp_json_file:
203
+ json.dump(embeddings_db, temp_json_file)
204
+
205
+ # Upload to Azure Blob Storage - save in profile-media/embeddings/
206
+ blob_name = f'profile-media/embeddings/embeddings_db.json'
207
+ blob_client = self.container_client.get_blob_client(blob_name)
208
+
209
+ with open(temp_json_path, 'rb') as data:
210
+ blob_client.upload_blob(data, overwrite=True)
211
+
212
+ print(f"Embeddings saved to Azure: {blob_name}")
213
+
214
+ # Clean up temporary file
215
+ try:
216
+ os.unlink(temp_json_path)
217
+ except:
218
+ pass # Ignore cleanup errors
219
+
220
+ except Exception as e:
221
+ print(f"Error saving embeddings: {e}")
222
+
223
+ def find_similar_images_aggregate(self, query_images: List[str], gender: str, top_n: int = 5) -> List[str]:
224
+ print(f"Debug: Starting similarity search with {len(query_images)} query images")
225
+ print(f"Debug: Looking for gender: {gender}, top_n: {top_n}")
226
+
227
+ similarities = {}
228
+ for i, image_input in enumerate(query_images):
229
+ print(f"Debug: Processing query image {i+1}/{len(query_images)}: {image_input}")
230
+ try:
231
+ # Determine the type of image input
232
+ if image_input.startswith('http'):
233
+ # It's a URL
234
+ img = self.load_image_from_url(image_input)
235
+ elif image_input.startswith('data:image/'):
236
+ # It's a base64-encoded image
237
+ img = self.load_image_from_base64(image_input)
238
+ else:
239
+ # Assume it's a local file path
240
+ img = cv2.imread(image_input)
241
+
242
+ if img is None:
243
+ print(f"Failed to load image: {image_input}")
244
+ continue
245
+
246
+ faces = self.app.get(img)
247
+ if len(faces) == 0:
248
+ print(f"Debug: No faces detected in query image {i+1}")
249
+ continue
250
+
251
+ query_embedding = faces[0].embedding
252
+ print(f"Debug: Successfully extracted face embedding from query image {i+1}")
253
+
254
+ # Load embeddings database from Azure
255
+ embeddings_db = self.load_embeddings_from_azure()
256
+ print(f"Debug: Total embeddings in database: {len(embeddings_db)}")
257
+
258
+ # Filter to only include images from profile-media folder structure
259
+ profile_media_db = [item for item in embeddings_db if 'image_url' in item and 'profile-media' in item['image_url']]
260
+ print(f"Debug: Profile-media embeddings: {len(profile_media_db)}")
261
+
262
+ filtered_db = [item for item in profile_media_db if 'gender' in item and item['gender'] == gender]
263
+ print(f"Debug: Filtered by gender '{gender}': {len(filtered_db)}")
264
+
265
+ if len(filtered_db) == 0:
266
+ print(f"Debug: No embeddings found for gender '{gender}' in profile-media folder")
267
+ print(f"Debug: Available genders in profile-media: {list(set([item.get('gender') for item in profile_media_db if 'gender' in item]))}")
268
+ continue
269
+
270
+ for item in filtered_db:
271
+ similarity = 1 - cosine(query_embedding, np.array(item['embedding']))
272
+ if item['image_url'] in similarities:
273
+ similarities[item['image_url']].append(similarity)
274
+ else:
275
+ similarities[item['image_url']] = [similarity]
276
+
277
+ except Exception as e:
278
+ error_message = f"Error processing image input: {e}"
279
+ print(error_message)
280
+ # Return empty list instead of error dict
281
+ return []
282
+
283
+ # Aggregate similarities
284
+ print(f"Debug: Total similarities found: {len(similarities)}")
285
+ aggregated_similarities = [(np.mean(scores), url) for url, scores in similarities.items()]
286
+ aggregated_similarities.sort(reverse=True, key=lambda x: x[0])
287
+ result = [url for _, url in aggregated_similarities[:top_n]]
288
+ print(f"Debug: Returning {len(result)} recommendations")
289
+ return result
290
+
291
+ def find_similar_images_by_embedding(self, query_embedding: np.ndarray, gender: str = 'all', top_n: int = 10, excluded_images: List[str] = None) -> List[str]:
292
+ """Find similar images based on a given embedding vector."""
293
+ try:
294
+ # Load embeddings database from Azure
295
+ embeddings_db = self.load_embeddings_from_azure()
296
+
297
+ # Filter to only include images from profile-media folder structure
298
+ profile_media_db = [item for item in embeddings_db if 'image_url' in item and 'profile-media' in item['image_url']]
299
+
300
+ # Filter by gender if specified
301
+ if gender != 'all':
302
+ filtered_db = [item for item in profile_media_db if 'gender' in item and item['gender'] == gender]
303
+ else:
304
+ filtered_db = [item for item in profile_media_db if 'embedding' in item]
305
+
306
+ # Filter out excluded images
307
+ if excluded_images is not None:
308
+ filtered_db = [item for item in filtered_db if item['image_url'] not in excluded_images]
309
+
310
+ similarities = []
311
+ for item in filtered_db:
312
+ if 'embedding' in item and not item.get('no_face_detected', False):
313
+ similarity = 1 - cosine(query_embedding, np.array(item['embedding']))
314
+ similarities.append((similarity, item['image_url']))
315
+
316
+ # Sort by similarity and return top matches
317
+ similarities.sort(reverse=True, key=lambda x: x[0])
318
+ return [url for _, url in similarities[:top_n]]
319
+
320
+ except Exception as e:
321
+ print(f"Error in find_similar_images_by_embedding: {e}")
322
+ return []
323
+
324
+ def load_image_from_url(self, url):
325
+ try:
326
+ response = requests.get(url, timeout=30)
327
+ response.raise_for_status()
328
+ image = Image.open(BytesIO(response.content)).convert('RGB')
329
+ image = np.array(image)
330
+ return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
331
+ except Exception as e:
332
+ print(f"Error loading image from URL {url}: {e}")
333
+ return None
334
+
335
+ def load_image_from_blob(self, blob_client):
336
+ try:
337
+ blob_bytes = blob_client.download_blob().readall()
338
+ image = Image.open(BytesIO(blob_bytes)).convert('RGB')
339
+ image = np.array(image)
340
+ return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
341
+ except Exception as e:
342
+ print(f"Error loading image from blob: {e}")
343
+ return None
344
+
345
+ def load_image_from_base64(self, base64_string):
346
+ header, encoded = base64_string.split(',', 1)
347
+ data = base64.b64decode(encoded)
348
+ np_arr = np.frombuffer(data, np.uint8)
349
+ img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
350
+ return img # Returns BGR image as expected by OpenCV
351
+
352
+
353
+ # Instantiate the handler
354
+ handler = EndpointHandler()
main.py ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, Request, HTTPException, Body
2
+ from fastapi.middleware.cors import CORSMiddleware
3
+ from fastapi.responses import HTMLResponse, JSONResponse
4
+ from pydantic import BaseModel, Field
5
+ from typing import List, Optional, Union, Dict, Any
6
+ import uuid
7
+ import json
8
+ import os
9
+ from datetime import datetime
10
+ from handler import EndpointHandler
11
+ import numpy as np
12
+
13
+ app = FastAPI()
14
+
15
+ # Enable CORS
16
+ app.add_middleware(
17
+ CORSMiddleware,
18
+ allow_origins=["*"],
19
+ allow_credentials=True,
20
+ allow_methods=["*"],
21
+ allow_headers=["*"],
22
+ )
23
+
24
+ # In-memory user session (stateless, resets on restart)
25
+ user_sessions = {}
26
+ USER_PREFERENCES_FILE = 'user_preferences.json'
27
+
28
+ face_handler = EndpointHandler()
29
+
30
+ # Pydantic model for recommendations
31
+ class RecommendationRequest(BaseModel):
32
+ query_images: List[str] = Field(..., description="List of Azure URLs for query images")
33
+ gender: Optional[str] = Field('all', description="Gender filter: 'male', 'female', or 'all'")
34
+ top_n: Optional[int] = Field(5, description="Number of recommendations to return")
35
+
36
+ # Pydantic model for Hugging Face format
37
+ class HuggingFaceRequest(BaseModel):
38
+ inputs: RecommendationRequest
39
+
40
+ # Helper functions
41
+
42
+ def load_user_preferences():
43
+ if os.path.exists(USER_PREFERENCES_FILE):
44
+ with open(USER_PREFERENCES_FILE, 'r') as f:
45
+ return json.load(f)
46
+ return {}
47
+
48
+ def save_user_preferences(preferences):
49
+ with open(USER_PREFERENCES_FILE, 'w') as f:
50
+ json.dump(preferences, f, indent=2)
51
+
52
+ @app.get("/", response_class=HTMLResponse)
53
+ def index():
54
+ # Serve the UI if needed, or just a welcome message
55
+ return "<h2>FaceMatch FastAPI is running!</h2>"
56
+
57
+ @app.post("/api/init_user")
58
+ def init_user():
59
+ user_id = str(uuid.uuid4())
60
+ user_sessions[user_id] = True
61
+ preferences = load_user_preferences()
62
+ if user_id not in preferences:
63
+ preferences[user_id] = {
64
+ 'liked_images': [],
65
+ 'disliked_images': [],
66
+ 'preference_embedding': None,
67
+ 'created_at': datetime.now().isoformat()
68
+ }
69
+ save_user_preferences(preferences)
70
+ return {"user_id": user_id, "status": "initialized"}
71
+
72
+ @app.get("/api/get_training_images")
73
+ def get_training_images():
74
+ try:
75
+ training_images = []
76
+ for gender_folder in ['men', 'women']:
77
+ gender_prefix = f'ai-images/{gender_folder}/'
78
+ blob_list = face_handler.container_client.list_blobs(name_starts_with=gender_prefix)
79
+ for blob in blob_list:
80
+ if blob.name.endswith(('.jpg', '.jpeg', '.png')):
81
+ image_url = f'https://{face_handler.blob_service_client.account_name}.blob.core.windows.net/{face_handler.container_name}/{blob.name}'
82
+ training_images.append(image_url)
83
+ return {"training_images": training_images[:10], "status": "success"}
84
+ except Exception as e:
85
+ return JSONResponse(status_code=500, content={"error": str(e)})
86
+
87
+ @app.post("/api/record_preference")
88
+ async def record_preference(request: Request):
89
+ try:
90
+ data = await request.json()
91
+ user_id = data.get('user_id')
92
+ image_url = data.get('image_url')
93
+ preference = data.get('preference')
94
+ if not user_id or not image_url or not preference:
95
+ raise HTTPException(status_code=400, detail="Missing required parameters")
96
+ preferences = load_user_preferences()
97
+ if user_id not in preferences:
98
+ raise HTTPException(status_code=404, detail="User not found")
99
+ if preference == 'like':
100
+ if image_url not in preferences[user_id]['liked_images']:
101
+ preferences[user_id]['liked_images'].append(image_url)
102
+ elif preference == 'dislike':
103
+ if image_url not in preferences[user_id]['disliked_images']:
104
+ preferences[user_id]['disliked_images'].append(image_url)
105
+ save_user_preferences(preferences)
106
+ return {"status": "preference_recorded"}
107
+ except Exception as e:
108
+ return JSONResponse(status_code=500, content={"error": str(e)})
109
+
110
+ @app.post("/api/get_matches")
111
+ async def get_matches(request: Request):
112
+ try:
113
+ data = await request.json()
114
+ user_id = data.get('user_id')
115
+ gender = data.get('gender', 'all')
116
+ top_n = data.get('top_n', 10)
117
+ if not user_id:
118
+ raise HTTPException(status_code=404, detail="User not found")
119
+ preferences = load_user_preferences()
120
+ if user_id not in preferences:
121
+ raise HTTPException(status_code=404, detail="User preferences not found")
122
+ user_prefs = preferences[user_id]
123
+ if user_prefs['liked_images']:
124
+ liked_embeddings = []
125
+ for image_url in user_prefs['liked_images']:
126
+ try:
127
+ img = face_handler.load_image_from_url(image_url)
128
+ faces = face_handler.app.get(img)
129
+ if len(faces) > 0:
130
+ liked_embeddings.append(faces[0].embedding)
131
+ except Exception as e:
132
+ continue
133
+ if liked_embeddings:
134
+ preference_embedding = np.mean(liked_embeddings, axis=0)
135
+ user_prefs['preference_embedding'] = preference_embedding.tolist()
136
+ save_user_preferences(preferences)
137
+ similar_images = face_handler.find_similar_images_by_embedding(
138
+ preference_embedding, gender, top_n, user_prefs['disliked_images']
139
+ )
140
+ return {"similar_images": similar_images}
141
+ return {"similar_images": []}
142
+ except Exception as e:
143
+ return JSONResponse(status_code=500, content={"error": str(e)})
144
+
145
+ @app.post("/api/get_recommendations")
146
+ async def get_recommendations(
147
+ body: Union[RecommendationRequest, HuggingFaceRequest] = Body(...)
148
+ ):
149
+ try:
150
+ # Handle both direct format and Hugging Face format
151
+ if isinstance(body, HuggingFaceRequest):
152
+ # Hugging Face format: {"inputs": {...}}
153
+ query_images = body.inputs.query_images
154
+ gender = body.inputs.gender or 'all'
155
+ top_n = body.inputs.top_n or 5
156
+ else:
157
+ # Direct format: {...}
158
+ query_images = body.query_images
159
+ gender = body.gender or 'all'
160
+ top_n = body.top_n or 5
161
+
162
+ if not query_images:
163
+ raise HTTPException(status_code=400, detail="No query images provided")
164
+
165
+ similar_images = face_handler.find_similar_images_aggregate(query_images, gender, top_n)
166
+ return {"similar_images": similar_images}
167
+ except Exception as e:
168
+ return JSONResponse(status_code=500, content={"error": str(e)})
169
+
170
+ @app.post("/api/extract_embeddings")
171
+ def extract_embeddings():
172
+ try:
173
+ face_handler.extract_and_save_embeddings()
174
+ return {"status": "Embeddings extraction completed"}
175
+ except Exception as e:
176
+ return JSONResponse(status_code=500, content={"error": str(e)})
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ azure-storage-blob
2
+ onnxruntime
3
+ insightface
4
+ opencv-python
5
+ flask
6
+ flask-cors
7
+ numpy
8
+ scipy
9
+ pillow
10
+ requests
11
+ scikit-learn
12
+ pandas
13
+ fastapi
14
+ uvicorn
15
+