Spaces:

nbeuchat
/

actors_matching

Runtime error

App Files Files Community

nbeuchat commited on Jan 28, 2022

Commit

0e1f08b

1 Parent(s): 75ce42f

Gradio app

Browse files

Files changed (13) hide show

.gitignore +4 -0
README.md +14 -3
actors_matching/__init__.py +0 -0
actors_matching/api.py +38 -0
app.py +58 -0
images/example_hannibal_barca.jpg +0 -0
images/example_joan_of_arc.jpg +0 -0
images/example_marie_curie.jpg +0 -0
images/example_scipio_africanus.jpg +0 -0
models/actors_annoy_index.ann +0 -0
models/actors_annoy_metadata.json +1 -0
models/actors_mapping.json +0 -0
requirements.txt +3 -1

.gitignore CHANGED Viewed

@@ -1,9 +1,13 @@
 # data files from imdb
 data/title.*.tsv*
 data/name.*.tsv*
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class

+# IDE
+.vscode
 # data files from imdb
 data/title.*.tsv*
 data/name.*.tsv*
 # Byte-compiled / optimized / DLL files
 __pycache__/
+*/__pycache__/
 *.py[cod]
 *$py.class

README.md CHANGED Viewed

@@ -19,7 +19,18 @@ Note that due to API limits, I only took images from 1,000 actors.
 The application is built with Gradio and deployed on HuggingFace Space. In the background, it uses:
-1. The [`face_recognition` library](https://github.com/ageitgey/face_recognition) to compute an embedding of the image
-2. Spotify's `annoy` library to efficiently search the closest actors based on the image embedding and a small database of actors' faces embeddings.
-3. Show you your best matches!

 The application is built with Gradio and deployed on HuggingFace Space. In the background, it uses:
+1. The [`face_recognition` library](https://github.com/ageitgey/face_recognition) to extract the location of faces in the image and compute an embedding of these faces
+2. Spotify's `annoy` library to efficiently search the closest actors based on the face embedding and a small database of actors' faces embeddings.
+3. Show you the best matches!
+This is meant to be a fun and tiny application. There are known issues and biases.
+## Known biases and limitations
+There are a few issues with the dataset and models used:
+- The dataset of actors is limited to a couple thousands actors and actresses and it is therefore not representative of the richness of professionals out there
+- The subset of actors and actresses selected is based on an aggregated metrics that considers all movies and shows in which the person was listed as an actor/actress. It is the weighted sum of the number of IMDb votes for this movie/show, weighted by the average IMDb score. This is obviously only a rough indicator of popularity but provided me with a quick way of getting a dataset with actors that people may know.
+- Given the above, the database sampling will have several biases that are intrinsic to (a) the IMDb database and user base itself which is biased towards western/American movies, (b) the movie industry itself with a dominance of white male actors
+- The pictures of actors and actresses was done through a simple Bing Search and not manually verified, there are several mistakes. For example, Graham Greene has a mix of pictures from Graham Greene, the canadian actor, and Graham Greene, the writer. You may get surprising results from time to time! Let me know if you find mistakes

actors_matching/__init__.py ADDED Viewed

File without changes

actors_matching/api.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import face_recognition
+import json
+import annoy
+from typing import Tuple
+EMBEDDING_DIMENSION=128
+ANNOY_INDEX_FILE = "models/actors_annoy_index.ann"
+ANNOY_METADATA_FILE = "models/actors_annoy_metadata.json"
+ANNOY_MAPPING_FILE = "models/actors_mapping.json"
+def load_annoy_index(
+    index_file = ANNOY_INDEX_FILE,
+    metadata_file = ANNOY_METADATA_FILE,
+    mapping_file = ANNOY_MAPPING_FILE
+) -> Tuple[annoy.AnnoyIndex, dict]:
+    """Load annoy index and associated mapping file"""
+    with open(metadata_file) as f:
+        annoy_index_metadata = json.load(f)
+    annoy_index = annoy.AnnoyIndex(f=EMBEDDING_DIMENSION, **annoy_index_metadata)
+    annoy_index.load(index_file)
+    with open(mapping_file) as f:
+        mapping = json.load(f)
+        mapping = {int(k): v for k, v in mapping.items()}
+    return annoy_index, mapping
+def analyze_image(image, annoy_index, n_matches: int = 1, num_jitters: int = 1, model: str = "large"):
+    """Extract face location, embeddings, and top n_matches matches"""
+    face_locations = face_recognition.face_locations(image)
+    embeddings = face_recognition.face_encodings(image, num_jitters=num_jitters, model=model, known_face_locations=face_locations)
+    matches = []
+    distances = []
+    for emb in embeddings:
+        m, d = annoy_index.get_nns_by_vector(emb, n_matches, include_distances=True)
+        matches.append(m)
+        distances.append(d)
+    return [dict(embeddings=e, matches=m, distances=d, face_locations=f) for e,m,d,f in zip(embeddings, matches, distances, face_locations)]

app.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import gradio as gr
+import numpy as np
+from actors_matching.api import analyze_image, load_annoy_index
+annoy_index, actors_mapping = load_annoy_index()
+def get_image_html(actor: dict):
+    url = actor["url"]
+    name = actor["name"]
+    imdb_url = f"https://www.imdb.com/name/{actor['nconst']}/"
+    return f'''
+    <div style="position: relative; text-align: center; color: white;">
+       <img src="{url}" alt="{name} matches the input image" style="height: 500px">
+        <div style="padding: 0.2em; position: absolute; bottom: 16px; left: 16px; background-color: #aacccccc; font-size: 2em;">
+            <p>{name}</p>
+            <p style="font-size:0.5em"><a href={imdb_url} target="_blank">Click to see on IMDb</></p>
+        </div>
+    </div>
+    '''
+def get_best_matches(image, n_matches: int):
+    return analyze_image(image, annoy_index=annoy_index, n_matches=n_matches)
+def find_matching_actors(input_img, title, n_matches: int = 10):
+    best_matches_list = get_best_matches(input_img, n_matches=n_matches)
+    best_matches = best_matches_list[0]  # TODO: allow looping through characters
+    # Show how the initial image was parsed (ie: which person is displayed)
+    # Build htmls to display the result
+    output_htmls = []
+    for match in best_matches["matches"]:
+        actor = actors_mapping[match]
+        output_htmls.append(get_image_html(actor))
+    return output_htmls
+iface = gr.Interface(
+    find_matching_actors,
+    title="Which actor or actress looks like you?",
+    description="""Who is the best person to play a movie about you? Upload a picture and find out!
+    Or maybe you'd like to know who would best interpret your favorite historical character?
+    Give it a shot or try one of the sample images below.""",
+    inputs=[
+        gr.inputs.Image(shape=(256, 256), label="Your image"),
+        gr.inputs.Textbox(label="Who's that?", placeholder="Optional, you can leave this blank"),
+        #gr.inputs.Slider(minimum=1, maximum=10, step=1, default=5, label="Number of matches"),
+    ],
+    outputs=gr.outputs.Carousel(gr.outputs.HTML(), label="Matching actors & actresses"),
+    examples=[
+        ["images/example_marie_curie.jpg", "Marie Curie"],
+        ["images/example_hannibal_barca.jpg", "Hannibal (the one with the elephants...)"],
+        ["images/example_scipio_africanus.jpg", "Scipio Africanus"],
+        ["images/example_joan_of_arc.jpg", "Jeanne d'Arc"]
+    ]
+)
+iface.launch()

images/example_hannibal_barca.jpg ADDED Viewed

images/example_joan_of_arc.jpg ADDED Viewed

images/example_marie_curie.jpg ADDED Viewed

images/example_scipio_africanus.jpg ADDED Viewed

models/actors_annoy_index.ann ADDED Viewed

Binary file (1.52 MB). View file

models/actors_annoy_metadata.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"metric": "angular"}

models/actors_mapping.json ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt CHANGED Viewed

@@ -1,7 +1,9 @@
-# Inference
 cmake  # required for dlib (used by face_recognition)
 face_recognition
 annoy
 # Preprocessing
 microsoft-bing-imagesearch

+# App
 cmake  # required for dlib (used by face_recognition)
 face_recognition
 annoy
+matplotlib
+gradio
 # Preprocessing
 microsoft-bing-imagesearch