code / misc /lists /PROFESSION_CATEGORIES.md
Laura Wagner
to commit or not commit that is the question
5f5806d

Profession Categories for Deepfake Adapter Classification

Overview

The LLM annotation uses 9 specific profession categories to classify people found in deepfake adapter datasets. These categories cover the most common types of public figures targeted by deepfake technology.

The 9 Categories

1. actor

Film, TV, and theater performers who primarily work in scripted dramatic content.

Examples:

  • Emma Watson (film actor)
  • Scarlett Johansson (film actor)
  • Bryan Cranston (TV actor)

Keywords: actor, actress, film, movie, cinema, theatrical, performer, drama


2. adult performer

People working in the adult entertainment industry.

Examples:

  • Adult film actors
  • OnlyFans creators
  • Cam models

Keywords: adult entertainer, onlyfans, cam model, camgirl, webcam, pornographic actor, porn, pornstar

Note: This category is important for research on unauthorized deepfake usage, which disproportionately affects adult performers.


3. singer/musician

Vocalists, instrumentalists, and music performers across all genres.

Examples:

  • IU (K-pop singer)
  • Taylor Swift (singer/songwriter)
  • DJ Khaled (DJ/producer)

Keywords: singer, musician, rapper, rap artist, band, vocalist, songwriter, composer, DJ, producer, kpop, jpop


4. model

Fashion, runway, and photoshoot models.

Examples:

  • Gigi Hadid (fashion model)
  • Tyra Banks (supermodel)
  • Ashley Graham (plus-size model)

Keywords: model, fashion, runway, photoshoot, supermodel


5. online personality

Digital content creators, streamers, and influencers.

Includes:

  • Streamers (Twitch, YouTube)
  • Cosplayers
  • YouTubers
  • Instagram influencers
  • Content creators
  • E-girls/E-boys
  • Gaming personalities

Examples:

  • Belle Delphine (online personality, cosplayer)
  • Pokimane (Twitch streamer)
  • MrBeast (YouTuber)

Keywords: influencer, streamer, twitch, youtuber, youtube, content creator, instagrammer, instagram, e-girl, egirl, e-boy, eboy, cosplayer, gamer


6. public figure

Politicians, activists, journalists, authors, and other public-facing professionals not in entertainment.

Examples:

  • Barack Obama (politician)
  • Greta Thunberg (activist)
  • Gordon Ramsay (chef)
  • J.K. Rowling (author)

Keywords: politician, activist, public speaker, journalist, author, writer, chef, famous, celebrity (when not in entertainment)


7. voice actor/ASMR

Voice performers for animation, games, and ASMR content creators.

Examples:

  • Tara Strong (voice actress)
  • Troy Baker (voice actor)
  • ASMR artists

Keywords: voice actor, voice actress, asmr creator, asmr


8. sports professional

Professional athletes and sports competitors.

Examples:

  • Cristiano Ronaldo (soccer player)
  • Serena Williams (tennis player)
  • LeBron James (basketball player)

Keywords: athlete, sports, player, professional, competitor, olympian


9. tv personality

TV hosts, presenters, reality TV stars, and broadcast personalities.

Examples:

  • Oprah Winfrey (talk show host)
  • Kim Kardashian (reality TV)
  • Jimmy Fallon (late night host)

Keywords: tv host, tv moderator, talk show host, talkshow, radio host, media personality, reality tv star, reality, comedian, presenter, broadcaster, anchor


Multi-Category Classification

Many public figures work across multiple categories. The LLM can assign up to 3 professions per person, ordered by relevance.

Examples:

Person Professions Explanation
Emma Watson actor, public figure Primarily an actor, but also known for activism
IU (Lee Ji-eun) singer/musician, actor K-pop singer who also acts in TV dramas
Belle Delphine online personality, adult performer Internet personality with adult content
Jamie Foxx actor, singer/musician Actor who also has a music career
Dwayne Johnson actor, sports professional Former wrestler, now primarily an actor

Category Selection Guidelines

For the LLM:

  1. Choose most specific category first

    • "actor" for film performers
    • "public figure" for someone primarily known as an actor
  2. Order by relevance

    • Most important role first
    • Secondary roles after
    • Maximum 3 categories
  3. Be inclusive for online personalities

    • Streamers → "online personality"
    • Cosplayers → "online personality"
    • Influencers → "online personality"
  4. Distinguish TV personalities from actors

    • "tv personality" for talk show hosts
    • "actor" for scripted TV drama performers
    • Some can be both!

Why These 9 Categories?

These categories were chosen based on:

  1. Common targets of deepfake technology

    • Celebrities and public figures are most frequently deepfaked
    • Adult performers are disproportionately affected
  2. Clear distinctions

    • Each category represents a distinct professional domain
    • Minimal overlap between categories
  3. Research relevance

    • Important for analyzing demographic patterns in deepfake usage
    • Helps understand which professions are most at risk
  4. Comprehensive coverage

    • Covers the vast majority of people in the deepfake adapter dataset
    • Includes both traditional and digital-native celebrities

Output Format

The LLM returns professions as a comma-separated list:

"singer/musician, actor"
"online personality"
"actor, public figure, online personality"
"sports professional"

Validation

After annotation, we can analyze:

  • Distribution across categories
  • Multi-category patterns
  • Correlation with countries/regions
  • Demographic patterns

Example analysis:

# Count by category
df['profession_llm'].value_counts()

# Most common combinations
df['profession_llm'].value_counts().head(20)

# Filter by specific category
actors = df[df['profession_llm'].str.contains('actor', na=False)]

Edge Cases

Fictional Characters

Handling: Return "Unknown" for all fields

Example:

  • Input: "Elsa from Frozen"
  • Output: profession_llm: "Unknown"

Unclear/Ambiguous Cases

Handling: Use best judgment based on primary public recognition

Example:

  • Input: "Elon Musk"
  • Primary recognition: Business/technology
  • Category: "public figure"

Multiple Equally Important Roles

Handling: List all relevant categories (up to 3)

Example:

  • Input: "Donald Glover / Childish Gambino"
  • Categories: "actor, singer/musician, tv personality"

Notes for Researchers

When analyzing the annotated data:

  1. Category frequency indicates which professions are most targeted
  2. Multi-category entries show crossover between industries
  3. Online personality prevalence indicates rise of digital-native celebrities in deepfakes
  4. Adult performer numbers highlight ongoing issues with non-consensual deepfakes

Version History

  • v1.0 (Current): Initial 9-category system
    • Refined from original detailed profession list
    • Focused on clear, distinct categories
    • Added comprehensive documentation