File size: 1,996 Bytes
b2580c5 9ebe907 e4dc9fc 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 9ebe907 b2580c5 e4dc9fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ---
language: en
license: mit
tags:
- classification
- lobbying
- linkedin
datasets: custom
---
# Lobbyist classifier (English (US))
Binary sequence classifier fine-tuned to predict whether a LinkedIn-style job position (title + employer + description) corresponds to a **lobbyist** (1) or not (0). Trained for the project "Who Becomes a Lobbyist?" (MINISTERIALLOBBY) on Revelio/LinkedIn position text, with labels from the German Bundestag lobby register (DE) or LobbyView (US).
- **Base model:** `distilbert-base-uncased`
- **Task:** Sequence classification (2 labels: non-lobbyist, lobbyist)
- **Max length:** 256 tokens
## Evaluation (5-fold CV)
- Mean F1: 0.8942 (± 0.0025)
- Fold F1 scores: [0.8954220915581689, 0.891170431211499, 0.8943089430894309, 0.8919135308246597, 0.8982282653481665]
- Training samples: 12834 (positive: 6417)
## Intended use
- Research: classify past or current job positions as lobby vs non-lobby for career-path and panel analyses.
- Not for commercial use without checking compliance with LinkedIn/Revelio terms.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo_id = "cornelius/lobbyist-classifier-us"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
def predict(texts, threshold=0.95):
inp = tokenizer(texts, truncation=True, max_length=256, padding="max_length", return_tensors="pt")
with torch.no_grad():
logits = model(**inp).logits
probs = torch.softmax(logits, dim=1)
return probs[:, 1].numpy() # prob lobbyist
# Single position: title + " " + company + " " + description
text = "Senior Public Affairs Manager Acme Corp Government relations and advocacy."
prob = predict([text])[0]
print(f"P(lobbyist) = {prob:.2f}")
```
## Citation
If you use this model, please cite the paper "Who Becomes a Lobbyist? Comparative Evidence from the US and Germany" (MINISTERIALLOBBY project, DFG). |