File size: 1,996 Bytes
b2580c5
9ebe907
 
 
e4dc9fc
 
 
9ebe907
b2580c5
 
9ebe907
b2580c5
9ebe907
b2580c5
9ebe907
 
 
b2580c5
9ebe907
b2580c5
9ebe907
 
 
b2580c5
9ebe907
b2580c5
9ebe907
 
b2580c5
9ebe907
b2580c5
9ebe907
 
 
b2580c5
9ebe907
 
 
b2580c5
9ebe907
 
 
 
 
 
b2580c5
9ebe907
 
 
 
 
b2580c5
9ebe907
b2580c5
e4dc9fc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
language: en
license: mit
tags:
- classification
- lobbying
- linkedin
datasets: custom
---

# Lobbyist classifier (English (US))

Binary sequence classifier fine-tuned to predict whether a LinkedIn-style job position (title + employer + description) corresponds to a **lobbyist** (1) or not (0). Trained for the project "Who Becomes a Lobbyist?" (MINISTERIALLOBBY) on Revelio/LinkedIn position text, with labels from the German Bundestag lobby register (DE) or LobbyView (US).

- **Base model:** `distilbert-base-uncased`
- **Task:** Sequence classification (2 labels: non-lobbyist, lobbyist)
- **Max length:** 256 tokens

## Evaluation (5-fold CV)

- Mean F1: 0.8942 (± 0.0025)
- Fold F1 scores: [0.8954220915581689, 0.891170431211499, 0.8943089430894309, 0.8919135308246597, 0.8982282653481665]
- Training samples: 12834 (positive: 6417)

## Intended use

- Research: classify past or current job positions as lobby vs non-lobby for career-path and panel analyses.
- Not for commercial use without checking compliance with LinkedIn/Revelio terms.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo_id = "cornelius/lobbyist-classifier-us"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

def predict(texts, threshold=0.95):
    inp = tokenizer(texts, truncation=True, max_length=256, padding="max_length", return_tensors="pt")
    with torch.no_grad():
        logits = model(**inp).logits
    probs = torch.softmax(logits, dim=1)
    return probs[:, 1].numpy()  # prob lobbyist

# Single position: title + " " + company + " " + description
text = "Senior Public Affairs Manager  Acme Corp  Government relations and advocacy."
prob = predict([text])[0]
print(f"P(lobbyist) = {prob:.2f}")
```

## Citation

If you use this model, please cite the paper "Who Becomes a Lobbyist? Comparative Evidence from the US and Germany" (MINISTERIALLOBBY project, DFG).