--- language: de license: mit tags: - classification - lobbying - linkedin datasets: custom --- # Lobbyist classifier (German) Binary sequence classifier fine-tuned to predict whether a LinkedIn-style job position (title + employer + description) corresponds to a **lobbyist** (1) or not (0). Trained for the project "Who Becomes a Lobbyist?" (MINISTERIALLOBBY) on Revelio/LinkedIn position text, with labels from the German Bundestag lobby register (DE) or LobbyView (US). - **Base model:** `distilbert-base-german-cased` - **Task:** Sequence classification (2 labels: non-lobbyist, lobbyist) - **Max length:** 256 tokens ## Evaluation (5-fold CV) - Mean F1: 0.8455 (± 0.0035) - Fold F1 scores: [0.8467809952206916, 0.8434272955623779, 0.8514680483592401, 0.8410428931875525, 0.8445796460176991] - Training samples: 17824 (positive: 8912) ## Intended use - Research: classify past or current job positions as lobby vs non-lobby for career-path and panel analyses. - Not for commercial use without checking compliance with LinkedIn/Revelio terms. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch repo_id = "cornelius/lobbyist-classifier-de" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForSequenceClassification.from_pretrained(repo_id) def predict(texts, threshold=0.95): inp = tokenizer(texts, truncation=True, max_length=256, padding="max_length", return_tensors="pt") with torch.no_grad(): logits = model(**inp).logits probs = torch.softmax(logits, dim=1) return probs[:, 1].numpy() # prob lobbyist # Single position: title + " " + company + " " + description text = "Senior Public Affairs Manager Acme Corp Government relations and advocacy." prob = predict([text])[0] print(f"P(lobbyist) = {prob:.2f}") ``` ## Citation If you use this model, please cite the paper "Who Becomes a Lobbyist? Comparative Evidence from the US and Germany" (MINISTERIALLOBBY project, DFG).