Automated Profile Inference with Language Model Agents
Paper • 2505.12402 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Fine-tune Qwen2.5-7B-Instruct on SynthPAI for personal attribute inference from online text.
pip install huggingface_hub
huggingface-cli login
# Download training script
wget https://huggingface.co/AhmedSohair/synthpai-training/resolve/main/train_synthpai.py
# Launch via HF Jobs API (uses YOUR credits, not org credits)
huggingface-cli jobs run train_synthpai.py \
--namespace AhmedSohair \
--hardware a100-large \
--timeout 6h \
--dependencies transformers trl torch datasets trackio accelerate peft bitsandbytes huggingface_hub
pip install transformers trl torch datasets trackio accelerate peft bitsandbytes huggingface_hub
huggingface-cli login
wget https://huggingface.co/AhmedSohair/synthpai-training/resolve/main/train_synthpai.py
python train_synthpai.py
Requirements: 24GB+ VRAM (L4, A10G, A100, etc.)
wget https://huggingface.co/AhmedSohair/synthpai-training/resolve/main/evaluate_synthpai.py
python evaluate_synthpai.py --model AhmedSohair/synthpai-attribute-inference-7b --split test --mode per_comment
| Component | Details |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Method | SFT + LoRA (r=64, all-linear, RSLoRA) |
| Dataset | RobinSta/SynthPAI — 7,823 comments, 300 authors |
| Training examples | |
| Attributes | age, sex, city/country, birth city/country, education, income level, occupation, relationship status |
| Split | Author-level: 240 train / 30 val / 30 test (no author leakage) |
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Effective batch size | 16 (2 × 8 grad accum) |
| Epochs | 3 |
| Max seq length | 1024 |
| Packing | True |
| Precision | bf16 |
| LoRA rank | 64 |
| LoRA alpha | 16 |
| Target modules | all-linear |
| RSLoRA | True |
| Scheduler | cosine |
| Warmup | 5% |
Source: AutoProfiler Table 5, FTI column (Staab et al. 2024)
| Attribute | FTI (GPT-4) |
|---|---|
| Age | 69.4% |
| Sex | 92.8% |
| Location | 80.0% |
| Birth place | 88.0% |
| Education | 73.0% |
| Income | 66.7% |
| Occupation | 73.9% |
| Relationship | 79.2% |
| Average | 77.9% |
Model trained on SynthPAI (CC-BY-NC-SA-4.0). Use accordingly.