Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
AhmedSohair
/
synthpai-training
like
0
arxiv:
2505.12402
arxiv:
2310.07298
arxiv:
2406.07217
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
synthpai-training
472 kB
Ctrl+K
Ctrl+K
1 contributor
History:
62 commits
AhmedSohair
Add paper draft: Fine-Tuning Small LLMs for Personal Attribute Inference
336c57d
verified
28 days ago
.gitattributes
Safe
1.52 kB
initial commit
about 1 month ago
HOW_TO_RUN_LOCAL.md
Safe
11.4 kB
Add local setup & run guide
about 1 month ago
PAPER_DRAFT.md
16.5 kB
Add paper draft: Fine-Tuning Small LLMs for Personal Attribute Inference
28 days ago
README.md
Safe
3.13 kB
Add README with instructions and baselines
about 1 month ago
TRAINING_LOG.md
Safe
86.4 kB
Add v6 results (46.7%), multi-task SFT literature review (6 papers), v7 research-backed design
about 1 month ago
compute_v7_filtered.py
Safe
3.24 kB
Add v7 filtered evaluation script (cert>=3 comparison with AutoProfiler)
28 days ago
evaluate_pan15.py
Safe
18.5 kB
Add PAN15 evaluation script: age + gender on 294 real Twitter users
about 1 month ago
evaluate_pan15_v6.py
Safe
13.9 kB
Add PAN15 v6 eval: single-inference multi-attribute for real Twitter
about 1 month ago
evaluate_pan15_v7.py
Safe
13.8 kB
Fix PAN15 v7 eval: add checkpointing, timeout per inference, reduce max tokens to prevent hangs
28 days ago
evaluate_synthpai.py
Safe
16.6 kB
Fix eval script: proper LoRA loading (base + PeftModel), greedy decoding, truncation for long comments
about 1 month ago
evaluate_synthpai_v3.py
Safe
23.4 kB
Fix eval prompt mismatch: replace reasoning instruction with explicit JSON output format (matches training target)
about 1 month ago
evaluate_synthpai_v4.py
Safe
20.1 kB
Add v4 evaluation script (matching JSON prompt + age buckets + v2/v3/v4/GPT-4 comparison)
about 1 month ago
evaluate_synthpai_v5.py
Safe
28.9 kB
Fix reasoning extraction: capture reasoning/evidence from JSON fields, not just <think> blocks
about 1 month ago
evaluate_synthpai_v6.py
Safe
29.5 kB
Fix v6 eval: combine separate city+country JSON keys, capture plain text analysis
about 1 month ago
generate_holistic_traces.py
Safe
17.8 kB
Fix null check on failed trace generation in preview mode
about 1 month ago
generate_reasoning_traces.py
Safe
20.7 kB
Add --preview mode: generate N sample traces for quality inspection before full run
about 1 month ago
train_synthpai.py
Safe
14.1 kB
Speed optimizations: packing=True, SDPA attention, Liger kernel, larger batch size
about 1 month ago
train_synthpai_v2.py
Safe
19.5 kB
fix: disable trackio (crashes on empty LoRA rank_pattern during hub push); use plain logging instead
about 1 month ago
train_synthpai_v3.py
Safe
22.8 kB
Add v3 training script: DFT loss + NEFTune + minority oversampling
about 1 month ago
train_synthpai_v4.py
Safe
20.7 kB
Add v4 training script: JSON-only prompt + age buckets (fixes v3 prompt mismatch + age mode collapse)
about 1 month ago
train_synthpai_v5.py
26.3 kB
Add v5 script: train_synthpai_v5.py
about 1 month ago
train_synthpai_v6.py
Safe
19.3 kB
Add v6 training script: holistic multi-attribute profiling with cross-attribute reasoning
about 1 month ago
train_synthpai_v7.py
Safe
23.5 kB
V7 training script: research-backed combined single+multi attribute with unified format, upsampled multi-attr, oversampling fixes
about 1 month ago