472 kB

Ctrl+K

1 contributor

History: 62 commits

AhmedSohair

Add paper draft: Fine-Tuning Small LLMs for Personal Attribute Inference

336c57d verified 28 days ago

.gitattributes

1.52 kB
initial commit about 1 month ago
HOW_TO_RUN_LOCAL.md

11.4 kB
Add local setup & run guide about 1 month ago
PAPER_DRAFT.md

16.5 kB
Add paper draft: Fine-Tuning Small LLMs for Personal Attribute Inference 28 days ago
README.md

3.13 kB
Add README with instructions and baselines about 1 month ago
TRAINING_LOG.md

86.4 kB
Add v6 results (46.7%), multi-task SFT literature review (6 papers), v7 research-backed design about 1 month ago
compute_v7_filtered.py

3.24 kB
Add v7 filtered evaluation script (cert>=3 comparison with AutoProfiler) 28 days ago
evaluate_pan15.py

18.5 kB
Add PAN15 evaluation script: age + gender on 294 real Twitter users about 1 month ago
evaluate_pan15_v6.py

13.9 kB
Add PAN15 v6 eval: single-inference multi-attribute for real Twitter about 1 month ago
evaluate_pan15_v7.py

13.8 kB
Fix PAN15 v7 eval: add checkpointing, timeout per inference, reduce max tokens to prevent hangs 28 days ago
evaluate_synthpai.py

16.6 kB
Fix eval script: proper LoRA loading (base + PeftModel), greedy decoding, truncation for long comments about 1 month ago
evaluate_synthpai_v3.py

23.4 kB
Fix eval prompt mismatch: replace reasoning instruction with explicit JSON output format (matches training target) about 1 month ago
evaluate_synthpai_v4.py

20.1 kB
Add v4 evaluation script (matching JSON prompt + age buckets + v2/v3/v4/GPT-4 comparison) about 1 month ago
evaluate_synthpai_v5.py

28.9 kB
Fix reasoning extraction: capture reasoning/evidence from JSON fields, not just <think> blocks about 1 month ago
evaluate_synthpai_v6.py

29.5 kB
Fix v6 eval: combine separate city+country JSON keys, capture plain text analysis about 1 month ago
generate_holistic_traces.py

17.8 kB
Fix null check on failed trace generation in preview mode about 1 month ago
generate_reasoning_traces.py

20.7 kB
Add --preview mode: generate N sample traces for quality inspection before full run about 1 month ago
train_synthpai.py

14.1 kB
Speed optimizations: packing=True, SDPA attention, Liger kernel, larger batch size about 1 month ago
train_synthpai_v2.py

19.5 kB
fix: disable trackio (crashes on empty LoRA rank_pattern during hub push); use plain logging instead about 1 month ago
train_synthpai_v3.py

22.8 kB
Add v3 training script: DFT loss + NEFTune + minority oversampling about 1 month ago
train_synthpai_v4.py

20.7 kB
Add v4 training script: JSON-only prompt + age buckets (fixes v3 prompt mismatch + age mode collapse) about 1 month ago
train_synthpai_v5.py

26.3 kB
Add v5 script: train_synthpai_v5.py about 1 month ago
train_synthpai_v6.py

19.3 kB
Add v6 training script: holistic multi-attribute profiling with cross-attribute reasoning about 1 month ago
train_synthpai_v7.py

23.5 kB
V7 training script: research-backed combined single+multi attribute with unified format, upsampled multi-attr, oversampling fixes about 1 month ago