"""Sample ~8,000 AMR records from SynGenome with full ORFs (start AND stop codons present) and emit JSONL files in the same schema as the VFDB pipeline so they slot into the existing Modal embed flow. Sampling strategy: take ALL records from rare drug classes (anything with <1k records) so probes/SAE see all 13 drug-class secondary_labels. Sample the remainder budget from macrolide (which dominates with 14,591 records). Output: data/targeted_jsonl/syngenome/.jsonl Each record's mag_id = drug_class slug so embed_*_lean's //