# External Tool Benchmark Status

This file tracks the apples-to-apples benchmark setup for external tools
on the same held-out BacDive/MediaDive strains used by the dry-lab media
recommender benchmark.

## Held-Out Manifest

- Manifest: `artifacts/external_benchmark_manifest.parquet`
- Rows: 25,728
- Unique genome accessions: 16,154
- Media labels retained: 40
- Fold counts: {"0": 5146, "1": 5146, "2": 5146, "3": 5145, "4": 5145}

Label coverage:

| Target | Labeled rows |
|---|---:|
| Temperature | 25,727 |
| pH | 2,984 |
| Salt | 2,486 |
| Oxygen | 9,283 |
| Medium | 21,050 |

## Local Requirements

- FASTA directory: `data/external_benchmark_fastas`
- FASTAs present: 8 / 16,154 (0.05%)
- FASTA download smoke run: {"attempted": 0, "downloaded": 0, "failed": 0}

| Tool | Local command | Status |
|---|---|---|
| GenomeSPOT | `uv run python -m genome_spot.genome_spot` | available |
| CarveMe | `uv run --with carveme carve` | available |
| gapseq | `` | missing |

## Verdict

External baseline execution is not ready on this machine yet: the full held-out FASTA set and one or more external tool binaries/databases are missing.

## Next Commands

Use the manifest to run each external tool against the same rows and folds.
The medium-feasibility tools should be scored by whether at least one known
MediaDive medium is feasible or closest among the tool's predicted feasible
media/metabolite environments.

```bash
PYTHONPATH=src uv run --python 3.11 python scripts/42_prepare_external_benchmarks.py \
  --download-fastas 10
```

For the full benchmark, download the complete FASTA set into the FASTA
directory above, install the external tools plus their databases, then run
tool-specific inference using the `bacdive_id`, `fold`, and
`genome_accession` columns from the manifest.