Buckets:

UCL-CSSB/PlasmidRL-ICML / original_paper /paper_statistics.md
McClain's picture
|
download
raw
2.43 kB
# Paper Statistics
**Results Directory:** `results_temp_1.1`
---
## 1. QC Pass Rates
- **Base:** 5/100 (5.0%)
- **SFT:** 10/100 (10.0%)
- **RL:** 77/100 (77.0%)
## 2. Held-Out Continuation Task (Log-Probability)
| Model | Mean Log-Prob | Std | N |
|-------|---------------|-----|---|
| Base | -12.4492 | 6.1440 | 85 |
| SFT | -12.4492 | 6.1440 | 85 |
| RL | -10.9660 | 2.7415 | 85 |
### Paired T-Tests (Completion)
- Base vs SFT: t=3.945, p=1.65e-04 *** (n=85 pairs)
- Base vs RL: t=-2.474, p=1.54e-02 * (n=85 pairs)
- SFT vs RL: t=-2.474, p=1.54e-02 * (n=85 pairs)
### Alignment Tax Analysis
- **Base vs RL (paired t-test)**
- N pairs: 85
- Base mean: -12.4492
- RL mean: -10.9660
- Difference (RL - Base): +1.4831
- t-statistic: -2.4741
- p-value: 1.54e-02 *
- Cohen's d: 0.268 (small effect)
- **Interpretation: No alignment tax** (RL performs better)
## 3. Surprisal Benchmark
| Model | Mean Log-Prob | Std | N |
|-------|---------------|-----|---|
| Base | -13.8384 | 2.5110 | 28 |
| SFT | -13.8384 | 2.5110 | 28 |
| RL | -11.6637 | 1.8210 | 28 |
### Paired T-Tests (Surprisal)
- Base vs SFT: t=7.408, p=5.72e-08 *** (n=28 pairs)
- Base vs RL: t=-5.128, p=2.16e-05 *** (n=28 pairs)
- SFT vs RL: t=-5.128, p=2.16e-05 *** (n=28 pairs)
## 4. Distribution Metrics
### 3-mer KL Divergence from Real Plasmids
| Model | KL(Model||Real) | Mean JS Divergence |
|-------|-----------------|--------------------|
| Base | 0.0142 | 0.1037 |
| SFT | 0.0084 | 0.1074 |
| RL | 0.0106 | 0.0866 |
### GC Content
| Model | Mean GC | Std |
|-------|---------|-----|
| Real | 0.5172 | 0.0291 |
| Base | 0.4779 | 0.0336 |
| SFT | 0.4953 | 0.0501 |
| RL | 0.5185 | 0.0293 |
### Sequence Length
| Model | Mean Length | Std | Median |
|-------|-------------|-----|--------|
| Real | 7273 | 2924 | 6690 |
| Base | 7482 | 3829 | 7168 |
| SFT | 5812 | 4202 | 4409 |
| RL | 7107 | 1489 | 6668 |
### MFE Density (Thermodynamic Stability)
| Model | Mean MFE/nt | Std |
|-------|-------------|-----|
| Real | -0.3643 | 0.0267 |
| Base | -0.3597 | 0.0652 |
| SFT | -0.3577 | 0.0728 |
| RL | -0.3622 | 0.0338 |
## 5. Diversity Metrics
| Model | Pass Rate (%) | Diversity Score |
|-------|---------------|----------------|
| Base | 5.0 | 0.9150 |
| SFT | 10.0 | 0.8964 |
| GRPO | 77.0 | 0.5878 |
## 6. Sample Counts
| Category | Count |
|----------|-------|
| Real | 254 |
| Base | 53 |
| SFT | 100 |
| RL | 100 |
| **Total** | 507 |

Xet Storage Details

Size:
2.43 kB
·
Xet hash:
fc735048747d738b96b94e2c4796b7b1c6217c68c5c2e462c1ce6273d0fcea54

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.