Buckets:
323 GB
4 files
Updated 8 days ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| .gitattributes | 2.5 kB xet | 738f1125 | |
| README.md | 1.89 kB xet | ee3519b4 | |
| binaries.tar.zst | 216 GB xet | 532dc612 | |
| deephistory.duckdb.tar.zst | 107 GB xet | bde893de |
Assemblage-Deep-History
Temporal convered cross build binary dataset
Scope
- 73,610 binaries spanning 248 open-source projects
- Multiple compilers, optimization levels, Linux and Windows
- Multi-year version histories
- 329 CVEs
Changelog
2026 May: A bug in function name matching caused mismatches when writing affected function IDs to the database. It leads to missing / incorrect function IDs in cve table. We are uploading a fixed version using DuckDB, please use the latest version!
Contents
| File | Purpose |
|---|---|
binaries.tar.zst |
All binary files. Layout binaries/<XX>/<YY>[/<ZZ>]/<filename> matches the binaries.path column. ELF binaries are stripped of symbols; for PE binaries the matching PDBs are held separately and referenced by the pdbs table. |
deephistory.duckdb.tar.zst |
DuckDB database with all metadata. Tables: binaries, functions, pdbs, rvas, lines, cve_binary_function. |
Loading
# stream-extract directly (no intermediate .tar on disk)
tar -I zstd -xf binaries.tar.zst
tar -I zstd -xf deephistory.duckdb.tar.zst
import duckdb
con = duckdb.connect("deephistory.duckdb", read_only=True)
# every build that contains the vulnerable functions for a given CVE
con.execute("""
SELECT b.path, b.platform, b.optimization, b.toolset_version,
b.package_name, b.version, c.function_name
FROM cve_binary_function c
JOIN binaries b ON b.id = c.binary_id
WHERE c.cve_id = 'CVE-2013-0340'
""").fetchall()
License
CC0-1.0. Underlying source projects retain their original upstream
licenses; per-binary license information is in binaries.license.
- Total size
- 323 GB
- Files
- 4
- Last updated
- Jun 22
- Pre-warmed CDN
- US EU US EU