323 GB
4 files
Updated 8 days ago
README.md

Assemblage-Deep-History

Temporal convered cross build binary dataset

Scope

  • 73,610 binaries spanning 248 open-source projects
  • Multiple compilers, optimization levels, Linux and Windows
  • Multi-year version histories
  • 329 CVEs

Changelog

2026 May: A bug in function name matching caused mismatches when writing affected function IDs to the database. It leads to missing / incorrect function IDs in cve table. We are uploading a fixed version using DuckDB, please use the latest version!

Contents

File Purpose
binaries.tar.zst All binary files. Layout binaries/<XX>/<YY>[/<ZZ>]/<filename> matches the binaries.path column. ELF binaries are stripped of symbols; for PE binaries the matching PDBs are held separately and referenced by the pdbs table.
deephistory.duckdb.tar.zst DuckDB database with all metadata. Tables: binaries, functions, pdbs, rvas, lines, cve_binary_function.

Loading

# stream-extract directly (no intermediate .tar on disk)
tar -I zstd -xf binaries.tar.zst
tar -I zstd -xf deephistory.duckdb.tar.zst
import duckdb
con = duckdb.connect("deephistory.duckdb", read_only=True)

# every build that contains the vulnerable functions for a given CVE
con.execute("""
    SELECT b.path, b.platform, b.optimization, b.toolset_version,
           b.package_name, b.version, c.function_name
    FROM cve_binary_function c
    JOIN binaries b ON b.id = c.binary_id
    WHERE c.cve_id = 'CVE-2013-0340'
""").fetchall()

License

CC0-1.0. Underlying source projects retain their original upstream licenses; per-binary license information is in binaries.license.

Total size
323 GB
Files
4
Last updated
Jun 22
Pre-warmed CDN
US EU US EU

Contributors