Buckets:

tarekziade's picture
|
download
raw
2.46 kB
# transformers CI telemetry
Public, daily-partitioned snapshot of the transformers CI test telemetry
collected by the pytest observability stack (OpenTelemetry → Tempo). Refreshed
**hourly**. Schema version **v1**.
This is derived from raw test-execution traces so you can build apps and
analyses on top of CI data without access to the internal stack.
## Layout
```
current_view.json manifest: schema_version, updated_at, partitions, totals
README.md this data card
daily/
<YYYY-MM-DD>/ partition = UTC day of the run's start
test_rows.parquet one row per (trace_id, test_nodeid)
run_rollups.parquet one row per (run_id, test_job)
traces/
<trace_id>.json raw Jaeger-shaped trace (full fidelity)
```
The bucket is the long-term archive; it keeps full history independent of the
stack's operational retention.
## `test_rows` columns
- `ts`
- `date`
- `service_name`
- `provider`
- `pr`
- `run_id`
- `trace_id`
- `test_job`
- `test_nodeid`
- `test_module`
- `test_class`
- `test_function`
- `test_line`
- `model`
- `gpu`
- `status_code`
- `duration_seconds`
- `exception_type`
- `exception_message`
- `exception_stacktrace`
`model` is derived from `tests/models/<model>/...` nodeids (empty otherwise).
`gpu` is derived from the job name (`single` / `multi` / empty).
`status_code` is the OTEL span status: `OK` / `ERROR` / `UNSET`.
`exception_message` and `exception_stacktrace` are the **full, untruncated**
failure text.
## `run_rollups` columns
- `date`
- `service_name`
- `provider`
- `pr`
- `run_id`
- `test_job`
- `total_tests`
- `passed_tests`
- `failed_tests`
- `duration_seconds`
- `start_time`
- `end_time`
- `job_count`
- `commit_sha`
- `commit_message`
`job_count` is the number of distinct jobs that contributed tests to the run.
## Load examples
```python
import pandas as pd
df = pd.read_parquet(
"hf://buckets/huggingface/transformers-ci-telemetry/daily/2026-06-08/test_rows.parquet"
)
```
```sql
-- DuckDB, straight from the bucket
SELECT model, count(*) FILTER (WHERE status_code = 'ERROR') AS fails
FROM 'hf://buckets/huggingface/transformers-ci-telemetry/daily/*/test_rows.parquet'
GROUP BY model ORDER BY fails DESC;
```
## Notes
- Coverage reflects what the stack actually traced — uninstrumented workflows
are absent here too.
- Columns are additive across schema versions; `current_view.json.schema_version`
is bumped on any change.

Xet Storage Details

Size:
2.46 kB
·
Xet hash:
aded3763be5c16ec41cee5ef80080db6a705b270ff801ff80000bf519bf152b0

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.