Buckets:

huggingface
/

transformers-ci-telemetry

Files

xet

huggingface/transformers-ci-telemetry / README.md

tarekziade

27 minutes ago

preview code

download

raw

2.46 kB

	# transformers CI telemetry

	Public, daily-partitioned snapshot of the transformers CI test telemetry
	collected by the pytest observability stack (OpenTelemetry → Tempo). Refreshed
	hourly. Schema version v1.

	This is derived from raw test-execution traces so you can build apps and
	analyses on top of CI data without access to the internal stack.

	## Layout

	```
	current_view.json manifest: schema_version, updated_at, partitions, totals
	README.md this data card
	daily/
	<YYYY-MM-DD>/ partition = UTC day of the run's start
	test_rows.parquet one row per (trace_id, test_nodeid)
	run_rollups.parquet one row per (run_id, test_job)
	traces/
	<trace_id>.json raw Jaeger-shaped trace (full fidelity)
	```

	The bucket is the long-term archive; it keeps full history independent of the
	stack's operational retention.

	## `test_rows` columns

	- `ts`
	- `date`
	- `service_name`
	- `provider`
	- `pr`
	- `run_id`
	- `trace_id`
	- `test_job`
	- `test_nodeid`
	- `test_module`
	- `test_class`
	- `test_function`
	- `test_line`
	- `model`
	- `gpu`
	- `status_code`
	- `duration_seconds`
	- `exception_type`
	- `exception_message`
	- `exception_stacktrace`

	`model` is derived from `tests/models/<model>/...` nodeids (empty otherwise).
	`gpu` is derived from the job name (`single` / `multi` / empty).
	`status_code` is the OTEL span status: `OK` / `ERROR` / `UNSET`.
	`exception_message` and `exception_stacktrace` are the full, untruncated
	failure text.

	## `run_rollups` columns

	- `date`
	- `service_name`
	- `provider`
	- `pr`
	- `run_id`
	- `test_job`
	- `total_tests`
	- `passed_tests`
	- `failed_tests`
	- `duration_seconds`
	- `start_time`
	- `end_time`
	- `job_count`
	- `commit_sha`
	- `commit_message`

	`job_count` is the number of distinct jobs that contributed tests to the run.

	## Load examples

	```python
	import pandas as pd
	df = pd.read_parquet(
	"hf://buckets/huggingface/transformers-ci-telemetry/daily/2026-06-08/test_rows.parquet"
	)
	```

	```sql
	-- DuckDB, straight from the bucket
	SELECT model, count(*) FILTER (WHERE status_code = 'ERROR') AS fails
	FROM 'hf://buckets/huggingface/transformers-ci-telemetry/daily/*/test_rows.parquet'
	GROUP BY model ORDER BY fails DESC;
	```

	## Notes

	- Coverage reflects what the stack actually traced — uninstrumented workflows
	are absent here too.
	- Columns are additive across schema versions; `current_view.json.schema_version`
	is bumped on any change.

Xet Storage Details

Size:: 2.46 kB
Xet hash:: aded3763be5c16ec41cee5ef80080db6a705b270ff801ff80000bf519bf152b0

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.