YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PoC β fastavro unbounded allocation via file-declared array/map block count (CWE-770/789/400)
Minimal proof-of-concept model file for a denial-of-service bug in the Avro
format reader shipped by the fastavro PyPI package. A 178-byte .avro
file makes fastavro.reader(...) allocate memory without bound until
MemoryError / OOM β the work is declared in the file, not contained in it.
- Format: Avro (
.avro) - Parser:
fastavro(PyPI) - Affected version:
fastavro==1.12.2(latest on PyPI, 2026-06-20); bug also present atmasterHEADecea658(2026-06-09) - Class: Uncontrolled resource consumption / unbounded allocation β CWE-770, CWE-789, CWE-400 (memory-exhaustion DoS; no memory corruption)
Reproduce
pip install fastavro==1.12.2
python -c "from fastavro import reader; list(reader(open('mal.avro','rb')))"
# RSS climbs without bound -> MemoryError (under a vmem cap) or OOM-kill
generate_poc.py rebuilds mal.avro (and a ctrl.avro control) from scratch.
On Linux, bound the run so it raises instead of OOM-killing the host:
( ulimit -v 1500000; python -c "from fastavro import reader; list(reader(open('mal.avro','rb')))" )
# -> MemoryError at fastavro/_read.pyx:405 (read_array)
Measured
- Control (
ctrl.avro, 173 bytes, well-formed): parses in <1 ms, RSS flat. - Malicious (
mal.avro, 178 bytes, declared arrayblock_count = 2**40): RSS climbed to 818 MB in 10 s and was still climbing linearly (macOS, fastavro 1.12.2, CPython 3.14 Cython wheel β watchdog-killed for safety). Under a Linuxulimit -vcap it raisesMemoryErrorat the sink (fastavro/_read.pyx:405,read_array).
Root cause
fastavro decodes an Avro array (or map) as a series of blocks. It reads a
block count straight from the file body and loops that many times, appending
each decoded item to an in-memory list, with no bound on the count and no
check that the file contains enough bytes for it. When the item type reads
zero bytes (null), every iteration consumes no input, so EOF is never
reached and the loop runs the full attacker-declared count (up to 2^63β1).
Cython hot path (the path the installed wheel runs), fastavro/_read.pyx,
read_array:
:386 block_count = read_long(fo) # array block count, read straight from the file body
:388 while block_count != 0:
:395 for i in range(block_count): # no bound vs bytes remaining
:405 read_items.append(_read_data(...)) # eager list growth; a `null` item consumes 0 bytes
read_map is structurally identical (_read.pyx:467+). The pure-Python
implementation has the same shape: _read_py.py:330 (for item in decoder.iter_array()) drives the append, and the count comes from
io/binary_decoder.py:101 (self._block_count = self.read_long()) consumed at
:122 (for i in range(self._block_count): yield). The negative-count form
(binary_decoder.py:117-120) reads a block byte-size but leaves it unused β
it neither skips nor bounds, so it is equally unbounded. The LONG_MAX_VALUE
constant exists only in the writer-side validation path
(fastavro/_validation*.py), never in the read path.
Impact
Any service that parses an untrusted .avro with fastavro (a feature-store
ingest path, a dataset-upload validator, a streaming consumer) is DoS-able by a
tiny file β no large upload required; the work is attacker-declared.
Fix
Bound the declared block count against the bytes actually remaining before
looping (and/or cap the per-block element count), raising on violation β the
same remediation every peer Avro implementation adopted for this exact class:
CVE-2023-39410 (Apache Avro Java), CVE-2022-35724 (Rust), and
CVE-2021-43045 (.NET), all fixed by adding bounds. The class was never fixed
in fastavro.