YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Apache ORC String Length Integer Overflow (CWE-190)

Vulnerability

Root cause: StringDirectColumnReader::computeSize() in ColumnReader.cc:697 casts int64_t string lengths to size_t without validating for negative values.

String lengths are decoded by an unsigned RLE decoder (isSigned=false) but stored into int64_t* arrays. When a crafted .orc file encodes a length >= 2^63, the value becomes a negative int64_t. static_cast<size_t>(negative) produces a huge positive value near SIZE_MAX.

Vulnerable Code

// ColumnReader.cc:691-706
size_t totalLength = 0;
for (size_t i = 0; i < numValues; ++i) {
    totalLength += static_cast<size_t>(lengths[i]);  // NO NEGATIVE CHECK!
}
// ...
byteBatch.blob.resize(totalLength);  // OOM or undersized

Impact

  • DoS via OOM: Single huge length β†’ blob.resize(9.2 exabytes) β†’ crash
  • Wild pointers: Two lengths wrapping totalLength to 0 β†’ empty blob β†’ ptr += negative_length β†’ OOB read
  • Stripe offset overflow: Reader.cc:591 β€” uint64 addition overflow, no checked arithmetic
  • ORC C++ has zero safe arithmetic in 3600+ lines of core parsing code
  • Used by Apache Hive, Spark, Presto β€” ORC files from external sources

Fix

Add if (lengths[i] < 0) throw ParseError(...) before the cast.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support