400 MB
3 files
Updated 6 days ago
Name
Size
data
.gitattributes2.5 kB
xet
README.md857 Bytes
xet
README.md

CAG-Lab AWS Docs Vectors

89,221 document chunk embeddings from AWS public documentation.

  • Embedding model: OpenAI text-embedding-3-small (512 dimensions)
  • Distance metric: Cosine
  • Source: AWS public documentation chunks

Schema

Column Type Description
id string Deterministic UUID
embedding list[float32] 512-dim vector
content string Document chunk text
filePath string Original file path
chunkIndex string Chunk position
_pinecone_id string Original Pinecone vector ID

Usage

from datasets import load_dataset

ds = load_dataset("mouadja/aws-docs")

Or use with CAG-Lab:

python scripts/setup_vectordb.py
Total size
400 MB
Files
3
Last updated
Jun 20
Pre-warmed CDN
US EU US EU

Contributors