Buckets:
400 MB
3 files
Updated 6 days ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| data | 1 items | ||
| .gitattributes | 2.5 kB xet | 738f1125 | |
| README.md | 857 Bytes xet | 5dcd2f93 |
CAG-Lab AWS Docs Vectors
89,221 document chunk embeddings from AWS public documentation.
- Embedding model: OpenAI
text-embedding-3-small(512 dimensions) - Distance metric: Cosine
- Source: AWS public documentation chunks
Schema
| Column | Type | Description |
|---|---|---|
| id | string | Deterministic UUID |
| embedding | list[float32] | 512-dim vector |
| content | string | Document chunk text |
| filePath | string | Original file path |
| chunkIndex | string | Chunk position |
| _pinecone_id | string | Original Pinecone vector ID |
Usage
from datasets import load_dataset
ds = load_dataset("mouadja/aws-docs")
Or use with CAG-Lab:
python scripts/setup_vectordb.py
- Total size
- 400 MB
- Files
- 3
- Last updated
- Jun 20
- Pre-warmed CDN
- US EU US EU