YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
WCU Dataset Metadata
Dataset containing metadata for 13,404 files (127.4 GB) from WCU teaching materials.
Files Created
wcu_metadata.parquet- Main dataset file (0.66 MB)dataset_summary.json- Summary statistics
Dataset Statistics
- Total Files: 13,404
- Total Size: 127.4 GB
- ZIP Files: 3,591
- Extracted Files: 9,813
Files by Extension
{
".zip": 9903,
".doc": 86,
".pdf": 1514,
".docx": 1229,
".rar": 9,
".ppt": 55,
".pptx": 478,
".crdownload": 22,
".html": 3,
".jpeg": 5,
".pdf_": 4,
".avif": 2,
".rtf": 21,
".mp3": 7,
".webp": 3,
".jpg": 28,
".rar_": 3,
"": 8,
".docm": 2,
".png": 15,
".odt": 1,
".ris": 2,
".djvu": 3,
".htm": 1
}
Usage
from datasets import load_dataset
# Load from local parquet file
dataset = load_dataset('parquet', data_files='wcu_metadata.parquet')
# Or after uploading to HuggingFace:
# dataset = load_dataset('GravityLeet/wcu_dataset')
# View first few entries
print(dataset['train'][:5])
# Filter by extension
docx_files = dataset['train'].filter(lambda x: x['extension'] == '.docx')
print(f"Total DOCX files: {len(docx_files)}")
# Get total size by category
import pandas as pd
df = pd.DataFrame(dataset['train'])
print(df.groupby('category')['size_mb'].sum())
Manual Upload Instructions
To upload this dataset to HuggingFace:
- Go to https://huggingface.co/new-dataset
- Create a new dataset named "wcu_dataset" (private)
- Upload the following files:
- wcu_metadata.parquet (to data/ folder)
- dataset_summary.json
- This README.md
Or use the HuggingFace CLI:
huggingface-cli upload GravityLeet/wcu_dataset wcu_metadata.parquet data/wcu_metadata.parquet
huggingface-cli upload GravityLeet/wcu_dataset dataset_summary.json
huggingface-cli upload GravityLeet/wcu_dataset README.md
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support