YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

WCU Dataset Metadata

Dataset containing metadata for 13,404 files (127.4 GB) from WCU teaching materials.

Files Created

  • wcu_metadata.parquet - Main dataset file (0.66 MB)
  • dataset_summary.json - Summary statistics

Dataset Statistics

  • Total Files: 13,404
  • Total Size: 127.4 GB
  • ZIP Files: 3,591
  • Extracted Files: 9,813

Files by Extension

{
  ".zip": 9903,
  ".doc": 86,
  ".pdf": 1514,
  ".docx": 1229,
  ".rar": 9,
  ".ppt": 55,
  ".pptx": 478,
  ".crdownload": 22,
  ".html": 3,
  ".jpeg": 5,
  ".pdf_": 4,
  ".avif": 2,
  ".rtf": 21,
  ".mp3": 7,
  ".webp": 3,
  ".jpg": 28,
  ".rar_": 3,
  "": 8,
  ".docm": 2,
  ".png": 15,
  ".odt": 1,
  ".ris": 2,
  ".djvu": 3,
  ".htm": 1
}

Usage

from datasets import load_dataset

# Load from local parquet file
dataset = load_dataset('parquet', data_files='wcu_metadata.parquet')

# Or after uploading to HuggingFace:
# dataset = load_dataset('GravityLeet/wcu_dataset')

# View first few entries
print(dataset['train'][:5])

# Filter by extension
docx_files = dataset['train'].filter(lambda x: x['extension'] == '.docx')
print(f"Total DOCX files: {len(docx_files)}")

# Get total size by category
import pandas as pd
df = pd.DataFrame(dataset['train'])
print(df.groupby('category')['size_mb'].sum())

Manual Upload Instructions

To upload this dataset to HuggingFace:

  1. Go to https://huggingface.co/new-dataset
  2. Create a new dataset named "wcu_dataset" (private)
  3. Upload the following files:
    • wcu_metadata.parquet (to data/ folder)
    • dataset_summary.json
    • This README.md

Or use the HuggingFace CLI:

huggingface-cli upload GravityLeet/wcu_dataset wcu_metadata.parquet data/wcu_metadata.parquet
huggingface-cli upload GravityLeet/wcu_dataset dataset_summary.json
huggingface-cli upload GravityLeet/wcu_dataset README.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support