adanish91's picture
Upload README.md
5583111 verified
# Safety Training Dataset
Comprehensive dataset of 2.4M occupational safety documents used to train SafetyBERT and SafetyALBERT models.
## Dataset Overview
- **Size:** 120MB compressed (all-data-combined.7z)
- **Documents:** 2.4M safety reports and narratives
- **Sources:** MSHA, OSHA, NTSB, FRA, IOGP, iChem, Safety Abstracts
- **Format:** CSV files with narrative/abstract columns
## Usage
```python
from huggingface_hub import hf_hub_download
import py7zr
# Download and extract
data_file = hf_hub_download("adanish91/safety-training-data", "all-data-combined.7z")
with py7zr.SevenZipFile(data_file, 'r') as archive:
archive.extractall("./data/")