Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Safety Training Dataset
|
| 2 |
+
|
| 3 |
+
Comprehensive dataset of 2.4M occupational safety documents used to train SafetyBERT and SafetyALBERT models.
|
| 4 |
+
|
| 5 |
+
## Dataset Overview
|
| 6 |
+
|
| 7 |
+
- **Size:** 120MB compressed (all-data-combined.7z)
|
| 8 |
+
- **Documents:** 2.4M safety reports and narratives
|
| 9 |
+
- **Sources:** MSHA, OSHA, NTSB, FRA, IOGP, iChem, Safety Abstracts
|
| 10 |
+
- **Format:** CSV files with narrative/abstract columns
|
| 11 |
+
|
| 12 |
+
## Usage
|
| 13 |
+
|
| 14 |
+
```python
|
| 15 |
+
from huggingface_hub import hf_hub_download
|
| 16 |
+
import py7zr
|
| 17 |
+
|
| 18 |
+
# Download and extract
|
| 19 |
+
data_file = hf_hub_download("adanish91/safety-training-data", "all-data-combined.7z")
|
| 20 |
+
with py7zr.SevenZipFile(data_file, 'r') as archive:
|
| 21 |
+
archive.extractall("./data/")
|