File size: 7,238 Bytes
4ee650e bb483aa 4ee650e c509f9f 0e0f42a c509f9f 26b9672 5adb822 26b9672 c509f9f bb483aa c509f9f 81c3a67 c509f9f 81c3a67 c509f9f 9924ed4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
title: README
emoji: π
colorFrom: yellow
colorTo: red
sdk: static
pinned: false
license: apache-2.0
---
# CSI-4CAST Organization
Welcome to the CSI-4CAST organization on Hugging Face! This organization hosts datasets for CSI prediction research.
This dataset is originally created for our research paper: [CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing](https://arxiv.org/abs/2510.12996). The corresponding code and implementation are available in our [GitHub repo](https://github.com/AI4OPT/CSI-4CAST).
## TL;DR
**Quick Start Options:**
- **For specific datasets**: Use the `snapshot_download` command to download individual datasets you need
- **For all datasets with original structure**: Run [`download.py`](https://huggingface.co/spaces/CSI-4CAST/README/blob/main/download.py) followed by [`reconstruction.py`](https://huggingface.co/spaces/CSI-4CAST/README/blob/main/reconstruction.py) to get the complete, well-structured dataset
See the **Usage** section below for detailed instructions.
## Dataset Structure
The datasets are organized in the following structure:
```
data/
βββ stats/
β βββ fdd/
β β βββ normalization_stats.pkl
β βββ tdd/
β βββ normalization_stats.pkl
βββ test/
β βββ generalization/
β β βββ cm_A_ds_030_ms_001/
β β β βββ H_D_pred.pt
β β β βββ H_U_hist.pt
β β β βββ H_U_pred.pt
β β βββ cm_B_ds_030_ms_001/
β β βββ cm_C_ds_030_ms_001/
β β βββ cm_D_ds_030_ms_001/
β β βββ cm_E_ds_030_ms_001/
β β βββ ...
β βββ regular/
β βββ cm_A_ds_030_ms_001/
β β βββ H_D_pred.pt
β β βββ H_U_hist.pt
β β βββ H_U_pred.pt
β βββ cm_C_ds_030_ms_001/
β βββ cm_D_ds_030_ms_001/
β βββ ...
βββ train/
βββ regular/
βββ cm_A_ds_030_ms_001/
β βββ H_D_pred.pt
β βββ H_U_hist.pt
β βββ H_U_pred.pt
βββ cm_C_ds_030_ms_001/
βββ cm_D_ds_030_ms_001/
βββ ...
```
## Dataset Organization Strategy
Our datasets are organized using a **convenience-first naming strategy** on Hugging Face. Instead of uploading the entire data folder as one large dataset, we've split it into individual datasets with descriptive names. This approach allows users to:
- **Download only the specific data they need** (e.g., just one configuration or test type)
- **Easily identify datasets** by their purpose and configuration
- **Reduce download time and storage** by avoiding unnecessary data
- **Enable selective loading** for different research scenarios
### Available Datasets
#### Statistics Dataset
- **stats**: Contains normalization statistics for FDD and TDD configurations
#### Test Datasets
- **test_regular_***: Regular test data for various configurations
- **test_generalization_***: Generalization test data with extended parameter ranges
#### Training Datasets
- **train_regular_***: Training data for various configurations
### Dataset Naming Convention
The datasets follow this naming pattern:
- `[train/test]_[regular/generalization]_cm_[A/B/C/D/E]`: Dataset type and channel model
- `cm_[A/B/C/D/E]`: Channel models CDL-A, CDL-B, CDL-C, CDL-D, CDL-E
- `ds_[030/050/100/200/300/400]`: Delay spreads with values in ns
- `ms_[001/003/006/009/010/012/015/018/021/024/027/030/033/036/039/042/045]`: User speed with values in m/s
**Examples:**
- `test_regular_cm_A_ds_030_ms_001`: Regular test data for CDL-A model, 30ns delay spread, 1 m/s speed
- `train_regular_cm_C_ds_100_ms_030`: Training data for CDL-C model, 100ns delay spread, 30 m/s speed
- `test_generalization_cm_B_ds_200_ms_015`: Generalization test data for CDL-B model, 200ns delay spread, 15 m/s speed
## Usage
### Downloading Datasets
You can download individual datasets using the Hugging Face Hub:
```python
from huggingface_hub import snapshot_download
# Download the stats dataset
snapshot_download(repo_id="CSI-4CAST/stats", repo_type="dataset")
# Download a specific CSI prediction dataset
snapshot_download(repo_id="CSI-4CAST/test_regular_cm_A_ds_030_ms_001", repo_type="dataset")
```
### Downloading All Datasets
To download all available datasets at once, use the provided [`download.py`](https://huggingface.co/spaces/CSI-4CAST/README/blob/main/download.py) script:
```bash
# Download all datasets to a 'datasets' folder
python3 download.py
# Download to a custom directory
python3 download.py --output-dir my_datasets
# Dry run to test without downloading (creates empty placeholder files)
python3 download.py --dry-run
```
The script will automatically:
- Check for all possible dataset combinations
- Download only the datasets that exist on Hugging Face
- Create organized folder structure with descriptive names
### Reconstructing Original Folder Structure
While our naming strategy makes it easy to download specific datasets, you might want to work with the complete dataset in its original folder structure. For this purpose, we provide the [`reconstruction.py`](https://huggingface.co/spaces/CSI-4CAST/README/blob/main/reconstruction.py) script that restores the original organization:
```bash
python3 reconstruction.py --input-dir datasets --output-dir data
```
This script will:
1. Remove the prefixes (test_regular_, test_generalization_, train_regular_)
2. Organize the folders back into the original data structure
3. Create the proper hierarchy: data/stats/, data/test/regular/, data/test/generalization/, data/train/regular/
**When to use reconstruction:**
- You want to replicate the exact structure used in the original CSI-4CAST paper
- Your existing code expects the original folder organization
- You need the complete dataset in the original research structure
**Note:** Reconstruction is only necessary if you need to replicate the CSI-4CAST paper's results exactly. If you're working with individual datasets or don't need the specific folder structure, you can skip reconstruction and work directly with the downloaded datasets.
## File Types
Each dataset folder contains:
- `H_D_pred.pt`: Predicted H_D values (PyTorch tensor)
- `H_U_hist.pt`: Historical H_U values (PyTorch tensor)
- `H_U_pred.pt`: Predicted H_U values (PyTorch tensor)
## Questions & Contributions
For further questions or any contribution suggestions, you can create pull requests here or to the [GitHub homepage](https://github.com/AI4OPT/CSI-4CAST) of this organization.
## Citation
```bibtex
@misc{cheng2025csi4casthybriddeeplearning,
title={CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing},
author={Sikai Cheng and Reza Zandehshahvar and Haoruo Zhao and Daniel A. Garcia-Ulloa and Alejandro Villena-Rodriguez and Carles Navarro ManchΓ³n and Pascal Van Hentenryck},
year={2025},
eprint={2510.12996},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.12996},
}
``` |