SIKAI-C commited on
Commit
c509f9f
Β·
verified Β·
1 Parent(s): 4ee650e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -1
README.md CHANGED
@@ -7,4 +7,171 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # CSI-4CAST Organization
11
+
12
+ Welcome to the CSI-4CAST organization on Hugging Face! This organization hosts datasets for CSI prediction research.
13
+
14
+ ## Dataset Structure
15
+
16
+ The datasets are organized in the following structure:
17
+
18
+ ```
19
+ data/
20
+ β”œβ”€β”€ stats/
21
+ β”‚ β”œβ”€β”€ fdd/
22
+ β”‚ β”‚ └── normalization_stats.pkl
23
+ β”‚ └── tdd/
24
+ β”‚ └── normalization_stats.pkl
25
+ β”œβ”€β”€ test/
26
+ β”‚ β”œβ”€β”€ generalization/
27
+ β”‚ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_001/
28
+ β”‚ β”‚ β”‚ β”œβ”€β”€ H_D_pred.pt
29
+ β”‚ β”‚ β”‚ β”œβ”€β”€ H_U_hist.pt
30
+ β”‚ β”‚ β”‚ └── H_U_pred.pt
31
+ β”‚ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_003/
32
+ β”‚ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_006/
33
+ β”‚ β”‚ β”œβ”€β”€ ...
34
+ β”‚ β”‚ β”œβ”€β”€ cm_B_ds_030_ms_001/
35
+ β”‚ β”‚ β”œβ”€β”€ cm_B_ds_030_ms_003/
36
+ β”‚ β”‚ β”œβ”€β”€ ...
37
+ β”‚ β”‚ β”œβ”€β”€ cm_C_ds_030_ms_001/
38
+ β”‚ β”‚ β”œβ”€β”€ cm_C_ds_030_ms_003/
39
+ β”‚ β”‚ β”œβ”€β”€ ...
40
+ β”‚ β”‚ β”œβ”€β”€ cm_D_ds_030_ms_001/
41
+ β”‚ β”‚ β”œβ”€β”€ cm_D_ds_030_ms_003/
42
+ β”‚ β”‚ └── ...
43
+ β”‚ └── regular/
44
+ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_001/
45
+ β”‚ β”‚ β”œβ”€β”€ H_D_pred.pt
46
+ β”‚ β”‚ β”œβ”€β”€ H_U_hist.pt
47
+ β”‚ β”‚ └── H_U_pred.pt
48
+ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_010/
49
+ β”‚ β”œβ”€β”€ cm_A_ds_030_ms_030/
50
+ β”‚ β”œβ”€β”€ ...
51
+ β”‚ β”œβ”€β”€ cm_C_ds_030_ms_001/
52
+ β”‚ β”œβ”€β”€ cm_C_ds_030_ms_010/
53
+ β”‚ β”œβ”€β”€ cm_C_ds_030_ms_030/
54
+ β”‚ β”œβ”€β”€ ...
55
+ β”‚ β”œβ”€β”€ cm_D_ds_030_ms_001/
56
+ β”‚ β”œβ”€β”€ cm_D_ds_030_ms_010/
57
+ β”‚ β”œβ”€β”€ cm_D_ds_030_ms_030/
58
+ β”‚ └── ...
59
+ └── train/
60
+ └── regular/
61
+ β”œβ”€β”€ cm_A_ds_030_ms_001/
62
+ β”‚ β”œβ”€β”€ H_D_pred.pt
63
+ β”‚ β”œβ”€β”€ H_U_hist.pt
64
+ β”‚ └── H_U_pred.pt
65
+ β”œβ”€β”€ cm_A_ds_030_ms_010/
66
+ β”œβ”€β”€ cm_A_ds_030_ms_030/
67
+ β”œβ”€β”€ ...
68
+ β”œβ”€β”€ cm_C_ds_030_ms_001/
69
+ β”œβ”€β”€ cm_C_ds_030_ms_010/
70
+ β”œβ”€β”€ cm_C_ds_030_ms_030/
71
+ β”œβ”€β”€ ...
72
+ β”œβ”€β”€ cm_D_ds_030_ms_001/
73
+ β”œβ”€β”€ cm_D_ds_030_ms_010/
74
+ β”œβ”€β”€ cm_D_ds_030_ms_030/
75
+ └── ...
76
+ ```
77
+
78
+ ## Dataset Organization Strategy
79
+
80
+ Our datasets are organized using a **convenience-first naming strategy** on Hugging Face. Instead of uploading the entire data folder as one large dataset, we've split it into individual datasets with descriptive names. This approach allows users to:
81
+
82
+ - **Download only the specific data they need** (e.g., just one configuration or test type)
83
+ - **Easily identify datasets** by their purpose and configuration
84
+ - **Reduce download time and storage** by avoiding unnecessary data
85
+ - **Enable selective loading** for different research scenarios
86
+
87
+ ### Available Datasets
88
+
89
+ #### Statistics Dataset
90
+ - **stats**: Contains normalization statistics for FDD and TDD configurations
91
+
92
+ #### Test Datasets
93
+ - **test_regular_***: Regular test data for various configurations
94
+ - **test_generalization_***: Generalization test data with extended parameter ranges
95
+
96
+ #### Training Datasets
97
+ - **train_regular_***: Training data for various configurations
98
+
99
+ ### Dataset Naming Convention
100
+
101
+ The datasets follow this naming pattern:
102
+ - `[train/test]_[regular/generalization]_cm_[A/B/C/D/E]`: Dataset type and channel model
103
+ - `cm_[A/B/C/D/E]`: Channel models CDL-A, CDL-B, CDL-C, CDL-D, CDL-E
104
+ - `ds_[030/050/100/200/300/400]`: Delay spreads with values in ns
105
+ - `ms_[001/003/006/009/010/012/015/018/021/024/027/030/033/036/039/042/045]`: User speed with values in m/s
106
+
107
+ **Examples:**
108
+ - `test_regular_cm_A_ds_030_ms_001`: Regular test data for CDL-A model, 30ns delay spread, 1 m/s speed
109
+ - `train_regular_cm_C_ds_100_ms_030`: Training data for CDL-C model, 100ns delay spread, 30 m/s speed
110
+ - `test_generalization_cm_B_ds_200_ms_015`: Generalization test data for CDL-B model, 200ns delay spread, 15 m/s speed
111
+
112
+ ## Usage
113
+
114
+ ### Downloading Datasets
115
+
116
+ You can download individual datasets using the Hugging Face Hub:
117
+
118
+ ```python
119
+ from huggingface_hub import snapshot_download
120
+
121
+ # Download the stats dataset
122
+ snapshot_download(repo_id="CSI-4CAST/stats", repo_type="dataset")
123
+
124
+ # Download a specific CSI prediction dataset
125
+ snapshot_download(repo_id="CSI-4CAST/test_regular_cm_A_ds_030_ms_001", repo_type="dataset")
126
+ ```
127
+
128
+ ### Downloading All Datasets
129
+
130
+ To download all available datasets at once, use the provided `download.py` script:
131
+
132
+ ```bash
133
+ # Download all datasets to a 'datasets' folder
134
+ python3 download.py
135
+
136
+ # Download to a custom directory
137
+ python3 download.py --output-dir my_datasets
138
+
139
+ # Dry run to test without downloading (creates empty placeholder files)
140
+ python3 download.py --dry-run
141
+ ```
142
+
143
+ The script will automatically:
144
+ - Check for all possible dataset combinations
145
+ - Download only the datasets that exist on Hugging Face
146
+ - Create organized folder structure with descriptive names
147
+
148
+ ### Reconstructing Original Folder Structure
149
+
150
+ While our naming strategy makes it easy to download specific datasets, you might want to work with the complete dataset in its original folder structure. For this purpose, we provide the `reconstruction.py` script that restores the original organization:
151
+
152
+ ```bash
153
+ python3 reconstruction.py --input-dir datasets --output-dir data
154
+ ```
155
+
156
+ This script will:
157
+ 1. Remove the prefixes (test_regular_, test_generalization_, train_regular_)
158
+ 2. Organize the folders back into the original data structure
159
+ 3. Create the proper hierarchy: data/stats/, data/test/regular/, data/test/generalization/, data/train/regular/
160
+
161
+ **When to use reconstruction:**
162
+ - You want to replicate the exact structure used in the original CSI-4CAST paper
163
+ - Your existing code expects the original folder organization
164
+ - You need the complete dataset in the original research structure
165
+
166
+ **Note:** Reconstruction is only necessary if you need to replicate the CSI-4CAST paper's results exactly. If you're working with individual datasets or don't need the specific folder structure, you can skip reconstruction and work directly with the downloaded datasets.
167
+
168
+ ## File Types
169
+
170
+ Each dataset folder contains:
171
+ - `H_D_pred.pt`: Predicted H_D values (PyTorch tensor)
172
+ - `H_U_hist.pt`: Historical H_U values (PyTorch tensor)
173
+ - `H_U_pred.pt`: Predicted H_U values (PyTorch tensor)
174
+
175
+ ## Questions & Contributions
176
+
177
+ For further questions or any contribution suggestions, you can create pull requests to the [GitHub homepage](https://github.com/AI4OPT/CSI-4CAST) of this organization.