File size: 8,409 Bytes
bddeb01 a72f112 f9e81f0 a72f112 f9e81f0 947d994 c723207 947d994 a72f112 f9e81f0 a72f112 f9e81f0 a72f112 f9e81f0 a72f112 f9e81f0 fe0269f ab0371f fe0269f ab0371f fe0269f ab0371f fe0269f ab0371f fe0269f ab0371f fe0269f ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f f9e81f0 ab0371f fe0269f ab0371f f9e81f0 ab0371f f9e81f0 ab0371f fe0269f ab0371f f9e81f0 fe0269f f9e81f0 fe0269f f9e81f0 fe0269f f9e81f0 fe0269f f9e81f0 fe0269f f9e81f0 fe0269f f9e81f0 3daab43 f9e81f0 3daab43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
language:
- en
tags:
- data
- compression
- training
- decompression
---
# Model Card for Model ID
Development of Data Compression Tools for Maintenance and Utilization of Large-scale Research Facilities
## Model Details
- **Learning mechanism
- **Compression mechanism
- **Decompression mechanism
### Model Description
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/667233cdcbf550c42aeb6bb5/ZL07jSBTov-37luTjP0xT.png" alt="Image 1" width="45%" style="margin-right: 10px;"/>
<img src="https://cdn-uploads.huggingface.co/production/uploads/667233cdcbf550c42aeb6bb5/ewWnkHT-xYsM_p2axq4kv.png" alt="Image 2" width="45%"/>
</div>
Learning mechanism
PredNet, Convlstm is used to learn the change in the movement of an object over time. According to the learning method of PredNet, the learning data is converted into the hkl format and then learned. The learned model is output to a file. This file is used by the compression mechanism and decompression mechanism. Use another program to download the training data and convert it to hkl. The details are explained in section “Learning mechanism” below.
Compression mechanism
Using the model output by the learning mechanism, the results of inference and difference of time series images are compressed. After deriving the difference between the original image and the inference result,error-bounded quantization, Density-based Spatial Encoding, and Partitioned Entropy Encoding are processed. These processes have the effect of increasing the compression rate when compressing. Use the zstd library to compress and output to a binary file (.dat).
And,differences and keyframe images are also output to a binary file (.dat) using the zstd library.
Decompression mechanism
Using the model output by the learning mechanism and the binary file (.dat) output by the compression mechanism, the image group input to the compression mechanism is restored. By inferring by inputting keyframes, the inference result of the compression mechanism is reproduced. The processing of Density-based Spatial Decoding and Partitioned Entropy Decoding is performed in the reverse order of the compression mechanism, and the original difference is restored. Since the error-bounded quantization process is lossy compression, it is not included in the decompression mechanism. The inference result and the difference are added to restore the original image and output it.
- **Developed by:** Mina
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** Amarjit Singh
- **Model type:** .pt model files
- **Language(s) (NLP):** Libtorch
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
- **Repository:** (https://github.com/mina98/TEZip-Libtorch-Main.git)
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
Original source code exist in https://github.com/mina98/TEZip-Libtorch-Main.git
# Neural Compression Model
## Model Description
This model implements neural compression using ConvLSTM and PredNet architectures for efficient video/image data compression and decompression.
## Environment Setup
### Prerequisites
- CUDA 12.1
- LibTorch
- C++ build environment
### Environment Variables
Set up the following environment variables before running:
```bash
export PATH=/home/mwahba/cuda-12.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/home/mwahba/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
```
## Usage
### 1. Creating Training Data
**Command:**
```bash
./build/main train_data_create train_data_dir val_data_dir(optional) save_dir
```
**Parameters:**
- `train_data_dir`: Path to your training data
- `val_data_dir`: (Optional) Path to your validation data
- `save_dir`: Directory where processed data should be saved
**Example:**
```bash
./build/main train_data_create 2011_09_26/2011_09_26_drive_0027_sync/ data
```
### 2. Training the Model
**Command:**
```bash
./build/main train_model model_dir data_dir save_dir --verbose --model model_name
```
**Parameters:**
- `model_dir`: Path to directory containing the model
- `data_dir`: Dataset path
- `save_dir`: Directory where trained model should be saved
- `model_name`: Model architecture (e.g., `convlstm`, `prednet`)
- `--verbose`: Enables detailed logging
**Example:**
```bash
./build/main tezip --learn model/ data/ --verbose --model convlstm
```
### 3. Data Compression
**Command:**
```bash
./build/main compress model_dir data_dir save_dir --preprocess preprocess_level --window window_size --verbose --mode mode_type --model model_name --bound bound_value
```
**Parameters:**
- `model_dir`: Path to the trained model
- `data_dir`: Directory containing data to be compressed
- `save_dir`: Directory where compressed data will be saved
- `preprocess_level`: Preprocessing level (e.g., `3`)
- `window_size`: Window size for processing (e.g., `5`)
- `mode_type`: Compression mode (e.g., `pwrel`)
- `model_name`: Model type (`convlstm`, `prednet`)
- `bound_value`: Compression bound (e.g., `0.000001`)
**Example:**
```bash
./build/main tezip --compress model_convlstm 2011_09_26/2011_09_26_drive_0027_sync/image_02 comp_0.1/ --preprocess 3 --window 5 --verbose --mode pwrel --model convlstm --bound 0.1
```
### 4. Data Decompression
**Command:**
```bash
./build/main decompress model_dir compressed_data_dir save_dir --verbose --model model_name
```
**Parameters:**
- `model_dir`: Path to the trained model
- `compressed_data_dir`: Directory containing compressed data
- `save_dir`: Directory where decompressed data will be saved
- `model_name`: Model type (e.g., `convlstm`)
- `--verbose`: Enables detailed logging
**Example:**
```bash
./build/main tezip --uncompress model_convlstm comp_0.1/ decomp_0.1 --verbose --model convlstm
```
## Model Architecture
### Supported Models
- **ConvLSTM**: Convolutional LSTM for spatiotemporal sequence modeling
- **PredNet**: Predictive coding network for video prediction
### Key Features
- Neural compression and decompression
- Support for video/image sequences
- Configurable preprocessing levels
- Adjustable compression bounds
- Window-based processing
## File Structure
### Core Components
- **`main.cpp`**: Master file for running train_data_create and tezip modules
- **`tezip.cpp`**: Master module integrating training, compression, and decompression
- **`train_data_create.cpp`**: Generates hickle data for training
### Model Implementation
- **`convlstm.cpp`**: ConvLSTM layer implementation
- **`conv_lstm_cell.cpp`**: ConvLSTM cell definition
- **`seq2seq.cpp`**: ConvLSTM model instance creation
- **`train_convlstm.cpp`**: ConvLSTM training function
- **`prednet.cpp`**: PredNet model implementation
- **`convlstmcell.cpp`**: ConvLSTM cell for PredNet
- **`train.cpp`**: PredNet training function
### Data Processing
- **`the_data.cpp`**: Sequence generation for LibTorch models
- **`manual_data_loader.cpp`**: Data loader functionality simulation
- **`compress.cpp`**: Compression functionality implementation
- **`decompress.cpp`**: Decompression functionality implementation
## Technical Requirements
### Dependencies
- CUDA 12.1
- LibTorch
- C++ compiler with C++14 support or higher
### Hardware Requirements
- NVIDIA GPU with CUDA support
- Sufficient GPU memory for model training and inference
## Performance
### Compression Efficiency
- Configurable compression bounds (e.g., 0.1, 0.000001)
- Adaptive preprocessing levels
- Window-based processing for memory efficiency
### Processing Modes
- **pwrel**: Pixel-wise relative compression mode
- Additional modes may be available depending on implementation
## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
Compression ratio
Compression/Decompression Time
#### Summary
## Citation
Mina Yousef, Amarjit Singh, and Kento Sato. 2025. Refactoring TEZip: Integrating Python-Based Predictive Compression into an HPC C++/LibTorch Environment. In Proceedings of the 34th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '25), July 20-23, 2025, Notre Dame, IN, USA. Poster presentation |