---
language:
- en
tags:
- data
- compression
- training
- decompression
---
# Model Card for Model ID

Development of Data Compression Tools for Maintenance and Utilization of Large-scale Research Facilities

## Model Details

- **Learning mechanism
- **Compression mechanism
- **Decompression mechanism


### Model Description


<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/667233cdcbf550c42aeb6bb5/ZL07jSBTov-37luTjP0xT.png" alt="Image 1" width="45%" style="margin-right: 10px;"/>
  <img src="https://cdn-uploads.huggingface.co/production/uploads/667233cdcbf550c42aeb6bb5/ewWnkHT-xYsM_p2axq4kv.png" alt="Image 2" width="45%"/>
</div>

Learning mechanism
PredNet, Convlstm is used to learn the change in the movement of an object over time. According to the learning method of PredNet, the learning data is converted into the hkl format and then learned. The learned model is output to a file. This file is used by the compression mechanism and decompression mechanism. Use another program to download the training data and convert it to hkl. The details are explained in section “Learning mechanism” below.

Compression mechanism
Using the model output by the learning mechanism, the results of inference and difference of time series images are compressed. After deriving the difference between the original image and the inference result,error-bounded quantization, Density-based Spatial Encoding, and Partitioned Entropy Encoding are processed. These processes have the effect of increasing the compression rate when compressing. Use the zstd library to compress and output to a binary file (.dat).

And,differences and keyframe images are also output to a binary file (.dat) using the zstd library.

Decompression mechanism
Using the model output by the learning mechanism and the binary file (.dat) output by the compression mechanism, the image group input to the compression mechanism is restored. By inferring by inputting keyframes, the inference result of the compression mechanism is reproduced. The processing of Density-based Spatial Decoding and Partitioned Entropy Decoding is performed in the reverse order of the compression mechanism, and the original difference is restored. Since the error-bounded quantization process is lossy compression, it is not included in the decompression mechanism. The inference result and the difference are added to restore the original image and output it.


- **Developed by:** Mina 
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** Amarjit Singh
- **Model type:** .pt model files 
- **Language(s) (NLP):** Libtorch
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

- **Repository:** (https://github.com/mina98/TEZip-Libtorch-Main.git)
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

Original source code exist in https://github.com/mina98/TEZip-Libtorch-Main.git 

# Neural Compression Model

## Model Description

This model implements neural compression using ConvLSTM and PredNet architectures for efficient video/image data compression and decompression.

## Environment Setup

### Prerequisites
- CUDA 12.1
- LibTorch
- C++ build environment

### Environment Variables
Set up the following environment variables before running:

```bash
export PATH=/home/mwahba/cuda-12.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/home/mwahba/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
```

## Usage

### 1. Creating Training Data

**Command:**
```bash
./build/main train_data_create train_data_dir val_data_dir(optional) save_dir
```

**Parameters:**
- `train_data_dir`: Path to your training data
- `val_data_dir`: (Optional) Path to your validation data  
- `save_dir`: Directory where processed data should be saved

**Example:**
```bash
./build/main train_data_create 2011_09_26/2011_09_26_drive_0027_sync/ data
```

### 2. Training the Model

**Command:**
```bash
./build/main train_model model_dir data_dir save_dir --verbose --model model_name
```

**Parameters:**
- `model_dir`: Path to directory containing the model
- `data_dir`: Dataset path
- `save_dir`: Directory where trained model should be saved
- `model_name`: Model architecture (e.g., `convlstm`, `prednet`)
- `--verbose`: Enables detailed logging

**Example:**
```bash
./build/main tezip --learn model/ data/ --verbose --model convlstm
```

### 3. Data Compression

**Command:**
```bash
./build/main compress model_dir data_dir save_dir --preprocess preprocess_level --window window_size --verbose --mode mode_type --model model_name --bound bound_value
```

**Parameters:**
- `model_dir`: Path to the trained model
- `data_dir`: Directory containing data to be compressed
- `save_dir`: Directory where compressed data will be saved
- `preprocess_level`: Preprocessing level (e.g., `3`)
- `window_size`: Window size for processing (e.g., `5`)
- `mode_type`: Compression mode (e.g., `pwrel`)
- `model_name`: Model type (`convlstm`, `prednet`)
- `bound_value`: Compression bound (e.g., `0.000001`)

**Example:**
```bash
./build/main tezip --compress model_convlstm 2011_09_26/2011_09_26_drive_0027_sync/image_02 comp_0.1/ --preprocess 3 --window 5 --verbose --mode pwrel --model convlstm --bound 0.1
```

### 4. Data Decompression

**Command:**
```bash
./build/main decompress model_dir compressed_data_dir save_dir --verbose --model model_name
```

**Parameters:**
- `model_dir`: Path to the trained model
- `compressed_data_dir`: Directory containing compressed data
- `save_dir`: Directory where decompressed data will be saved
- `model_name`: Model type (e.g., `convlstm`)
- `--verbose`: Enables detailed logging

**Example:**
```bash
./build/main tezip --uncompress model_convlstm comp_0.1/ decomp_0.1 --verbose --model convlstm
```

## Model Architecture

### Supported Models
- **ConvLSTM**: Convolutional LSTM for spatiotemporal sequence modeling
- **PredNet**: Predictive coding network for video prediction

### Key Features
- Neural compression and decompression
- Support for video/image sequences
- Configurable preprocessing levels
- Adjustable compression bounds
- Window-based processing

## File Structure

### Core Components
- **`main.cpp`**: Master file for running train_data_create and tezip modules
- **`tezip.cpp`**: Master module integrating training, compression, and decompression
- **`train_data_create.cpp`**: Generates hickle data for training

### Model Implementation
- **`convlstm.cpp`**: ConvLSTM layer implementation
- **`conv_lstm_cell.cpp`**: ConvLSTM cell definition
- **`seq2seq.cpp`**: ConvLSTM model instance creation
- **`train_convlstm.cpp`**: ConvLSTM training function
- **`prednet.cpp`**: PredNet model implementation
- **`convlstmcell.cpp`**: ConvLSTM cell for PredNet
- **`train.cpp`**: PredNet training function

### Data Processing
- **`the_data.cpp`**: Sequence generation for LibTorch models
- **`manual_data_loader.cpp`**: Data loader functionality simulation
- **`compress.cpp`**: Compression functionality implementation
- **`decompress.cpp`**: Decompression functionality implementation

## Technical Requirements

### Dependencies
- CUDA 12.1
- LibTorch
- C++ compiler with C++14 support or higher

### Hardware Requirements
- NVIDIA GPU with CUDA support
- Sufficient GPU memory for model training and inference

## Performance

### Compression Efficiency
- Configurable compression bounds (e.g., 0.1, 0.000001)
- Adaptive preprocessing levels
- Window-based processing for memory efficiency

### Processing Modes
- **pwrel**: Pixel-wise relative compression mode
- Additional modes may be available depending on implementation

## Evaluation


![image/png](https://cdn-uploads.huggingface.co/production/uploads/667233cdcbf550c42aeb6bb5/pA3-gGzzvQhtTDa2SRcwX.png)

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

Compression ratio
Compression/Decompression Time 

#### Summary


## Citation

Mina Yousef, Amarjit Singh, and Kento Sato. 2025. Refactoring TEZip: Integrating Python-Based Predictive Compression into an HPC C++/LibTorch Environment. In Proceedings of the 34th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '25), July 20-23, 2025, Notre Dame, IN, USA. Poster presentation