Safetensors
English
File size: 7,135 Bytes
3210e71
 
 
 
 
 
 
 
 
3cba222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd2859d
3cba222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd2859d
 
 
 
 
 
 
 
 
 
 
 
3cba222
3210e71
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: mit
datasets:
- SIA-IDE/MBHM
language:
- en
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
---
<div align="center">
<a href="https://github.com/SIA-IDE/BearLLM">
<img src="https://raw.githubusercontent.com/SIA-IDE/BearLLM/refs/heads/main/docs/images/logo.svg" width="200" alt="logo"/>
</a>
<h1>BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation</h1>

<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/Python-3.12-blue"></a>
<a href="https://pytorch.org/"><img alt="PyTorch" src="https://img.shields.io/badge/Pytorch-latest-orange"></a>
<a href="https://arxiv.org/abs/2408.11281"><img alt="arXiv" src="https://img.shields.io/badge/Paper-arXiv-B31B1B"></a>
<a href="https://huggingface.co/datasets/SIA-IDE/MBHM"><img alt="Dataset" src="https://img.shields.io/badge/Dataset-๐Ÿค—-FFFDF5"></a>
<a href="https://github.com/SIA-IDE/BearLLM"><img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/SIA-IDE/BearLLM"></a>
</div>

<h4 align="center">
    <p>
        <b>English</b> |
        <a href="https://github.com/SIA-IDE/BearLLM/blob/main/docs/README_zh.md">็ฎ€ไฝ“ไธญๆ–‡</a>
    </p>
</h4>

## ๐Ÿ”ฅ NEWS
- **[2025-04-11]** ๐ŸŽ‰ The [AAAI-25 Proceedings](https://aaai.org/proceeding/aaai-39-2025/) are now officially published! Our [conference paper](https://ojs.aaai.org/index.php/AAAI/article/view/34188) is included. We welcome you to read and cite it!
- **[2025-03-06]** ๐ŸŒŸ The complete dataset and code are now officially open source!
- **[2024-12-11]** โซ We are now working on making the code of BearLLM public. Stay tuned!
- **[2024-12-10]** ๐ŸŽ‰ The BearLLM paper is accepted by the Thirty-Ninth AAAI Conference on Artificial Intelligence ([AAAI-25](https://aaai.org/conference/aaai/aaai-25/)).
- **[2024-08-21]** ๐Ÿ“ The preprint of the BearLLM paper is available on arXiv. Check the [paper page](https://arxiv.org/abs/2408.11281) for more details.

## ๐Ÿ“… TODO
- [ ] Improve related comments and documentation.
- [x] Upload the complete BearLLM demo code.
- [x] Upload the health management corpus of the MBHM dataset.
- [x] Collect the codes for pre-training and fine-tuning BearLLM.
- [x] Collect the codes of BearLLM's classification network and other comparison models.
- [x] Upload the vibration signal portion of the MBHM dataset.

## ๐Ÿ“š Introduction
The [MBHM](https://huggingface.co/datasets/SIA-IDE/MBHM) dataset is the first multimodal dataset designed for the study of bearing health management. It is divided into two parts: vibration signals and health management corpus. The vibration signals and condition information are derived from 9 publicly available datasets, and are still under continuous updating and improvement. The thousands of working conditions pose more difficult challenges for the identification model and better represent real-world usage scenarios.

[BearLLM](https://github.com/SIA-IDE/BearLLM) is a prior knowledge-enhanced bearing health management framework with a unified vibration signal representation. This framework transforms the signal to be tested into the frequency domain, enabling effective identification of spectral differences compared to the vibration signal under fault-free conditions. By aligning the vibration signal with the fault semantic embedding, we achieve a unified natural language response for various health management tasks through a fine-tuned language model with low computational overhead. Experiments demonstrate that this framework achieves leading performance under thousands of working conditions.

## ๐Ÿ’ป Requirements

The code is implemented in Python 3.12. The required packages are listed in the `requirements.txt` file. You can install the required packages by running the following command:

```bash
conda create --name bearllm python=3.12
conda activate bearllm
pip install -r requirements.txt
```


## ๐Ÿš€ Quick Start

### 1. Download Demo Data / Use Your Own Data

First, you need to download the `demo_data.json` from the [MBHM](https://huggingface.co/datasets/SIA-IDE/MBHM/tree/main) dataset.
For users in mainland China, you can use the [mirror link](https://hf-mirror.com/datasets/SIA-IDE/MBHM/tree/main) to speed up the download:

Or, you can also build your own test data in the same format:
`instruction`: Text instruction for health management task.
`vib_data`: Vibration signal data to be identified, with a required duration of 1 second.
`ref_data`: Reference vibration signal data without faults, with a required duration of 1 second.

```json
{
    "instruction": "xxx.",
    "vib_data": [1.0, 0.0, 1.0, ...],
    "ref_data": [1.0, 0.0, 1.0, ...],
}
```

### 2. Download Weights

You can download the pre-trained weights of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/tree/main) from Hugging Face.

Additionally, you need to download the weights of [BearLLM](https://huggingface.co/SIA-IDE/BearLLM/tree/main).

### 3. Organize Files

It is recommended to organize the weights and test data as follows:

```
BearLLM/
โ”œโ”€โ”€ qwen_weights/
โ”‚   โ”œโ”€โ”€ model.safetensors
โ”‚   โ”œโ”€โ”€ tokenizer.json
โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ””โ”€โ”€ other files...
โ”œโ”€โ”€ bearllm_weights/
โ”‚   โ”œโ”€โ”€ vibration_adapter.pth
โ”‚   โ”œโ”€โ”€ adapter_config.json
โ”‚   โ””โ”€โ”€ adapter_model.safetensors
โ””โ”€โ”€ mbhm_dataset/
    โ””โ”€โ”€ demo_data.json 
```

### 4. Run Code
First, copy the `.env.example` file to `.env` and modify the data paths inside.
Then, you can run the code using the following command:

```bash
python run_demo.py
```

## โš™๏ธ Development

### 1. Download Dataset

First, you need to download the following files from the [MBHM](https://huggingface.co/datasets/SIA-IDE/MBHM/tree/main) dataset. For users in mainland China, you can use the [mirror link](https://hf-mirror.com/datasets/SIA-IDE/MBHM/tree/main) to speed up the download:

- `data.hdf5`: Contains the vibration signal data.
- `corpus.json`: Contains the health management corpus.
- `metadata.sqlite`: Contains metadata information of the dataset.

### 2. Download Weights

You can download the pre-trained weights of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/tree/main) from Hugging Face.

### 3. Modify Environment Variables

Copy the `.env.example` file to `.env` and modify the data paths inside.

### 4. Pre-train and Fine-tune Model

Pre-train according to `src/pre_training.py`.
Fine-tune according to `src/fine_tuning.py`.

## ๐Ÿ“– Citation
Please cite the following paper if you use this study in your research:

```
@article{pengBearLLMPriorKnowledgeEnhanced2025,
  title = {{{BearLLM}}: {{A Prior Knowledge-Enhanced Bearing Health Management Framework}} with {{Unified Vibration Signal Representation}}},
  author = {Peng, Haotian and Liu, Jiawei and Du, Jinsong and Gao, Jie and Wang, Wei},
  year = {2025},
  month = apr,
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = {39},
  number = {19},
  pages = {19866--19874},
  issn = {2374-3468},
  doi = {10.1609/aaai.v39i19.34188},
  urldate = {2025-04-11},
}
```