Update README.md
Browse files
README.md
CHANGED
|
@@ -3,15 +3,38 @@ license: mit
|
|
| 3 |
---
|
| 4 |
# malwi - AI Python Malware Scanner
|
| 5 |
|
|
|
|
|
|
|
| 6 |
Detect Python malware _fast_ - no internet, no expensive hardware, no fees.
|
| 7 |
|
| 8 |
-
malwi is specialized in detecting **zero-day vulnerabilities**,
|
| 9 |
|
| 10 |
Open-source software made in Europe.
|
| 11 |
Based on open research, open code, open data.
|
| 12 |
🇪🇺🤘🕊️
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
[The number of _malicious open-source packages_ is growing](https://arxiv.org/pdf/2404.04991). This is not just a threat to your business but also to the open-source community.
|
| 17 |
|
|
@@ -22,16 +45,7 @@ Typical malware behaviors include:
|
|
| 22 |
- _Destructive_ actions: Deleting files, corrupting databases, or sabotaging applications.
|
| 23 |
|
| 24 |
> **Attention**: Malicious packages might execute code during installation (e.g. through `setup.py`).
|
| 25 |
-
Make sure to *NOT* download
|
| 26 |
-
>```
|
| 27 |
-
># Do NOT RUN THE FOLLOWING COMMANDS on malicious packages!!!
|
| 28 |
-
>
|
| 29 |
-
>uv add <MALICIOUS_PACKAGE>
|
| 30 |
-
>pip install <MALICIOUS_PACKAGE>
|
| 31 |
-
>pipenv install <MALICIOUS_PACKAGE>
|
| 32 |
-
>poetry add <MALICIOUS_PACKAGE>
|
| 33 |
-
>conda install <MALICIOUS_PACKAGE>
|
| 34 |
-
>```
|
| 35 |
|
| 36 |
## What's next?
|
| 37 |
|
|
@@ -41,15 +55,55 @@ Future iterations will cover malware scanning for more languages (JavaScript, Ru
|
|
| 41 |
|
| 42 |
## How does it work?
|
| 43 |
|
| 44 |
-
malwi applies [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert) and Support Vector Machines (SVM) based on the design of [_Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application_ (2025)](https://arxiv.org/pdf/2504.14886v1).
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
## Support
|
| 54 |
|
| 55 |
Do you have access to malicious Rust, Go, whatever packages? **Contact me.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
# malwi - AI Python Malware Scanner
|
| 5 |
|
| 6 |
+
<img src="malwi-logo.png" alt="Logo">
|
| 7 |
+
|
| 8 |
Detect Python malware _fast_ - no internet, no expensive hardware, no fees.
|
| 9 |
|
| 10 |
+
malwi is specialized in detecting **zero-day vulnerabilities**, for classifying code as safe or harmful.
|
| 11 |
|
| 12 |
Open-source software made in Europe.
|
| 13 |
Based on open research, open code, open data.
|
| 14 |
🇪🇺🤘🕊️
|
| 15 |
|
| 16 |
+
1) **Install**
|
| 17 |
+
```
|
| 18 |
+
pip install --user malwi
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
2) **Run**
|
| 22 |
+
```
|
| 23 |
+
malwi ./examples
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
3) **Evaluate**: a [recent zero-day](https://socket.dev/blog/malicious-pypi-package-targets-discord-developers-with-RAT) detected with high confidence
|
| 27 |
+
```
|
| 28 |
+
def runcommand(value):
|
| 29 |
+
output = subprocess.run(value, shell=True, capture_output=True)
|
| 30 |
+
return [output.stdout, output.stderr]
|
| 31 |
+
|
| 32 |
+
## examples/__init__.py
|
| 33 |
+
- Object: runcommand
|
| 34 |
+
- Maliciousness: 👹 0.9620079398155212
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
## Why malwi?
|
| 38 |
|
| 39 |
[The number of _malicious open-source packages_ is growing](https://arxiv.org/pdf/2404.04991). This is not just a threat to your business but also to the open-source community.
|
| 40 |
|
|
|
|
| 45 |
- _Destructive_ actions: Deleting files, corrupting databases, or sabotaging applications.
|
| 46 |
|
| 47 |
> **Attention**: Malicious packages might execute code during installation (e.g. through `setup.py`).
|
| 48 |
+
Make sure to *NOT* download or install malicious packages from the dataset with commands like `uv add`, `pip install`, `poetry add`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
## What's next?
|
| 51 |
|
|
|
|
| 55 |
|
| 56 |
## How does it work?
|
| 57 |
|
| 58 |
+
malwi applies [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert) and Support Vector Machines (SVM) based on the design of [_Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application_ (2025)](https://arxiv.org/pdf/2504.14886v1). [pypi_malregistry](https://github.com/lxyeternal/pypi_malregistry) is used as a source for malicious samples.
|
| 59 |
+
|
| 60 |
+
1. malwi compiles Python files to bytecode:
|
| 61 |
+
|
| 62 |
+
```
|
| 63 |
+
def runcommand(value):
|
| 64 |
+
output = subprocess.run(value, shell=True, capture_output=True)
|
| 65 |
+
return [output.stdout, output.stderr]
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
```
|
| 69 |
+
0 RESUME 0
|
| 70 |
|
| 71 |
+
1 LOAD_CONST 0 (<code object runcommand at 0x5b4f60ae7540, file "example.py", line 1>)
|
| 72 |
+
MAKE_FUNCTION
|
| 73 |
+
STORE_NAME 0 (runcommand)
|
| 74 |
+
RETURN_CONST 1 (None)
|
| 75 |
+
...
|
| 76 |
+
```
|
| 77 |
|
| 78 |
+
2. Bytecode operators are mapped to tokens:
|
| 79 |
+
|
| 80 |
+
```
|
| 81 |
+
TARGETED_FILE resume load_global subprocess load_attr run load_fast value load_const INTEGER load_const INTEGER kw_names capture_output shell call store_fast output load_fast output load_attr stdout load_fast output load_attr stderr build_list return_value
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
3. Tokens are used as input for a pre-trained DistilBert:
|
| 85 |
+
|
| 86 |
+
```
|
| 87 |
+
Maliciousness: 0.9620079398155212
|
| 88 |
+
```
|
| 89 |
|
| 90 |
## Support
|
| 91 |
|
| 92 |
Do you have access to malicious Rust, Go, whatever packages? **Contact me.**
|
| 93 |
+
|
| 94 |
+
### Develop
|
| 95 |
+
|
| 96 |
+
Prerequisites: [uv](https://docs.astral.sh/uv/)
|
| 97 |
+
```
|
| 98 |
+
# Download and process data
|
| 99 |
+
cmds/download_and_preprocess.sh
|
| 100 |
+
|
| 101 |
+
# Only process data
|
| 102 |
+
cmds/preprocess.sh
|
| 103 |
+
|
| 104 |
+
# Preprocess then start training
|
| 105 |
+
cmds/preprocess_and_train.sh
|
| 106 |
+
|
| 107 |
+
# Only start training
|
| 108 |
+
cmds/train.sh
|
| 109 |
+
```
|