File size: 1,637 Bytes
52e5b45
 
a3d14e7
31ba0eb
 
84deb4c
52e5b45
 
adb26ab
52e5b45
 
31ba0eb
52e5b45
31ba0eb
52e5b45
 
 
31ba0eb
 
 
 
 
52e5b45
31ba0eb
52e5b45
31ba0eb
52e5b45
31ba0eb
 
 
 
 
 
52e5b45
31ba0eb
52e5b45
31ba0eb
 
 
 
 
52e5b45
31ba0eb
 
 
 
 
 
 
 
 
f44b2b9
31ba0eb
 
 
 
f44b2b9
31ba0eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52e5b45
 
 
31ba0eb
 
52e5b45
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: CRISPR Array Detection
emoji: 🧬
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false
license: mit
short_description: Detect CRISPR arrays in DNA sequences
---

# crispr-detect

BERT-based CRISPR array detection in prokaryotic genomes.

## Model

| | |
|---|---|
| architecture | BERT, 24 layers, 768 hidden, 430M params |
| input | DNA sequence (min 1000 bp) |
| output | per-position probability (0-1) |

## Deployment

### Push changes

```bash
cd /vol/hpcprojects/pmuench/crispr_tool/crispr-hf-space
git add -A
git commit -m "description"
git push
```

### Git credentials (first time)

```bash
git config --global credential.helper store
huggingface-cli login
# paste token from https://huggingface.co/settings/tokens
```

### Clone fresh

```bash
git clone https://huggingface.co/spaces/genomenet/crispr-array-detection
```

### Space settings (HuggingFace web UI)

- SDK: Docker
- Hardware: CPU Basic works for the default demo; T4 GPU is recommended for long sequences or low stride values
- Visibility: Public

### Model weights

Hosted at: https://huggingface.co/genomenet/crispr-bert-model

Downloaded automatically via `huggingface_hub` at startup.

## Local dev

```bash
pip install -r requirements.txt
python app.py
# http://localhost:7860
```

## Files

```
β”œβ”€β”€ app.py              # gradio app
β”œβ”€β”€ inference/
β”‚   β”œβ”€β”€ model_loader.py # model download
β”‚   β”œβ”€β”€ tokenizer.py    # sequence validation
β”‚   └── inference.py    # prediction
β”œβ”€β”€ Dockerfile
└── requirements.txt
```

## Acknowledgements

- Ziyu Mu (HZI BIFO)
- DFG SPP 2141 (MC 172)
- BMBF GenomeNet