Spaces:
Runtime error
Runtime error
Niv Sardi
commited on
Commit
·
1732876
1
Parent(s):
bbf5506
README: update TODO
Browse files- README.org +13 -5
README.org
CHANGED
|
@@ -4,14 +4,19 @@ Detect spoofed website by detecting logos from bank and financial entities in
|
|
| 4 |
pages with =ssl certificates= that do not match.
|
| 5 |
|
| 6 |
The process is pretty simple:
|
| 7 |
-
- [
|
|
|
|
|
|
|
| 8 |
- [x] get logos, names and url
|
| 9 |
- [x] navigate the url, extract the ssl certificate and look for =img= and tags
|
| 10 |
with =id= or =class= logo (needs more heuristics) to make a db of logos
|
| 11 |
- [x] screenshot the page and slice it into tiles generating YOLO annotations for
|
| 12 |
the detected logos
|
| 13 |
- [x] augment data using the logos database and the logoless tiles as background images
|
| 14 |
-
- [
|
|
|
|
|
|
|
|
|
|
| 15 |
- [ ] feed everything to a web extension that will detect the logos in any page
|
| 16 |
and show a warning if the =SSL certificate= mismatches the collected one.
|
| 17 |
|
|
@@ -20,13 +25,13 @@ The process is pretty simple:
|
|
| 20 |
# build the training dataset
|
| 21 |
docker-compose up --build --remove-orphans -d
|
| 22 |
docker-compose exec python ./run
|
| 23 |
-
|
| 24 |
# run the training on your machine or collab
|
| 25 |
# https://colab.research.google.com/drive/10R7uwVJJ1R1k6oTjbkkhxPDka7COK-WE
|
| 26 |
git clone https://github.com/ultralytics/yolov5 # clone repo
|
| 27 |
pip install -U -r yolov5/requirements.txt # install dependencies
|
| 28 |
python3 yolov5/train.py --img 416 --batch 80 --epochs 100 --data ./ia/data.yaml --cfg ./ia/yolov5s.yaml --weights ''
|
| 29 |
-
|
| 30 |
#+end_src
|
| 31 |
|
| 32 |
* research
|
|
@@ -38,7 +43,7 @@ https://github.com/Hyuto/yolov5-tfjs
|
|
| 38 |
there were a lot of augmentation solutions out there, because it had better
|
| 39 |
piplines and multicore support we went with:
|
| 40 |
- https://github.com/aleju/imgaug
|
| 41 |
-
|
| 42 |
but leaving the other here for refs
|
| 43 |
- https://github.com/srp-31/Data-Augmentation-for-Object-Detection-YOLO-
|
| 44 |
- https://github.com/mdbloice/Augmentor
|
|
@@ -53,3 +58,6 @@ http://www.bcra.gob.ar/SistemasFinancierosYdePagos/Entidades_financieras.asp
|
|
| 53 |
https://stackoverflow.com/questions/6566545/is-there-any-way-to-access-certificate-information-from-a-chrome-extension
|
| 54 |
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest#accessing_security_information
|
| 55 |
https://chromium-review.googlesource.com/c/chromium/src/+/644858
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
pages with =ssl certificates= that do not match.
|
| 5 |
|
| 6 |
The process is pretty simple:
|
| 7 |
+
- [1/2] scrape gvt websites to get a list of entities.
|
| 8 |
+
- [x] 🇦🇷 BCRA ok
|
| 9 |
+
- [ ] other countries
|
| 10 |
- [x] get logos, names and url
|
| 11 |
- [x] navigate the url, extract the ssl certificate and look for =img= and tags
|
| 12 |
with =id= or =class= logo (needs more heuristics) to make a db of logos
|
| 13 |
- [x] screenshot the page and slice it into tiles generating YOLO annotations for
|
| 14 |
the detected logos
|
| 15 |
- [x] augment data using the logos database and the logoless tiles as background images
|
| 16 |
+
- [2/3] train YOLO
|
| 17 |
+
- [x] v5
|
| 18 |
+
- [x] v6
|
| 19 |
+
. [ ] v7 (actually slower than v6)
|
| 20 |
- [ ] feed everything to a web extension that will detect the logos in any page
|
| 21 |
and show a warning if the =SSL certificate= mismatches the collected one.
|
| 22 |
|
|
|
|
| 25 |
# build the training dataset
|
| 26 |
docker-compose up --build --remove-orphans -d
|
| 27 |
docker-compose exec python ./run
|
| 28 |
+
|
| 29 |
# run the training on your machine or collab
|
| 30 |
# https://colab.research.google.com/drive/10R7uwVJJ1R1k6oTjbkkhxPDka7COK-WE
|
| 31 |
git clone https://github.com/ultralytics/yolov5 # clone repo
|
| 32 |
pip install -U -r yolov5/requirements.txt # install dependencies
|
| 33 |
python3 yolov5/train.py --img 416 --batch 80 --epochs 100 --data ./ia/data.yaml --cfg ./ia/yolov5s.yaml --weights ''
|
| 34 |
+
|
| 35 |
#+end_src
|
| 36 |
|
| 37 |
* research
|
|
|
|
| 43 |
there were a lot of augmentation solutions out there, because it had better
|
| 44 |
piplines and multicore support we went with:
|
| 45 |
- https://github.com/aleju/imgaug
|
| 46 |
+
|
| 47 |
but leaving the other here for refs
|
| 48 |
- https://github.com/srp-31/Data-Augmentation-for-Object-Detection-YOLO-
|
| 49 |
- https://github.com/mdbloice/Augmentor
|
|
|
|
| 58 |
https://stackoverflow.com/questions/6566545/is-there-any-way-to-access-certificate-information-from-a-chrome-extension
|
| 59 |
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest#accessing_security_information
|
| 60 |
https://chromium-review.googlesource.com/c/chromium/src/+/644858
|
| 61 |
+
|
| 62 |
+
** papers
|
| 63 |
+
https://logomotive.sidnlabs.nl/downloads/LogoMotive_paper.pdf
|