Spaces:

leuschnm
/

CrowdCounting-with-Scale-Adaptive-Selection-SASNet

Running

App Files Files Community

leuschnm commited on Jan 23, 2023

Commit

d8cab4b

1 Parent(s): 1dab846

fix model

Browse files

Files changed (2) hide show

app.py +16 -5
environment.yaml +0 -161

app.py CHANGED Viewed

@@ -22,6 +22,7 @@ import gradio as gr
 import torchvision.transforms as standard_transforms
 from torch.utils.data import DataLoader
 from torch.utils.data import Dataset
 warnings.filterwarnings('ignore')
@@ -96,14 +97,23 @@ with gr.Blocks() as demo:
     We implemented a image crowd counting model with VGG16 following the paper of Song et. al (2021).
     ## Abstract
-In this paper, we address the large scale variation problem in crowd counting by taking full advantage of the multi-scale feature representations in a multi-level network. We implement such an idea by keeping the counting error of a patch as small as possible with a proper feature level selection strategy, since a specific feature level tends to perform better for a certain range of scales. However, without scale annotations, it is sub-optimal and error-prone to manually assign the predictions for heads of different scales to specific feature levels. Therefore, we propose a Scale-Adaptive Selection Network (SASNet), which automatically learns the internal correspondence between the scales and the feature levels. Instead of directly using the predictions from the most appropriate feature level as the final estimation, our SASNet also considers the predictions from other feature levels via weighted average, which helps to mitigate the gap between discrete feature levels and continuous scale variation. Since the heads in a local patch share roughly a same scale, we conduct the adaptive selection strategy in a patch-wise style. However, pixels within a patch contribute different counting errors due to the various difficulty degrees of learning. Thus, we further propose a Pyramid Region Awareness Loss (PRA Loss) to recursively select the most hard sub-regions within a patch until reaching the pixel level. With awareness of whether the parent patch is over-estimated or under-estimated, the fine-grained optimization with the PRA Loss for these region-aware hard pixels helps to alleviate the inconsistency problem between training target and evaluation metric. The state-of-the-art results on four datasets demonstrate the superiority of our approach.
-The code will be available at: https://github.com/TencentYoutuResearch/CrowdCounting-SASNet.
     ## References
-    Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., … Ma, J. (2021). To Choose or to Fuse? Scale Selection for Crowd Counting. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21).
     """)
-    image_button = gr.Button("Count the Crowd!")
     with gr.Row():
         with gr.Column():
             image_input = gr.Image(type="pil")
@@ -112,6 +122,7 @@ The code will be available at: https://github.com/TencentYoutuResearch/CrowdCoun
             image_output = gr.Plot()
         with gr.Column():
             text_output = gr.Label()
     image_button.click(predict, inputs=image_input, outputs=[text_output, image_output])

 import torchvision.transforms as standard_transforms
 from torch.utils.data import DataLoader
 from torch.utils.data import Dataset
+from model import SASNet
 warnings.filterwarnings('ignore')
     We implemented a image crowd counting model with VGG16 following the paper of Song et. al (2021).
     ## Abstract
+    In this paper, we address the large scale variation problem in crowd counting by taking full advantage of the multi-scale feature representations in a multi-level network. We
+    implement such an idea by keeping the counting error of a patch as small as possible with a proper feature level selection strategy, since a specific feature level tends to perform
+    better for a certain range of scales. However, without scale annotations, it is sub-optimal and error-prone to manually assign the predictions for heads of different scales to
+    specific feature levels. Therefore, we propose a Scale-Adaptive Selection Network (SASNet), which automatically learns the internal correspondence between the scales and the feature
+    levels. Instead of directly using the predictions from the most appropriate feature level as the final estimation, our SASNet also considers the predictions from other feature
+    levels via weighted average, which helps to mitigate the gap between discrete feature levels and continuous scale variation. Since the heads in a local patch share roughly a same
+    scale, we conduct the adaptive selection strategy in a patch-wise style. However, pixels within a patch contribute different counting errors due to the various difficulty degrees of
+    learning. Thus, we further propose a Pyramid Region Awareness Loss (PRA Loss) to recursively select the most hard sub-regions within a patch until reaching the pixel level. With
+    awareness of whether the parent patch is over-estimated or under-estimated, the fine-grained optimization with the PRA Loss for these region-aware hard pixels helps to alleviate the
+    inconsistency problem between training target and evaluation metric. The state-of-the-art results on four datasets demonstrate the superiority of our approach.
+    The code will be available at: https://github.com/TencentYoutuResearch/CrowdCounting-SASNet.
     ## References
+    Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., … Ma, J. (2021). To Choose or to Fuse? Scale Selection for Crowd Counting.
+    The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21).
     """)
     with gr.Row():
         with gr.Column():
             image_input = gr.Image(type="pil")
             image_output = gr.Plot()
         with gr.Column():
             text_output = gr.Label()
+            image_button = gr.Button("Count the Crowd!")
     image_button.click(predict, inputs=image_input, outputs=[text_output, image_output])

environment.yaml DELETED Viewed

@@ -1,161 +0,0 @@
-name: SASNet
-channels:
-  - pytorch
-  - nvidia
-  - anaconda
-  - defaults
-dependencies:
-  - _libgcc_mutex=0.1=main
-  - _openmp_mutex=5.1=1_gnu
-  - _pytorch_select=0.1=cpu_0
-  - backcall=0.2.0=pyhd3eb1b0_0
-  - blas=1.0=mkl
-  - ca-certificates=2022.07.19=h06a4308_0
-  - certifi=2022.6.15=py37h06a4308_0
-  - cffi=1.15.0=py37h7f8727e_0
-  - cuda=12.0.0=0
-  - cuda-cccl=12.0.90=0
-  - cuda-command-line-tools=12.0.0=0
-  - cuda-compiler=12.0.0=0
-  - cuda-cudart=12.0.107=0
-  - cuda-cudart-dev=12.0.107=0
-  - cuda-cudart-static=12.0.107=0
-  - cuda-cuobjdump=12.0.76=0
-  - cuda-cupti=12.0.90=0
-  - cuda-cupti-static=12.0.90=0
-  - cuda-cuxxfilt=12.0.76=0
-  - cuda-demo-suite=12.0.76=0
-  - cuda-documentation=12.0.76=0
-  - cuda-driver-dev=12.0.107=0
-  - cuda-gdb=12.0.90=0
-  - cuda-libraries=12.0.0=0
-  - cuda-libraries-dev=12.0.0=0
-  - cuda-libraries-static=12.0.0=0
-  - cuda-nsight=12.0.78=0
-  - cuda-nsight-compute=12.0.0=0
-  - cuda-nvcc=12.0.76=0
-  - cuda-nvdisasm=12.0.76=0
-  - cuda-nvml-dev=12.0.76=0
-  - cuda-nvprof=12.0.90=0
-  - cuda-nvprune=12.0.76=0
-  - cuda-nvrtc=12.0.76=0
-  - cuda-nvrtc-dev=12.0.76=0
-  - cuda-nvrtc-static=12.0.76=0
-  - cuda-nvtx=12.0.76=0
-  - cuda-nvvp=12.0.90=0
-  - cuda-opencl=12.0.76=0
-  - cuda-opencl-dev=12.0.76=0
-  - cuda-profiler-api=12.0.76=0
-  - cuda-runtime=12.0.0=0
-  - cuda-sanitizer-api=12.0.90=0
-  - cuda-toolkit=12.0.0=0
-  - cuda-tools=12.0.0=0
-  - cuda-visual-tools=12.0.0=0
-  - cudatoolkit=10.2.89=hfd86e86_1
-  - debugpy=1.5.1=py37h295c915_0
-  - decorator=5.1.1=pyhd3eb1b0_0
-  - entrypoints=0.4=py37h06a4308_0
-  - freetype=2.12.1=h4a9f257_0
-  - gds-tools=1.5.0.59=0
-  - giflib=5.2.1=h7b6447c_0
-  - intel-openmp=2022.1.0=h9e868ea_3769
-  - ipykernel=6.9.1=py37h06a4308_0
-  - ipython=7.31.1=py37h06a4308_1
-  - jedi=0.18.1=py37h06a4308_1
-  - jpeg=9e=h7f8727e_0
-  - jupyter_client=7.2.2=py37h06a4308_0
-  - jupyter_core=4.10.0=py37h06a4308_0
-  - lcms2=2.12=h3be6417_0
-  - lerc=3.0=h295c915_0
-  - libcublas=12.0.1.189=0
-  - libcublas-dev=12.0.1.189=0
-  - libcublas-static=12.0.1.189=0
-  - libcufft=11.0.0.21=0
-  - libcufft-dev=11.0.0.21=0
-  - libcufft-static=11.0.0.21=0
-  - libcufile=1.5.0.59=0
-  - libcufile-dev=1.5.0.59=0
-  - libcufile-static=1.5.0.59=0
-  - libcurand=10.3.1.50=0
-  - libcurand-dev=10.3.1.50=0
-  - libcurand-static=10.3.1.50=0
-  - libcusolver=11.4.2.57=0
-  - libcusolver-dev=11.4.2.57=0
-  - libcusolver-static=11.4.2.57=0
-  - libcusparse=12.0.0.76=0
-  - libcusparse-dev=12.0.0.76=0
-  - libcusparse-static=12.0.0.76=0
-  - libdeflate=1.8=h7f8727e_5
-  - libedit=3.1.20221030=h5eee18b_0
-  - libffi=3.2.1=hf484d3e_1007
-  - libgcc-ng=11.2.0=h1234567_1
-  - libgfortran-ng=7.5.0=ha8ba4b0_17
-  - libgfortran4=7.5.0=ha8ba4b0_17
-  - libgomp=11.2.0=h1234567_1
-  - libnpp=12.0.0.30=0
-  - libnpp-dev=12.0.0.30=0
-  - libnpp-static=12.0.0.30=0
-  - libnvjitlink=12.0.76=0
-  - libnvjitlink-dev=12.0.76=0
-  - libnvjpeg=12.0.0.28=0
-  - libnvjpeg-dev=12.0.0.28=0
-  - libnvjpeg-static=12.0.0.28=0
-  - libnvvm-samples=12.0.94=0
-  - libpng=1.6.37=hbc83047_0
-  - libsodium=1.0.18=h7b6447c_0
-  - libstdcxx-ng=11.2.0=h1234567_1
-  - libtiff=4.5.0=hecacb30_0
-  - libwebp=1.2.4=h11a3e52_0
-  - libwebp-base=1.2.4=h5eee18b_0
-  - lz4-c=1.9.4=h6a678d5_0
-  - matplotlib-inline=0.1.2=pyhd3eb1b0_2
-  - mkl=2019.4=243
-  - mkl-service=2.3.0=py37he8ac12f_0
-  - mkl_fft=1.3.0=py37h54f3939_0
-  - mkl_random=1.1.0=py37hd6b4f25_0
-  - ncurses=6.3=h5eee18b_3
-  - nest-asyncio=1.5.5=py37h06a4308_0
-  - ninja=1.10.2=h06a4308_5
-  - ninja-base=1.10.2=hd09550d_5
-  - nsight-compute=2022.4.0.15=0
-  - numpy-base=1.17.0=py37hde5b4d6_0
-  - openssl=1.0.2u=h7b6447c_0
-  - parso=0.8.3=pyhd3eb1b0_0
-  - pexpect=4.8.0=pyhd3eb1b0_3
-  - pickleshare=0.7.5=pyhd3eb1b0_1003
-  - pip=22.3.1=py37h06a4308_0
-  - prompt-toolkit=3.0.20=pyhd3eb1b0_0
-  - ptyprocess=0.7.0=pyhd3eb1b0_2
-  - pycparser=2.21=pyhd3eb1b0_0
-  - pygments=2.11.2=pyhd3eb1b0_0
-  - python=3.7.0=h6e4f718_3
-  - python-dateutil=2.8.2=pyhd3eb1b0_0
-  - pytorch=1.5.0=py3.7_cuda10.2.89_cudnn7.6.5_0
-  - pyzmq=23.2.0=py37h6a678d5_0
-  - readline=7.0=h7b6447c_5
-  - setuptools=65.6.3=py37h06a4308_0
-  - six=1.16.0=pyhd3eb1b0_1
-  - sqlite=3.33.0=h62c20be_0
-  - tk=8.6.12=h1ccaba5_0
-  - torchvision=0.6.0=py37_cu102
-  - tornado=6.1=py37h27cfd23_0
-  - traitlets=5.1.1=pyhd3eb1b0_0
-  - wcwidth=0.2.5=pyhd3eb1b0_0
-  - wheel=0.37.1=pyhd3eb1b0_0
-  - xz=5.2.8=h5eee18b_0
-  - zeromq=4.3.4=h2531618_0
-  - zlib=1.2.13=h5eee18b_0
-  - zstd=1.5.2=ha4553b6_0
-  - pip:
-    - cached-property==1.5.2
-    - cycler==0.11.0
-    - h5py==3.1.0
-    - kiwisolver==1.4.4
-    - matplotlib==3.3.3
-    - numpy==1.19.0
-    - opencv-python==4.4.0.46
-    - pillow==8.0.1
-    - pyparsing==3.0.9
-    - scipy==1.5.4
-    - typing-extensions==4.4.0
-prefix: /home/leuschnm/miniconda3/envs/SASNet