| # 如何加入技术交流QQ群? | |
| - 打开QQ→点击群聊搜索→搜索群号637093648→输入问题答案:卷卷卷卷卷→进入群聊→准备接受图灵测试(bushi) | |
| - 前往QQ搜索Pocky群:677104663(超多大佬),问题答案:multi level intermediate representation | |
| # 如何看作者b站直播? | |
| - nihui的bilibili直播间:[水竹院落](https://live.bilibili.com/1264617) | |
| # 编译 | |
| - ## 怎样下载完整源码? | |
| git clone --recursive https://github.com/Tencent/ncnn/ | |
| 或者 | |
| 下载 [ncnn-xxxxx-full-source.zip](https://github.com/Tencent/ncnn/releases) | |
| - ## 怎么交叉编译?cmake 工具链怎么设置啊? | |
| 参见 https://github.com/Tencent/ncnn/wiki/how-to-build | |
| - ## The submodules were not downloaded! Please update submodules with "git submodule update --init" and try again | |
| 如上,下载完整源码。或者按提示执行: git submodule update --init | |
| - ## Could NOT find Protobuf (missing: Protobuf_INCLUDE_DIR) | |
| sudo apt-get install libprotobuf-dev protobuf-compiler | |
| - ## Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) | |
| https://github.com/Tencent/ncnn/issues/1873 | |
| - ## Could not find a package configuration file provided by "OpenCV" with any of the following names: OpenCVConfig.cmake opencv-config.cmake | |
| sudo apt-get install libopencv-dev | |
| 或者自行编译安装,set(OpenCV_DIR {OpenCVConfig.cmake所在目录}) | |
| - ## Could not find a package configuration file provided by "ncnn" with any of the following names: ncnnConfig.cmake ncnn-config.cmake | |
| set(ncnn_DIR {ncnnConfig.cmake所在目录}) | |
| - ## 找不到 Vulkan, | |
| cmake版本 3.10,否则没有带 FindVulkan.cmake | |
| android-api >= 24 | |
| macos 要先执行安装脚本 | |
| - ## 如何安装 vulkan sdk | |
| - ## 找不到库(需要根据系统/编译器指定) | |
| undefined reference to __kmpc_for_static_init_4 __kmpc_for_static_fini __kmpc_fork_call ... | |
| 需要链接openmp库 | |
| undefined reference to vkEnumerateInstanceExtensionProperties vkGetInstanceProcAddr vkQueueSubmit ... | |
| 需要 vulkan-1.lib | |
| undefined reference to glslang::InitializeProcess() glslang::TShader::TShader(EShLanguage) ... | |
| 需要 glslang.lib OGLCompiler.lib SPIRV.lib OSDependent.lib | |
| undefined reference to AAssetManager_fromJava AAssetManager_open AAsset_seek ... | |
| find_library和target_like_libraries中增加 android | |
| find_package(ncnn) | |
| - ## undefined reference to typeinfo for ncnn::Layer | |
| opencv rtti -> opencv-mobile | |
| - ## undefined reference to __cpu_model | |
| 升级编译器 / libgcc_s libgcc | |
| - ## unrecognized command line option "-mavx2" | |
| 升级 gcc | |
| - ## 为啥自己编译的ncnn android库特别大? | |
| https://github.com/Tencent/ncnn/wiki/build-for-android.zh 以及见 如何裁剪更小的 ncnn 库 | |
| - ## ncnnoptimize和自定义层 | |
| 先ncnnoptimize再增加自定义层,避免ncnnoptimize不能处理自定义层保存。 | |
| - ## rtti/exceptions冲突 | |
| 产生原因是项目工程中使用的库配置不一样导致冲突,根据自己的实际情况分析是需要开启还是关闭。ncnn默认是ON,在重新编译ncnn时增加以下2个参数即可: | |
| - 开启:-DNCNN_DISABLE_RTTI=OFF -DNCNN_DISABLE_EXCEPTION=OFF | |
| - 关闭:-DNCNN_DISABLE_RTTI=ON -DNCNN_DISABLE_EXCEPTION=ON | |
| - ## error: undefined symbol: ncnn::Extractor::extract(char const*, ncnn::Mat&) | |
| 可能的情况: | |
| - 尝试升级 Android Studio 的 NDK 版本 | |
| - ## CMake 3.14.0 or higher is required. You are running version 2.8.12.2 | |
| ```shell | |
| wget https://github.com/Kitware/CMake/releases/download/v3.18.2/cmake-3.18.2-Linux-x86_64.tar.gz | |
| tar zxvf cmake-3.18.2-Linux-x86_64.tar.gz | |
| mv cmake-3.18.2-Linux-x86_64 /opt/cmake-3.18.2 | |
| ln -sf /opt/cmake-3.18.2/bin/* /usr/bin/ | |
| ``` | |
| # 怎样添加ncnn库到项目中?cmake方式怎么用? | |
| 编译ncnn,make install。linux/windows set/export ncnn_DIR 指向 install目录下包含ncnnConfig.cmake 的目录 | |
| - ## android | |
| - ## ios | |
| - ## linux | |
| - ## windows | |
| - ## macos | |
| - ## arm linux | |
| # 转模型问题 | |
| - ## caffe | |
| `./caffe2ncnn caffe.prototxt caffe.caffemodel ncnn.param ncnn.bin` | |
| - ## mxnet | |
| ` ./mxnet2ncnn mxnet-symbol.json mxnet.params ncnn.param ncnn.bin` | |
| - ## darknet | |
| [https://github.com/xiangweizeng/darknet2ncnn](https://github.com/xiangweizeng/darknet2ncnn) | |
| - ## pytorch - onnx | |
| [use ncnn with pytorch or onnx](https://github.com/Tencent/ncnn/wiki/use-ncnn-with-pytorch-or-onnx) | |
| - ## tensorflow 1.x/2.x - keras | |
| [https://github.com/MarsTechHAN/keras2ncnn](https://github.com/MarsTechHAN/keras2ncnn) **[@MarsTechHAN](https://github.com/MarsTechHAN)** | |
| - ## tensorflow 2.x - mlir | |
| [通过MLIR将tensorflow2模型转换到ncnn](https://zhuanlan.zhihu.com/p/152535430) **@[nihui](https://www.zhihu.com/people/nihui-2)** | |
| - ## Shape not supported yet! Gather not supported yet! Cast not supported yet! | |
| onnx-simplifier 静态shape | |
| - ## convertmodel | |
| [https://convertmodel.com/](https://convertmodel.com/) **[@大老师](https://github.com/daquexian)** | |
| - ## netron | |
| [https://github.com/lutzroeder/netron](https://github.com/lutzroeder/netron) | |
| - ## 怎么生成有固定 shape 信息的模型? | |
| Input 0=w 1=h 2=c | |
| - ## why gpu能更快 | |
| - ## ncnnoptimize 怎么转成 fp16 模型 | |
| `ncnnoptimize model.param model.bin yolov5s-opt.param yolov5s-opt.bin 65536` | |
| - ## ncnnoptimize 怎样查看模型的 FLOPS / 内存占用情况 | |
| - ## 怎么修改模型支持动态 shape? | |
| Interp Reshape | |
| - ## 如何将模型转换为代码内嵌到程序里? | |
| ncnn2mem | |
| - ## 如何加密模型? | |
| https://zhuanlan.zhihu.com/p/268327784 | |
| - ## Linux下转的ncnn模型,Windows/MacOS/Android/.. 也能直接用吗? | |
| Yes,全平台通用 | |
| - ## 如何去掉后处理,再导出 onnx? | |
| 检测: | |
| 参考up的一篇文章<https://zhuanlan.zhihu.com/p/128974102>,步骤三就是去掉后处理,再导出onnx,其中去掉后处理可以是项目内测试时去掉后续步骤的结果。 | |
| - ## pytorch 有的层导不出 onnx 怎么办? | |
| 方式一: | |
| ONNX_ATEN_FALLBACK | |
| 完全自定义的op,先改成能导出的(如 concat slice),转到 ncnn 后再修改 param | |
| 方式二: | |
| 可以使用PNNX来试试,参考以下文章大概说明: | |
| 1. [Windows/Linux/macOS 编译 PNNX 步骤](https://zhuanlan.zhihu.com/p/431833958) | |
| 2. [5分钟学会!用 PNNX 转换 TorchScript 模型到 ncnn 模型](https://zhuanlan.zhihu.com/p/427512763) | |
| # 使用 | |
| - ## vkEnumeratePhysicalDevices failed -3 | |
| - ## vkCreateInstance failed -9 | |
| 出现此类问题请先更新GPU驱动。Please upgrade your GPU driver if you encounter this crash or error. | |
| 这里提供了一些品牌的GPU驱动下载网址.We have provided some drivers' download pages here. | |
| [Intel](https://downloadcenter.intel.com/product/80939/Graphics-Drivers),[AMD](https://www.amd.com/en/support),[Nvidia](https://www.nvidia.com/Download/index.aspx) | |
| - ## ModuleNotFoundError: No module named 'ncnn.ncnn' | |
| python setup.py develop | |
| - ## fopen nanodet-m.param failed | |
| 文件路径 working dir | |
| File not found or not readable. Make sure that XYZ.param/XYZ.bin is accessible. | |
| - ## find_blob_index_by_name data / output / ... failed | |
| layer name vs blob name | |
| param.bin 应该用 xxx.id.h 的枚举 | |
| - ## parse magic failed | |
| - ## param is too old, please regenerate | |
| 模型本身有问题 | |
| Your model file is being the old format converted by an old caffe2ncnn tool. | |
| Checkout the latest ncnn code, build it and regenerate param and model binary files, and that should work. | |
| Make sure that your param file starts with the magic number 7767517. | |
| you may find more info on use-ncnn-with-alexnet | |
| When adding the softmax layer yourself, you need to add 1=1 | |
| - ## set_vulkan_compute failed, network use_vulkan_compute disabled | |
| 你应该在 load_param / load_model 之前设置 net.opt.use_vulkan_compute = true; | |
| - ## 多个blob输入,多个blob输出,怎么做? | |
| 多次执行`ex.input()` 和 `ex.extract()` | |
| ``` | |
| ex.input("data1", in_1); | |
| ex.input("data2", in_2); | |
| ex.extract("output1", out_1); | |
| ex.extract("output2", out_2); | |
| ``` | |
| - ## Extractor extract 多次会重复计算吗? | |
| 不会 | |
| - ## 如何看每一层的耗时? | |
| cmake -DNCNN_BENCHMARK=ON .. | |
| - ## 如何转换 cv::Mat CV_8UC3 BGR 图片 | |
| from_pixels to_pixels | |
| - ## 如何转换 float 数据为 ncnn::Mat | |
| 首先,自己申请的内存需要自己管理,此时ncnn::Mat不会自动给你释放你传过来的float数据 | |
| ``` c++ | |
| std::vector<float> testData(60, 1.0); // 利用std::vector<float>自己管理内存的申请和释放 | |
| ncnn::Mat in1 = ncnn::Mat(60, (void*)testData.data()).reshape(4, 5, 3); // 把float数据的指针转成void*传过去即可,甚至还可以指定维度(up说最好使用reshape用来解决channel gap) | |
| float* a = new float[60]; // 自己new一块内存,后续需要自己释放 | |
| ncnn::Mat in2 = ncnn::Mat(60, (void*)a).reshape(4, 5, 3).clone(); // 使用方法和上面相同,clone() to transfer data owner | |
| ``` | |
| - ## 如何初始化 ncnn::Mat 为全 0 | |
| `mat.fill(0.f);` | |
| - ## 如何查看/获取版本号 | |
| cmake时会打印 | |
| c_api.h ncnn_version() | |
| 自己拼 1.0+yyyymmdd | |
| - ## 如何转换 yuv 数据 | |
| yuv420sp2rgb yuv420sp2rgb_nv12 | |
| **[@metarutaiga](https://github.com/metarutaiga/xxYUV)** | |
| - ## 如何 resize crop rotate 图片 | |
| [efficient roi resize rotate](https://github.com/Tencent/ncnn/wiki/efficient-roi-resize-rotate) | |
| - ## 如何人脸5点对齐 | |
| get_affine_transform | |
| warpaffine_bilinear_c3 | |
| ```c | |
| // 计算变换矩阵 并且求逆变换 | |
| int type = 0; // 0->区域外填充为v[0],v[1],v[2], -233->区域外不处理 | |
| unsigned int v = 0; | |
| float tm[6]; | |
| float tm_inv[6]; | |
| // 人脸区域在原图上的坐标和宽高 | |
| float src_x = target->det.rect.x / target->det.w * pIveImageU8C3->u32Width; | |
| float src_y = target->det.rect.y / target->det.h * pIveImageU8C3->u32Height; | |
| float src_w = target->det.rect.w / target->det.w * pIveImageU8C3->u32Width; | |
| float src_h = target->det.rect.h / target->det.h * pIveImageU8C3->u32Height; | |
| float point_src[10] = { | |
| src_x + src_w * target->attr.land[0][0], src_x + src_w * target->attr.land[0][1], | |
| src_x + src_w * target->attr.land[1][0], src_x + src_w * target->attr.land[1][1], | |
| src_x + src_w * target->attr.land[2][0], src_x + src_w * target->attr.land[2][1], | |
| src_x + src_w * target->attr.land[3][0], src_x + src_w * target->attr.land[3][1], | |
| src_x + src_w * target->attr.land[4][0], src_x + src_w * target->attr.land[4][1], | |
| }; | |
| float point_dst[10] = { // +8 是因为我们处理112*112的图 | |
| 30.2946f + 8.0f, 51.6963f, | |
| 65.5318f + 8.0f, 51.5014f, | |
| 48.0252f + 8.0f, 71.7366f, | |
| 33.5493f + 8.0f, 92.3655f, | |
| 62.7299f + 8.0f, 92.2041f, | |
| }; | |
| // 第一种方式:先计算变换在求逆 | |
| AffineTrans::get_affine_transform(point_src, point_dst, 5, tm); | |
| AffineTrans::invert_affine_transform(tm, tm_inv); | |
| // 第二种方式:直接拿到求逆的结果 | |
| // AffineTrans::get_affine_transform(point_dst, point_src, 5, tm_inv); | |
| // rgb 分离的,所以要单独处理 | |
| for(int c = 0; c < 3; c++) | |
| { | |
| unsigned char* pSrc = malloc(xxx); | |
| unsigned char* pDst = malloc(xxx); | |
| ncnn::warpaffine_bilinear_c1(pSrc, SrcWidth, SrcHeight, SrcStride[c], pDst, DstWidth, DstHeight, DstStride[c], tm_inv, type, v); | |
| } | |
| // rgb packed则可以一次处理 | |
| ncnn::warpaffine_bilinear_c3(pSrc, SrcWidth, SrcHeight, SrcStride, pDst, DstWidth, DstHeight, DstStride, tm_inv, type, v); | |
| ``` | |
| - ## 如何获得中间层的blob输出 | |
| ncnn::Mat output; | |
| ex.extract("your_blob_name", output); | |
| - ## 为什么我使用GPU,但是GPU占用为0 | |
| windows 10 任务管理器 - 性能选项卡 - GPU - 选择其中一个视图左上角的下拉箭头切换到 Compute_0 / Compute_1 / Cuda | |
| 你还可以安装软件:GPU-Z | |
| - ## layer XYZ not exists or registered | |
| Your network contains some operations that are not implemented in ncnn. | |
| You may implement them as custom layer followed in how-to-implement-custom-layer-step-by-step. | |
| Or you could simply register them as no-op if you are sure those operations make no sense. | |
| ``` | |
| class Noop : public ncnn::Layer {}; | |
| DEFINE_LAYER_CREATOR(Noop) | |
| net.register_custom_layer("LinearRegressionOutput", Noop_layer_creator); | |
| net.register_custom_layer("MAERegressionOutput", Noop_layer_creator); | |
| ``` | |
| - ## network graph not ready | |
| You shall call Net::load_param() first, then Net::load_model(). | |
| This error may also happens when Net::load_param() failed, but not properly handled. | |
| For more information about the ncnn model load api, see ncnn-load-model | |
| - ## memory not 32-bit aligned at XYZ | |
| The pointer passed to Net::load_param() or Net::load_model() is not 32bit aligned. | |
| In practice, the head pointer of std::vector is not guaranteed to be 32bit aligned. | |
| you can store your binary buffer in ncnn::Mat structure, its internal memory is aligned. | |
| - ## crash on android with '__kmp_abort_process' | |
| This usually happens if you bundle multiple shared library with openmp linked | |
| It is actually an issue of the android ndk https://github.com/android/ndk/issues/1028 | |
| On old android ndk, modify the link flags as | |
| -Wl,-Bstatic -lomp -Wl,-Bdynamic | |
| For recent ndk >= 21 | |
| -fstatic-openmp | |
| - ## dlopen failed: library "libomp.so" not found | |
| Newer android ndk defaults to dynamic openmp runtime | |
| modify the link flags as | |
| -fstatic-openmp -fopenmp | |
| - ## crash when freeing a ncnn dynamic library(.dll/.so) built with openMP | |
| for optimal performance, the openmp threadpool spin waits for about a second prior to shutting down in case more work becomes available. | |
| If you unload a dynamic library that's in the process of spin-waiting, it will crash in the manner you see (most of the time). | |
| Just set OMP_WAIT_POLICY=passive in your environment, before calling loadlibrary. or Just wait a few seconds before calling freelibrary. | |
| You can also use the following method to set environment variables in your code: | |
| for msvc++: | |
| SetEnvironmentVariable(_T("OMP_WAIT_POLICY"), _T("passive")); | |
| for g++: | |
| setenv("OMP_WAIT_POLICY", "passive", 1) | |
| reference: https://stackoverflow.com/questions/34439956/vc-crash-when-freeing-a-dll-built-with-openmp | |
| # 跑出来的结果对不上 | |
| [ncnn-produce-wrong-result](https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result) | |
| - ## 如何打印 ncnn::Mat 的值? | |
| ```C++ | |
| void pretty_print(const ncnn::Mat& m) | |
| { | |
| for (int q=0; q<m.c; q++) | |
| { | |
| const float* ptr = m.channel(q); | |
| for (int y=0; y<m.h; y++) | |
| { | |
| for (int x=0; x<m.w; x++) | |
| { | |
| printf("%f ", ptr[x]); | |
| } | |
| ptr += m.w; | |
| printf("\n"); | |
| } | |
| printf("------------------------\n"); | |
| } | |
| } | |
| ``` | |
| In Android Studio, `printf` will not work, you can use `__android_log_print` instead. Example : | |
| ```C++ | |
| #include <android/log.h> // Don't forget this | |
| void pretty_print(const ncnn::Mat& m) | |
| { | |
| for (int q=0; q<m.c; q++) | |
| { | |
| for (int y=0; y<m.h; y++) | |
| { | |
| for (int x=0; x<m.w; x++) | |
| { | |
| __android_log_print(ANDROID_LOG_DEBUG,"LOG_TAG","ncnn Mat is : %f", m.channel(q).row(y)[x]); | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| - ## 如何可视化 ncnn::Mat 的值? | |
| ``` | |
| void visualize(const char* title, const ncnn::Mat& m) | |
| { | |
| std::vector<cv::Mat> normed_feats(m.c); | |
| for (int i=0; i<m.c; i++) | |
| { | |
| cv::Mat tmp(m.h, m.w, CV_32FC1, (void*)(const float*)m.channel(i)); | |
| cv::normalize(tmp, normed_feats[i], 0, 255, cv::NORM_MINMAX, CV_8U); | |
| cv::cvtColor(normed_feats[i], normed_feats[i], cv::COLOR_GRAY2BGR); | |
| // check NaN | |
| for (int y=0; y<m.h; y++) | |
| { | |
| const float* tp = tmp.ptr<float>(y); | |
| uchar* sp = normed_feats[i].ptr<uchar>(y); | |
| for (int x=0; x<m.w; x++) | |
| { | |
| float v = tp[x]; | |
| if (v != v) | |
| { | |
| sp[0] = 0; | |
| sp[1] = 0; | |
| sp[2] = 255; | |
| } | |
| sp += 3; | |
| } | |
| } | |
| } | |
| int tw = m.w < 10 ? 32 : m.w < 20 ? 16 : m.w < 40 ? 8 : m.w < 80 ? 4 : m.w < 160 ? 2 : 1; | |
| int th = (m.c - 1) / tw + 1; | |
| cv::Mat show_map(m.h * th, m.w * tw, CV_8UC3); | |
| show_map = cv::Scalar(127); | |
| // tile | |
| for (int i=0; i<m.c; i++) | |
| { | |
| int ty = i / tw; | |
| int tx = i % tw; | |
| normed_feats[i].copyTo(show_map(cv::Rect(tx * m.w, ty * m.h, m.w, m.h))); | |
| } | |
| cv::resize(show_map, show_map, cv::Size(0,0), 2, 2, cv::INTER_NEAREST); | |
| cv::imshow(title, show_map); | |
| } | |
| ``` | |
| - ## 总是输出第一张图的结果 | |
| 复用 Extractor?! | |
| - ## 启用fp16时的精度有差异 | |
| net.opt.use_fp16_packed = false; | |
| net.opt.use_fp16_storage = false; | |
| net.opt.use_fp16_arithmetic = false; | |
| [ncnn-produce-wrong-result](https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result) | |
| # 如何跑得更快?内存占用更少?库体积更小? | |
| - ## fp32 fp16 | |
| - ## 大小核绑定 | |
| ncnn::set_cpu_powersave(int)绑定大核或小核 | |
| 注意windows系统不支持绑核。 | |
| ncnn支持不同的模型运行在不同的核心。假设硬件平台有2个大核,4个小核,你想把netA运行在大核,netB运行在小核。 | |
| 可以通过std::thread or pthread创建两个线程,运行如下代码: | |
| 0:全部 | |
| 1:小核 | |
| 2:大核 | |
| ``` | |
| void thread_1() | |
| { | |
| ncnn::set_cpu_powersave(2); // bind to big cores | |
| netA.opt.num_threads = 2; | |
| } | |
| void thread_2() | |
| { | |
| ncnn::set_cpu_powersave(1); // bind to little cores | |
| netB.opt.num_threads = 4; | |
| } | |
| ``` | |
| [openmp-best-practice.zh.md](https://github.com/Tencent/ncnn/blob/master/docs/how-to-use-and-FAQ/openmp-best-practice.zh.md) | |
| - ## 查看 CPU 或 GPU 数量 | |
| get_cpu_count | |
| get_gpu_count | |
| - ## ncnnoptimize | |
| 使用方式一: | |
| - ./ncnnoptimize ncnn.param ncnn.bin new.param new.bin flag | |
| <br/>注意这里的flag指的是fp32和fp16,其中0指的是fp32,1指的是fp16 | |
| 使用方式二: | |
| - ./ncnnoptimize ncnn.param ncnn.bin new.param new.bin flag cutstartname cutendname | |
| <br/>cutstartname:模型截取的起点 | |
| <br/>cutendname:模型截取的终点 | |
| - ## 如何使用量化工具? | |
| [Post Training Quantization Tools](https://github.com/Tencent/ncnn/tree/master/tools/quantize) | |
| - ## 如何设置线程数? | |
| opt.num_threads | |
| - ## 如何降低CPU占用率? | |
| net.opt.openmp_blocktime = 0; | |
| OMP_WAIT_POLICY=passive | |
| - ## 如何 batch inference? | |
| ``` | |
| int max_batch_size = vkdev->info.compute_queue_count; | |
| ncnn::Mat inputs[1000]; | |
| ncnn::Mat outputs[1000]; | |
| #pragma omp parallel for num_threads(max_batch_size) | |
| for (int i=0; i<1000; i++) | |
| { | |
| ncnn::Extractor ex = net1.create_extractor(); | |
| ex.input("data", inputs[i]); | |
| ex.extract("prob", outputs[i]); | |
| } | |
| ``` | |
| - ## partial graph inference | |
| 先 extract 分类,判断后,再 extract bbox | |
| - ## 如何启用 bf16s 加速? | |
| ``` | |
| net.opt.use_packing_layout = true; | |
| net.opt.use_bf16_storage = true; | |
| ``` | |
| [用bf16加速ncnn](https://zhuanlan.zhihu.com/p/112564372) **@[nihui](https://www.zhihu.com/people/nihui-2)** | |
| A53 | |
| - ## 如何裁剪更小的 ncnn 库? | |
| [build-minimal-library](https://github.com/Tencent/ncnn/wiki/build-minimal-library) | |
| - ## net.opt sgemm winograd fp16_storage 各是有什么作用? | |
| 对内存消耗的影响 | |
| - ## 如何解决显卡进入节能模式造成的一系列问题? | |
| nVidia显卡(Intel和AMD估计也有)会在它认为的所谓空闲模式下,自动进入 `节能模式`,显存和核心频率就都会降低。 | |
| 简单来说就是如果你的计算任务是 `非连续的`,那么可能会让耗时看起来非常 `不均匀`,当期间有运算空闲间隔发生,显卡进入节能模式,则会在下一次冷启动时发生计算耗时远超正常耗时几倍的情况,如下日志所示: | |
| ```cpp | |
| //开始播放 | |
| Total: 162ms, Diff: 0ms, GLTex2Mat: 7ms, calc: 152ms, Mat2GLTex: 3ms | |
| Total: 43ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 35ms, Mat2GLTex: 2ms | |
| Total: 45ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 37ms, Mat2GLTex: 3ms | |
| Total: 40ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 32ms, Mat2GLTex: 4ms | |
| //暂停3秒 | |
| //继续播放 | |
| Total: 190ms, Diff: 0ms, GLTex2Mat: 9ms, calc: 177ms, Mat2GLTex: 3ms | |
| Total: 134ms, Diff: 0ms, GLTex2Mat: 5ms, calc: 110ms, Mat2GLTex: 18ms | |
| Total: 40ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 34ms, Mat2GLTex: 2ms | |
| Total: 42ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 36ms, Mat2GLTex: 2ms | |
| Total: 47ms, Diff: 0ms, GLTex2Mat: 5ms, calc: 38ms, Mat2GLTex: 3ms | |
| ... | |
| ``` | |
| 在对时间不敏感的项目上,这个问题没什么大不了的,完全可以忽略,但是有些业务场景上必须精准推估下一帧及其未来几帧的从上传、计算到渲染的耗时情况,则这种现象将会给开发者打开些许困扰。 | |
| ### 3种解决方法 | |
| * 联系显卡厂商,让其更新驱动将你的应用加入到免节能模式的白名单。 | |
| * 优点:你什么都不用改。缺点:沟通困难,很可能显卡厂商根本不理你。 | |
| * [显卡控制面板] - [管理3D设置] - [电源管理模式],改成:[最高性能优先]。 | |
| * 优点:不用改代码。缺点:如果是部署端是小白用户,需要编写手册手把手教他。 | |
| * 可以空闲(暂停)时定期灌一些心跳计算包的任务进去(放1x1小图)让GPU维持在高性能状态。 | |
| * 优点:需要改代码。缺点:不低碳不环保。 | |
| # 白嫖项目 | |
| - ## nanodet | |
| # 其他 | |
| - ## up主用的什么系统/编辑器/开发环境? | |
| | 软件类型 | 软件名称 | | |
| | ------------| ----------- | | |
| | 系统 | Fedora | | |
| | 桌面环境 | KDE | | |
| | 编辑器 | Kate | | |
| | 画草图 | kolourpaint | | |
| | 画函数图像 | kmplot | | |
| | bilibili直播 | OBS | | |