Buckets:
| <img align="left" width="100" height="100" src="doc/mimalloc-logo.png"/> | |
| <div align="right"> | |
| <sup> v3:</sup><a href="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml"><img src="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml/badge.svg?branch=dev3"/></a> | |
| <sup> v2:</sup><a href="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml"><img src="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml/badge.svg?branch=dev2"/></a> | |
| <sup> v1:</sup><a href="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml"><img src="https://github.com/microsoft/mimalloc/actions/workflows/test.yaml/badge.svg?branch=dev"/></a> | |
| <sup>v3:</sup><a href="https://dev.azure.com/Daan0324/mimalloc/_build?definitionId=1&_a=summary"><img src="https://dev.azure.com/Daan0324/mimalloc/_apis/build/status/microsoft.mimalloc?branchName=dev3"/></a> | |
| </div> | |
| # mimalloc | |
| | |
| mimalloc (pronounced "me-malloc") | |
| is a general purpose allocator with excellent [performance](#performance) characteristics. | |
| Initially developed by Daan Leijen for the runtime systems of the | |
| [Koka](https://koka-lang.github.io) and [Lean](https://github.com/leanprover/lean) languages. | |
| Latest release : `v3.3.1` (2026-04-20) recommended. | |
| Latest v2 release: `v2.3.1` (2026-04-20) stable. | |
| Latest v1 release: `v1.9.9` (2026-04-20) legacy. | |
| mimalloc is a drop-in replacement for `malloc` and can be used in other programs | |
| without code changes, for example, on dynamically linked ELF-based systems (Linux, BSD, etc.) you can use it as: | |
| ``` | |
| > LD_PRELOAD=/usr/lib/libmimalloc.so myprogram | |
| ``` | |
| It also includes a way to dynamically override the default allocator in [Windows](#override_on_windows). | |
| Notable aspects of the design include: | |
| - __small and consistent__: the library is about 10k LOC using simple and | |
| consistent data structures. This makes it very suitable | |
| to integrate and adapt in other projects. For runtime systems it | |
| provides hooks for a monotonic _heartbeat_ and deferred freeing (for | |
| bounded worst-case times with reference counting). | |
| Partly due to its simplicity, mimalloc has been ported to many systems (Windows, macOS, | |
| Linux, WASM, various BSD's, Haiku, MUSL, etc) and has excellent support for dynamic overriding. | |
| At the same time, it is an industrial strength allocator that runs (very) large scale | |
| distributed services on thousands of machines with excellent worst case latencies. | |
| - __free list sharding__: instead of one big free list (per size class) we have | |
| many smaller lists per "mimalloc page" which reduces fragmentation and | |
| increases locality -- | |
| things that are allocated close in time get allocated close in memory. | |
| (A mimalloc page contains blocks of one size class and is usually 64KiB on a 64-bit system). | |
| - __free list multi-sharding__: the big idea! Not only do we shard the free list | |
| per mimalloc page, but for each page we have multiple free lists. In particular, there | |
| is one list for thread-local `free` operations, and another one for concurrent `free` | |
| operations. Free-ing from another thread can now be a single CAS without needing | |
| sophisticated coordination between threads. Since there will be | |
| thousands of separate free lists, contention is naturally distributed over the heap, | |
| and the chance of contending on a single location will be low -- this is quite | |
| similar to randomized algorithms like skip lists where adding | |
| a random oracle removes the need for a more complex algorithm. | |
| - __eager page purging__: when a "page" becomes empty (with increased chance | |
| due to free list sharding) the memory is marked to the OS as unused (reset or decommitted) | |
| reducing (real) memory pressure and fragmentation, especially in long running | |
| programs. | |
| - __secure__: _mimalloc_ can be built in secure mode, adding guard pages, | |
| randomized allocation, encrypted free lists, etc. to protect against various | |
| heap vulnerabilities. The performance penalty is usually around 10% on average | |
| over our benchmarks. | |
| - __first-class heaps__: efficiently create and use multiple heaps to allocate across different regions. | |
| A heap can be destroyed at once instead of deallocating each object separately. | |
| New: v3 has true first-class heaps where one can allocate in a heap from any thread. | |
| - __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation | |
| times (_wcat_) (upto OS primitives), bounded space overhead (~0.2% meta-data, with low | |
| internal fragmentation), and has no internal points of contention using only atomic operations. | |
| - __fast__: In our benchmarks (see [below](#performance)), | |
| _mimalloc_ outperforms other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc), | |
| and often uses less memory. A nice property is that it does consistently well over a wide range | |
| of benchmarks. There is also good huge OS page support for larger server programs. | |
| The [documentation](https://microsoft.github.io/mimalloc) gives a full overview of the API. | |
| You can read more on the design of mimalloc in the [technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action) which also has detailed benchmark results. | |
| Enjoy! | |
| ### Versions | |
| There are three maintained versions of mimalloc. These are mostly equal except for how the OS memory is handled. | |
| New development is mostly on v3, while v1 and v2 are maintained with security and bug fixes. | |
| - __v3__: recommended: simplifies the lock-free design of previous versions and improves sharing of | |
| memory between threads. On certain large workloads this version may use | |
| (much) less memory. Also supports true first-class heaps (that can allocate from any thread) | |
| and has more efficient heap-walking (for the CPython GC for example). | |
| (release tags: `v3.x`, development branch `dev3`). | |
| - __v2__: stable mimalloc version. Uses thread-local segments to reduce fragmentation. (release tags: `v2.x`, development branch `dev2` and `main`) | |
| - __v1__: legacy version: initial design of mimalloc (release tags: `v1.9.x`, development branch `dev`). Send PR's against this version if possible. | |
| ### Releases | |
| * 2026-04-20, `v1.9.9`, `v2.3.1`, `v3.3.1`: various bug and security fixes. Special thanks to | |
| @jinpzhanAMD, @res2k, and @GoldJohnKing for their help in improving Windows finalization, and | |
| @Zoxc for his help in finding various issues. | |
| * 2026-04-15, `v1.9.8`, `v2.3.0`, `v3.3.0`: initial support for github (binary) releases, | |
| fix visiting of full pages during collection (performance), | |
| fix THP alignment (performance), fix arm64 cross-compilation on Windows, enable guard pages in debug mode, | |
| always use uncommitted areas between arenas (security), enable static overloading of `malloc` etc. on Windows with the | |
| static CRT (by @Noxybot), fix TLS slot leak on Windows (v3), enable clean DLL load/unload with statically linked | |
| mimalloc (v3), fix race in `mi_heap_destroy` (v3), by default put page meta info separate from allocated | |
| objects (v3,security), fix C++ overrides for emscripten. Various bugs found by DeepTest include: | |
| fix offset for `mi_heap_realloc_aligned`, fix `mi_(w)dupenv_s` buffer size, fix potential overflow in size options, | |
| and error codes for `mi_reallocarr(ay)`. | |
| * 2026-02-03, `v3.2.8` (rc3): Fix thread reinitialize issue on macOS. Fix SIMD codegen bug on older | |
| GCC versions. Extend Windows TLS slot limit from 64 to 1088. Report commit statistics more precise. | |
| Fixes issue in free-page search in arenas. | |
| * 2026-01-15, `v1.9.7`, `v2.2.7`, `v3.2.7` (rc2): Fix zero initializing blocks that were OS allocated. | |
| For v3 various bug and performance fixes. Fix Debian 32-bit compilation. | |
| * 2026-01-08, `v1.9.6`, `v2.2.6`, `v3.2.6` (rc1): Important bug fixes. Many improvements to v3 including | |
| true first-class heaps where one can allocate in heap from any thread, and track statistics per heap as well. | |
| Added `MIMALLOC_ALLOW_THP` option. This is by default enabled except on Android. When THP is detected on v3, | |
| mimalloc will set the `MIMALLOC_MINIMAL_PURGE_SIZE` to 2MiB to avoid breaking up potential THP huge pages. | |
| v3 uses faster TLS access on Windows, and has improved performance for `mi_calloc` and aligned allocations. | |
| Fixed rare race condition on older v3, fixed potential buffer overflow in debug statistics, add API for returning | |
| allocated sizes on allocation and free. | |
| * 2025-06-09, `v1.9.4`, `v2.2.4`, `v3.1.4` (beta) : Some important bug fixes, including a case where OS memory | |
| was not always fully released. Improved v3 performance, build on XBox, fix build on Android, support interpose | |
| for older macOS versions, use MADV_FREE_REUSABLE on macOS, always check commit success, better support for Windows | |
| fixed TLS offset, etc. | |
| * 2025-03-28, `v1.9.3`, `v2.2.3`, `v3.0.3` (beta) : Various small bug and build fixes, including: | |
| fix arm32 pre v7 builds, fix mingw build, get runtime statistics, improve statistic commit counts, | |
| fix execution on non BMI1 x64 systems. | |
| * 2025-03-06, `v1.9.2`, `v2.2.2`, `v3.0.2-beta`: Various small bug and build fixes. | |
| Add `mi_options_print`, `mi_arenas_print`, and the experimental `mi_stat_get` and `mi_stat_get_json`. | |
| Add `mi_thread_set_in_threadpool` and `mi_heap_set_numa_affinity` (v3 only). Add vcpkg portfile. | |
| Upgrade mimalloc-redirect to v1.3.2. `MI_OPT_ARCH` is off by default now but still assumes armv8.1-a on arm64 | |
| for fast atomic operations. Add QNX support. | |
| * 2025-01-03, `v1.8.9`, `v2.1.9`, `v3.0.1-alpha`: Interim release. Support Windows arm64. New [guarded](#guarded) build that can place OS | |
| guard pages behind objects to catch buffer overflows as they occur. | |
| Many small fixes: build on Windows arm64, cygwin, riscV, and dragonfly; fix Windows static library initialization to account for | |
| thread local destructors (in Rust/C++); macOS tag change; macOS TLS slot fix; improve stats; | |
| consistent `mimalloc.dll` on Windows (instead of `mimalloc-override.dll`); fix mimalloc-redirect on Win11 H2; | |
| add 0-byte to canary; upstream CPython fixes; reduce .bss size; allow fixed TLS slot on Windows for improved performance. | |
| * [Older release notes](#older-release-notes) | |
| Special thanks to: | |
| * Sergiy Kuryata for his contributions on reducing memory commit -- especially on Windows with the Windows thread pool (now implemented in v3). | |
| * [David Carlier](https://devnexen.blogspot.com/) (@devnexen) for his many contributions, and making | |
| mimalloc work better on many less common operating systems, like Haiku, Dragonfly, etc. | |
| * Mary Feofanova (@mary3000), Evgeniy Moiseenko, and Manuel Pöter (@mpoeter) for making mimalloc TSAN checkable, and finding | |
| memory model bugs using the [genMC] model checker. | |
| * Weipeng Liu (@pongba), Zhuowei Li, Junhua Wang, and Jakub Szymanski, for their early support of mimalloc and deployment | |
| at large scale services, leading to many improvements in the mimalloc algorithms for large workloads. | |
| * Jason Gibson (@jasongibson) for exhaustive testing on large scale workloads and server environments, and finding complex bugs | |
| in (early versions of) `mimalloc`. | |
| * Manuel Pöter (@mpoeter) and Sam Gross(@colesbury) for finding an ABA concurrency issue in abandoned segment reclamation. Sam also created the [no GIL](https://github.com/colesbury/nogil) Python fork which | |
| uses mimalloc internally. | |
| [genMC]: https://plv.mpi-sws.org/genmc/ | |
| ### Usage | |
| mimalloc is used in various large scale low-latency services and programs, for example: | |
| <a href="https://www.bing.com"><img height="50" align="left" src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Bing_logo.svg"></a> | |
| <a href="https://azure.microsoft.com/"><img height="50" align="left" src="https://upload.wikimedia.org/wikipedia/commons/a/a8/Microsoft_Azure_Logo.svg"></a> | |
| <a href="https://deathstrandingpc.505games.com"><img height="100" src="doc/ds-logo.png"></a> | |
| <a href="https://docs.unrealengine.com/4.26/en-US/WhatsNew/Builds/ReleaseNotes/4_25/"><img height="100" src="doc/unreal-logo.svg"></a> | |
| <a href="https://ablab.github.io/software/spades/"><img height="100" src="doc/spades-logo.png"></a> | |
| # Building | |
| ## Windows | |
| Open `ide/vs2022/mimalloc.sln` in Visual Studio 2022 and build. | |
| The `mimalloc-lib` project builds a static library (in `out/msvc-x64`), while the | |
| `mimalloc-override-dll` project builds a DLL for overriding malloc | |
| in the entire program. | |
| ## Linux, macOS, BSD, etc. | |
| We use [`cmake`](https://cmake.org) as the build system: | |
| ``` | |
| > mkdir -p out/release | |
| > cd out/release | |
| > cmake ../.. | |
| > make | |
| ``` | |
| This builds the library as a shared (dynamic) | |
| library (`.so` or `.dylib`), a static library (`.a`), and | |
| as a single object file (`.o`). | |
| `> sudo make install` (install the library and header files in `/usr/local/lib` and `/usr/local/include`) | |
| You can build the debug version which does many internal checks and | |
| maintains detailed statistics as: | |
| ``` | |
| > mkdir -p out/debug | |
| > cd out/debug | |
| > cmake -DCMAKE_BUILD_TYPE=Debug ../.. | |
| > make | |
| ``` | |
| This will name the shared library as `libmimalloc-debug.so`. | |
| Finally, you can build a _secure_ version that uses guard pages, encrypted free lists, etc., as: | |
| ``` | |
| > mkdir -p out/secure | |
| > cd out/secure | |
| > cmake -DMI_SECURE=ON ../.. | |
| > make | |
| ``` | |
| This will name the shared library as `libmimalloc-secure.so`. | |
| Use `cmake ../.. -LH` to see all the available build options. | |
| The examples use the default compiler. If you like to use another, use: | |
| ``` | |
| > CC=clang CXX=clang++ cmake ../.. | |
| ``` | |
| ## Cmake with Visual Studio | |
| You can also use cmake on Windows. Open a Visual Studio 2022 development prompt | |
| and invoke `cmake` with the right [generator](https://cmake.org/cmake/help/latest/generator/Visual%20Studio%2017%202022.html) | |
| and architecture, like: | |
| ``` | |
| > cmake ..\.. -G "Visual Studio 17 2022" -A x64 -DMI_OVERRIDE=ON | |
| ``` | |
| The cmake build type is specified when actually building, for example: | |
| ``` | |
| > cmake --build . --config=Release | |
| ``` | |
| You can also install the [LLVM toolset](https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170#install-1) | |
| on Windows to build with the `clang-cl` compiler directly: | |
| ``` | |
| > cmake ../.. -G "Visual Studio 17 2022" -T ClangCl | |
| ``` | |
| ## Single Source | |
| You can also directly build the single `src/static.c` file as part of your project without | |
| needing `cmake` at all. Make sure to also add the mimalloc `include` directory to the include path. | |
| # Using the Library | |
| The preferred usage is including `<mimalloc.h>`, linking with | |
| the shared- or static library, and using the `mi_malloc` API exclusively for allocation. For example, | |
| ``` | |
| > gcc -o myprogram -lmimalloc myfile.c | |
| ``` | |
| mimalloc uses only safe OS calls (`mmap` and `VirtualAlloc`) and can co-exist | |
| with other allocators linked to the same program. | |
| If you use `cmake`, you can simply use: | |
| ``` | |
| find_package(mimalloc 1.8 REQUIRED) | |
| ``` | |
| in your `CMakeLists.txt` to find a locally installed mimalloc. Then use either: | |
| ``` | |
| target_link_libraries(myapp PUBLIC mimalloc) | |
| ``` | |
| to link with the shared (dynamic) library, or: | |
| ``` | |
| target_link_libraries(myapp PUBLIC mimalloc-static) | |
| ``` | |
| to link with the static library. See `test\CMakeLists.txt` for an example. | |
| For best performance in C++ programs, it is also recommended to override the | |
| global `new` and `delete` operators. For convenience, mimalloc provides | |
| [`mimalloc-new-delete.h`](include/mimalloc-new-delete.h) which does this for you -- just include it in a single(!) source file in your project. | |
| In C++, mimalloc also provides the `mi_stl_allocator` struct which implements the `std::allocator` | |
| interface. | |
| You can pass environment variables to print verbose messages (`MIMALLOC_VERBOSE=1`) | |
| and statistics (`MIMALLOC_SHOW_STATS=1`) (in the debug version): | |
| ``` | |
| > env MIMALLOC_SHOW_STATS=1 ./cfrac 175451865205073170563711388363 | |
| 175451865205073170563711388363 = 374456281610909315237213 * 468551 | |
| subproc 0 | |
| blocks peak total current block total# | |
| bin S 4: 75.3 KiB 55.2 MiB 0 32 B 1.8 M ok | |
| bin S 6: 31.0 KiB 180.4 KiB 0 48 B 3.8 K ok | |
| bin S 8: 64 B 64 B 0 64 B 1 ok | |
| bin S 9: 160 B 160 B 0 80 B 2 ok | |
| bin S 17: 1.2 KiB 1.2 KiB 0 320 B 4 ok | |
| bin S 21: 640 B 3.1 KiB 0 640 B 5 ok | |
| bin S 33: 5.0 KiB 5.0 KiB 0 5.0 KiB 1 ok | |
| binned : 84.2 Ki 41.5 Mi 0 ok | |
| huge : 0 0 0 ok | |
| total : 84.2 KiB 41.5 MiB 0 | |
| malloc req: 29.7 MiB | |
| pages peak total current block total# | |
| touched : 152.8 KiB 152.8 KiB 152.8 KiB | |
| pages : 8 14 0 ok | |
| abandoned : 1 249 0 ok | |
| reclaima : 0 | |
| reclaimf : 249 | |
| reabandon : 0 | |
| waits : 0 | |
| extended : 38 | |
| retire : 35 | |
| searches : 0.7 avg | |
| arenas peak total current block total# | |
| reserved : 1.0 GiB 1.0 GiB 1.0 GiB | |
| committed : 4.8 MiB 4.8 MiB 4.4 MiB | |
| reset : 0 | |
| purged : 385.5 Ki | |
| arenas : 1 | |
| rollback : 0 | |
| mmaps : 3 | |
| commits : 0 | |
| resets : 1 | |
| purges : 2 | |
| guarded : 0 | |
| heaps : 1 1 1 | |
| process peak total current block total# | |
| threads : 1 1 1 | |
| numa nodes: 1 | |
| elapsed : 0.553 s | |
| process : user: 0.557 s, system: 0.013 s, faults: 29, peak rss: 2.1 MiB, peak commit: 4.8 MiB | |
| ``` | |
| The above model of using the `mi_` prefixed API is not always possible | |
| though in existing programs that already use the standard malloc interface, | |
| and another option is to override the standard malloc interface | |
| completely and redirect all calls to the _mimalloc_ library instead . | |
| ## Environment Options | |
| You can set further options either programmatically (using [`mi_option_set`](https://microsoft.github.io/mimalloc/group__options.html)), or via environment variables: | |
| - `MIMALLOC_SHOW_STATS=1`: show statistics when the program terminates. | |
| - `MIMALLOC_VERBOSE=1`: show verbose messages (including statistics). | |
| - `MIMALLOC_SHOW_ERRORS=1`: show error and warning messages. | |
| Advanced options: | |
| - `MIMALLOC_ARENA_EAGER_COMMIT=2`: turns on eager commit for the large arenas (usually 1GiB) from which mimalloc | |
| allocates segments and pages. Set this to 2 (default) to | |
| only enable this on overcommit systems (e.g. Linux). Set this to 1 to enable explicitly on other systems | |
| as well (like Windows or macOS) which may improve performance (as the whole arena is committed at once). | |
| Note that eager commit only increases the commit but not the actual the peak resident set | |
| (rss) so it is generally ok to enable this. | |
| - `MIMALLOC_PURGE_DELAY=N`: the delay in `N` milli-seconds (by default `1000` in v3) after which mimalloc will purge | |
| OS pages that are not in use. This signals to the OS that the underlying physical memory can be reused which | |
| can reduce memory fragmentation especially in long running (server) programs. Setting `N` to `0` purges immediately when | |
| a page becomes unused which can improve memory usage but also decreases performance. | |
| Setting it to `-1` disables purging completely. | |
| - `MIMALLOC_PURGE_DECOMMITS=1`: By default "purging" memory means unused memory is decommitted (`MEM_DECOMMIT` on Windows, | |
| `MADV_DONTNEED` (which decresease rss immediately) on `mmap` systems). Set this to 0 to instead "reset" unused | |
| memory on a purge (`MEM_RESET` on Windows, generally `MADV_FREE` (which does not decrease rss immediately) on `mmap` systems). | |
| Mimalloc generally does not "free" OS memory but only "purges" OS memory, in other words, it tries to keep virtual | |
| address ranges and decommits within those ranges (to make the underlying physical memory available to other processes). | |
| Further options for large workloads and services: | |
| - `MIMALLOC_ALLOW_THP=1`: By default always allow transparent huge pages (THP) on Linux systems. On Android only this is | |
| by default off. When set to `0`, THP is disabled for the process that mimalloc runs in. If enabled, mimalloc also sets | |
| the `MIMALLOC_MINIMAL_PURGE_SIZE` in v3 to 2MiB to avoid potentially breaking up transparent huge pages when purging memory. | |
| - `MIMALLOC_USE_NUMA_NODES=N`: pretend there are at most `N` NUMA nodes. If not set, the actual NUMA nodes are detected | |
| at runtime. Setting `N` to 1 may avoid problems in some virtual environments. Also, setting it to a lower number than | |
| the actual NUMA nodes is fine and will only cause threads to potentially allocate more memory across actual NUMA | |
| nodes (but this can happen in any case as NUMA local allocation is always a best effort but not guaranteed). | |
| - `MIMALLOC_ALLOW_LARGE_OS_PAGES=0`: Set to 1 to use large OS pages (2 or 4MiB) when available; for some workloads this can | |
| significantly improve performance. However, large OS pages cannot be purged or shared with other processes so may lead | |
| to increased memory usage in some cases. | |
| Use `MIMALLOC_VERBOSE` to check if the large OS pages are enabled -- usually one needs | |
| to explicitly give permissions for large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes | |
| the OS is very slow to reserve contiguous physical memory for large OS pages so use with care on systems that | |
| can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead whenever possible). | |
| - `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where `N` is the number of 1GiB _huge_ OS pages. This reserves the huge pages at | |
| startup and sometimes this can give a large (latency) performance improvement on big workloads. | |
| Usually it is better to not use `MIMALLOC_ALLOW_LARGE_OS_PAGES=1` in combination with this setting. Just like large | |
| OS pages, use with care as reserving | |
| contiguous physical memory can take a long time when memory is fragmented (but reserving the huge pages is done at | |
| startup only once). | |
| Note that we usually need to explicitly give permission for huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])). | |
| The huge pages are usually allocated evenly among NUMA nodes. | |
| We can use `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N` where `N` is the numa node (starting at 0) to allocate all | |
| the huge pages at a specific numa node instead. | |
| Use caution when using `fork` in combination with either large or huge OS pages: on a fork, the OS uses copy-on-write | |
| for all pages in the original process including the huge OS pages. When any memory is now written in that area, the | |
| OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the memory usage to grow in large increments. | |
| [linux-huge]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-large_memory_optimization_big_pages_and_huge_pages-configuring_huge_pages_in_red_hat_enterprise_linux_4_or_5 | |
| [windows-huge]: https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/enable-the-lock-pages-in-memory-option-windows?view=sql-server-2017 | |
| ## Secure Mode | |
| _mimalloc_ can be build in secure mode by using the `-DMI_SECURE=ON` flags in `cmake`. This build enables various mitigations | |
| to make mimalloc more robust against exploits. In particular: | |
| - All internal mimalloc page meta-data is surrounded by guard pages (so a buffer overflow exploit cannot reach into the metadata). | |
| - All free list pointers are | |
| [encoded](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af43a0793910a9f2d01ac7863/include/mimalloc-internal.h#L396) | |
| with per-page keys which is used both to prevent overwrites with a known pointer, as well as to detect heap corruption. | |
| - Double free's are detected (and ignored). | |
| - The free lists are initialized in a random order and allocation randomly chooses between extension and reuse within a page to | |
| mitigate against attacks that rely on a predicable allocation order. Similarly, the larger heap blocks allocated by mimalloc | |
| from the OS are also address randomized. | |
| - If enabling `-DMI_SECURE_FULL=ON` there will also be guard pages at the end of each (64KiB) mimalloc page (thus interleaving | |
| valid block data with inaccessible gaps). This setting is not recommended in general as it is more expensive and can lead to | |
| reaching the maximum VMA limit on Linux systems if the heap gets too large. | |
| As always, evaluate with care as part of an overall security strategy as all of the above are mitigations but not guarantees. | |
| ## Debug Mode | |
| When _mimalloc_ is built using debug mode, (`-DCMAKE_BUILD_TYPE=Debug`), | |
| various checks are done at runtime to catch development errors. | |
| - Statistics are maintained in detail for each object size. They can be shown using `MIMALLOC_SHOW_STATS=1` at runtime. | |
| - All objects have padding at the end to detect (byte precise) heap block overflows. | |
| - Double free's, and freeing invalid heap pointers are detected. | |
| - Corrupted free-lists and some forms of use-after-free are detected. | |
| ## Guarded Mode | |
| <span id="guarded">_mimalloc_ can be build in guarded mode using the `-DMI_GUARDED=ON` flags in `cmake`.</span> | |
| This is `ON` by default when building a debug version of mimalloc. | |
| Guarded mode enables placing OS guard pages behind certain object allocations to catch buffer overflows as they occur. | |
| This can be invaluable to catch buffer-overflow bugs in large programs. However, it also means that any object | |
| allocated with a guard page takes at least 8 KiB memory for the guard page and its alignment. As such, allocating | |
| a guard page for every allocation may be too expensive both in terms of memory, and in terms of performance with | |
| many system calls. Therefore, there are various environment variables (and options) to tune this: | |
| - `MIMALLOC_GUARDED_SAMPLE_RATE=N`: Set the sample rate to `N` (by default 0). This mode places a guard page | |
| behind every `N` suitable object allocations (per thread). Since the performance in guarded mode without placing | |
| guard pages is close to release mode, this can be used to enable guard pages even in production to catch latent | |
| buffer overflow bugs. Set the sample rate to `1` to guard every object, and to `0` to place no guard pages at all. | |
| - `MIMALLOC_GUARDED_SAMPLE_SEED=N`: Start sampling at `N` (by default random). Can be used to reproduce a buffer | |
| overflow if needed. | |
| - `MIMALLOC_GUARDED_MIN=N`, `MIMALLOC_GUARDED_MAX=N`: Minimal and maximal _rounded_ object sizes for which a guard | |
| page is considered (`0` and `1GiB` respectively). If you suspect a buffer overflow occurs with an object of size | |
| 141, set the minimum and maximum to `148` and the sample rate to `1` to have all of those guarded. | |
| - `MIMALLOC_GUARDED_PRECISE=1`: If we have an object of size 13, we would usually place it an aligned 16 bytes in | |
| front of the guard page. Using `MIMALLOC_GUARDED_PRECISE` places it exactly 13 bytes before a page so that even | |
| a 1 byte overflow is detected. This violates the C/C++ minimal alignment guarantees though so use with care. | |
| # Overriding Standard Malloc | |
| Overriding the standard `malloc` (and `new`) can be done either _dynamically_ or _statically_. | |
| ## Dynamic override | |
| This is the recommended way to override the standard malloc interface. | |
| ### Dynamic Override on Linux, BSD | |
| On these ELF-based systems we preload the mimalloc shared | |
| library so all calls to the standard `malloc` interface are | |
| resolved to the _mimalloc_ library. | |
| ``` | |
| > env LD_PRELOAD=/usr/lib/libmimalloc.so myprogram | |
| ``` | |
| You can set extra environment variables to check that mimalloc is running, | |
| like: | |
| ``` | |
| > env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram | |
| ``` | |
| or run with the debug version to get detailed statistics: | |
| ``` | |
| > env MIMALLOC_SHOW_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram | |
| ``` | |
| ### Dynamic Override on MacOS | |
| On macOS we can also preload the mimalloc shared | |
| library so all calls to the standard `malloc` interface are | |
| resolved to the _mimalloc_ library. | |
| ``` | |
| > env DYLD_INSERT_LIBRARIES=/usr/lib/libmimalloc.dylib myprogram | |
| ``` | |
| Note that certain security restrictions may apply when doing this from | |
| the [shell](https://stackoverflow.com/questions/43941322/dyld-insert-libraries-ignored-when-calling-application-through-bash). | |
| ### Dynamic Override on Windows | |
| <span id="override_on_windows">We use a separate redirection DLL to override mimalloc on Windows</span> | |
| such that we redirect all malloc/free calls that go through the (dynamic) C runtime allocator, | |
| including those from other DLL's or libraries. As it intercepts all allocation calls on a low level, | |
| it can be used on large programs that include other 3rd party components. | |
| There are four requirements to make the overriding work well: | |
| 1. Use the C-runtime library as a DLL (using the `/MD` or `/MDd` switch). | |
| 2. Link your program explicitly with the `mimalloc.dll.lib` export library for the `mimalloc.dll`. | |
| (which must be compiled with `-DMI_OVERRIDE=ON`, which is the default though). | |
| To ensure the `mimalloc.dll` is actually loaded at run-time it is easiest | |
| to insert some call to the mimalloc API in the `main` function, like `mi_version()` | |
| (or use the `/include:mi_version` switch on the linker command, or | |
| similarly, `#pragma comment(linker, "/include:mi_version")` in some source file). | |
| See the `mimalloc-test-override` project for an example on how to use this. | |
| 3. The `mimalloc-redirect.dll` must be put in the same directory as the main | |
| `mimalloc.dll` at runtime (as it is a dependency of that DLL). | |
| The redirection DLL ensures that all calls to the C runtime malloc API get | |
| redirected to mimalloc functions (which reside in `mimalloc.dll`). | |
| 4. Ensure the `mimalloc.dll` comes as early as possible in the import | |
| list of the final executable (so it can intercept all potential allocations). | |
| You can use `minject -l <exe>` to check this if needed. | |
| For best performance on Windows with C++, it is also recommended to also override | |
| the `new`/`delete` operations (by including [`mimalloc-new-delete.h`](include/mimalloc-new-delete.h) | |
| a single(!) source file in your project). | |
| The environment variable `MIMALLOC_DISABLE_REDIRECT=1` can be used to disable dynamic | |
| overriding at run-time. Use `MIMALLOC_VERBOSE=1` to check if mimalloc was successfully | |
| redirected. | |
| For different platforms than x64, you may need a specific [redirection dll](bin). | |
| Furthermore, we cannot always re-link an executable or ensure `mimalloc.dll` comes | |
| first in the import table. In such cases the [`minject`](bin) tool can be used | |
| to patch the executable's import tables. | |
| ## Static override | |
| On Unix-like systems, you can also statically link with _mimalloc_ to override the standard | |
| malloc interface. The recommended way is to link the final program with the | |
| _mimalloc_ single object file (`mimalloc.o`). We use | |
| an object file instead of a library file as linkers give preference to | |
| that over archives to resolve symbols. To ensure that the standard | |
| malloc interface resolves to the _mimalloc_ library, link it as the first | |
| object file. For example: | |
| ``` | |
| > gcc -o myprogram mimalloc.o myfile1.c ... | |
| ``` | |
| Another way to override statically that works on all platforms, is to | |
| link statically to mimalloc (as shown in the introduction) and include a | |
| header file in each source file that re-defines `malloc` etc. to `mi_malloc`. | |
| This is provided by [`mimalloc-override.h`](include/mimalloc-override.h). This only works | |
| reliably though if all sources are | |
| under your control or otherwise mixing of pointers from different heaps may occur! | |
| Note: recently we also enabled static overloading on Windows. In that case you need | |
| to link with the static CRT _release_ runtime (`/MT`) and link with the static | |
| `mimalloc(-debug).obj` (to take precendence over the definitions in the CRT library). | |
| # Tools | |
| Generally, we recommend using the standard allocator with memory tracking tools, but mimalloc | |
| can also be build to support the [address sanitizer][asan] or the excellent [Valgrind] tool. | |
| Moreover, it can be build to support Windows event tracing ([ETW]). | |
| This has a small performance overhead but does allow detecting memory leaks and byte-precise | |
| buffer overflows directly on final executables. See also the `test/test-wrong.c` file to test with various tools. | |
| ## Valgrind | |
| To build with [valgrind] support, use the `MI_TRACK_VALGRIND=ON` cmake option: | |
| ``` | |
| > cmake ../.. -DMI_TRACK_VALGRIND=ON | |
| ``` | |
| This can also be combined with secure mode or debug mode. | |
| You can then run your programs directly under valgrind: | |
| ``` | |
| > valgrind <myprogram> | |
| ``` | |
| If you rely on overriding `malloc`/`free` by mimalloc (instead of using the `mi_malloc`/`mi_free` API directly), | |
| you also need to tell `valgrind` to not intercept those calls itself, and use: | |
| ``` | |
| > MIMALLOC_SHOW_STATS=1 valgrind --soname-synonyms=somalloc=*mimalloc* -- <myprogram> | |
| ``` | |
| By setting the `MIMALLOC_SHOW_STATS` environment variable you can check that mimalloc is indeed | |
| used and not the standard allocator. Even though the [Valgrind option][valgrind-soname] | |
| is called `--soname-synonyms`, this also works when overriding with a static library or object file. | |
| To dynamically override mimalloc using `LD_PRELOAD` together with `valgrind`, use: | |
| ``` | |
| > valgrind --trace-children=yes --soname-synonyms=somalloc=*mimalloc* /usr/bin/env LD_PRELOAD=/usr/lib/libmimalloc.so -- <myprogram> | |
| ``` | |
| See also the `test/test-wrong.c` file to test with `valgrind`. | |
| Valgrind support is in its initial development -- please report any issues. | |
| [Valgrind]: https://valgrind.org/ | |
| [valgrind-soname]: https://valgrind.org/docs/manual/manual-core.html#opt.soname-synonyms | |
| ## ASAN | |
| To build with the address sanitizer, use the `-DMI_TRACK_ASAN=ON` cmake option: | |
| ``` | |
| > cmake ../.. -DMI_TRACK_ASAN=ON | |
| ``` | |
| This can also be combined with secure mode or debug mode. | |
| You can then run your programs as:' | |
| ``` | |
| > ASAN_OPTIONS=verbosity=1 <myprogram> | |
| ``` | |
| When you link a program with an address sanitizer build of mimalloc, you should | |
| generally compile that program too with the address sanitizer enabled. | |
| For example, assuming you build mimalloc in `out/debug`: | |
| ``` | |
| clang -g -o test-wrong -Iinclude test/test-wrong.c out/debug/libmimalloc-asan-debug.a -lpthread -fsanitize=address -fsanitize-recover=address | |
| ``` | |
| Since the address sanitizer redirects the standard allocation functions, on some platforms (macOSX for example) | |
| it is required to compile mimalloc with `-DMI_OVERRIDE=OFF`. | |
| Address sanitizer support is in its initial development -- please report any issues. | |
| [asan]: https://github.com/google/sanitizers/wiki/AddressSanitizer | |
| ## ETW | |
| Event tracing for Windows ([ETW]) provides a high performance way to capture all allocations though | |
| mimalloc and analyze them later. To build with ETW support, use the `-DMI_TRACK_ETW=ON` cmake option. | |
| You can then capture an allocation trace using the Windows performance recorder (WPR), using the | |
| `src/prim/windows/etw-mimalloc.wprp` profile. In an admin prompt, you can use: | |
| ``` | |
| > wpr -start src\prim\windows\etw-mimalloc.wprp -filemode | |
| > <my_mimalloc_program> | |
| > wpr -stop <my_mimalloc_program>.etl | |
| ``` | |
| and then open `<my_mimalloc_program>.etl` in the Windows Performance Analyzer (WPA), or | |
| use a tool like [TraceControl] that is specialized for analyzing mimalloc traces. | |
| [ETW]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/event-tracing-for-windows | |
| [TraceControl]: https://github.com/xinglonghe/TraceControl | |
| # Performance | |
| Last update: 2021-01-30 | |
| We tested _mimalloc_ against many other top allocators over a wide | |
| range of benchmarks, ranging from various real world programs to | |
| synthetic benchmarks that see how the allocator behaves under more | |
| extreme circumstances. In our benchmark suite, _mimalloc_ outperforms other leading | |
| allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc), and has a similar memory footprint. A nice property is that it | |
| does consistently well over the wide range of benchmarks. | |
| General memory allocators are interesting as there exists no algorithm that is | |
| optimal -- for a given allocator one can usually construct a workload | |
| where it does not do so well. The goal is thus to find an allocation | |
| strategy that performs well over a wide range of benchmarks without | |
| suffering from (too much) underperformance in less common situations. | |
| As always, interpret these results with care since some benchmarks test synthetic | |
| or uncommon situations that may never apply to your workloads. For example, most | |
| allocators do not do well on `xmalloc-testN` but that includes even the best | |
| industrial allocators like _jemalloc_ and _tcmalloc_ that are used in some of | |
| the world's largest systems (like Chrome or FreeBSD). | |
| Also, the benchmarks here do not measure the behaviour on very large and long-running server workloads, | |
| or worst-case latencies of allocation. Much work has gone into `mimalloc` to work well on such | |
| workloads (for example, to reduce virtual memory fragmentation on long-running services) | |
| but such optimizations are not always reflected in the current benchmark suite. | |
| We show here only an overview -- for | |
| more specific details and further benchmarks we refer to the | |
| [technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action). | |
| The benchmark suite is automated and available separately | |
| as [mimalloc-bench](https://github.com/daanx/mimalloc-bench). | |
| ## Benchmark Results on a 16-core AMD 5950x (Zen3) | |
| Testing on the 16-core AMD 5950x processor at 3.4Ghz (4.9Ghz boost), with | |
| 32GiB memory at 3600Mhz, running Ubuntu 20.04 with glibc 2.31 and GCC 9.3.0. | |
| We measure three versions of _mimalloc_: the main version `mi` (tag:v1.7.0), | |
| the new v2.0 beta version as `xmi` (tag:v2.0.0), and the main version in secure mode as `smi` (tag:v1.7.0). | |
| The other allocators are | |
| Google's [_tcmalloc_](https://github.com/gperftools/gperftools) (`tc`, tag:gperftools-2.8.1) used in Chrome, | |
| Facebook's [_jemalloc_](https://github.com/jemalloc/jemalloc) (`je`, tag:5.2.1) by Jason Evans used in Firefox and FreeBSD, | |
| the Intel thread building blocks [allocator](https://github.com/intel/tbb) (`tbb`, tag:v2020.3), | |
| [rpmalloc](https://github.com/mjansson/rpmalloc) (`rp`,tag:1.4.1) by Mattias Jansson, | |
| the original scalable [_Hoard_](https://github.com/emeryberger/Hoard) (git:d880f72) allocator by Emery Berger \[1], | |
| the memory compacting [_Mesh_](https://github.com/plasma-umass/Mesh) (git:67ff31a) allocator by | |
| Bobby Powers _et al_ \[8], | |
| and finally the default system allocator (`glibc`, 2.31) (based on _PtMalloc2_). | |
| <img width="90%" src="doc/bench-2021/bench-amd5950x-2021-01-30-a.svg"/> | |
| <img width="90%" src="doc/bench-2021/bench-amd5950x-2021-01-30-b.svg"/> | |
| Any benchmarks ending in `N` run on all 32 logical cores in parallel. | |
| Results are averaged over 10 runs and reported relative | |
| to mimalloc (where 1.2 means it took 1.2× longer to run). | |
| The legend also contains the _overall relative score_ between the | |
| allocators where 100 points is the maximum if an allocator is fastest on | |
| all benchmarks. | |
| The single threaded _cfrac_ benchmark by Dave Barrett is an implementation of | |
| continued fraction factorization which uses many small short-lived allocations. | |
| All allocators do well on such common usage, where _mimalloc_ is just a tad | |
| faster than _tcmalloc_ and | |
| _jemalloc_. | |
| The _leanN_ program is interesting as a large realistic and | |
| concurrent workload of the [Lean](https://github.com/leanprover/lean) | |
| theorem prover compiling its own standard library, and there is a 13% | |
| speedup over _tcmalloc_. This is | |
| quite significant: if Lean spends 20% of its time in the | |
| allocator that means that _mimalloc_ is 1.6× faster than _tcmalloc_ | |
| here. (This is surprising as that is not measured in a pure | |
| allocation benchmark like _alloc-test_. We conjecture that we see this | |
| outsized improvement here because _mimalloc_ has better locality in | |
| the allocation which improves performance for the *other* computations | |
| in a program as well). | |
| The single threaded _redis_ benchmark again show that most allocators do well on such workloads. | |
| The _larsonN_ server benchmark by Larson and Krishnan \[2] allocates and frees between threads. They observed this | |
| behavior (which they call _bleeding_) in actual server applications, and the benchmark simulates this. | |
| Here, _mimalloc_ is quite a bit faster than _tcmalloc_ and _jemalloc_ probably due to the object migration between different threads. | |
| The _mstressN_ workload performs many allocations and re-allocations, | |
| and migrates objects between threads (as in _larsonN_). However, it also | |
| creates and destroys the _N_ worker threads a few times keeping some objects | |
| alive beyond the life time of the allocating thread. We observed this | |
| behavior in many larger server applications. | |
| The [_rptestN_](https://github.com/mjansson/rpmalloc-benchmark) benchmark | |
| by Mattias Jansson is a allocator test originally designed | |
| for _rpmalloc_, and tries to simulate realistic allocation patterns over | |
| multiple threads. Here the differences between allocators become more apparent. | |
| The second benchmark set tests specific aspects of the allocators and | |
| shows even more extreme differences between them. | |
| The _alloc-test_, by | |
| [OLogN Technologies AG](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/), is a very allocation intensive benchmark doing millions of | |
| allocations in various size classes. The test is scaled such that when an | |
| allocator performs almost identically on _alloc-test1_ as _alloc-testN_ it | |
| means that it scales linearly. | |
| The _sh6bench_ and _sh8bench_ benchmarks are | |
| developed by [MicroQuill](http://www.microquill.com/) as part of SmartHeap. | |
| In _sh6bench_ _mimalloc_ does much | |
| better than the others (more than 2.5× faster than _jemalloc_). | |
| We cannot explain this well but believe it is | |
| caused in part by the "reverse" free-ing pattern in _sh6bench_. | |
| The _sh8bench_ is a variation with object migration | |
| between threads; whereas _tcmalloc_ did well on _sh6bench_, the addition of object migration causes it to be 10× slower than before. | |
| The _xmalloc-testN_ benchmark by Lever and Boreham \[5] and Christian Eder, simulates an asymmetric workload where | |
| some threads only allocate, and others only free -- they observed this pattern in | |
| larger server applications. Here we see that | |
| the _mimalloc_ technique of having non-contended sharded thread free | |
| lists pays off as it outperforms others by a very large margin. Only _rpmalloc_, _tbb_, and _glibc_ also scale well on this benchmark. | |
| The _cache-scratch_ benchmark by Emery Berger \[1], and introduced with | |
| the Hoard allocator to test for _passive-false_ sharing of cache lines. | |
| With a single thread they all | |
| perform the same, but when running with multiple threads the potential allocator | |
| induced false sharing of the cache lines can cause large run-time differences. | |
| Crundal \[6] describes in detail why the false cache line sharing occurs in the _tcmalloc_ design, and also discusses how this | |
| can be avoided with some small implementation changes. | |
| Only the _tbb_, _rpmalloc_ and _mesh_ allocators also avoid the | |
| cache line sharing completely, while _Hoard_ and _glibc_ seem to mitigate | |
| the effects. Kukanov and Voss \[7] describe in detail | |
| how the design of _tbb_ avoids the false cache line sharing. | |
| ## On a 36-core Intel Xeon | |
| For completeness, here are the results on a big Amazon | |
| [c5.18xlarge](https://aws.amazon.com/ec2/instance-types/#Compute_Optimized) instance | |
| consisting of a 2×18-core Intel Xeon (Cascade Lake) at 3.4GHz (boost 3.5GHz) | |
| with 144GiB ECC memory, running Ubuntu 20.04 with glibc 2.31, GCC 9.3.0, and | |
| Clang 10.0.0. This time, the mimalloc allocators (mi, xmi, and smi) were | |
| compiled with the Clang compiler instead of GCC. | |
| The results are similar to the AMD results but it is interesting to | |
| see the differences in the _larsonN_, _mstressN_, and _xmalloc-testN_ benchmarks. | |
| <img width="90%" src="doc/bench-2021/bench-c5-18xlarge-2021-01-30-a.svg"/> | |
| <img width="90%" src="doc/bench-2021/bench-c5-18xlarge-2021-01-30-b.svg"/> | |
| ## Peak Working Set | |
| The following figure shows the peak working set (rss) of the allocators | |
| on the benchmarks (on the c5.18xlarge instance). | |
| <img width="90%" src="doc/bench-2021/bench-c5-18xlarge-2021-01-30-rss-a.svg"/> | |
| <img width="90%" src="doc/bench-2021/bench-c5-18xlarge-2021-01-30-rss-b.svg"/> | |
| Note that the _xmalloc-testN_ memory usage should be disregarded as it | |
| allocates more the faster the program runs. Similarly, memory usage of | |
| _larsonN_, _mstressN_, _rptestN_ and _sh8bench_ can vary depending on scheduling and | |
| speed. Nevertheless, we hope to improve the memory usage on _mstressN_ | |
| and _rptestN_ (just as _cfrac_, _larsonN_ and _sh8bench_ have a small working set which skews the results). | |
| <!-- | |
| # Previous Benchmarks | |
| Todo: should we create a separate page for this? | |
| ## Benchmark Results on 36-core Intel: 2020-01-20 | |
| Testing on a big Amazon EC2 compute instance | |
| ([c5.18xlarge](https://aws.amazon.com/ec2/instance-types/#Compute_Optimized)) | |
| consisting of a 72 processor Intel Xeon at 3GHz | |
| with 144GiB ECC memory, running Ubuntu 18.04.1 with glibc 2.27 and GCC 7.4.0. | |
| The measured allocators are _mimalloc_ (xmi, tag:v1.4.0, page reset enabled) | |
| and its secure build as _smi_, | |
| Google's [_tcmalloc_](https://github.com/gperftools/gperftools) (tc, tag:gperftools-2.7) used in Chrome, | |
| Facebook's [_jemalloc_](https://github.com/jemalloc/jemalloc) (je, tag:5.2.1) by Jason Evans used in Firefox and FreeBSD, | |
| the Intel thread building blocks [allocator](https://github.com/intel/tbb) (tbb, tag:2020), | |
| [rpmalloc](https://github.com/mjansson/rpmalloc) (rp,tag:1.4.0) by Mattias Jansson, | |
| the original scalable [_Hoard_](https://github.com/emeryberger/Hoard) (tag:3.13) allocator by Emery Berger \[1], | |
| the memory compacting [_Mesh_](https://github.com/plasma-umass/Mesh) (git:51222e7) allocator by | |
| Bobby Powers _et al_ \[8], | |
| and finally the default system allocator (glibc, 2.27) (based on _PtMalloc2_). | |
| <img width="90%" src="doc/bench-2020/bench-c5-18xlarge-2020-01-20-a.svg"/> | |
| <img width="90%" src="doc/bench-2020/bench-c5-18xlarge-2020-01-20-b.svg"/> | |
| The following figure shows the peak working set (rss) of the allocators | |
| on the benchmarks (on the c5.18xlarge instance). | |
| <img width="90%" src="doc/bench-2020/bench-c5-18xlarge-2020-01-20-rss-a.svg"/> | |
| <img width="90%" src="doc/bench-2020/bench-c5-18xlarge-2020-01-20-rss-b.svg"/> | |
| ## On 24-core AMD Epyc, 2020-01-16 | |
| For completeness, here are the results on a | |
| [r5a.12xlarge](https://aws.amazon.com/ec2/instance-types/#Memory_Optimized) instance | |
| having a 48 processor AMD Epyc 7000 at 2.5GHz with 384GiB of memory. | |
| The results are similar to the Intel results but it is interesting to | |
| see the differences in the _larsonN_, _mstressN_, and _xmalloc-testN_ benchmarks. | |
| <img width="90%" src="doc/bench-2020/bench-r5a-12xlarge-2020-01-16-a.svg"/> | |
| <img width="90%" src="doc/bench-2020/bench-r5a-12xlarge-2020-01-16-b.svg"/> | |
| --> | |
| # References | |
| - \[1] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. | |
| _Hoard: A Scalable Memory Allocator for Multithreaded Applications_ | |
| the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). Cambridge, MA, November 2000. | |
| [pdf](http://www.cs.utexas.edu/users/mckinley/papers/asplos-2000.pdf) | |
| - \[2] P. Larson and M. Krishnan. _Memory allocation for long-running server applications_. | |
| In ISMM, Vancouver, B.C., Canada, 1998. [pdf](http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.45.1947&rep=rep1&type=pdf) | |
| - \[3] D. Grunwald, B. Zorn, and R. Henderson. | |
| _Improving the cache locality of memory allocation_. In R. Cartwright, editor, | |
| Proceedings of the Conference on Programming Language Design and Implementation, pages 177–186, New York, NY, USA, June 1993. [pdf](http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.43.6621&rep=rep1&type=pdf) | |
| - \[4] J. Barnes and P. Hut. _A hierarchical O(n*log(n)) force-calculation algorithm_. Nature, 324:446-449, 1986. | |
| - \[5] C. Lever, and D. Boreham. _Malloc() Performance in a Multithreaded Linux Environment._ | |
| In USENIX Annual Technical Conference, Freenix Session. San Diego, CA. Jun. 2000. | |
| Available at <https://github.com/kuszmaul/SuperMalloc/tree/master/tests> | |
| - \[6] Timothy Crundal. _Reducing Active-False Sharing in TCMalloc_. 2016. CS16S1 project at the Australian National University. [pdf](http://courses.cecs.anu.edu.au/courses/CSPROJECTS/16S1/Reports/Timothy_Crundal_Report.pdf) | |
| - \[7] Alexey Kukanov, and Michael J Voss. | |
| _The Foundations for Scalable Multi-Core Software in Intel Threading Building Blocks._ | |
| Intel Technology Journal 11 (4). 2007 | |
| - \[8] Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor. | |
| _Mesh: Compacting Memory Management for C/C++_ | |
| In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'19), June 2019, pages 333-–346. | |
| <!-- | |
| - \[9] Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J Parkinson, | |
| Alex Shamis, Christoph M Wintersteiger, and David Chisnall. | |
| _Snmalloc: A Message Passing Allocator._ | |
| In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, 122–135. ACM. 2019. | |
| --> | |
| # Contributing | |
| This project welcomes contributions and suggestions. Most contributions require you to agree to a | |
| Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us | |
| the rights to use your contribution. For details, visit https://cla.microsoft.com. | |
| When you submit a pull request, a CLA-bot will automatically determine whether you need to provide | |
| a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions | |
| provided by the bot. You will only need to do this once across all repos using our CLA. | |
| # Older Release Notes | |
| * 2024-05-21, `v1.8.7`, `v2.1.7`: Fix build issues on less common platforms. Started upstreaming patches | |
| from the CPython [integration](https://github.com/python/cpython/issues/113141#issuecomment-2119255217). Upstream `vcpkg` patches. | |
| * 2024-05-13, `v1.8.6`, `v2.1.6`: Fix build errors on various (older) platforms. Refactored aligned allocation. | |
| * 2024-04-22, `v1.8.4`, `v2.1.4`: Fixes various bugs and build issues. Add `MI_LIBC_MUSL` cmake flag for musl builds. | |
| Free-ing code is refactored into a separate module (`free.c`). Mimalloc page info is simplified with the block size | |
| directly available (and new `block_size_shift` to improve aligned block free-ing). | |
| New approach to collection of abandoned segments: When | |
| a thread terminates the segments it owns are abandoned (containing still live objects) and these can be | |
| reclaimed by other threads. We no longer use a list of abandoned segments but this is now done using bitmaps in arena's | |
| which is more concurrent (and more aggressive). Abandoned memory can now also be reclaimed if a thread frees an object in | |
| an abandoned page (which can be disabled using `mi_option_abandoned_reclaim_on_free`). The option `mi_option_max_segment_reclaim` | |
| gives a maximum percentage of abandoned segments that can be reclaimed per try (=10%). | |
| * 2023-04-24, `v1.8.2`, `v2.1.2`: Fixes build issues on freeBSD, musl, and C17 (UE 5.1.1). Reduce code size/complexity | |
| by removing regions and segment-cache's and only use arenas with improved memory purging -- this may improve memory | |
| usage as well for larger services. Renamed options for consistency. Improved Valgrind and ASAN checking. | |
| * 2023-04-03, `v1.8.1`, `v2.1.1`: Fixes build issues on some platforms. | |
| * 2023-03-29, `v1.8.0`, `v2.1.0`: Improved support dynamic overriding on Windows 11. Improved tracing precision | |
| with [asan](#asan) and [Valgrind](#valgrind), and added Windows event tracing [ETW](#ETW) (contributed by Xinglong He). Created an OS | |
| abstraction layer to make it easier to port and separate platform dependent code (in `src/prim`). Fixed C++ STL compilation on older Microsoft C++ compilers, and various small bug fixes. | |
| * 2022-12-23, `v1.7.9`, `v2.0.9`: Supports building with [asan](#asan) and improved [Valgrind](#valgrind) support. | |
| Support arbitrary large alignments (in particular for `std::pmr` pools). | |
| Added C++ STL allocators attached to a specific heap (thanks @vmarkovtsev). | |
| Heap walks now visit all object (including huge objects). Support Windows nano server containers (by Johannes Schindelin,@dscho). | |
| Various small bug fixes. | |
| * 2022-11-03, `v1.7.7`, `v2.0.7`: Initial support for [Valgrind](#valgrind) for leak testing and heap block overflow | |
| detection. Initial | |
| support for attaching heaps to a specific memory area (only in v2). Fix `realloc` behavior for zero size blocks, remove restriction to integral multiple of the alignment in `alloc_align`, improved aligned allocation performance, reduced contention with many threads on few processors (thank you @dposluns!), vs2022 support, support `pkg-config`, . | |
| * 2022-04-14, `v1.7.6`, `v2.0.6`: fix fallback path for aligned OS allocation on Windows, improve Windows aligned allocation | |
| even when compiling with older SDK's, fix dynamic overriding on macOS Monterey, fix MSVC C++ dynamic overriding, fix | |
| warnings under Clang 14, improve performance if many OS threads are created and destroyed, fix statistics for large object | |
| allocations, using MIMALLOC_VERBOSE=1 has no maximum on the number of error messages, various small fixes. | |
| * 2022-02-14, `v1.7.5`, `v2.0.5` (alpha): fix malloc override on | |
| Windows 11, fix compilation with musl, potentially reduced | |
| committed memory, add `bin/minject` for Windows, | |
| improved wasm support, faster aligned allocation, | |
| various small fixes. | |
| * 2021-11-14, `v1.7.3`, `v2.0.3` (beta): improved WASM support, improved macOS support and performance (including | |
| M1), improved performance for v2 for large objects, Python integration improvements, more standard | |
| installation directories, various small fixes. | |
| * 2021-06-17, `v1.7.2`, `v2.0.2` (beta): support M1, better installation layout on Linux, fix | |
| thread_id on Android, prefer 2-6TiB area for aligned allocation to work better on pre-windows 8, various small fixes. | |
| * 2021-04-06, `v1.7.1`, `v2.0.1` (beta): fix bug in arena allocation for huge pages, improved aslr on large allocations, initial M1 support (still experimental). | |
| * 2021-01-31, `v2.0.0`: beta release 2.0: new slice algorithm for managing internal mimalloc pages. | |
| * 2021-01-31, `v1.7.0`: stable release 1.7: support explicit user provided memory regions, more precise statistics, | |
| improve macOS overriding, initial support for Apple M1, improved DragonFly support, faster memcpy on Windows, various small fixes. | |
| * 2020-09-24, `v1.6.7`: stable release 1.6: using standard C atomics, passing tsan testing, improved | |
| handling of failing to commit on Windows, add [`mi_process_info`](https://github.com/microsoft/mimalloc/blob/master/include/mimalloc.h#L156) api call. | |
| * 2020-08-06, `v1.6.4`: stable release 1.6: improved error recovery in low-memory situations, | |
| support for IllumOS and Haiku, NUMA support for Vista/XP, improved NUMA detection for AMD Ryzen, ubsan support. | |
| * 2020-05-05, `v1.6.3`: stable release 1.6: improved behavior in out-of-memory situations, improved malloc zones on macOS, | |
| build PIC static libraries by default, add option to abort on out-of-memory, line buffered statistics. | |
| * 2020-04-20, `v1.6.2`: stable release 1.6: fix compilation on Android, MingW, Raspberry, and Conda, | |
| stability fix for Windows 7, fix multiple mimalloc instances in one executable, fix `strnlen` overload, | |
| fix aligned debug padding. | |
| * 2020-02-17, `v1.6.1`: stable release 1.6: minor updates (build with clang-cl, fix alignment issue for small objects). | |
| * 2020-02-09, `v1.6.0`: stable release 1.6: fixed potential memory leak, improved overriding | |
| and thread local support on FreeBSD, NetBSD, DragonFly, and macOSX. New byte-precise | |
| heap block overflow detection in debug mode (besides the double-free detection and free-list | |
| corruption detection). Add `nodiscard` attribute to most allocation functions. | |
| Enable `MIMALLOC_PAGE_RESET` by default. New reclamation strategy for abandoned heap pages | |
| for better memory footprint. | |
| * 2020-02-09, `v1.5.0`: stable release 1.5: improved free performance, small bug fixes. | |
| * 2020-01-22, `v1.4.0`: stable release 1.4: improved performance for delayed OS page reset, | |
| more eager concurrent free, addition of STL allocator, fixed potential memory leak. | |
| * 2020-01-15, `v1.3.0`: stable release 1.3: bug fixes, improved randomness and [stronger | |
| free list encoding](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af43a0793910a9f2d01ac7863/include/mimalloc-internal.h#L396) in secure mode. | |
| * 2019-12-22, `v1.2.2`: stable release 1.2: minor updates. | |
| * 2019-11-22, `v1.2.0`: stable release 1.2: bug fixes, improved secure mode (free list corruption checks, double free mitigation). Improved dynamic overriding on Windows. | |
| * 2019-10-07, `v1.1.0`: stable release 1.1. | |
| * 2019-09-01, `v1.0.8`: pre-release 8: more robust windows dynamic overriding, initial huge page support. | |
| * 2019-08-10, `v1.0.6`: pre-release 6: various performance improvements. | |
Xet Storage Details
- Size:
- 57.5 kB
- Xet hash:
- 3b2df1f78d1e3f7179156b58525088e72daca7e5e641055824c1c6b84e61dbf5
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.