Add option to memory map .ORT model loads by Kevin-Taha · Pull Request #28164 · microsoft/onnxruntime

Kevin-Taha · 2026-04-21T22:00:35Z

Addressing issue #25524 (MS internal: 60577894)

Today, the closest method callers have to loading models from a shared resource is by mapping the model themselves and using use_ort_model_bytes_directly - this puts the responsibility on the caller to ensure the validity of the mapping as well. These changes introduce use_memory_mapped_ort_model, a session option for using memory-mapped I/O to load ORT format models directly inside OnnxRuntime. The mapping in this case is owned by the InferenceSession. The changes to implement this are simple and minimal and use ORT's existing platform-agnostic memory mapping helpers, and if we choose to make this the default behavior could mean automatic memory savings for multi-process usage.

Note about memory implications & sharing model bytes:

The reality of this change is that using use_memory_mapped_ort_model alone doesn't have a long-running memory usage advantage because ORT will ultimately copy the model bytes from the mapped pages into Tensors. Using it in coordination with session.use_ort_model_bytes_for_initializers ensures that that initializers point directly to the flatbuffer bytes and avoids the extra copy. This would be the expected usage for multi-process sharing of a single model. This introduces questions around what the default behavior should be - the changes I made in this PR are conservative and retain all existing defaults at this time.

Changes

onnxruntime_session_options_config_keys.h — New session.use_memory_mapped_ort_model config key
inference_session.h — Added Env::MappedMemoryPtr member to hold the file mapping; updated existing comments to document the mmap path
inference_session.cc — New LoadOrtModelBytesMapped() static function; updated LoadOrtModel(PathString) to check config and use mmap; updated Initialize() cleanup to release the mapping; updated comment
on initializer gating to note mmap case
ort_model_only_test.cc — Two new tests: LoadOrtFormatModelMemoryMapped and LoadOrtFormatModelMemoryMappedWithInitializersFromMap
Also checking in a benchmarking tool, benchmark_mmap_ort.py, just for preservation, but this is optional and can be omitted.

Benchmark Examples

Note that the benchmark is largely written by GHCP and may not be perfect, but I've validated some of its results.
Single-Proc
Here is a sample result from a single-process benchmark using resnet50 (converted to ORT format). Note that these measure peaks during construction and not end-states, and the measurements may be imperfect.
python tools/python/benchmark_mmap_ort.py --perf-test build\Windows\Release\Release\onnxruntime_perf_test.exe --model resnet50.ort --iterations 15

Configuration	Session Creation (ms)	Peak Private Commit (MB)	Peak Working Set (MB)	Session vs baseline	Private vs baseline
.ort standard load (baseline)	193.13	222.9	235.9	—	—
.ort memory-mapped load	120.95	125.7	236.1	-37.4%	-43.6%
.ort mmap + direct initializers	14.87	109.6	120.6	-92.3%	-50.8%

Multi-Proc

The multi-proc benchmark shows that total memory bandwidth gains for shared models can only be obtained alongside use_ort_model_bytes_for_initializers_

Configuration (4 processes)	Total Private (MB)	Total Working Set (MB)	Private vs baseline
.ort standard load (baseline)	462.6	519.0	—
.ort memory-mapped load	462.1	518.5	-0.1%
.ort mmap + direct initializers	98.2	187.8	-78.8%

github-actions

You can commit the suggested changes from lintrunner.

…ap benchmarking - Clean up benchmark_mmap_ort.py: remove unused code, simplify multi-process approach to use native perf_test processes instead of Python wrappers - Add --hold_ms_after_session_creation flag to onnxruntime_perf_test to keep sessions alive for multi-process memory measurement - Print SESSION_READY marker when holding so benchmark script knows when to sample Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

init

18f28b5

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

Comment thread tools/python/benchmark_mmap_ort.py Fixed

Comment thread tools/python/benchmark_mmap_ort.py Fixed

Comment thread tools/python/benchmark_mmap_ort.py Fixed

github-actions Bot reviewed Apr 21, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

static analysis

a361e31

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

Comment thread tools/python/benchmark_mmap_ort.py Fixed

Comment thread tools/python/benchmark_mmap_ort.py Fixed

Kevin Taha and others added 2 commits April 22, 2026 12:23

static analysis

b60b179

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to memory map .ORT model loads#28164

Add option to memory map .ORT model loads#28164
Kevin-Taha wants to merge 4 commits intomainfrom
user/kevintaha/memMapOrt

Kevin-Taha commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kevin-Taha commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note about memory implications & sharing model bytes:

Benchmark Examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kevin-Taha commented Apr 21, 2026 •

edited

Loading