Skip to content

Add option to memory map .ORT model loads#28164

Draft
Kevin-Taha wants to merge 4 commits intomainfrom
user/kevintaha/memMapOrt
Draft

Add option to memory map .ORT model loads#28164
Kevin-Taha wants to merge 4 commits intomainfrom
user/kevintaha/memMapOrt

Conversation

@Kevin-Taha
Copy link
Copy Markdown
Contributor

@Kevin-Taha Kevin-Taha commented Apr 21, 2026

Addressing issue #25524 (MS internal: 60577894)

Today, the closest method callers have to loading models from a shared resource is by mapping the model themselves and using use_ort_model_bytes_directly - this puts the responsibility on the caller to ensure the validity of the mapping as well. These changes introduce use_memory_mapped_ort_model, a session option for using memory-mapped I/O to load ORT format models directly inside OnnxRuntime. The mapping in this case is owned by the InferenceSession. The changes to implement this are simple and minimal and use ORT's existing platform-agnostic memory mapping helpers, and if we choose to make this the default behavior could mean automatic memory savings for multi-process usage.

Note about memory implications & sharing model bytes:

The reality of this change is that using use_memory_mapped_ort_model alone doesn't have a long-running memory usage advantage because ORT will ultimately copy the model bytes from the mapped pages into Tensors. Using it in coordination with session.use_ort_model_bytes_for_initializers ensures that that initializers point directly to the flatbuffer bytes and avoids the extra copy. This would be the expected usage for multi-process sharing of a single model. This introduces questions around what the default behavior should be - the changes I made in this PR are conservative and retain all existing defaults at this time.

Changes

  • onnxruntime_session_options_config_keys.h — New session.use_memory_mapped_ort_model config key
  • inference_session.h — Added Env::MappedMemoryPtr member to hold the file mapping; updated existing comments to document the mmap path
  • inference_session.cc — New LoadOrtModelBytesMapped() static function; updated LoadOrtModel(PathString) to check config and use mmap; updated Initialize() cleanup to release the mapping; updated comment
    on initializer gating to note mmap case
  • ort_model_only_test.cc — Two new tests: LoadOrtFormatModelMemoryMapped and LoadOrtFormatModelMemoryMappedWithInitializersFromMap
  • Also checking in a benchmarking tool, benchmark_mmap_ort.py, just for preservation, but this is optional and can be omitted.

Benchmark Examples

Note that the benchmark is largely written by GHCP and may not be perfect, but I've validated some of its results.
Single-Proc
Here is a sample result from a single-process benchmark using resnet50 (converted to ORT format). Note that these measure peaks during construction and not end-states, and the measurements may be imperfect.
python tools/python/benchmark_mmap_ort.py --perf-test build\Windows\Release\Release\onnxruntime_perf_test.exe --model resnet50.ort --iterations 15

Configuration Session Creation (ms) Peak Private Commit (MB) Peak Working Set (MB) Session vs baseline Private vs baseline
.ort standard load (baseline) 193.13 222.9 235.9
.ort memory-mapped load 120.95 125.7 236.1 -37.4% -43.6%
.ort mmap + direct initializers 14.87 109.6 120.6 -92.3% -50.8%

Multi-Proc

The multi-proc benchmark shows that total memory bandwidth gains for shared models can only be obtained alongside use_ort_model_bytes_for_initializers_

Configuration (4 processes) Total Private (MB) Total Working Set (MB) Private vs baseline
.ort standard load (baseline) 462.6 519.0
.ort memory-mapped load 462.1 518.5 -0.1%
.ort mmap + direct initializers 98.2 187.8 -78.8%

Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Outdated
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Comment thread tools/python/benchmark_mmap_ort.py Fixed
Kevin Taha and others added 2 commits April 22, 2026 12:23
…ap benchmarking

- Clean up benchmark_mmap_ort.py: remove unused code, simplify multi-process
  approach to use native perf_test processes instead of Python wrappers
- Add --hold_ms_after_session_creation flag to onnxruntime_perf_test to keep
  sessions alive for multi-process memory measurement
- Print SESSION_READY marker when holding so benchmark script knows when to sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants