Skip to content

Commit f980b7e

Browse files
committed
add introduction and glossary to html docs
1 parent 89e431b commit f980b7e

File tree

5 files changed

+201
-1
lines changed

5 files changed

+201
-1
lines changed
22.4 KB
Loading

scripts/docs_config/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
==========================================
2-
Unified Memory Framework API Documentation
2+
API Documentation
33
==========================================
44

55
Globals

scripts/docs_config/glossary.rst

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
Glossary
2+
==========================================================
3+
4+
Homogeneous Memory
5+
A collection of memory composed of a single memory type, managed by a singular
6+
driver using a uniform approach.
7+
8+
Heterogeneous Memory
9+
A set of memory composed of multiple types of memory technologies, each
10+
requiring distinct handling approaches often managed by separate drivers.
11+
12+
Memory Tiering
13+
An organization and hierarchy of different types of memory storage within a
14+
system, with each type of memory having distinct characteristics, performance,
15+
and cost attributes. These memory tiers are typically organized in a
16+
hierarchy, with faster, more expensive memory located closer to the processor
17+
and slower, less expensive memory located further away.
18+
19+
Memory Access Initiator
20+
A component in a computer system that initiates or requests access to the
21+
computer's memory subsystem. This could be a CPU, GPU, or other I/O and cache
22+
devices.
23+
24+
Memory Target
25+
Any part of the memory subsystem that can handle memory access requests. This
26+
could be the OS memory (RAM), video memory that resides on the graphics
27+
cards, memory caches, storage, external memory devices connected using
28+
CXL.mem protocol, etc.
29+
30+
Memory Page
31+
A fixed-length contiguous block of virtual memory, described by a single
32+
entry in the page table. It is the smallest unit of data for memory
33+
management in a virtual memory operating system.
34+
35+
Enlightened Application
36+
An application that explicitly manages data allocation distribution among
37+
memory tiers and further data migration.
38+
39+
Unenlightened Application
40+
An application that coexists with the underlying infrastructure (OS,
41+
frameworks, libraries) that offers various memory tiering and migration
42+
solutions without any code modifications.
43+
44+
Memory Pool
45+
A memory management technique used in computer programming and software
46+
development, where fixed-size blocks of memory are preallocated using one or
47+
more memory providers and then divided into smaller, fixed-size blocks or
48+
chunks. These smaller blocks are then allocated and deallocated by a pool
49+
allocator depending on the needs of the program or application. Thanks to
50+
low fragmentation and constant allocation time, memory pools are used to
51+
optimize memory allocation and deallocation in scenarios where efficiency
52+
and performance are critical.
53+
54+
Pool Allocator
55+
A memory allocator type used to efficiently manage memory pools.
56+
57+
Memory Provider
58+
A software component responsible for supplying memory or managing memory
59+
targets. A single memory provider kind can efficiently manage the memory
60+
operations for one or multiple devices within the system or other memory
61+
sources like file-backed or user-provided memory.
62+
63+
High Bandwidth Memory (HBM)
64+
A high-speed computer memory. It is used in conjunction with high-performance
65+
graphics accelerators, network devices, and high-performance data centers, as
66+
on-package cache on-package RAM in CPUs, FPGAs, supercomputers, etc.
67+
68+
Compute Express Link (CXL_)
69+
An open standard for high-speed, high-capacity central processing unit
70+
(CPU)-to-device and CPU-to-memory connections, designed for high-performance
71+
data center computers. CXL is built on the serial PCI Express (PCIe) physical
72+
and electrical interface and includes PCIe-based block input/output protocol
73+
(CXL.io), cache-coherent protocols for accessing system memory (CXL.cache),
74+
and device memory (CXL.mem).
75+
76+
oneAPI Threading Building Blocks (oneTBB_)
77+
A C++ template library developed by Intel for parallel programming on
78+
multi-core processors. TBB broke down the computation into tasks that can run
79+
in parallel. The library manages and schedules threads to execute these tasks.
80+
81+
jemalloc
82+
A general-purpose malloc implementation that emphasizes fragmentation
83+
avoidance and scalable concurrency support. It provides introspection, memory
84+
management, and tuning features functionalities. Jemalloc_ uses separate pools
85+
(“arenas”) for each CPU which avoids lock contention problems in
86+
multithreading applications and makes them scale linearly with the number of
87+
threads.
88+
89+
Unified Shared Memory (USM)
90+
A programming model which provides a single memory address space that is
91+
shared between CPUs, GPUs, and possibly other accelerators. It simplifies
92+
memory management by transparently handling data migration between the CPU
93+
and the accelerator device as needed.
94+
95+
.. _CXL https://www.computeexpresslink.org/
96+
.. _oneTBB https://oneapi-src.github.io/oneTBB/
97+
.. _Jemalloc https://jemalloc.net/

scripts/docs_config/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ Intel Unified Memory Framework documentation
77
.. toctree::
88
:maxdepth: 3
99

10+
introduction.rst
1011
api.rst
12+
glossary.rst

scripts/docs_config/introduction.rst

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
==============
2+
Introduction
3+
==============
4+
5+
Motivation
6+
============
7+
8+
The amount of data associated with modern workloads that need to be processed
9+
by modern workloads is continuously growing. To address the increasing demand
10+
memory subsystem of modern server platforms is becoming heterogeneous. For
11+
example, High-Bandwidth Memory (HBM) introduced in Sapphire Rapids addresses
12+
throughput needs; the emerging CXL protocol closes the capacity gap and tends
13+
to better memory utilization by memory pooling capabilities. Beyond CPU use
14+
cases, there are GPU accelerators with their own memory on board.
15+
The opportunities provided by modern heterogeneous memory platforms come
16+
together with additional challenges. This means that additional software
17+
changes might be required to fully leverage new HW capabilities. The are two
18+
main problems that modern applications need to deal with. The first one is
19+
appropriate data placement and data migration between different types of
20+
memory. The second one is how SW should deal with different memory topologies.
21+
All applications can be divided into two big groups: enlightened and
22+
unenlightened. Enlightened applications explicitly manage data allocation
23+
distribution among memory tiers and further data migration. Unenlightened
24+
applications do not require any code modifications and rely on underlying
25+
infrastructure which is in turn enlightened. And underlying infrastructure is
26+
not only OS with various memory tiering solutions to migrate memory pages
27+
between tiers, but also middleware: frameworks and libraries.
28+
29+
==============
30+
Architecture
31+
==============
32+
33+
The Unified Memory Framework (UMF) is a library for constructing allocators
34+
and memory pools. It also contains broadly useful abstractions and utilities
35+
for memory management. UMF allows users to manage multiple memory pools
36+
characterized by different attributes, allowing certain allocation types to be
37+
isolated from others and allocated using different hardware resources as
38+
required.
39+
40+
A memory pool is a combination of a pool allocator and one or more memory
41+
targets accessed by memory providers along with their properties and allocation
42+
policies. Specifically, a memory provider is responsible for coarse-grained
43+
memory allocations, while the pool allocator controls the pool and handles
44+
fine-grained memory allocations. UMF provides distinct interfaces for both pool
45+
allocators and memory providers, allowing integration into various
46+
applications.
47+
48+
.. figure:: ../assets/images/intro_architecture.png
49+
50+
The UMF library contains various pool allocators and memory providers but also
51+
allows for the integration of external ones, giving users the flexibility to
52+
either use existing solutions or provide their implementations.
53+
54+
Memory Providers
55+
==================
56+
57+
A memory provider is an abstraction for coarse (memory page) allocations and
58+
deallocations of target memory types, such as host CPU, GPU, or CXL memory.
59+
A single memory provider kind can efficiently manage the memory operations for
60+
one or multiple devices within the system or other memory sources like
61+
file-backed or user-provided memory.
62+
63+
UMF comes with several bundled memory providers. Please refer to the README.md
64+
to see a full list of them. There is also a possibility to use externally
65+
defined memory providers if they implement the UMF interface.
66+
67+
To instantiate a memory provider, user must pass an additional context with
68+
contains the details about the specific memory target that should be used. This
69+
would be a NUMA node mask for the OS memory provider, file path for the
70+
file-backed memory provider, etc. After creation, the memory provider context
71+
can't be changed.
72+
73+
Pool Allocators
74+
=================
75+
76+
A pool allocator is an abstraction over object-level memory management based
77+
on coarse chunks acquired from the memory provider. It manages the memory pool
78+
and services fine-grained malloc/free requests.
79+
80+
Pool allocators can be implemented for be general purpose or to fulfill
81+
specific use cases. Implementations of the pool allocator interface can
82+
leverage existing allocators (e.g., jemalloc or oneTBB) or be fully
83+
customizable. The pool allocator abstraction could contain basic memory
84+
management interfaces, as well as more complex ones that can be used, for
85+
example, by the implementation for page monitoring or control (e.g., `madvise`).
86+
87+
UMF comes with several bundled memory providers. Please refer to the README.md
88+
to see a full list of them. There is also a possibility to use externally
89+
defined pool allocators if they implement the UMF interface.
90+
91+
Memory Pools
92+
==============
93+
94+
A Memory pool is a combination of a pool allocator and one or more memory
95+
targets accessed by memory providers. In UMF the user could either use some
96+
predefined memory pools or construct user-defined ones using the Pool Creation
97+
API.
98+
99+
After construction, memory pools are used by the Allocation API as a first
100+
argument. There is also a possibility to retrieve a memory pool from an
101+
existing memory pointer that points to a memory previously allocated by UMF.

0 commit comments

Comments
 (0)