Documentation / driver-api / cxl / platform / bios-and-efi.rst


Based on kernel version 6.16. Page generated on 2025-08-06 08:57 EST.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262
.. SPDX-License-Identifier: GPL-2.0

======================
BIOS/EFI Configuration
======================

BIOS and EFI are largely responsible for configuring static information about
devices (or potential future devices) such that Linux can build the appropriate
logical representations of these devices.

At a high level, this is what occurs during this phase of configuration.

* The bootloader starts the BIOS/EFI.

* BIOS/EFI do early device probe to determine static configuration

* BIOS/EFI creates ACPI Tables that describe static config for the OS

* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc)

* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process.

Much of what this section is concerned with is ACPI Table production and
static memory map configuration. More detail on these tables can be found
at :doc:`ACPI Tables <acpi>`.

.. note::
   Platform Vendors should read carefully, as this sections has recommendations
   on physical memory region size and alignment, memory holes, HDM interleave,
   and what linux expects of HDM decoders trying to work with these features.

UEFI Settings
=============
If your platform supports it, the :code:`uefisettings` command can be used to
read/write EFI settings. Changes will be reflected on the next reboot. Kexec
is not a sufficient reboot.

One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit.
When this is enabled, this bit tells linux to defer management of a memory
region to a driver (in this case, the CXL driver). Otherwise, the memory is
treated as "normal memory", and is exposed to the page allocator during
:code:`__init`.

uefisettings examples
---------------------

:code:`uefisettings identify` ::

        uefisettings identify

        bios_vendor: xxx
        bios_version: xxx
        bios_release: xxx
        bios_date: xxx
        product_name: xxx
        product_family: xxx
        product_version: xxx

On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL
Memory Attribute` field.  This may be called something else on your platform.

:code:`uefisettings get "CXL Memory Attribute"` ::

        selector: xxx
        ...
        question: Question {
            name: "CXL Memory Attribute",
            answer: "Enabled",
            ...
        }

Physical Memory Map
===================

Physical Address Region Alignment
---------------------------------

As of Linux v6.14, the hotplug memory system requires memory regions to be
uniform in size and alignment.  While the CXL specification allows for memory
regions as small as 256MB, the supported memory block size and alignment for
hotplugged memory is architecture-defined.

A Linux memory blocks may be as small as 128MB and increase in powers of two.

* On ARM, the default block size and alignment is either 128MB or 256MB.

* On x86, the default block size is 256MB, and increases to 2GB as the
  capacity of the system increases up to 64GB.

For best support across versions, platform vendors should place CXL memory at
a 2GB aligned base address, and regions should be 2GB aligned.  This also helps
prevent the creating thousands of memory devices (one per block).

Memory Holes
------------

Holes in the memory map are tricky.  Consider a 4GB device located at base
address 0x100000000, but with the following memory map ::

  ---------------------
  |    0x100000000    |
  |        CXL        |
  |    0x1BFFFFFFF    |
  ---------------------
  |    0x1C0000000    |
  |    MEMORY HOLE    |
  |    0x1FFFFFFFF    |
  ---------------------
  |    0x200000000    |
  |     CXL CONT.     |
  |    0x23FFFFFFF    |
  ---------------------

There are two issues to consider:

* decoder programming, and
* memory block alignment.

If your architecture requires 2GB uniform size and aligned memory blocks, the
only capacity Linux is capable of mapping (as of v6.14) would be the capacity
from `0x100000000-0x180000000`.  The remaining capacity will be stranded, as
they are not of 2GB aligned length.

Assuming your architecture and memory configuration allows 1GB memory blocks,
this memory map is supported and this should be presented as multiple CFMWS
in the CEDT that describe each side of the memory hole separately - along with
matching decoders.

Multiple decoders can (and should) be used to manage such a memory hole (see
below), but each chunk of a memory hole should be aligned to a reasonable block
size (larger alignment is always better).  If you intend to have memory holes
in the memory map, expect to use one decoder per contiguous chunk of host
physical memory.

As of v6.14, Linux does provide support for memory hotplug of multiple
physical memory regions separated by a memory hole described by a single
HDM decoder.


Decoder Programming
===================
If BIOS/EFI intends to program the decoders to be statically configured,
there are a few things to consider to avoid major pitfalls that will
prevent Linux compatibility.  Some of these recommendations are not
required "per the specification", but Linux makes no guarantees of support
otherwise.


Translation Point
-----------------
Per the specification, the only decoders which **TRANSLATE** Host Physical
Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**.
All other decoders in the fabric are intended to route accesses without
translating the addresses.

This is heavily implied by the specification, see: ::

  CXL Specification 3.1
  8.2.4.20: CXL HDM Decoder Capability Structure
  - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow
  - Implementation Note: Device Decoder Logic

Given this, Linux makes a strong assumption that decoders between CPU and
endpoint will all be programmed with addresses ranges that are subsets of
their parent decoder.

Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications
"hand off" responsibility between domains, some early adopting platforms
attempted to do translation at the originating memory controller or host
bridge.  This configuration requires a platform specific extension to the
driver and is not officially endorsed - despite being supported.

It is *highly recommended* **NOT** to do this; otherwise, you are on your own
to implement driver support for your platform.

Interleave and Configuration Flexibility
----------------------------------------
If providing cross-host-bridge interleave, a CFMWS entry in the :doc:`CEDT
<acpi/cedt>` must be presented with target host-bridges for the interleaved
device sets (there may be multiple behind each host bridge).

If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is
required for that host bridge - if it covers the entire capacity of the devices
behind the host bridge.

If intending to provide users flexibility in programming decoders beyond the
root, you may want to provide multiple CFMWS entries in the CEDT intended for
different purposes.  For example, you may want to consider adding:

1) A CFMWS entry to cover all interleavable host bridges.
2) A CFMWS entry to cover all devices on a single host bridge.
3) A CFMWS entry to cover each device.

A platform may choose to add all of these, or change the mode based on a BIOS
setting.  For each CFMWS entry, Linux expects descriptions of the described
memory regions in the :doc:`SRAT <acpi/srat>` to determine the number of
NUMA nodes it should reserve during early boot / init.

As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if
a matching SRAT entry does not exist; however, this is not guaranteed in the
future and such a configuration should be avoided.

Memory Holes
------------
If your platform includes memory holes intersparsed between your CXL memory, it
is recommended to utilize multiple decoders to cover these regions of memory,
rather than try to program the decoders to accept the entire range and expect
Linux to manage the overlap.

For example, consider the Memory Hole described above ::

  ---------------------
  |    0x100000000    |
  |        CXL        |
  |    0x1BFFFFFFF    |
  ---------------------
  |    0x1C0000000    |
  |    MEMORY HOLE    |
  |    0x1FFFFFFFF    |
  ---------------------
  |    0x200000000    |
  |     CXL CONT.     |
  |    0x23FFFFFFF    |
  ---------------------

Assuming this is provided by a single device attached directly to a host bridge,
Linux would expect the following decoder programming ::

     -----------------------   -----------------------
     | root-decoder-0      |   | root-decoder-1      |
     |   base: 0x100000000 |   |   base: 0x200000000 |
     |   size:  0xC0000000 |   |   size:  0x40000000 |
     -----------------------   -----------------------
                |                         |
     -----------------------   -----------------------
     | HB-decoder-0        |   | HB-decoder-1        |
     |   base: 0x100000000 |   |   base: 0x200000000 |
     |   size:  0xC0000000 |   |   size:  0x40000000 |
     -----------------------   -----------------------
                |                         |
     -----------------------   -----------------------
     | ep-decoder-0        |   | ep-decoder-1        |
     |   base: 0x100000000 |   |   base: 0x200000000 |
     |   size:  0xC0000000 |   |   size:  0x40000000 |
     -----------------------   -----------------------

With a CEDT configuration with two CFMWS describing the above root decoders.

Linux makes no guarantee of support for strange memory hole situations.

Multi-Media Devices
-------------------
The CFMWS field of the CEDT has special restriction bits which describe whether
the described memory region allows volatile or persistent memory (or both). If
the platform intends to support either:

1) A device with multiple medias, or
2) Using a persistent memory device as normal memory

A platform may wish to create multiple CEDT CFMWS entries to describe the same
memory, with the intent of allowing the end user flexibility in how that memory
is configured. Linux does not presently have strong requirements in this area.