User Tools

Site Tools


how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system

When selecting a memmap kernel parameter for PMEM you have to be careful that the physical addresses you are trying to reserve represent usable RAM. This information is easily available in the e820 table, available via dmesg.

Here is an example setup using a virtual machine with 20GiB of memory:

# dmesg | grep BIOS-e820
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000053fffffff] usable

In this output the regions marked as “usable” are fair game to be reserved for the PMEM driver, while the “reserved” regions are not. The last usable region represents the bulk of our available space, so we'll use that.

[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000053fffffff] usable

Plugging these physical addresses into our hex calculator, the region starts at 0x0000000100000000 (4 GiB) and ends at 0x000000053fffffff (21 GiB). Say we want to reserve 16 GiB to be used by PMEM. We can start this reservation at 4 GiB, and with size 16 GiB it will end at 20 GiB which is still within this usable range. The syntax for this reservation will then be as follows:

memmap=16G!4G

After rebooting with our new kernel parameter, we can see our new user defined e820 table via dmesg as well (the old table is still present, in case you want to compare):

# dmesg | grep user:
[    0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] user: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] user: [mem 0x0000000000100000-0x00000000bffdffff] usable
[    0.000000] user: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[    0.000000] user: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] user: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] user: [mem 0x0000000100000000-0x00000004ffffffff] persistent (type 12)
[    0.000000] user: [mem 0x0000000500000000-0x000000053fffffff] usable

We can see that our new persistent memory range does indeed start at 4 GiB and end at 20 GiB, fully overlapping the usable memory range defined in the e820 table output.

If we have the pmem driver loaded, we will see this reserved memory range as /dev/pmem0:

# fdisk -l /dev/pmem0 
Disk /dev/pmem0: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Another thing that you may need to be aware of is the CONFIG_RANDOMIZE_BASE kernel config option. When enabled, this randomizes the physical address at which the kernel image is decompressed and the virtual address where the kernel image is mapped. Currently this random address is chosen without regard to the memmap kernel command line parameter.

This means that the kernel can choose to put itself in the middle of your reserved memmap area. You can observe this behavior via /proc/iomem.

Here is /proc/iomem from a system with CONFIG_RANDOMIZE_BASE turned off:

# cat /proc/iomem
00000000-00000fff : reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c97ff : Video ROM
000c9800-000ca5ff : Adapter ROM
000ca800-000ccbff : Adapter ROM
000f0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-bffd8fff : System RAM
  01000000-01b18598 : Kernel code
  01b18599-023f53ff : Kernel data
  0276d000-0365efff : Kernel bss
bffd9000-bfffffff : reserved
c0000000-febfffff : PCI Bus 0000:00
  f4000000-f7ffffff : 0000:00:02.0
  f8000000-fbffffff : 0000:00:02.0
  fc000000-fc03ffff : 0000:00:03.0
  fc050000-fc051fff : 0000:00:02.0
  fc052000-fc052fff : 0000:00:03.0
  fc053000-fc053fff : 0000:00:04.0
  fc054000-fc054fff : 0000:00:05.7
    fc054000-fc054fff : ehci_hcd
  fc055000-fc055fff : 0000:00:06.0
fec00000-fec003ff : IOAPIC 0
fee00000-fee00fff : Local APIC
feffc000-feffffff : reserved
fffc0000-ffffffff : reserved
100000000-4ffffffff : Persistent Memory (legacy)
  100000000-4ffffffff : namespace0.0
500000000-53fffffff : System RAM

The interesting bits for us are the “System RAM” region from 00100000-bffd8fff, and the “Persistent Memory (legacy)” region from 100000000-4ffffffff.

If I turn on CONFIG_RANDOMIZE_BASE on this same system, I get the following:

# cat /proc/iomem
00000000-00000fff : reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c97ff : Video ROM
000c9800-000ca5ff : Adapter ROM
000ca800-000ccbff : Adapter ROM
000f0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-bffd8fff : System RAM
bffd9000-bfffffff : reserved
c0000000-febfffff : PCI Bus 0000:00
  f4000000-f7ffffff : 0000:00:02.0
  f8000000-fbffffff : 0000:00:02.0
  fc000000-fc03ffff : 0000:00:03.0
  fc050000-fc051fff : 0000:00:02.0
  fc052000-fc052fff : 0000:00:03.0
  fc053000-fc053fff : 0000:00:04.0
  fc054000-fc054fff : 0000:00:05.7
    fc054000-fc054fff : ehci_hcd
  fc055000-fc055fff : 0000:00:06.0
fec00000-fec003ff : IOAPIC 0
fee00000-fee00fff : Local APIC
feffc000-feffffff : reserved
fffc0000-ffffffff : reserved
100000000-4e6ffffff : Persistent Memory (legacy)
4e7000000-4e968bfff : System RAM
  4e7000000-4e7b185d8 : Kernel code
  4e7b185d9-4e83f54bf : Kernel data
  4e876d000-4e965efff : Kernel bss
4e968c000-4ffffffff : Persistent Memory (legacy)
500000000-53fffffff : System RAM

The “System RAM” region now sits in the middle of my “Persistent Memory (legacy)” region, splitting it in half. This results in the following kernel WARNING:

[    6.356180] WARNING: CPU: 4 PID: 689 at kernel/memremap.c:300 devm_memremap_pages+0x3b2/0x4c0
[    6.357757] devm_memremap_pages attempted on mixed region [mem 0x4e968c000-0x4ffffffff flags 0x200]

and no /dev/pmem* devices being created.

The CONFIG_RANDOMIZE_BASE (KASLR) issue should have been fixed: f28442497b5caf (“x86/boot: Fix KASLR and memmap= collision”)

There seems to be an issue with CONFIG_KSAN at the moment however.

how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system.txt · Last modified: 2018/08/10 16:17 by Dave Jiang