This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
2mib_fs_dax [2017/12/20 20:37] Ross Zwisler created |
2mib_fs_dax [2020/09/24 00:43] Darrick Wong mount xfs with lazytime to avoid timestamp update overhead on page faults |
||
---|---|---|---|
Line 16: | Line 16: | ||
Here are the steps that I've used to successfully get filesystem DAX PMDs: | Here are the steps that I've used to successfully get filesystem DAX PMDs: | ||
- | 1. First, make sure that our persistent memory block device starts at a 2 MiB aligned physical address. | + | 1. First, make sure that your namespace is in 'fsdax' mode. |
+ | |||
+ | # ndctl list --human | ||
+ | { | ||
+ | "dev":"namespace0.0", | ||
+ | "mode":"fsdax", | ||
+ | "size":"16.73 GiB (17.96 GB)", | ||
+ | "uuid":"179e5b98-96ee-4988-ba9f-ed9383d11598", | ||
+ | "sector_size":512, | ||
+ | "blockdev":"pmem0", | ||
+ | "numa_node":0 | ||
+ | } | ||
+ | |||
+ | 2. Next, make sure that our persistent memory block device starts at a 2 MiB aligned physical address. | ||
This is important because when we ask the filesystem for 2 MiB aligned and sized block allocations it will provide those block allocations relative to the beginning of its block device. If the filesystem is built on top of a namespace whose data starts at a 1 MiB aligned offset, for example, a block allocation that is 2 MiB aligned from the point of view of the filesystem will still be only 1 MiB aligned from DAX's point of view. This will cause DAX to fall back to 4 KiB page faults. | This is important because when we ask the filesystem for 2 MiB aligned and sized block allocations it will provide those block allocations relative to the beginning of its block device. If the filesystem is built on top of a namespace whose data starts at a 1 MiB aligned offset, for example, a block allocation that is 2 MiB aligned from the point of view of the filesystem will still be only 1 MiB aligned from DAX's point of view. This will cause DAX to fall back to 4 KiB page faults. | ||
Line 24: | Line 37: | ||
# cat /proc/iomem | # cat /proc/iomem | ||
... | ... | ||
- | 140000000-57fffffff : Persistent Memory | + | 140000000-57fdfffff : Persistent Memory |
- | 140000000-57fffffff : namespace0.0 | + | 140000000-57fdfffff : namespace0.0 |
Our namespace in this case begins at 5 GiB (0x1 4000 0000), which is 2 MiB (0x20 0000) aligned. | Our namespace in this case begins at 5 GiB (0x1 4000 0000), which is 2 MiB (0x20 0000) aligned. | ||
- | If we create any partitions on top of our PMEM namespace, we must ensure that those partitions are likewise 2 MiB aligned. By default fdisk will create partitions that are 1 MiB (2048 sector) aligned from the start of the parent block device: | + | It is recommend to use raw devices and create multiple namespaces if the system configuration calls for persistent memory to be provisioned into smaller volumes. This is because namespace alignment is enforced at namespace creation time whereas partitions need to be created by tooling that is careful to align both the start of the namespace and the start of partitions. Long term the pmem device partition is scheduled for deprecation in favor of requiring namespaces for all provisioning. |
+ | |||
+ | Instead, if we create any partitions on top of our PMEM namespace, we must ensure that those partitions are likewise 2 MiB aligned. By default fdisk will create partitions that are 1 MiB (2048 sector) aligned from the start of the parent block device: | ||
# fdisk -l /dev/pmem0 | # fdisk -l /dev/pmem0 | ||
- | Disk /dev/pmem0: 16.8 GiB, 17966301184 bytes, 35090432 sectors | + | Disk /dev/pmem0: 16.7 GiB, 17964204032 bytes, 35086336 sectors |
Units: sectors of 1 * 512 = 512 bytes | Units: sectors of 1 * 512 = 512 bytes | ||
Sector size (logical/physical): 512 bytes / 4096 bytes | Sector size (logical/physical): 512 bytes / 4096 bytes | ||
I/O size (minimum/optimal): 4096 bytes / 4096 bytes | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | ||
Disklabel type: dos | Disklabel type: dos | ||
- | Disk identifier: 0x5af75158 | + | Disk identifier: 0xfd17c8f9 |
Device Boot Start End Sectors Size Id Type | Device Boot Start End Sectors Size Id Type | ||
- | /dev/pmem0p1 2048 35090431 35088384 16.7G 83 Linux | + | /dev/pmem0p1 2048 35086335 35084288 16.7G 83 Linux |
A filesystem built on top of this partition won't be able to provide DAX with 2 MiB aligned block allocations. We instead need to have our partition begin at a 2 MiB aligned boundary: | A filesystem built on top of this partition won't be able to provide DAX with 2 MiB aligned block allocations. We instead need to have our partition begin at a 2 MiB aligned boundary: | ||
# fdisk -l /dev/pmem0 | # fdisk -l /dev/pmem0 | ||
- | Disk /dev/pmem0: 16.8 GiB, 17966301184 bytes, 35090432 sectors | + | Disk /dev/pmem0: 16.7 GiB, 17964204032 bytes, 35086336 sectors |
Units: sectors of 1 * 512 = 512 bytes | Units: sectors of 1 * 512 = 512 bytes | ||
Sector size (logical/physical): 512 bytes / 4096 bytes | Sector size (logical/physical): 512 bytes / 4096 bytes | ||
I/O size (minimum/optimal): 4096 bytes / 4096 bytes | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | ||
Disklabel type: dos | Disklabel type: dos | ||
- | Disk identifier: 0x276da416 | + | Disk identifier: 0xfd17c8f9 |
Device Boot Start End Sectors Size Id Type | Device Boot Start End Sectors Size Id Type | ||
- | /dev/pmem0p1 4096 35090431 35086336 16.7G 83 Linux | + | /dev/pmem0p1 4096 35086335 35082240 16.7G 83 Linux |
- | 2. Once we have a block device that starts at a 2 MiB aligned persistent memory address, we then need to create a filesystem on top of it that will give us 2 MiB aligned and sized block allocations. Here are the commands to do that with either ext4 or XFS: | + | 3. Once we have a block device that starts at a 2 MiB aligned persistent memory address, we then need to create a filesystem on top of it that will give us 2 MiB aligned and sized block allocations. Here are the commands to do that with either ext4 or XFS: |
ext4: | ext4: | ||
Line 61: | Line 76: | ||
xfs: | xfs: | ||
- | # mkfs.xfs -f -d su=2m,sw=1 /dev/pmem0 | + | # mkfs.xfs -f -d su=2m,sw=1 -m reflink=0 /dev/pmem0 |
- | # mount /dev/pmem0 /mnt/dax | + | # mount -o dax,lazytime /dev/pmem0 /mnt/dax |
# xfs_io -c "extsize 2m" /mnt/dax | # xfs_io -c "extsize 2m" /mnt/dax | ||
Line 70: | Line 85: | ||
[[https://linux.die.net/man/8/xfs_io|xfs_io(8)]] for more details. | [[https://linux.die.net/man/8/xfs_io|xfs_io(8)]] for more details. | ||
- | 3. Now that we have a filesystem that can give us 2 MiB sized and aligned | + | 4. Now that we have a filesystem that can give us 2 MiB sized and aligned |
block allocations we just need to create a file that will receive those | block allocations we just need to create a file that will receive those | ||
allocations. To do this we need to begin with a file that is at least 2 MiB | allocations. To do this we need to begin with a file that is at least 2 MiB |