User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start [2018/04/09 20:13]
Ross Zwisler
start [2020/06/12 15:58]
Ira Weiny [Filesystems]
Line 1: Line 1:
-==== Persistent Memory ==== +===== Persistent Memory ===== 
----- +These pages contain instructions,​ links and other information related to persistent memory in Linux.
-These pages contain instructions,​ links and other information related to persistent memory ​enabling ​in Linux.+
  
-=== Links === +==== Links ==== 
-----+=== Miscellaneous ===
  
   * NVDIMM Namespace: http://​pmem.io/​documents/​NVDIMM_Namespace_Spec.pdf   * NVDIMM Namespace: http://​pmem.io/​documents/​NVDIMM_Namespace_Spec.pdf
-  * DSM Interface(1.6):​ http://​pmem.io/​documents/​NVDIMM_DSM_Interface-V1.6.pdf 
   * Driver Writer’s Guide: http://​pmem.io/​documents/​NVDIMM_Driver_Writers_Guide.pdf   * Driver Writer’s Guide: http://​pmem.io/​documents/​NVDIMM_Driver_Writers_Guide.pdf
   * NVDIMM Kernel Tree: https://​git.kernel.org/​cgit/​linux/​kernel/​git/​nvdimm/​nvdimm.git   * NVDIMM Kernel Tree: https://​git.kernel.org/​cgit/​linux/​kernel/​git/​nvdimm/​nvdimm.git
   * NDCTL: https://​github.com/​pmem/​ndctl.git   * NDCTL: https://​github.com/​pmem/​ndctl.git
-  * linux-nvdimm Mailing List: https://​lists.01.org/​mailman/listinfo/​linux-nvdimm+  ​* NDCTL man pages online: http://​pmem.io/​ndctl/​ 
 +  ​* linux-nvdimm Mailing List: https://​lists.01.org/​postorius/lists/​linux-nvdimm.lists.01.org
   * linux-nvdimm Patchwork: https://​patchwork.kernel.org/​project/​linux-nvdimm/​list/​   * linux-nvdimm Patchwork: https://​patchwork.kernel.org/​project/​linux-nvdimm/​list/​
-  * IXPDIMM management software: https://​github.com/​01org/​IXPDIMMSW 
-  * NDCTL man pages online: http://​pmem.io/​ndctl/​ 
  
 === Blogs === === Blogs ===
Line 20: Line 17:
   * [[https://​www.suse.com/​communities/​blog/​nvdimm-enabling-part-2-intel/​|NVDIMM Enabling in SUSE Linux Enterprise 12, Service Pack 2 - Part 2]]   * [[https://​www.suse.com/​communities/​blog/​nvdimm-enabling-part-2-intel/​|NVDIMM Enabling in SUSE Linux Enterprise 12, Service Pack 2 - Part 2]]
  
-=== Industry specifications === +=== Industry ​standards and specifications ===
   * Advanced Configuration and Power Interface (ACPI) 6.2a: http://​www.uefi.org/​sites/​default/​files/​resources/​ACPI%206_2_A_Sept29.pdf   * Advanced Configuration and Power Interface (ACPI) 6.2a: http://​www.uefi.org/​sites/​default/​files/​resources/​ACPI%206_2_A_Sept29.pdf
   * Unified Extensible Firmware Interface (UEFI) Specification 2.7: http://​www.uefi.org/​sites/​default/​files/​resources/​UEFI_Spec_2_7.pdf   * Unified Extensible Firmware Interface (UEFI) Specification 2.7: http://​www.uefi.org/​sites/​default/​files/​resources/​UEFI_Spec_2_7.pdf
-  * Byte Addressable Energy Backed Interface (JESD245A): https://​www.jedec.org/​system/​files/​docs/​JESD245A.pdf+  * DMTF System Management BIOS (SMBIOS 3.2.0): https://​www.dmtf.org/​sites/​default/​files/​standards/​documents/​DSP0134_3.2.0.pdf 
 +  * JEDEC Byte Addressable Energy Backed Interface (JESD245B.01):​ https://​www.jedec.org/​system/​files/​docs/​JESD245B-01.pdf 
 +  * JEDEC DDR4 NVDIMM-N Design Specification (JESD248A): https://​www.jedec.org/​system/​files/​docs/​JESD248A.pdf
  
-=== Subtopics ===+=== Vendor-specific tools and specifications === 
 +  * Intel Optane DC persistent memory management software: https://​github.com/​intel/​ipmctl 
 +  * Intel DSM Interface: http://​pmem.io/​documents/​NVDIMM_DSM_Interface-V1.7.pdf 
 +  * Microsoft DSM Interface: https://​docs.microsoft.com/​en-us/​windows-hardware/​drivers/​storage/​-dsm-interface-for-byte-addressable-energy-backed-function-class--function-interface-1- 
 +  * HPE DSM Interface: https://​github.com/​HewlettPackard/​hpe-nvm 
 + 
 +==== Subtopics ===
 +----
   * [[How to choose the correct memmap kernel parameter for PMEM on your system|How to choose the correct memmap kernel parameter for PMEM on your system]]   * [[How to choose the correct memmap kernel parameter for PMEM on your system|How to choose the correct memmap kernel parameter for PMEM on your system]]
   * [[2MiB_FS_DAX|How to get 2 MiB filesystem DAX faults]]   * [[2MiB_FS_DAX|How to get 2 MiB filesystem DAX faults]]
 +  * [[pmem_in_qemu|Simulating persistent memory configurations using QEMU]]
  
-=== Quick Setup Guide ===+==== Quick Setup Guide ====
 ---- ----
 One interesting use of the PMEM driver is to allow users to begin developing One interesting use of the PMEM driver is to allow users to begin developing
Line 51: Line 57:
 Also see: [[How to choose the correct memmap kernel parameter for PMEM on your system|How to choose the correct memmap kernel parameter for PMEM on your system]]. Also see: [[How to choose the correct memmap kernel parameter for PMEM on your system|How to choose the correct memmap kernel parameter for PMEM on your system]].
  
-2) Set up the correct kernel configuration options for PMEM and DAX in .config.+2) Set up the correct kernel configuration options for PMEM and DAX in .config.  To use huge pages for mmapped files, you'll need CONFIG_FS_DAX_PMD selected, which is done automatically if you have the prerequisites marked below.
  
 Options in make menuconfig: Options in make menuconfig:
Line 64: Line 70:
   * Processor type and features   * Processor type and features
     * Support non-standard NVDIMMs and ADR protected memory <if using the memmap kernel parameter>​     * Support non-standard NVDIMMs and ADR protected memory <if using the memmap kernel parameter>​
 +    * Transparent Hugepage Support <needed for huge pages>
 +    * Allow for memory hot-add <needed for huge pages>
 +      * Allow for memory hot remove <needed for huge pages>
 +    * Device memory (pmem, HMM, etc...) hotplug support <needed for huge pages> ​
  
 <​code>​ <​code>​
-CONFIG_BLK_DEV_RAM_DAX=y +CONFIG_ZONE_DEVICE=y 
-CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y 
-CONFIG_X86_PMEM_LEGACY=y +CONFIG_MEMORY_HOTREMOVE=y 
-CONFIG_LIBNVDIMM=y+CONFIG_TRANSPARENT_HUGEPAGE=y 
 +CONFIG_ACPI_NFIT=m 
 +CONFIG_X86_PMEM_LEGACY=m 
 +CONFIG_OF_PMEM=m 
 +CONFIG_LIBNVDIMM=m
 CONFIG_BLK_DEV_PMEM=m CONFIG_BLK_DEV_PMEM=m
-CONFIG_ARCH_HAS_PMEM_API=y+CONFIG_BTT=y 
 +CONFIG_NVDIMM_PFN=y 
 +CONFIG_NVDIMM_DAX=y 
 +CONFIG_FS_DAX=y 
 +CONFIG_DAX=y 
 +CONFIG_DEV_DAX=m 
 +CONFIG_DEV_DAX_PMEM=m 
 +CONFIG_DEV_DAX_KMEM=m
 </​code>​ </​code>​
  
Line 107: Line 128:
 </​code>​ </​code>​
  
-=== Namespaces ===+==== Namespaces ​====
 ---- ----
 You can divide persistent memory address ranges into namespaces with ndctl. This stores namespace label metadata at the beginning of the persistent memory address range. You can divide persistent memory address ranges into namespaces with ndctl. This stores namespace label metadata at the beginning of the persistent memory address range.
  
-ndctl supports four modes: +ndctl create-namespace ties a namespace to a block device ​or character device: 
-  * raw present as /​dev/​pmemN ​block device + 
-    ​* supports ​filesystems ​with or without ​DAX +^ mode   ^ description ​   ^ device path ^ device type ^ label metadata ^ atomicity ^ filesystems ​DAX ^ PFN metadata ^ former name ^ 
-    ​no label metadata on the device +| raw    | raw            | /​dev/​pmemN ​ | block       ​| ​no             | no        | yes         | no  | no           | | 
-  ​* sector - /dev/pmemNs block device with sector atomicity +| sector | sector atomic ​ /​dev/​pmemNs ​block       | yes            | yes       | yes         | no  | no           | | 
-    * Block Translation Table (BTT) layer on top of a /dev/pmemN block device +| fsdax  | filesystem DAX | /​dev/​pmemN ​ ​| ​block       | yes            | no        | yes         | yes | yes          | memory | 
-    * supports filesystems without ​DAX +| devdax | device ​DAX     | /dev/daxN.M | character ​  | yes            | no        | no          | yes | yes          | dax | 
-  * memory - /dev/pmemN block device supporting PCIe device DMA + 
-    * requires storing extra "​struct page" ​entries somewhere +There are two places to store PFN metadata ("​struct page" ​metadata): 
-      * this requires 64 bytes per 4 KiB of persistent memory +  * --map=mem = regular system memory 
-      * struct page storage locations +    * adequate for small persistent memory capacities 
-        ​* --map=mem = regular system memory +  * --map=dev = persistent memory 
-          * adequate for small persistent memory capacities ​(e.g., 128 MiB for 8 GiB persistent memory) +    * intended for large persistent memory capacities (there might not be enough regular memory in the system!) 
-        * --map=dev = persistent memory +    * persistence of the PFN metadata is not important; this is just convenient because it scales with the persistent memory capacity 
-          * intended for large persistent memory capacities (e.g., 7.45 GiB for TB persistent memory) + 
-    ​* Supports filesystems with or without DAX +The PFN metadata size is 64 bytes per 4 KiB of persistent memory (1.5625%)For some common persistent memory capacities:​ 
-  ​* dax - /dev/daxN.M character device supporting ​DAX +^  persistent memory capacity ​ ^  PFN metadata size  ^ example ^ 
-    ​* ​does not support filesystems +|  8 GiB    |  128 MiB  | | 
-    * no interactions ​with the kernel page cache +|  16 GiB   ​| ​ 256 MiB  | | 
-    * requires storing extra "​struct page" entries in persistent memory+|  96 GiB   ​| ​ 1.GiB  | Six 16 GiB NVDIMMs | 
 +|  128 GiB  |  2 GiB    | | 
 +|  192 GiB  |  3 GiB    | Twelve 16 GiB NVDIMMs | 
 +|  256 GiB  |  4 GiB    | | 
 +|  512 GiB  |  8 GiB    | | 
 +|  768 GiB  |  12 GiB   | Six 128 GiB NVDIMMs | 
 +|  ​TiB    |  16 GiB   | | 
 +|  1.5 TiB  |  24 GiB   | Six 256 GiB NVDIMMs | 
 +|  3 TiB    |  48 GiB   | Six 512 GiB NVDIMMs | 
 + 6 TiB    |  96 GiB   | Six 1 TiB NVDIMMs, or twelve 512 GiB NVDIMMs | 
 + 
 + 
 +Sector Atomic mode uses a Block Translation Layer (BTT) to help software that doesn'​t understand sectors might end up with a mix of old and new data if power loss occurs while writes were underway. 
 + 
 +Filesystem ​DAX mode lets the filesystem provide direct access to persistent memory to applications by using mmap() (e.g., ext4 and xfs filesystems). 
 + 
 +Device DAX mode creates a character device instead of a block device, and is intended for applications that mmap() the the entire capacity. It does not support filesystems ​or interact ​with the kernel page cache.
  
 Example commands on an 8 GiB NVDIMM with output showing the resulting sizes and /dev/ device names: Example commands on an 8 GiB NVDIMM with output showing the resulting sizes and /dev/ device names:
Line 138: Line 175:
   "​dev":"​namespace0.0",​   "​dev":"​namespace0.0",​
   "​mode":"​raw",​   "​mode":"​raw",​
-  "​size":​8589934592, ​            # this is exactly ​8 GiB +  "​size":​"8.00 GiB (8.59 GB)",​ 
-  "​blockdev":"​pmem0"​+  "​sector_size":​512,​ 
 +  "​blockdev":"​pmem0"​
 +  "​numa_node":​0
 } }
  
Line 146: Line 185:
   "​dev":"​namespace0.0",​   "​dev":"​namespace0.0",​
   "​mode":"​sector",​   "​mode":"​sector",​
-  "​size":​8580472832, ​           # this is 9240 KiB less than 8 GiB +  "​size":​"​7.99 ​GiB (8.58 GB)", 
-  "​uuid":"​52b53e55-eccd-40bf-a2aa-9f03ebf30e6b",+  "​uuid":"​30868a48-9763-4d4d-a6b7-e43dbb165b16",
   "​sector_size":​4096,​   "​sector_size":​4096,​
-  "​blockdev":"​pmem0s"​+  "​blockdev":"​pmem0s"​
 +  "​numa_node":​0
 } }
  
-$ ndctl create-namespace --mode ​memory ​--map mem -e namespace0.0 -f+$ ndctl create-namespace --mode ​fsdax --map mem -e namespace0.0 -f
 { {
   "​dev":"​namespace0.0",​   "​dev":"​namespace0.0",​
-  "​mode":"​memory", +  "​mode":"​fsdax", 
-  "size":8587837440           # this is 2 MiB less than 8 GiB +  "map":"​mem"​, 
-  "​uuid":"​349b7e53-dfbb-4b90-89ed-db80cfdaab0f", +  "​size":"​8.00 GiB (8.59 GB)", 
-  "​blockdev":"​pmem0"​+  "​uuid":"​f0ab3a91-c5bc-42b2-805f-4fa6c6075a50"
 +  "​sector_size":​512
 +  "​blockdev":"​pmem0"​
 +  "​numa_node":​0
 } }
  
-$ ndctl create-namespace --mode ​memory ​--map dev -e namespace0.0 -f+$ ndctl create-namespace --mode ​fsdax --map dev -e namespace0.0 -f
 { {
   "​dev":"​namespace0.0",​   "​dev":"​namespace0.0",​
-  "​mode":"​memory", +  "​mode":"​fsdax", 
-  "size":8453619712           # this is 130 MiB less than 8 GiB +  "map":"​dev"​, 
-  "​uuid":"​03faeca5-226c-48d9-bb47-f71cbc6d322e", +  "​size":"​7.87 ​GiB (8.45 GB)", 
-  "​blockdev":"​pmem0"​+  "​uuid":"​64f617f3-b79a-4c92-8ca7-c02d05572d3c"
 +  "​sector_size":​512
 +  "​blockdev":"​pmem0"​
 +  "​numa_node":​0
 } }
  
-sudo ndctl create-namespace --mode ​dax -e namespace0.0 -f+$ ndctl create-namespace --mode ​devdax --map mem -e namespace0.0 -f
 { {
   "​dev":"​namespace0.0",​   "​dev":"​namespace0.0",​
-  "​mode":"​dax", +  "​mode":"​devdax", 
-  "size":8453619712           # this is 130 MiB less than 8 GiB +  "map":"​mem"​, 
-  "​uuid":"​252d7895-91f3-42b7-9eeb-27ffc03e354c", +  "​size":"​8.00 GiB (8.59 GB)", 
-  "daxdevs":[ +  "​uuid":"​7fc2ecfb-edb2-4370-b9e1-09ecbdf7df16", 
-    ​{ +  "daxregion":{ 
-      "​chardev":"​dax0.0", ​      # this is 130 MiB less than 8 GiB +    ​"​id":​0,​ 
-      "​size":​8453619712 +    "​size":"​8.00 GiB (8.59 GB)",​ 
-    } +    "​align":​2097152,​ 
-  ​]+    "​devices":​[ 
 +      ​
 +        ​"​chardev":"​dax0.0",​ 
 +        "​size":​"8.00 GiB (8.59 GB)" 
 +      } 
 +    ​
 +  ​}, 
 +  ​"​numa_node":​0
 } }
  
 +$ ndctl create-namespace --mode devdax --map dev -e namespace0.0 -f
 +{
 +  "​dev":"​namespace0.0",​
 +  "​mode":"​devdax",​
 +  "​map":"​dev",​
 +  "​size":"​7.87 GiB (8.45 GB)",
 +  "​uuid":"​47343804-46f5-49d8-a76e-76cc240d8fc7",​
 +  "​daxregion":​{
 +    "​id":​0,​
 +    "​size":"​7.87 GiB (8.45 GB)",
 +    "​align":​2097152,​
 +    "​devices":​[
 +      {
 +        "​chardev":"​dax0.0",​
 +        "​size":"​7.87 GiB (8.45 GB)"
 +      }
 +    ]
 +  },
 +  "​numa_node":​0
 +}
 +</​code>​
 +
 +When using QEMU (see the [[pmem_in_qemu|Simulating persistent memory configurations using QEMU]] page) your namespaces will by default be in raw mode.  You can use the following bash script to convert all your raw mode namespaces to fsdax mode:
 +
 +<​code>​
 +#​!/​usr/​bin/​bash -ex
 +
 +namespaces=$(ndctl list | jq -r '((. | arrays | .[]), . | objects) | select(.mode == "​raw"​) | .dev')
 +for n in $namespaces;​ do
 + ndctl create-namespace -f -e $n --mode=memory
 +done
 +</​code>​
 +
 +This function highlights a tricky thing about ndctl and json.  If you have a single namespace, that is returned by ''​ndctl list''​ as a single json object:
 +
 +<​code>​
 +# ndctl list
 +{
 +  "​dev":"​namespace0.0",​
 +  "​mode":"​fsdax",​
 +  "​size":​17834180608,​
 +  "​uuid":"​830d3440-df00-4e5a-9f89-a951dfb962cd",​
 +  "​raw_uuid":"​2dbddec6-44cc-41a4-bafd-a4cc3e345e50",​
 +  "​sector_size":​512,​
 +  "​blockdev":"​pmem0",​
 +  "​numa_node":​0
 +}
 +</​code>​
 +
 +If you have two or more namespaces, though, they are returned as an array of json objects:
 +
 +<​code>​
 +# ndctl list
 +[
 +  {
 +    "​dev":"​namespace1.0",​
 +    "​mode":"​fsdax",​
 +    "​size":​17834180608,​
 +    "​uuid":"​ce92c90c-1707-4a39-abd8-1dd12788d137",​
 +    "​raw_uuid":"​f8130943-5867-4e84-b2e5-6c685434ef81",​
 +    "​sector_size":​512,​
 +    "​blockdev":"​pmem1",​
 +    "​numa_node":​0
 +  },
 +  {
 +    "​dev":"​namespace0.0",​
 +    "​mode":"​fsdax",​
 +    "​size":​17834180608,​
 +    "​uuid":"​33d46163-095a-4bf8-acf0-6dbc5dc8a738",​
 +    "​raw_uuid":"​8f44ccd3-50f3-4dec-9817-554e9d1a5c5f",​
 +    "​sector_size":​512,​
 +    "​blockdev":"​pmem0",​
 +    "​numa_node":​0
 +  }
 +]
 +</​code>​
 +
 +Note the outer ''​[''​ and ''​]''​ brackets surrounding the objects which turn it into an array. ​ The difficulty is that a given ''​jq''​ command expects to either operate on objects or on an array, but not both.  So, the command you need to run will vary based on how many namespaces you have.
 +
 +The command above works around this by first converting the multiple namespace output from an array of objects to multiple objects in a series:
 +
 +<​code>​
 +# ndctl list | jq -r '((. | arrays | .[]), . | objects)'​
 +{
 +  "​dev":​ "​namespace1.0",​
 +  "​mode":​ "​fsdax",​
 +  "​size":​ 17834180608,​
 +  "​uuid":​ "​ce92c90c-1707-4a39-abd8-1dd12788d137",​
 +  "​raw_uuid":​ "​f8130943-5867-4e84-b2e5-6c685434ef81",​
 +  "​sector_size":​ 512,
 +  "​blockdev":​ "​pmem1",​
 +  "​numa_node":​ 0
 +}
 +{
 +  "​dev":​ "​namespace0.0",​
 +  "​mode":​ "​fsdax",​
 +  "​size":​ 17834180608,​
 +  "​uuid":​ "​33d46163-095a-4bf8-acf0-6dbc5dc8a738",​
 +  "​raw_uuid":​ "​8f44ccd3-50f3-4dec-9817-554e9d1a5c5f",​
 +  "​sector_size":​ 512,
 +  "​blockdev":​ "​pmem0",​
 +  "​numa_node":​ 0
 +}
 </​code>​ </​code>​
  
 +We then structure the rest of the ''​jq''​ command to operate on normal objects, and it works whether we have one namespace or many.
  
-=== Partitions ​===+==== Persistent Naming ====
 ---- ----
-You can divide raw, sector, and memory ​devices (/dev/pmemN and /​dev/​pmemNs) into partitions.+The device names chosen by the kernel are subject to creation order and discovery order. Environments can not rely the kernel name being consistent from one boot to the next. For the most part they do not change if the configuration stays static, but if a permanent name is needed use /​dev/​disk/​by-id. Recent versions of udev deploy the following udev rule in 60-persistent-storage.rules:​ 
 +<​code>​ 
 +# PMEM devices 
 +KERNEL=="​pmem*",​ ENV{DEVTYPE}=="​disk",​ ATTRS{uuid}=="?​*",​ SYMLINK+="​disk/​by-id/​pmem-$attr{uuid}"​ 
 +</​code>​ 
 + 
 +This rule yields symlinks like the following to be created for namespaces defined by labels: 
 + 
 +<​code>​ 
 +ls -l /​dev/​disk/​by-id/​* 
 + ​lrwxrwxrwx 1 root root 13 Jul  9 15:24 pmem-206dcdfe-69b7-4e86-a01b-f540621ce62e -> ../​../​pmem1.2 
 + ​lrwxrwxrwx 1 root root 13 Jul  9 15:24 pmem-73840bf1-4e74-4ba4-a9c8-8248934c07c8 -> ../​../​pmem1.1 
 + ​lrwxrwxrwx 1 root root 13 Jul  9 15:24 pmem-8137bdfd-3c4d-4b26-b326-21da3d4cd4e5 -> ../​../​pmem1.4 
 + ​lrwxrwxrwx 1 root root 13 Jul  9 15:24 pmem-f43d1b6e-3300-46cb-8afc-06d66a7c16f6 -> ../​../​pmem1.3 
 +</​code>​ 
 + 
 +The persistent name for a pmem namespace is then listed in /etc/fstab like so: 
 +<​code>​ 
 +/​dev/​disk/​by-id/​pmem-206dcdfe-69b7-4e86-a01b-f540621ce62e /mnt/pmem xfs       ​defaults,​dax ​       1 2 
 +</​code>​ 
 + 
 + 
 +==== Partitions ==== 
 +---- 
 +You can divide raw, sector, and fsdax devices (/dev/pmemN and /​dev/​pmemNs) into partitions.
 In parted, the mkpart subcommand has this syntax In parted, the mkpart subcommand has this syntax
   mkpart [part-type fs-type name] start end   mkpart [part-type fs-type name] start end
Line 238: Line 419:
 </​code>​ </​code>​
  
-=== Filesystems ===+==== Filesystems ​====
 ---- ----
-You may place any filesystem (e.g., ext4, xfs, btrfs) on a raw or memory ​device (e.g., /​dev/​pmem0),​ a partition on a raw or memory ​device (e.g. /​dev/​pmem0p1),​ a sector device (e.g., /​dev/​pmem0s),​ or a partition on a sector device (e.g., /​dev/​pmem0sp1).+You may place any filesystem (e.g., ext4, xfs, btrfs) on a raw or fsdax device (e.g., /​dev/​pmem0),​ a partition on a raw or fsdax device (e.g. /​dev/​pmem0p1),​ a sector device (e.g., /​dev/​pmem0s),​ or a partition on a sector device (e.g., /​dev/​pmem0sp1).
  
-ext4 and xfs support DAX, which allow applications to perform direct access to persistent memory with mmap(). You may use DAX on raw devices and memory ​devices, but not on sector devices.+ext4 and xfs support DAX, which allow applications to perform direct access to persistent memory with mmap(). You may use DAX on raw devices and fsdax devices, but not on sector devices.
  
 Example creating ext4, xfs, and btrfs filesystems on three partitions and mounting ext4 and xfs with DAX (note: df -h displays sizes in IEC binary units; df -H uses SI decimal units): Example creating ext4, xfs, and btrfs filesystems on three partitions and mounting ext4 and xfs with DAX (note: df -h displays sizes in IEC binary units; df -H uses SI decimal units):
Line 248: Line 429:
 <​code>​ <​code>​
 $ mkfs.ext4 -F /​dev/​pmem0p1 $ mkfs.ext4 -F /​dev/​pmem0p1
-$ mkfs.xfs -f /​dev/​pmem0p2+$ mkfs.xfs -f -m reflink=0 ​/​dev/​pmem0p2
 $ mkfs.btrfs -f /​dev/​pmem0p3 $ mkfs.btrfs -f /​dev/​pmem0p3
-$ mount -o dax /​dev/​pmem0p1 /​mnt/​ext4-pmem0 +$ mount [dax_mount_options] ​/​dev/​pmem0p1 /​mnt/​ext4-pmem0 
-$ mount -o dax /​dev/​pmem0p2 /​mnt/​xfs-pmem0+$ mount [dax_mount_options] ​/​dev/​pmem0p2 /​mnt/​xfs-pmem0
 $ mount /​dev/​pmem0p3 /​mnt/​btrfs-pmem0 $ mount /​dev/​pmem0p3 /​mnt/​btrfs-pmem0
  
Line 274: Line 455:
 </​code>​ </​code>​
  
-=== iostats ===+Where **[dax_mount_options]** depends on the kernel support you have and the desired behavior. ​ See [[fs_mount_options|fs_mount_options]] for details. 
 + 
 +Check the kernel log to ensure the DAX mount option was honored; mount does not print this information. Example failures: 
 +<​code>​ 
 +$ dmesg | tail 
 +[1811131.922331] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL,​ use at your own risk 
 +[1811131.962630] XFS (pmem0): DAX unsupported by block device. Turning off DAX. 
 +[1811131.999039] XFS (pmem0): Mounting V5 Filesystem 
 +[1811132.025458] XFS (pmem0): Ending clean mount 
 + 
 +[1811261.329868] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL,​ use at your own risk 
 +[1811261.371653] EXT4-fs (pmem0): DAX unsupported by block device. Turning off DAX. 
 +[1811261.410944] EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax 
 +</​code>​ 
 + 
 +Example successes:​ 
 +<​code>​ 
 +$ dmesg | tail 
 +[1811420.919434] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL,​ use at your own risk 
 +[1811420.961539] EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax 
 + 
 +[1811472.505650] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL,​ use at your own risk 
 +[1811472.545702] XFS (pmem0): Mounting V5 Filesystem 
 +[1811472.571268] XFS (pmem0): Ending clean mount 
 +</​code>​ 
 + 
 +==== iostats ​====
 ---- ----
-iostats are disabled by default due to performance overhead (e.g., 12M IOPS dropping 25% to 9M IOPS). However, they can be enabled in sysfs if desired.  ​+iostats are disabled by default due to performance overhead (e.g., 12M IOPS dropping 25% to 9M IOP 
 +S). However, they can be enabled in sysfs if desired.  ​
  
 As of kernel 4.5, iostats are only collected for the base pmem device, not per-partition. Also, I/Os that go through DAX paths (rw_page, rw_bytes, and direct_access functions) are not counted, so nothing is collected for: As of kernel 4.5, iostats are only collected for the base pmem device, not per-partition. Also, I/Os that go through DAX paths (rw_page, rw_bytes, and direct_access functions) are not counted, so nothing is collected for:
Line 299: Line 507:
 </​code> ​ </​code> ​
  
-=== fio ===+==== fio ====
 ---- ----
 Example fio script to perform 4 KiB random reads to four pmem devices: Example fio script to perform 4 KiB random reads to four pmem devices:
start.txt · Last modified: 2020/06/12 15:58 by Ira Weiny