Non-Volatile Memory (NVM or NVRAM) provides random access memory that retains its data when powered off. At first glance, this would enable applications to become immediately persistent by swapping DRAM for NVRAM on the host server. Unfortunately, most applications cannot reap the full benefits of NVRAM without modification.

NVRAM as a DRAM Replacement

The most obvious way to use NVRAM would be to use it as a direct replacement for DRAM. This enables every byte written to memory to become immediately persistent without any application or OS modifications. Estimates of NVRAM performance put it anywhere between two and ten times slower than DRAM, which means that applications that are memory intensive take a performance hit when moving to NVRAM.

At first glance, this performance hit may be worth the cost to make in-memory systems like memcached, VoltDB, Redis, or Spark have additional persistence immediately; however, there is a more subtle problem with the move to NVRAM on current Intel and AMD architectures: the cache coherency framework is designed to provide guarantees about visibility to executing threads. Behind the scenes, the cache may flush data further down the memory hierarchy without maintaining causality at the main memory. Put another way, current coherency modules have a deeply ingrained assumption that the contents of DRAM and the cache do not have to be consistent because the cache coherency hardware will provide the illusion of consistency.

The move to NVRAM does not bode well for this assumption, as the memory can represent an inconsistent application state. For example, it is possible for a pointer to a structure to point to 64B that was only ever written to the cache. Upon rebooting the application, this becomes a dangling pointer, and the application's state is inconsistent.

A simple way to fix this problem is to modify applications to issue a cache flush after writing the structure but before writing the pointer. This technique would restore integrity of the application at a massive performance penalty. A full cache flush would affect more than just the requisite structure---data written by other threads would be flushed as well, as would temporary structures that do not necessarily need persistence.

To try fixing this performance problem, one could imagine having the equivalent of the mfence , lfence , and sfence visibility instructions that enforce semantics of cache-to-main memory writeback. This would allow applications to finely control the granularity at which data gets written back to main memory, without forcibly flushing the cache at each step. Further, it would allow applications to maximize use of the cache and know that a consistent prefix of the execution is resident in main memory. A failure of the process or server will lose only the most recent computation.

The complexity of current-generation Intel and AMD chips means that it is likely that such cache fences are a long way off-if they ever materialize at all.

NVRAM as a Block Device

Given the difficulty of storing a consistent view of a process in NVRAM, it is worth stepping back and looking at other ways one could interface with NVRAM. If one were to treat NVRAM as a block device, instead, applications that are used to manually handling the barriers necessary for correct block device storage will be able to transparently move to NVRAM-based storage. Such applications would likely perform faster and be correct immediately without any changes to the application.

If one were willing to modify the applications to use NVRAM as a block device, it becomes possible to exploit the random-access nature of NVRAM to reduce the block sizes and batching applications often hard-code. No longer must data be written in 4KB chunks to align to pages, but could instead work on 64B chunks, that more closely align to the cache lines in a modern processor. This would allow algorithms to be more finely tuned to the NVRAM storage.

A New CPU

While Intel and AMD are carrying a lot of legacy cache coherency hardware, more modern architectures like ARM and RISC-V are under active development and have the freedom to move more quickly. As NVRAM moves to the market there is a serious hole to fill with a processor that enables applications to efficiently run NVRAM for main memory. I imagine these processors could implicitly mirror the guarantees about visibility made in the cache to the main memory as well. All the work done by applications and compiler writers to maintain correctness on the new architecture would immediately enable applications to run persistently on NVRAM. Of course, it is also possible to go a different route, and allow the application programmer to explicitly manage writeback where it matters and ignore it when it doesn't.

Further Thoughts

In this post, I have jotted down some of the ideas regarding NVRAM that have been floating in my head since I first discussed it with researchers at VMWare nearly four years ago. Ultimately, there are many competing ideas for how to best use NVRAM, and I don't believe this post covers the full breadth of the revolution NVRAM may enable. I am hoping that we will perhaps see research into each of these points as NVRAM becomes widely available and commonplace in commodity servers, and that some of my predictions will come true. I would also love to be proven wrong when something better happens.