What's in the future for graph databases?
Graph databases have matured into mainstream information technology and delivered value to organizations in a wide range of applications. They are no longer esoteric software that is just emerging from the lab of a startup no one has ever heard of.
Many CIOs are wondering where the development of graph databases is headed. In general, the direction is to add functionality to the graph database and thereby reduce what custom application software must address. Shifting functionality from application software to operating systems and databases is an information technology trend that’s been underway for decades.
At the recent NODES Developer Expo and Summit sponsored by Neo4j, a leading vendor of graph database software, I spoke with Dr. Jim Webber, the Chief Data Scientist at Neo4j about future hardware developments that Neo4j and its peers expect to enhance the performance and therefore the value of their products. Jim said, “While Neo4j is proud of its continuing record of software development that enhances the capabilities of our graph database, we recognize that hardware advances also improve the performance of our products and our customer’s graph applications.”
Here are some ideas about the future hardware direction of graph databases that we discussed.
Non-volatile memory express
Graph databases are increasingly using the Non-Volatile Memory Express (NVMe) standard for Solid State Drives (SSDs) to increase application performance and significantly reduce recovery elapsed time after a power failure or a system crash.
Click to enlarge.
NVMe is an open standard that allows SSDs to operate at the read/write speeds their flash memory is capable of rather than be constrained by the disk controller speed.
This table shows a quick summary of the relative performance of various memory and storage components:
In summary, implementing NVMe improves DBMS throughput by about 1.3 times, reduces the recovery elapsed time from a crash by over 100 times, and shrinks the storage footprint of the DBMS by 1.5 times. To explore the technical details associated with the advantages of NVMe for database logging and recovery, please read this article Write-Behind Logging.
We tend to view SSDs as many little black boxes that offer fast storage performance. It easy to forget that SSDs include a processor. Recently some manufacturers have begun to provide an Application Programming Interface (API) to that processor. This API can lead to thinking about SSDs as software-defined storage that can accelerate input-output performance.
Implementing user-programmable SSDs with suitable software can improve the performance of the DBMS by about 1.5 times. To explore the technical details associated with user-programmable SSDs, please read this article Programmable Solid-State Storage in Future Cloud Datacenters.
Graph databases and Field Programmable Gate Arrays (FPGA) can dramatically increase application performance by multiple orders of magnitude to respond to the high-performance expectations and ever more demanding systems.
Adding FPGAs to a multi-CPU server cluster or single server can dramatically raise capacity, throughput, and performance for graph database applications.
FPGAs improve performance by offering a massively parallel processing architecture, pipelining parallel computations, and much higher memory-to-CPU bandwidth with low latency.
To explore the technical details associated with user-programmable SSDs, please read this article Capitalizing on Synergies Across 5G, Edge and Cloud Platforms.
As Ethernet speeds increased to 100 Gbit/second and are likely to double or more in the foreseeable future, packet processing on Network Interface Cards (NICs) has emerged as a bottleneck in overall application performance.
To avoid taxing server CPUs with packet processing, the emerging solution to this bottleneck problem is to add more processing capacity and programmable logic on NICs. To explore the technical details associated with programmable NICs, please read this article Programmable NICs.
Data storage architecture
Some graph database vendors are enhancing their data storage architecture to preserve data locality. Data locality means that data values for a given record or a given column are stored closely together. The value of this arrangement is that, in most cases, a given read request can be satisfied with a single database page.
What graph database capabilities are you looking for to advance your organization’s data management goals? Let us know in the comments below.