Machine Learning & Big Data Blog

Introduction to the Neo4j Graph Database

5 minute read
Walker Rowe

Neo4j is a graph database. A graph database, instead of having rows and columns has nodes edges and properties. It is more suitable for certain big data and analytics applications than row and column databases or free-form JSON document databases for many use cases.

A graph database is used to represent relationships. The most common example of that is the Facebook Friend relationship as well as the Like relationship. You can see some of that in the graphic below from Neo4j.

The circles are nodes. The lines, called edges, indicate relationships. And the any comments inside the circles are properties of that node.

graphic source: Neo4j

We write about Neo4j here because it has the largest market share. There are other players in this market. And according to Neo4J, Apache Spark 3.0 will add the Neo4j Cypher Query Language to allow and make easier “property graphs based on DataFrames to Spark.” Spark already supports GraphX, which is an extension of the RDD to support Graphs. We will discuss that in another blog post.

In another post we will also discuss graph algorithms. The most famous of those is the Google Page Rank Index. Algorithms are the way to navigate the nodes and edges.

(This article is part of our Data Visualization Guide. Use the right-hand menu to navigate.)

Costs?

Is Neo4J free? That’s rather complicated. The Community Edition is. So is the desktop version, suitable for learning. The Enterprise edition is not. That is consistent with other opensource products. When I asked Neo4J for a license to work with their product for an extended period of time they recommended that I use the desktop version. The Enterprise version has a 30-day trial period.

There are other alternatives in the market. The key would be to pick one that has enough users so that they do not go out of business. Which one should you use? You will have to do research to figure out that.

Install Neo4J

You can use the Desktop or tar version. Here I am using the tar version, on Mac. Just download it and then start up the shell as shown below. You will need a Java JDK, then.

export JAVA_HOME='/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home'

Start the server and set the initial password then open cypher-shell. The default URL is a rather strange looking bolt://localhost:7687.

cd neo4j bin folder
neo4j-admin set-initial-password xxxxxx
./cypher-shell   -a bolt://localhost:7687  -u neo4j  -p xxxxx

Create a Single Node

Thing of a node as a circle or an object, in the picture above. It can optionally have properties or a label or both.

Below is the simplest way to create a node. (Note that you have to end each command cypher command with a semicolon.) This node has nothing attached to it, meaning no properties or labels. It only has a variable, n, which you can use in other operations. What all of this means will become clear in subsequent steps.

create (n);  
0 rows available after 13 ms, consumed after another 0 ms
Added 1 nodes

Cyper SQL is what Neo4j calls their command language. When you run a command it does not return any value unless you add the return keyword. In this example, as before, we create a node with nothing in it. Neo4j creates a variable, which we have called x, to represent that object which we can then return using the return statement, so that we can see it or use it in subsequent operations.

create(x) return x ;
+----+
| x  |
+----+
| () |
+----+

Create Multiple Nodes

You can create more than one node at a time:

create (o),(p);
0 rows available after 13 ms, consumed after another 0 ms
Added 2 nodes

Create a Node with a Label

A label is like the node name, or think of it as a type. Below is node with label Student. Labels have colons (:) in front of them. As before, s is just a variable. It’s scope is the life of the shell. Close the shell and the variable goes away.

create (s:Student);
0 rows available after 16 ms, consumed after another 0 ms
Added 1 nodes, Added 1 labels

You can create nodes with more than one label:

create (s:Student:Biology) return s;

In addition to a label, a node can have properties, given as JSON:

CREATE (x:Employees { name: 'Walker', title: 'Tech Writer' }) 
CREATE (y:Employees { name: 'Stephen', title: 'Manager' }) 
return x,y;

Create a Relationship

The syntax for creating a relation is strange looking. Below, this example relationship is called BOSS. The r in front of it is just a variable. And the arrow shows the direction. You can use ->, <-, or – to indicate direction. Since it is complicated to remember direction you might want to use the – (dash), meaning both directions, in really complex models if direction does not matter. That’s a design decision.

Below is how to create a relationship. We say in this example that Stephen is the BOSS of Walker. The MATCH statement is like the SQL select. It gives the selection criteria which shows which two elements should be connected to each other.

And here we use x and y as temporary variables. As you can see they let you use shorthand notation in subsequent statements, using the variable name instead of the :label.

MATCH (x:Employees),(y:Employees)
WHERE y.name = "Stephen" AND x.name = "Walker"
CREATE (x)-[r:BOSS]->(y)
RETURN type(r);

Now, we carry the logic in the opposite direction. If Stephen is Walker’s boss then Walker works for Stephen. So we can make Walker EMPLOYEEOF Stephen. We can use any descriptive name, like EMPLOYEEOF.

MATCH (x:Employees),(y:Employees)
WHERE y.name = "Stephen" AND x.name = "Walker"
CREATE (y)-[r:EMPLOYEEOF]->(x)
RETURN type(r);

Create Multiple Relationships

Here we give another example Here we have a model of drugs, what they are used for, and who makes them.

Create some data and then create the relationships. Pay attention to the output as if you make a mistake and the MATCH statement finds no data then the relationship will have no data too. In other words it should say Create n nodes or Created n relationships, where n > 0.

CREATE (x:Cancers { name: 'skin'});
CREATE (x:Drugs { name: 'interferon'});
CREATE (x:Pharmacy { name:  'Acme Drugs'}); 
MATCH (a:Cancers),(b:Drugs)
WHERE a.name = 'skin' AND b.name = 'interferon'
CREATE (a)-[rp:PROTOCOL]->(b), (b)-[rt:TREATS]->(a);
MATCH (a:Pharmacy),(b:Drugs)
WHERE a.name = 'Acme Drugs' AND b.name = 'interferon'
CREATE (a)-[r:MAKES]->(b);

Here we make relationships in both directions, making two relationships at once. Notice that the b and a swap positions to show which belongs to which. We could also have pointed the arrow in the opposite direction.

MATCH (a:Cancers),(b:Drugs)
WHERE a.name = 'skin' AND b.name = 'interferon'
CREATE (a)-[rp:PROTOCOL]->(b), (b)-[rt:TREATS]->(a);

Querying

Here we illustrate further how MATCH works. First we list all nodes with label Pharmacy.

MATCH (p:Pharmacy) return p;
+----------------------------------+
| p                                |
+----------------------------------+
| (:Pharmacy {name: "Acme Drugs"}) |
+----------------------------------+

Now we show all relationships of Pharmacies that make Drugs.

MATCH (p:Pharmacy)-[r:MAKES]->(b:Drugs) return p,b;
+------------------------------------------------------------------+
| p                                | b                             |
+------------------------------------------------------------------+
| (:Pharmacy {name: "Acme Drugs"}) | (:Drugs {name: "interferon"}) |
+------------------------------------------------------------------+

Learn ML with our free downloadable guide

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.