Graph Databases are good at recording nodes and edges – and even more at performing queries that traverse the edges. Some challenges can be dealt with in Graph Databases far more elegantly and efficiently than for example in relational databases using traditional SQL.
As simple example, this article will create a Neo4J Graph Database instance, populated with the countries of the world (the nodes) and the (land) borders between the countries (the edges). The country data is retrieved from a GitHub document by a Node application that subsequently creates the nodes and edges in the Neo4J instance using Cypher expressions that are executed through the Bolt protocol server. The Node application leverages the NPM module neo4j-driver.
The code associated with this article is available in GitHub at: https://github.com/AMIS-Services/code-cafe/tree/master/neo4j-graphdatabase .
The steps for learning the shortest route (in terms of the number of land borders to cross) from France to India are:
Prerequisite: Linux host with Docker engine available to run Docker container (images) and an internet connection. That is all.
- Run Neo4J Container Image (as described in this article by my colleague Rosanna Denis)
- Access Neo4J through browser UI
- Run a Node JS Container Image
- Git Clone a GitHub Repo
- Run Node application (that will fetch the data, load it into Neo4J and execute the query to find the shortest path)
1. Run Neo4J Container Image
Neo4J will be run in a Docker container and in this simple case we will not map data to volumes outside the container (that means that we do not persist the data in the Neo4J database)
To start the Neo4J database, simply run:
docker run --publish=7474:7474 --publish=7687:7687 neo4j:3.0
If the image is not yet available locally, it will be pulled (around 200 MB).
In my case, I work on Windows and use Vagrant to spin up a Ubuntu VM through VirtualBox. This VM is assigned IP address 192.168.188.142 (as configured in the Vagrantfile). See https://github.com/AMIS-Services/code-cafe/tree/master/linux-and-docker-host-on-windows-machine for more details on my setup and on how to mimic it in your environment.
2. Access Neo4J through browser UI
Now access Neo4J in a browser on the Docker host (or the Windows host) at port 7474, for example: http://192.168.188.142:7474
Connect with neo4j/neo4j.
You will now be prompted to define a new password. Do so.
Subsequently type the command :server connect and provide the credentials (neo4j and the new password) to establish the connection to Neo4J:
You can now execute a first Cypher query: match (n) return n (equivalent to select * from <all tables in the schema> in a relational database). It will return no results – as there is no data loaded into the database yet.
3. Run a Node JS Container Image
To run a clean Node environment, execute the following command:
docker run -it --rm -p 8080:8080 node:10 bash
This runs a container with the Node 10 run time environment, with port 8080 in the container exposed at port 8080 on the Docker host. This allows us to run a Node application that can handle HTTP requests at port 8080 and access it through port 8080 on the Docker host. The container is run in interactive mode. The command prompt is shown for a shell session inside the container with the Node JS runtime environment.
4. Git Clone a GitHub Repo
To get the sources for the Node JS application, clone a repository from GitHub:
git clone https://github.com/AMIS-Services/code-cafe
Then change the directory:
cd code-cafe/neo4j-graphdatabase/
5. Run Node application
Before we can run the Node application, we first need to install all dependencies (as defined in package.json). Execute:
npm install
Check the contents of file `neo4j-node.js` to learn how the interaction with Neo4J takes place from JavaScript. Make sure to edit the file with the relevant values for user, password and uri for your environment.
Now the application – file neo4j-node.js – can be executed. You can run `neo4j-node.js` using `npm start`.
The Node program will retrieve a JSON document with countries from GitHub (https://raw.githubusercontent.com/mledoze/countries/master/countries.json ). It creates nodes for regions, subregions and languages and of course for all countries. It creates relationships from countries to the regions and subregions they are are part of, the languages that are spoken and all other countries they share a border with.
The Node code that retrieves the JSON document with country details and creates convenient Sets with unique values of regions, subregions and languages with some quite elegant code:
The program will show output about what it is doing – including the literal Cypher statements it is executing. Note: creating all nodes and edges in Neo4J can take a few minutes.
At the end of the program execution it will show output like this:
Here the question is answered what is the shortest path from France to India – in terms of smallest number of land borders that needs to be crossed. This question is put to Neo4J in the form of a Cypher Query:
Match path = shortestpath( (f:Country{name:'France'}) –[:BORDERS_WITH *1..15]-(p:Country{name:'India'})) return path
If this query is executed in the browser user interface – we get a graphical representation of the query result:
This kind of query is not super easy (nor super quick) in most relational database in regular SQL. However, regard this article as nothing more than a quick introduction to how Node programs can interact with Neo4J and how they can use Cypher for querying data with a focus on relations.
Obviously, similar questions can be asked by substituting different countries for France and India.
Other data included in the Neo4J database: languages (nodes) and the languages spoken in each country (edges). A simple query to retrieve all countries that speak French:
match (f:Language{name:'French'})<- []-(l) return f,l
And a slightly less trivial query to find all countries that do not speak French – and return the languages they do speak:
MATCH (french:Language{name:'French'}),(c:Country)-[spk:SPEAKS]-> (l) WHERE NOT (c)-->(french) RETURN c, l;
And as a final example:
All France’s bordering countries who do not speak French – and the language they do speak:
match (french:Language{name:'French'}), (f:Country{name:'France'})- [:BORDERS_WITH]->(bc)-[:SPEAKS]-> (language) WHERE NOT (bc)-->(french) return f,bc, language