November 6th, 2017

Creating a Simple Geographical Map with Neo4j and Cypher


Graph Databases

Cypher Query Language


Lately I've read about graph databases and their place in the NoSQL data storage universe. The graph database I've worked with is Neo4j, which is fun and easy to get started with. I found the user interface very enjoyable for viewing graphs and executing queries. I highly recommend it if you need a graph database solution.

Graph databases largest draw is related data storage and the speed at which you can query related data points (or in graph terms, nodes/vertices). Relationships are first class citizens, which allows related data queries to be executed by traversing relationships themselves. This is contrasted with a typical relational database where you have to find relationships through foreign keys or combine two tables with a very slow SQL JOIN operation1. The same slow query in a RDBMS (Relational DataBase Management System) is extremely quick in a graph database.

One of the first graphs I made in Neo4j represented the county I grew up in - Fairfield County CT. My first task was to create a vertex representing a state - in this case Connecticut. In Cypher (the query language used by Neo4j) that was easy!

CREATE (ct:State {name: 'Connecticut'}) RETURN ct

The CREATE statement builds a vertex and passes it a label :State and a property name . The label is used for grouping - in this case all states will have the label :State. Properties can give vertices names along with supplying additional key->value information.

Building multiple vertices at once can be done from a single CREATE statement. I utilize this to populate the counties towns and cities:

CREATE (:City {name: 'Bridgeport'}), (:City {name: 'Danbury'}), ... CREATE (:Town {name: 'Bethel'}), (:Town {name: 'Brookfield'}), ...

Before I made any relationships, I wanted to simplify things and group together cities and towns under one common label. After all towns and cities are both considered settlements.

MATCH (s) WHERE s:City OR s:Town SET s:Settlement

I introduced some new keywords here, most importantly MATCH, which queries the database based on some ASCII Art I pass it. The (s) token represents a node in the database that I assign to variable s. This query can be read as "for each vertex in the database that is a city or town, give the vertex a new label called Settlement". In Neo4j vertices can have multiple labels, so the SET operation will not override old labels.

Now its time for the fun part: relationships. Lets create a relationship between all the settlements and the state of Connecticut:

MATCH (ct:State), (s:Settlement) MERGE (ct)<-[:IN]-(s)

As you likely guessed, the ASCII art for relationships is <-[:IN]- where the arrow shows the direction of the relationship. Relationships are given a label, in this case :IN . Relationships can also have properties just like a vertex. This is what I meant by 'relationships are first class entities' - they are treated and queried just like a vertex! First class relationships are extremely powerful.

You may have noticed the query matches multiple vertices. I looked for all vertices where the label is State or Settlement. Then I created the relationship "all settlements are in all the states". Since the only state in the database is Connecticut, all settlements are given an :IN relationship to Connecticut.

For the final step I created relationships between all the neighboring towns. The query is long so I'll just show a snippet (the full code for all queries is on GitHub):

MATCH (greenwich:Settlement {name: 'Greenwich'}), (stamford:Settlement {name: 'Stamford'}), (newcannan:Settlement {name: 'New Cannan'}), (darien:Settlement {name: 'Darien'}), ... CREATE (greenwich)-[:NEIGHBORS_OF]->(stamford), (stamford)-[:NEIGHBORS_OF]->(newcannan), (stamford)-[:NEIGHBORS_OF]->(darien), ...

This code creates neighbor relationships between towns that share borders. One thing that I questioned when writing this code is 'why cant there be bi-directional relationships?' It turns out Neo4j does not support bi-directional relationships at this time. This is because traversing a relationship takes the same amount of time (O(1)) regardless of the direction it is pointing2. In a case like this where the relationships should be bi-directional, you can just ignore the arrow in MATCH queries. Below you can see the output of the settlements in the Neo4j user interface:

I will look further at Neo4j and build off this graph in future discoveries. I hope this post helps show how simple it is to build a graph database!

[1] Ian Robinson, Jim Webber & Emil Eifrem, Graph Databases (Beijing: O'Reilly, 2015), 6

[2] Ibid., 152