Getting started
The design philosophy of the Python client is to mimic the GDS Cypher API in Python code. The Python client will translate the Python code written by the user to a corresponding Cypher query which it will then run on the Neo4j server using a Neo4j Python driver connection.
The Python client attempts to be as pythonic as possible to maximize convenience for users accustomed to and experienced with Python environments.
As such standard Python and pandas types are used as much as possible.
However, to be consistent with the Cypher surface the general return value of calling a method corresponding to a Cypher procedure will be in the form of a table (a pandas DataFrame
in Python).
Read more about this in Mapping between Cypher and Python.
The root component of the Python client is the GraphDataScience
object.
Once instantiated it forms the entrypoint to interacting with the GDS library.
That includes projecting graphs, running algorithms, and defining and using machine learning pipelines in GDS.
As a convention we recommend always calling the instantiated GraphDataScience
object gds
as using it will then most resemble using the Cypher API directly.
1. Import and setup
The simplest way to instantiate the GraphDataScience
object is from a Neo4j server URI and corresponding credentials:
from graphdatascience import GraphDataScience
# Use Neo4j URI and credentials according to your setup
# NEO4J_URI could look similar to "bolt://my-server.neo4j.io:7687"
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
# Check the installed GDS version on the server
print(gds.version())
assert gds.version()
"2.1.9"
1.1. AuraDS
If you are connecting the client to an AuraDS instance, you can get recommended non-default configuration settings of the Python Driver applied automatically.
To achieve this, set the constructor argument aura_ds=True
:
from graphdatascience import GraphDataScience
# Configures the driver with AuraDS-recommended settings
gds = GraphDataScience(
"neo4j+s://my-aura-ds.databases.neo4j.io:7687",
auth=("neo4j", "my-password"),
aura_ds=True
)
1.2. Instantiating from a Neo4j driver
For some use cases, direct access and control of the Neo4j driver is required.
For example if one needs to configure the Neo4j driver that is used in a certain way.
In this case, one can use the method GraphDataScience.from_neo4j_driver
for instantiating a GraphDataScience
object.
It takes the same arguments as the regular GraphDataScience
constructor, except for the aura_ds
keyword parameter which is only relevant when the Neo4j driver under the hood used is instantiated internally.
1.3. Checking license status
To check if the GDS server library we’re running against is has an enterprise license we can make the following call:
using_enterprise = gds.is_licensed()
1.4. Specifying targeted database
If we don’t want to use the default database of our DBMS we can provide the GraphDataScience
constructor with the keyword parameter database
:
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database="my-db")
Or we could change the database we are targeting later:
gds.set_database("my-db")
1.5. Configure Apache Arrow parameters
If Apache Arrow is available on the server, we can provide the GraphDataScience
constructor with several keyword parameters to configure the connection:
-
arrow_disable_server_verification
: A flag that indicates that, if the flight client is connecting with TLS, that it skips server verification. If this is enabled, all other TLS settings are overridden. -
arrow_tls_root_certs
: PEM-encoded certificates that are used for the connecting to the Apache Arrow Flight server.
gds = GraphDataScience(
NEO4J_URI,
auth=(NEO4J_USER, NEO4J_PASSWORD),
arrow=True,
arrow_disable_server_verification=False,
arrow_tls_root_certs=CERT
)
2. Minimal example
In the following example we illustrate the Python client to run a Cypher query, project a graph into GDS, run an algorithm and inspect the result via the client-side graph object.
We suppose that we have already created a GraphDataScience
object stored in the variable gds
.
# Create a minimal example graph
gds.run_cypher(
"""
CREATE
(m: City {name: "Malmö"}),
(l: City {name: "London"}),
(s: City {name: "San Mateo"}),
(m)-[:FLY_TO]->(l),
(l)-[:FLY_TO]->(m),
(l)-[:FLY_TO]->(s),
(s)-[:FLY_TO]->(l)
"""
)
# Project the graph into the GDS Graph Catalog
# We call the object representing the projected graph `G_office`
G_office, project_result = gds.graph.project("neo4j-offices", "City", "FLY_TO")
# Run the mutate mode of the PageRank algorithm
mutate_result = gds.pageRank.mutate(G_office, tolerance=0.5, mutateProperty="rank")
# We can inspect the node properties of our projected graph directly
# via the graph object and see that indeed the new property exists
assert G_office.node_properties("City") == ["rank"]
You can also use one of the datasets that comes with the library to get started. See the Datasets chapter for more on this. |
The client library is designed so that most methods are inferred under the hood as you type them via a string building scheme and overloading the magic |
3. Running Cypher
As we saw in the example above, the GraphDataScience
object has a method run_cypher
for conveniently running Cypher queries.
This method takes as parameters a query string query: str
, an optional Cypher parameters dictionary params: Optional[Dict[str, Any]]
as well as an optional string database: Optional[str]
to override which database to target.
It returns the result of the query in the format of a pandas DataFrame
.
4. Close open connections
Similarly to how the Neo4j Python driver supports closing all open connections to the DBMS, you can call close
on the GraphDataScience
object to the same effect:
# Close any open connections in the underlying Neo4j driver's connection pool
gds.close()
close
is also called automatically when the GraphDataScience
object is deleted.
5. Mapping between Cypher and Python
There are some general principles for how the Cypher API maps to the Python client API:
-
Method calls corresponding to Cypher procedures (preceded by
CALL
in the docs) return:-
A table as a pandas
DataFrame
, if the procedure returns several rows (eg. stream mode algorithm calls). -
A row as a pandas
Series
, if the procedure returns exactly one row (eg. stats mode algorithm calls).
Some notable exceptions to this are:
-
Procedures instantiating graph objects and model objects have two return values: a graph or model object, and a row of metadata (typically a pandas
Series
) from the underlying procedure call. -
Any methods on pipeline, graph or model objects (native to the Python client) mapping to Cypher procedures.
-
gds.version()
which returns a string.
-
-
Method calls corresponding to Cypher functions (preceded by
RETURN
in the docs) will simply return the value the function returns. -
The Python client also contains specific functionality for inspecting graphs from the GDS Graph Catalog, using a client-side graph object. Similarly, models from the GDS Model Catalog can be inspected using a client-side model object.
-
Cypher functions and procedures of GDS that take references to graphs and/or models as strings for input typically instead take graph objects and/or model objects as input in the Python client API.
-
To configure and use machine learning pipelines in GDS, specific pipeline objects are used in the Python client.