Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ nav:
- modules/ROOT/content-nav.adoc
asciidoc:
attributes:
docs-version: '1.20'
docs-version: '1.20'
22 changes: 15 additions & 7 deletions doc/modules/ROOT/content-nav.adoc
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
* xref:index.adoc[Introduction]
* xref:index.adoc[]

* *Getting started*
* xref:installation.adoc[]
* xref:getting-started.adoc[]
* xref:graph-analytics-serverless.adoc[]
* xref:tutorials/index.adoc[]
// https://docs.antora.org/antora/latest/navigation/include-lists/
+
--
include::partial$tutorial-list.adoc[]
--

* *Graph algorithms*
* xref:graph-object.adoc[]
* xref:algorithms.adoc[]

* *Machine learning*
* xref:pipelines.adoc[]
* xref:model-object.adoc[]
* xref:common-datasets.adoc[]
* xref:rel-embedding-models.adoc[]
* xref:bookmarks.adoc[]

* *Reference*
* xref:visualization.adoc[]
* xref:tutorials/tutorials.adoc[]
// https://docs.antora.org/antora/latest/navigation/include-lists/
+
--
include::partial$tutorial-list.adoc[]
--

* xref:known-limitations.adoc[]
* xref:v2_endpoints.adoc[]
Expand Down
21 changes: 21 additions & 0 deletions doc/modules/ROOT/pages/cypher-mapping.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
= Mapping to Cypher API

There are some general principles for how the Cypher API maps to the Python client API:

* Method calls corresponding to Cypher procedures (preceded by `CALL` in the docs) return:
+
--
* A table as a pandas `DataFrame`, if the procedure returns several rows (eg. stream mode algorithm calls).
* A row as a pandas `Series`, if the procedure returns exactly one row (eg. stats mode algorithm calls).
--
+
Some notable exceptions to this are:

** Procedures instantiating xref:graph-object.adoc[graph objects] and xref:model-object.adoc[model objects] have two return values: a graph or model object, and a row of metadata (typically a pandas `Series`) from the underlying procedure call.
** Any methods on xref:pipelines.adoc[pipeline], xref:graph-object.adoc[graph] or xref:model-object.adoc[model] objects (native to the Python client) mapping to Cypher procedures.
** `gds.version()` which returns a string.
* Method calls corresponding to Cypher functions (preceded by `RETURN` in the docs) will simply return the value the function returns.
* The Python client also contains specific functionality for inspecting graphs from the https://neo4j.com/docs/graph-data-science/current/management-ops/graph-catalog-ops/[GDS Graph Catalog], using a client-side xref:graph-object.adoc[graph object].
Similarly, models from the https://neo4j.com/docs/graph-data-science/current/model-catalog/[GDS Model Catalog] can be inspected using a client-side xref:model-object.adoc[model object].
* Cypher functions and procedures of GDS that take references to graphs and/or models as strings for input typically instead take xref:graph-object.adoc[graph objects] and/or xref:model-object.adoc[model objects] as input in the Python client API.
* To configure and use https://neo4j.com/docs/graph-data-science/current/machine-learning/machine-learning/[machine learning pipelines] in GDS, specific xref:pipelines.adoc[pipeline objects] are used in the Python client.
185 changes: 64 additions & 121 deletions doc/modules/ROOT/pages/getting-started.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Getting started
= Quickstart

The design philosophy of the Python client is to mimic the GDS Cypher API in Python code.
The Python client will translate the Python code written by the user to a corresponding Cypher query which it will then run on the Neo4j server using a Neo4j Python driver connection.
Expand All @@ -16,134 +16,108 @@ As a convention we recommend always calling the instantiated `GraphDataScience`

== Import and setup

The simplest way to instantiate the `GraphDataScience` object is from a Neo4j server URI and corresponding credentials:
Use the Neo4j URI and credentials according to your setup.

[source,python]
[.tabbed-example]
====
[.include-with-Neo4j-server]
=====
[source, python, role=no-test]
----
from graphdatascience import GraphDataScience

# Use Neo4j URI and credentials according to your setup
# NEO4J_URI could look similar to "bolt://my-server.neo4j.io:7687"
# For example, in a local setup `NEO4J_URI` would be "neo4j://127.0.0.1:7687".
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))

# Check the installed GDS version on the server
print(gds.server_version())
assert gds.server_version()
----

[source,python,role=no-test]
.Results:
[source]
The `GraphDataScience` object needs the Neo4j database to be available upon construction, and uses the default `neo4j` database by default.
If the `neo4j` database does not exist or you want to use a different database, use the `database` keyword parameter:

[source, python, role=no-test]
----
"2.1.9"
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database="my-db")
----

Please note that the `GraphDataScience` object needs to communicate with a Neo4j database upon construction, and uses the default "neo4j" database by default.
If there is no such database, you will need to <<specifying-targeted-database, provide a valid database using the `database` keyword parameter>>.

You can also change the database after creating the `GraphDataScience` object:

=== Aura Graph Analytics
[source, python, role=no-test]
----
gds.set_database("my-db")
----
=====

The GDS Python Client has dedicated support for the xref:graph-analytics-serverless.adoc[Aura Graph Analytics] offering.
[.include-with-Aura-Graph-Analytics]
=====
The Python client has dedicated support for link:{aura-docs-base-uri}/graph-analytics[Aura Graph Analytics].

This example shows how to instantiate the `GraphDataScience` object using an Aura API key pair and AuraDB connection information.

[source,python,role=no-test]
[source, python, role=no-test]
----
from graphdatascience.session import DbmsConnectionInfo, GdsSessions, AuraAPICredentials, SessionMemory

sessions = GdsSessions(api_credentials=AuraAPICredentials("<clientId>", "<clientSecret>"))
sessions = GdsSessions(api_credentials=AuraAPICredentials(AURA_API_CLIENT_ID, AURA_API_CLIENT_SECRET))

# `NEO4J_URI` has the format "neo4j+s://xxxxxxxx.databases.neo4j.io".
# The credentials are for the AuraDB instance.
gds = sessions.get_or_create(
session_name="my-session",
memory=SessionMemory.m_4GB,
db_connection=DbmsConnectionInfo("neo4j+s://mydbid.databases.neo4j.io", "neo4j", "<password>"),
db_connection=DbmsConnectionInfo(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD),
)
----
=====

[.include-with-AuraDS]
=====
If you are connecting the client to an link:https://neo4j.com/cloud/graph-data-science/[AuraDS instance], you can get recommended non-default configuration settings of the Python driver applied automatically with `aura_ds=True`:

=== AuraDS

If you are connecting the client to an https://neo4j.com/cloud/graph-data-science/[AuraDS instance], you can get recommended non-default configuration settings of the Python Driver applied automatically.
To achieve this, set the constructor argument `aura_ds=True`:

[source,python,role=no-test]
[source, python, role=no-test]
----
from graphdatascience import GraphDataScience

# Configures the driver with AuraDS-recommended settings
# Configures the driver with AuraDS-recommended settings.
# `NEO4J_URI` has the format "neo4j+s://xxxxxxxx.databases.neo4j.io:7687".
gds = GraphDataScience(
"neo4j+s://my-aura-ds.databases.neo4j.io:7687",
auth=("neo4j", "my-password"),
NEO4J_URI,
auth=(NEO4J_USER, NEO4J_PASSWORD),
aura_ds=True
)
----
=====
====

=== Additional checks

=== Instantiating from a Neo4j driver

For some use cases, direct access and control of the Neo4j driver is required.
For example if one needs to configure the Neo4j driver that is used in a certain way.
In this case, one can use the method `GraphDataScience.from_neo4j_driver` for instantiating a `GraphDataScience` object.
It takes the same arguments as the regular `GraphDataScience` constructor, except for the `aura_ds` keyword parameter which is only relevant when the Neo4j driver under the hood used is instantiated internally.


=== Checking license status

To check if the GDS server library we're running against is has an enterprise license we can make the following call:

[source,python]
----
using_enterprise = gds.is_licensed()
----


[[specifying-targeted-database]]
=== Specifying targeted database

If we don't want to use the default database of our DBMS we can provide the `GraphDataScience` constructor with the keyword parameter `database`:
Check the version of GDS library running on the server:

[source,python,role=no-test]
[source, python]
----
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database="my-db")
print(gds.server_version())
----

Or we could change the database we are targeting later:
Check if the GDS library running on the server has an enterprise license:

[source,python,role=no-test]
[source, python]
----
gds.set_database("my-db")
print(gds.is_licensed())
----

=== Configure Apache Arrow parameters

If Apache Arrow is available on the https://neo4j.com/docs/graph-data-science/current/installation/configure-apache-arrow-server/[server], we can provide the `GraphDataScience` constructor with several keyword parameters to configure the connection:

* `arrow_disable_server_verification`: A flag that indicates that, if the flight client is connecting with
TLS, that it skips server verification. If this is enabled, all other TLS settings are overridden.
* `arrow_tls_root_certs`: PEM-encoded certificates that are used for the connecting to the Apache Arrow Flight server.

[source,python,role=no-test]
----
gds = GraphDataScience(
NEO4J_URI,
auth=(NEO4J_USER, NEO4J_PASSWORD),
arrow=True,
arrow_disable_server_verification=False,
arrow_tls_root_certs=CERT
)
----



[[getting-started-minimal-example]]
== Minimal example
== Usage example

In the following example we illustrate the Python client to run a Cypher query, project a graph into GDS, run an algorithm and inspect the result via the client-side graph object.
We suppose that we have already created a `GraphDataScience` object stored in the variable `gds`.
The following example shows how to use the `GraphDataScience` object to:

[source,python]
. Run a Cypher query to populate the Neo4j database.
. Create a graph projection.
. Run an algorithm on the graph.
. Inspect the updated graph.

[source, python]
----
# Create a minimal example graph
# Create a minimal example graph.
# The method returns a Pandas `DataFrame`.
gds.run_cypher(
"""
CREATE
Expand All @@ -157,15 +131,15 @@ gds.run_cypher(
"""
)

# Project the graph into the GDS Graph Catalog
# We call the object representing the projected graph `G_office`
# Create an in-memory graph called `neo4j-offices` and
# a `G_office` object representing the projected graph.
G_office, project_result = gds.graph.project("neo4j-offices", "City", "FLY_TO")

# Run the mutate mode of the PageRank algorithm
# Run the `mutate` mode of the PageRank algorithm.
mutate_result = gds.pageRank.mutate(G_office, tolerance=0.5, mutateProperty="rank")

# We can inspect the node properties of our projected graph directly
# via the graph object and see that indeed the new property exists
# Inspect the node properties of the projected graph
# via the graph object to confirm that a new property has been created.
assert G_office.node_properties("City") == ["rank"]
----

Expand All @@ -176,19 +150,12 @@ See the xref:common-datasets.adoc[] chapter for more on this.
[NOTE]
====
The client library is designed so that most methods are inferred under the hood as you type them via a string building scheme and overloading the magic `\\__getattr__` method.
Therefore most methods, such as `pageRank`, will not appear when calling `dir(gds)`.
Therefore, most methods such as `pageRank` will not appear when calling `dir(gds)`.
Similarly, IDEs and language servers will not be able to detect these automatically inferred methods, meaning that the auto-completion support they provide will be limited.
Rest assured however that despite the lack of this type of discoverability the inferred methods, such as `gds.pageRank.stream`, will still be called correctly.
Despite the lack of this type of discoverability, the inferred methods such as `gds.pageRank.stream` will still be called correctly.
====


== Running Cypher

As we saw in the <<getting-started-minimal-example, example above>>, the `GraphDataScience` object has a method `run_cypher` for conveniently running Cypher queries.
This method takes as parameters a query string `query: str`, an optional Cypher parameters dictionary `params: Optional[Dict[str, Any]]` as well as an optional string `database: Optional[str]` to override which database to target.
It returns the result of the query in the format of a pandas `DataFrame`.


== Close open connections

Similarly to how the Neo4j Python driver supports closing all open connections to the DBMS, you can call `close` on the `GraphDataScience` object to the same effect:
Expand All @@ -199,28 +166,4 @@ Similarly to how the Neo4j Python driver supports closing all open connections t
gds.close()
----

`close` is also called automatically when the `GraphDataScience` object is deleted.


[[getting-started-mapping]]
== Mapping between Cypher and Python

There are some general principles for how the Cypher API maps to the Python client API:

* Method calls corresponding to Cypher procedures (preceded by `CALL` in the docs) return:
+
--
* A table as a pandas `DataFrame`, if the procedure returns several rows (eg. stream mode algorithm calls).
* A row as a pandas `Series`, if the procedure returns exactly one row (eg. stats mode algorithm calls).
--
+
Some notable exceptions to this are:

** Procedures instantiating xref:graph-object.adoc[graph objects] and xref:model-object.adoc[model objects] have two return values: a graph or model object, and a row of metadata (typically a pandas `Series`) from the underlying procedure call.
** Any methods on xref:pipelines.adoc[pipeline], xref:graph-object.adoc[graph] or xref:model-object.adoc[model] objects (native to the Python client) mapping to Cypher procedures.
** `gds.version()` which returns a string.
* Method calls corresponding to Cypher functions (preceded by `RETURN` in the docs) will simply return the value the function returns.
* The Python client also contains specific functionality for inspecting graphs from the https://neo4j.com/docs/graph-data-science/current/management-ops/graph-catalog-ops/[GDS Graph Catalog], using a client-side xref:graph-object.adoc[graph object].
Similarly, models from the https://neo4j.com/docs/graph-data-science/current/model-catalog/[GDS Model Catalog] can be inspected using a client-side xref:model-object.adoc[model object].
* Cypher functions and procedures of GDS that take references to graphs and/or models as strings for input typically instead take xref:graph-object.adoc[graph objects] and/or xref:model-object.adoc[model objects] as input in the Python client API.
* To configure and use https://neo4j.com/docs/graph-data-science/current/machine-learning/machine-learning/[machine learning pipelines] in GDS, specific xref:pipelines.adoc[pipeline objects] are used in the Python client.
The `close` method is also called automatically when the `GraphDataScience` object is deleted.
28 changes: 12 additions & 16 deletions doc/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
= Neo4j Graph Data Science Python Client
= Overview
:description: This manual documents how to use the dedicated Python Client v{docs-version} for the Neo4j Graph Data Science library.

:toc: left
:experimental:
:sectid:
:sectlinks:
:toclevels: 2
:env-docs: true
The Neo4j Graph Data Science Python client is the official Neo4j Python library to run algorithms and machine learning pipelines on graphs.
The library wraps the https://neo4j.com/docs/python-manual/current/[Neo4j Python driver] to offer a Python interface to the link:{gds-docs-base-uri}[Neo4j Graph Data Science] (GDS) Cypher API.

You can use the client-specific graph, model, and pipeline objects form a Python program to:

To help users of https://neo4j.com/docs/graph-data-science/current/[Neo4j Graph Data Science] who work with Python as their primary language and environment, we offer the official Graph Data Science (GDS) Python Client package called `graphdatascience`.
It enables users to write pure Python code to project graphs, run algorithms, use machine learning pipelines, and train machine learning models with GDS.
To avoid naming confusion with the server-side GDS library, we will here refer to the Neo4j Graph Data Science client as the _Python client_.
* Create and manage graph projections.
* Run graph algorithms.
* Train and use machine learning pipelines.

The Python client API is designed to mimic the GDS Cypher procedure API in Python code.
It wraps and abstracts the necessary operations of the https://neo4j.com/docs/python-manual/current/[Neo4j Python driver] to offer a simpler surface.
For a high level explanation of how the Cypher API maps to the Python client API please see xref:getting-started.adoc#getting-started-mapping[Mapping between Cypher and Python].
For more details on how the Python client objects and functions map to the Cypher API, see the xref:cypher-mapping.adoc[] page.

Additionally, the client-specific graph, model, and pipeline objects offer convenient functions that heavily reduce the need to use Cypher to access and operate these GDS resources.
== Development

The source code of the GDS Python client is available at https://github.com/neo4j/graph-data-science-client[GitHub].
If you have a suggestion on how we can improve the library or want to report a problem, you can create a https://github.com/neo4j/graph-data-science-client/issues/new[new issue].
The source code of the Python client is available on https://github.com/neo4j/graph-data-science-client[GitHub].

If you have any suggestions or want to report a problem, you can create a https://github.com/neo4j/graph-data-science-client/issues/new[new issue].

// Make this depending on the backend if PDF needs to be generated
(C) {copyright}
Expand Down
Loading