OntoPop
Search…
⌃K

JanusGraph

Install and configure a one-node self-managed JanusGraph instance for development and testing purposes.
Last Updated: 08 February 2022 • Page Author: Jillur Quddus

Overview

JanusGraph is an open-source distributed graph database written in Java that is optimized for storing and querying graphs containing up to hundreds of billions of vertices and edges. This page provides instructions on how to install and configure a one-node self-managed JanusGraph instance with Gremlin Server for development and testing purposes only (i.e. non-production).
This page provides instructions on how to install and configure a one-node self-managed JanusGraph instance with Gremlin Server for development and testing purposes only. To configure a secure, distributed and production-ready instance of JanusGraph and Gremlin Server, please refer to the JanusGraph documentation.
The instructions below are for Ubuntu 20.04. Installation instructions for other Linux distributions are almost identical (assuming that Java is installed) as JanusGraph is a Java-based graph database.

Installation

To download and install JanusGraph (which comes bundled with a fully-integrated version of Gremlin Server and Gremlin Client), simply download the JanusGraph binary distribution (janusgraph-*.zip) from https://github.com/JanusGraph/janusgraph/releases and unpackage the downloaded archive file into a directory of your choice. In our case, we shall unpackage the JanusGraph binary distribution into /opt/janusgraph/janusgraph-0.6.0.

Configuration

Backends

To enable it to scale graph workloads across a distributed cluster, JanusGraph supports multiple distributed storage and index backends (supporting geo, numeric and full-text search) including:
JanusGraph is bundled with configuration templates for various combinations of the above backends. These configuration templates may be found in the conf folder inside the JanusGraph installation directory. In our example, we shall configure a one-node instance of JanusGraph that uses the Oracle Berkeley DB storage backend and the Elasticsearch index backend respectively.
To configure JanusGraph to use Oracle Berkeley DB and an existing Elasticsearch cluster, please edit conf/janusgraph-berkeleyje-es.properties as follows:
# Gremlin graph implementation
gremlin.graph=org.janusgraph.core.JanusGraphFactory
# Storage backend
storage.backend=berkeleyje
# Storage directory
storage.directory=/opt/janusgraph/janusgraph-0.6.0/db/berkeley
# Index backend
index.search.backend=elasticsearch
# Index hostname
index.search.hostname=127.0.0.1
In the example configuration above, we configure JanusGraph to use the Oracle Berkeley DB storage backend via the storage.backend property. We then define the absolute path to the directory in which the Oracle Berkeley DB data should be persisted via the storage.directory property. Next, we configure JanusGraph to use an existing Elasticsearch cluster for the mixed index backend via the index.search.backend property. Finally we provide the Elasticsearch hostname via the index.search.hostname property.
For an extensive list of JanusGraph configuration properties available, please visit https://docs.janusgraph.org/configs/configuration-reference.

Gremlin Server

JanusGraph implements the Apache TinkerPop framework, meaning that there are multiple methods by which to connect to the JanusGraph graph database. These include embedding JanusGraph as a library inside a Java application, or starting a JanusGraph server (which is a customized version of Gremlin server) instance to proxy requests (via HTTP or websockets) to JanusGraph.
In our example, we shall configure and start a Gremlin server instance that will accept Gremlin-based queries via HTTP or websockets. To configure Gremlin server given an Oracle Berkeley DB storage backend and an existing Elasticsearch mixed index backend, please edit conf/gremlin-server/gremlin-server-berkeleyje-es.yaml as follows:
host: localhost
port: 8182
evaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
graphs: {
graph: conf/janusgraph-berkeleyje-es.properties
}
authentication: {
authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
config: {
defaultUsername: "guest",
defaultPassword: "password",
hmacSecret: "secret",
credentialsDb: conf/janusgraph-gremlin-server-credentials.properties
}
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
The following subsections describe key properties in the example Gremlin server configuration found above.

Networking

The host property should be set to the network interface that Gremlin server will listen on for requests (for example localhost), at the port number specified in the port property (the default port number is 8182).

Channelizer

Setting the channelizer property to org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer configures Gremlin server to accept both websocket and HTTP connections over the same port.

Graph

The graph property should be set as the path to the configuration file defining the JanusGraph index and storage backends (amongst other properties). In our case this is conf/janusgraph-berkeleyje-es.properties that we configured earlier. This may be set as an absolute path, or a path relative to the JanusGraph installation directory.

Authentication

The authentication namespace enables us to configure a username and password for authenticating requests using SASL (for websockets) and Basic Authentication (for HTTP).
authentication: {
authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
config: {
defaultUsername: "guest",
defaultPassword: "password123",
hmacSecret: "secret",
credentialsDb: conf/janusgraph-gremlin-server-credentials.properties
}
}
Given that we have configured the WsAndHttpChannelizer channelizer to accept both websocket and HTTP connections over the same port, we should use the org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator authenticator and the org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler authentication handler to enable authentication through both websockets (using SASL) and HTTP (using Basic Authentication).
To define the credentials to be used for authenticating requests, we must first create a separate and dedicated graph for storing and managing these credentials. In our example, we have created a new configuration file at conf/janusgraph-gremlin-server-credentials.properties with the following contents:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=/opt/janusgraph/janusgraph-0.6.0/db/credentials/berkeley
In the configuration above, we set the storage backend for our separate and dedicated credentials graph as Oracle Berkeley DB and provide a path to the directory which will hold our credentials graph data. Using this properties file we can use Gremlin console (which is bundled with JanusGraph) to open the credentials graph and create a new user vertex (with username and password properties), as follows:
# Navigate to the JanusGraph installation folder and open Gremlin console
$ ./bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
# Open the credentials graph and create a new user vertex
gremlin> :plugin use tinkerpop.credentials
gremlin> graph = JanusGraphFactory.open("conf/janusgraph-gremlin-server-credentials.properties")
gremlin> credentials = graph.traversal(CredentialTraversalSource.class)
gremlin> credentials.user("guest","password123")
# Check that the new user vertex has been successfully created
gremlin> credentials.users("guest").elementMap()
gremlin> credentials.users().count()
gremlin> :exit
In the example above, we have created a new user vertex in our credentials graph with the username guest and the password password123. Finally we reference this new user in the authentication.config namespace in our Gremlin server configuration as follows:
authentication: {
authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
config: {
defaultUsername: "guest",
defaultPassword: "password123",
hmacSecret: "secret",
credentialsDb: conf/janusgraph-gremlin-server-credentials.properties
}
}
With this configuration, all client requests to Gremlin server (whether via websockets or HTTP) must supply these credentials in order to be successfully authenticated.

Starting Gremlin Server

To start the JanusGraph customized version of Gremlin server using our Gremlin server configuration file (see above), we can run the following command via the command line:
# Start Gremlin server and provide our Gremlin server configuration file
$ ./bin/janusgraph-server.sh console ./conf/gremlin-server/gremlin-server-berkeleyje-es.yaml
After starting Gremlin server with this configuration, the server will listen for requests at http://localhost:8182.

Testing Gremlin Server

We can use the curl package to send a HTTP POST request (with a JSON body containing a Gremlin query) to Gremlin server to test that it is online whilst also testing that authentication has been properly configured, as follows:
# Test that Gremlin server is accepting authorised requests
# Make a request with no credentials
# This should respond with a HTTP 401 Unauthorized response status
$ curl -v -X POST http://localhost:8182 -d '{"gremlin": "g.V().count()"}'
# Make a request with the relevant credentials using Basic Auth
# This should respond with a HTTP 200 OK response status
$ curl -v -X POST http://localhost:8182 -d '{"gremlin": "g.V().count()"}' -u guest:password123

OntoPop Context

To configure OntoPop to use JanusGraph, please configure the storage.graph namespace in application.yml as follows:
Property
Description
Example Value
service
The name of the specific graph database implementation or graph database general communication protocol used by OntoPop. For JanusGraph, please use one of the following values for this property: - janusgraph-ws (JanusGraph via websockets) - janusgraph-driver (JanusGraph Gremlin Server via remote connection driver) - janusgraph-http (JanusGraph Gremlin Server via HTTP)
janusgraph-http
gremlin-server.url
If using a gremlin-server-http or janusgraph-http, the URL to the Gremlin server. Note that the URL should be set as an externalized variable and NOT stored as plaintext in application.yml.
http://localhost:8182
gremlin-server.host
If using gremlin-server-ws, gremlin-server-driver, janusgraph-ws, janusgraph-driver or azure-cosmosdb, the hostname of the Gremlin server. Note that the hostname should be set as an externalized variable and NOT stored as plaintext in application.yml.
localhost
gremlin-server.port
If using gremlin-server-ws, gremlin-server-driver, janusgraph-ws, janusgraph-driver or azure-cosmosdb, the Gremlin server port number. Note that the port number should be set as an externalized variable and NOT stored as plaintext in application.yml.
8182
gremlin-server.username
If the Gremlin server requires authentication, then the username to use as part of basic authentication requests to the Gremlin server. Note that the username should be set as an externalized variable and NOT stored as plaintext in application.yml.
guest
gremlin-server.password
If the Gremlin server requires authentication, then the password to use as part of basic authentication requests to the Gremlin server. Note that the password should be set as an externalized variable and NOT stored as plaintext in application.yml.
password123
gremlin-server.enableSsl
If the Gremlin server is configured with SSL enabled, then Gremlin clients must also be SSL enabled via this property.
false
gremlin-server.bulkExecutor.rateLimiter.enabled
If using gremlin-server-http, gremlin-server-driver, janusgraph-http, janusgraph-driver or azure-cosmosdb, whether to enable a rate limiter when performing bulk actions such as bulk deletion or bulk insertion of vertices and/or edges. This property is useful particularly when using Azure Cosmos DB so as to not exceed the RU/s for manually provisioned throughput instances.
true
gremlin-server.bulkExecutor.rateLimiter.actionsPerSecond
If the rate limiter is enabled, the maximum number of bulk actions to perform per second.
100
gremlin-server.bulkExecutor.rateLimiter.maximumAttempts
If the rate limiter is enabled and the manually provisioned throughput is still exceeded, the maximum number of times to retry the bulk actions before throwing an exception.
10
Note that the remaining properties in the storage.graph namespace (for example engine.supportsSchema, engine.supportsTraversals.by, or gremlin-server.serializer.className) do not need to be set. When using specific graph database implementations (such as JanusGraph), these properties are already set automatically by OntoPop and any value entered in application.yml for these properties will be overridden.
For further information regarding JanusGraph, please visit https://docs.janusgraph.org. And for further information regarding configuring JanusGraph server (i.e. the JanusGraph customized version of Gremlin server), please visit https://docs.janusgraph.org/operations/server.