OntoPop
Search…
⌃K

Logical System Architecture

OntoPop is an open-source collection of event-driven microservices and APIs that enable consumer applications to visualise, search, explore and manage version-controlled ontologies.
Last Updated: 15 March 2022 • Page Author: Jillur Quddus

Single Logical Service

The following diagram describes the high-level logical system architecture of OntoPop when deployed as a single logical service.
OntoPop as a single logical service

Data Pipeline

The following ordered list describes the flow and transformation of data across OntoPop's integrated event-driven services, from left to right.

1. Ontology Ingestion Service

(1.1) The original W3C Web Ontology Language (OWL) ontology must exist as an OWL file managed in either a private or public Git-based version control repository. The ontology (and hence OWL file) may be completely empty, describe an upper ontology, or describe a mature well-developed ontology.
(1.2) A Git push action, where commits include changes to the specified OWL file resource, will invoke a Git webhook that is consumed by the ontology ingestion service (via a subscriber application). The ontology ingestion service will examine the commits contained within the JSON payload of the HTTP POST request and check whether the specified OWL file resource has been updated. If so, it will download the OWL file from the relevant Git repository and branch (if it is a private Git repository then the access token will be retrieved from the secrets management service) and save it to persistent object storage. It will then publish a message to the shared messaging system notifying subscribers that updates to the ontology have been made. Finally, it will insert a record into a physical SQL database containing details of the webhook event.
The end-to-end ontology ingestion service is illustrated in the following diagram.
Ontology Ingestion Service

2. Ontology Validation Service

The ontology validation service, which is subscribed to the shared messaging system, will consume the message indicating that updates to the ontology have been made. It will then ingest the W3C Web Ontology Language (OWL) ontology from persistent object storage and validate it for semantic consistency, outputting true or false. It will then publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been made. The end-to-end ontology validation service is illustrated in the following diagram.
Ontology Validation Service

3. Ontology Triplestore Loading Service

The ontology triplestore loading service, which is subscribed to the shared messaging system, will consume the message indicating that validated updates to the ontology have been made. It will then ingest the W3C Web Ontology Language (OWL) ontology from persistent object storage and directly load it into a physical RDF triplestore instance. It will then publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been loaded into a RDF triplestore. The end-to-end ontology triplestore loading service is illustrated in the following diagram.
Ontology Triplestore Loading Service

4. Ontology Parsing Service

The ontology parsing service, which is subscribed to the shared messaging system, will consume the message indicating that validated updates to the ontology have been made. It will then ingest the W3C Web Ontology Language (OWL) ontology from persistent object storage and parse it into its constituent objects, including annotation properties, object properties, classes and class relationships. It will then persist these parsed objects to persistent object storage where they will be serialized as JSON objects. Finally, it will publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been parsed. The end-to-end ontology parsing service is illustrated in the following diagram.
Ontology Parsing Service

5. Property Graph Modelling Service

The property graph modelling service, which is subscribed to the shared messaging system, will consume the message indicating that validated updates to the ontology have been parsed. It will then ingest the parsed JSON objects from persistent storage and model them as directed property graph objects, specifically vertices, edges, vertex properties and edge properties. It will then persist these modelled objects to persistent object storage where they will be serialized as JSON objects. Finally, it will publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been parsed and modelled. The end-to-end property graph modelling service is illustrated in the following diagram.
Property Graph Modelling Service

6. Property Graph Loading Service

The property graph loading service, which is subscribed to the shared messaging system, will consume the message indicating that validated updates to the ontology have been parsed and modelled. It will then ingest the modelled JSON objects from persistent storage and directly load them into a physical graph database. Finally, it will publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been parsed, modelled and loaded (into a graph database). The end-to-end property graph loading service is illustrated in the following diagram.
Property Graph Loading Service

7. Property Graph Indexing Service

The property graph indexing service, which is subscribed to the shared messaging system, will also consume the message indicating that validated updates to the ontology have been parsed and modelled. It will then ingest the modelled JSON objects from persistent storage and index them into a physical search index. Finally, it will publish a message to the shared messaging system notifying subscribers that validated updates to the ontology have been parsed, modelled and indexed (into a search index). The end-to-end property graph indexing service is illustrated in the following diagram.
Property Graph Indexing Service

Multi-Model Serving Storage

After the event-driven data pipeline, described by stages 1 - 7 above, has completed, the serving storage layer will contain updated ontology data across the following different data models, enabling a rich and diverse range of query patterns to be requested based on different use cases.
  • RDF Triplestore - users and consumer applications can make SPARQL queries to the RDF triplestore via the OntoPop Triplestore API. This enables semantic queries and returns linked data in the form of JSON or XML triples.
  • Graph Database - users and consumer applications can make Gremlin queries to the graph database via the OntoPop Graph API. This enables graph traversal queries and returns linked data in the form of JSON directed property graph objects (i.e. vertices, edges and paths).
  • Search Index - users and consumer applications can make search queries to the search index via the OntoPop Search API. This enables structured and unstructured free-text queries and returns data in the form of JSON indexed documents representing property graph vertices.
  • SQL Database - OntoPop supports the management of multiple ontologies concurrently. This means that OntoPop can monitor changes to W3C Web Ontology Language (OWL) ontologies across multiple Git repositories and multiple branches simultaneously. The SQL database is used to persist the details of the Git repositories and branches, and to keep a log of all webhook events that are consumed. The OntoPop Management API enables the definition of new ontologies that OntoPop should monitor for changes. Users and consumers can also request Git repository and webhook details from the SQL database via the OntoPop Management API.

API Collections

Users and consumer applications can query the different ontology data models stored in the serving storage layer through OntoPop's extensive API. OntoPop provides the following API collections for this purpose:
  • Triplestore API - enables the querying (via SPARQL) and updating of the RDF triplestore.
  • Graph API - enables the querying (via Gremlin) of the graph database.
  • Search API - enables the querying (via structured and unstructured free-text search queries) of the search index.
  • Management API - enables the creation, updating and deletion of W3C Web Ontology Language (OWL) ontologies (i.e. Git repositories, branches and resource names) that OntoPop should simultaneously monitor. The Management API also exposes an endpoint where consumer applications may upload an exported OWL file that will be programmatically pushed to the original Git repository (and relevant branch) and hence instigate the event-driven data pipeline as described above.
Version 3.x of OntoPop will support atomic real-time CRUD actions to the ontology via the OntoPop API. However the current release of OntoPop (version 2.x) only supports updates to the ontology if (1) updates are made to the OWL file in the relevant Git repository externally, or (2) an OWL file is uploaded via the Management API which in turn will be programmatically pushed to the relevant Git repository.

Applications

Downstream applications can consume OntoPop services via the OntoPop API. OntoPop provides an out-of-the-box native UI web application (under development), but any system or application that supports HTTP requests can seamlessly integrate with OntoPop.

Software Services

The following subsections describe the purpose of each of the software services required by OntoPop when deployed as a single logical service, as illustrated in the diagram above. Please refer to the Compatibility Matrix for a list of vendor-specific software services and managed services that are currently supported by OntoPop.

Git Version Control

A Git-based version control repository is used to manage the underlying ontology in the form of a W3C Web Ontology Language (OWL) file. The OWL file may be completely empty, describe an upper ontology, or describe a mature well-developed ontology. In any event, the first stage in using OntoPop is to define a new ontology for OntoPop to monitor via the Management API. The details of the Git repository managing the OWL file must be provided in the body of the HTTP POST request in order to create the new ontology to monitor, including the repository URL, a repository access token (if it is a private repository) and the specific branch and path to the OWL file. Git push actions applied to this repository will invoke a webhook whose payload will include details of all commits included in the push. The ontology ingestion service will consume and parse this payload and if one or more of the commits relate to the branch and path of the specified OWL file, then the OWL file will be downloaded from the Git repository and the remaining event-driven microservices invoked.

Graph Database

A graph database is used to persist and manage the ontology when modelled as a directed property graph. OntoPop utilizes the Apache TinkerPop framework and, as such, any graph database engine that implements the Apache TinkerPop framework is theoretically supported by OntoPop (including AWS Neptune, Azure Cosmos DB, JanusGraph and TinkerGraph). Please refer to the Compatibility Matrix for a list of specific graph database engines that have been officially tested. Gremlin queries can be executed against the graph database via the OntoPop Graph API which enables OntoPop to avoid exposing the physical graph database or Gremlin server directly.

Messaging

A messaging system is used to orchestrate the event-driven microservices, as well as to notify any external subscribers outside of OntoPop. Each OntoPop microservice (deployed as a serverless function) is subscribed to a relevant topic and will be triggered if an event is published to that topic. The payload of the event message will include both the ID of the ontology as well as the webhook, thus any external subscribers may consume the raw or transformed ontology at any stage of the data pipeline (e.g. the parsed ontology, or the ontology modelled as a directed property graph before loading and indexing).

Object Storage

Object storage is used to persist the raw OWL file downloaded from the Git repository, as well as to persist transformed ontology data after each stage of the data pipeline, serialized to object storage as JSON files. Each stage of the data pipeline will download, ingest and deserialize the JSON file serialized by the previous stage.

Relational Database

A relational database management system (RDBMS), or "SQL" database, is used to persist and manage the properties of the ontologies created via the Management API (with the exception of the repository access token and webhook secret, if applicable, which are stored in a secrets engine such as HashiCorp Vault). It is also used to keep an audit log of all the webhooks and associated payloads consumed by the ontology ingestion service.
Version 3.x of OntoPop will support seamless retrieval of previous versions of an ontology via the UI.

Search Engine

A search engine is used to persist and manage the indexed ontology when modelled as a directed property graph. It can be queried, via the Search API, to quickly and efficiently perform free-text based structured and unstructured searches for classes and properties found in the ontology (modelled as vertices and vertex properties in the search index, with IRIs preserved to enable linking with the triplestore and original OWL file). The Search API enables OntoPop to avoid exposing the physical search engine directly.

Secrets Engine

A secrets engine is used to persist the repository access token and webhook secret, if applicable, when creating a new ontology via the Management API. These secrets are then retrieved from the secrets engine and used by the ontology ingestion service to download the version-controlled OWL file from the respective private Git repository. The secrets engine is also used to persist relevant secrets referenced in the OntoPop configuration, thus avoiding secrets being stored in plaintext. These configuration secrets are retrieved from the secrets management service and injected into Spring application properties during the bootstrap context of the OntoPop Spring Boot applications.

Serverless Functions

The event-driven microservices are deployed as serverless functions subscribed to relevant topics managed by the shared messaging system and triggered only when an event is published to the relevant topic. Using serverless functions significantly reduces the cost and complexity of running the OntoPop data pipeline, especially given that each stage of the data pipeline has a typical total runtime duration of between 10 - 30 seconds.

Triplestore

A RDF triplestore is used to persist the ontology, which is loaded directly from the validated OWL file. The RDF triplestore persists the ontology as semantic triples, and provides a SPARQL endpoint where the triples may be queried and managed via HTTP. OntoPop exposes the SPARQL endpoint via its Triplestore API rather than exposing the physical RDF triplestore directly.