AWS Lambda

Provision an AWS Lambda instance via the AWS Management Console and deploy an OntoPop event-driven data pipeline app to it.

Please note that the OntoPop backend open-source software project, which includes the event-driven data pipelines and APIs, is undergoing extensive redesign and refactoring as part of OntoPop Community 3.x in order to improve performance, security, extensibility and maintainability. As a result, the documentation on this page will be significantly updated. Please refer to the OntoPop Roadmap for further information.

Overview

AWS Lambda is the native serverless, event-driven compute service offered by the AWS cloud computing platform, enabling applications and backend services to be run without provisioning or managing any servers. This page provides instructions on how to provision AWS Lambda instances and then deploy the OntoPop event-driven data pipeline Spring Boot applications to them.

For further information regarding AWS Lambda, please visit https://aws.amazon.com/lambda.

It is recommended that you configure and integrate the steps described in this page into a CI/CD pipeline in order to automate the build, testing and deployment stages.

Data Pipeline

OntoPop provides AWS Lambda Spring Boot application deployments that wrap around each of the event-driven microservices described in the logical system architecture. These AWS Lambda applications are provided out-of-the-box to enable quick and easy deployment to AWS Lambda instances. Assuming that you have followed the instructions detailed in Build from Source, the AWS Lambda Spring Boot applications for each of the event-driven microservices that make up the OntoPop data pipeline may be found in the $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws Maven module, which itself contains the following child modules pertinent to the data pipeline:

  1. ontopop-aws-lambda-app-subscriber-github-webhook - Node.js application that subscribes to GitHub webhooks and invokes the ontology ingestion service lambda directly via the AWS SDK. Note that this is a Node.js application (i.e. not a Spring Boot application) as GitHub webhook requests timeout after 10 seconds after which the HTTP connection is destroyed and the webhook payload lost. Thus we can deploy this lightweight Node.js application that returns a promise (i.e. an immediate response back to GitHub) to avoid the longer cold start-up times incurred by Java-based applications.
  2. ontopop-aws-lambda-app-data-ontology-ingestor - AWS Lambda Spring Boot application deployment wrapper around the ontology ingestion service, invoked by the ontopop-aws-lambda-app-subscriber-github-webhook application directly via the AWS SDK.
  3. ontopop-aws-lambda-app-data-ontology-validator - AWS Lambda Spring Boot application deployment wrapper around the ontology validation service, invoked via its subscription to the shared messaging system.
  4. ontopop-aws-lambda-app-data-ontology-loader-triplestore - AWS Lambda Spring Boot application deployment wrapper around the ontology triplestore loading service, invoked via its subscription to the shared messaging system.
  5. ontopop-aws-lambda-app-data-ontology-parser - AWS Lambda Spring Boot application deployment wrapper around the ontology parsing service, invoked via its subscription to the shared messaging system.
  6. ontopop-aws-lambda-app-data-ontology-modeller-graph - AWS Lambda Spring Boot application deployment wrapper around the property graph modelling service, invoked via its subscription to the shared messaging system.
  7. ontopop-aws-lambda-app-data-ontology-loader-graph - AWS Lambda Spring Boot application deployment wrapper around the property graph loading service, invoked via its subscription to the shared messaging system.
  8. ontopop-aws-lambda-app-data-ontology-indexer-graph - AWS Lambda Spring Boot application deployment wrapper around the property graph indexing service, invoked via its subscription to the shared messaging system.

Setup

Build from Source

In order to compile and build the OntoPop event-driven data pipeline AWS Lambda Spring Boot applications in preparation for deployment to AWS Lambda instances, please follow the instructions detailed in Build from Source.

AWS CLI

We shall use the AWS Command Line Interface (CLI) to deploy the OntoPop Java artifacts (i.e. OntoPop's data pipeline AWS Lambda Spring Boot applications packaged as JAR files) that were created in the Build from Source stage above to AWS Lambda instances. To install the AWS CLI, please follow the instructions below:

The instructions below are for Ubuntu 20.04. Installation instructions for other Linux distributions and other operating systems such as Windows may be found at https://aws.amazon.com/cli.

# Install the required dependencies
$ sudo apt-get update 
$ sudo apt-get install glibc groff less

# Install the AWS CLI from a ZIP file
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

Assuming that the AWS CLI has installed successfully, we can configure it with the Access Key ID and Secret Access Token of an IAM user with privileges to programmatically manage AWS Lambda instances (such as an IAM user provisioned with the AWSLambda_FullAccess AWS managed policy, or similar) as follows:

# Configure the AWS CLI
$ aws configure

    AWS Access Key ID [None]: AKIA123456789
    AWS Secret Access Key [None]: abcdefg987654321hijklmnop
    Default region name [None]: eu-west-2
    Default output format [None]: json
AWS Lambda

We shall use the AWS Management Console to provision AWS Lambda instances. To do so, navigate to the AWS Lambda service via the AWS Management Console, select "Create function" and follow the instructions below:

  1. Function Name - enter a custom function name that describes the purpose of the function, for example ontology-ingestor-service.
  2. Runtime - with the exception of the GitHub Webhook Subscriber (which is a Node.js application), all other OntoPop event-driven data pipeline functions are Java Spring Boot applications. Thus if you are deploying ontopop-aws-lambda-app-subscriber-github-webhook, then please select "Node.js 14.x". Otherwise please select "Java 11 (Corretto)" for all other deployments.

Once configured, select "Create function" to create the new AWS Lambda instance.

Amazon MQ Trigger

With the exception of ontopop-aws-lambda-app-data-ontology-ingestor (which is invoked directly by ontopop-aws-lambda-app-subscriber-github-webhook via the AWS SDK), all other OntoPop event-driven data pipeline AWS Lambda Spring Boot applications are invoked via their subscription to the shared messaging system. Assuming that you are integrating OntoPop with the Amazon MQ (RabbitMQ) managed broker, we need to add an Amazon MQ trigger to our AWS Lambda instances. To do this, navigate to the AWS Lambda service via the AWS Management Console, select the relevant AWS Lambda instance, select "Add trigger", choose "MQ" as the trigger type and enter the following properties:

  • Amazon MQ Broker - select an Amazon MQ (RabbitMQ) broker. For details on provisioning an Amazon MQ (RabbitMQ) message broker for integration with OntoPop, please see Amazon MQ.
  • Batch Size - enter the maximum number of messages to retrieve in a single batch (for example 1).
  • Batch Window - enter the maximum amount of time (in seconds) to gather records before invoking the function (for example 5 seconds)
  • Queue Name - enter the name of the Amazon MQ (RabbitMQ) broker destination queue that this AWS Lambda should subscribe to and consume. Note that the name you enter here must equal the name of the destination (topic) and group (queue) binding defined in the OntoPop application context. For example, if we are deploying the ontopop-aws-lambda-app-data-ontology-validator AWS Lambda Spring Boot application, which binds to the ingestedConsumptionChannel (see the spring.cloud.stream.bindings namespace in the OntoPop application context), then we would enter ontopop.data.ingested.ontopop as the fully qualified queue name (i.e. the topic name of ontopop.data.ingested combined with the queue name of ontopop).
  • Source Access Secret - as described in Amazon MQ, a secret must be defined in AWS Secrets Manager containing the RabbitMQ broker credentials (i.e. username and password as key-value pairs) in order for the AWS Lambda instance to connect and subscribe to messages. For this property, select the name of the secret containing the RabbitMQ broker credentials (for example MQaccess).

Once configured, select "Add" to add the Amazon MQ trigger to the AWS Lambda instance. Finally we need to provision the AWS Lambda instance permission to subscribe to and read the Amazon MQ (RabbitMQ) message broker queue. To do this, navigate to the AWS Lambda service via the AWS Management Console, select Configuration > Permissions and select the execution role name (for example ontology-validator-service-role-abc123). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and attach the AmazonMQReadOnlyAccess AWS managed policy (or equivalent custom policy) to this role. Everything is now setup so that every time a message is published to the relevant queue, the subscribing AWS Lambda instance will be invoked.

Deployment

GitHub Subscriber

In the following instructions we detail how the GitHub webhook subscriber Node.js application can be deployed to an AWS Lambda instance.

  1. Create a new (empty) AWS Lambda instance configured with the Node.js 14.x runtime via the AWS Management Console as detailed above. We shall call this AWS Lambda instance github-webhook-subscriber for the purposes of these instructions. Once created, open this new AWS Lambda instance via the AWS Management Console, navigate to Configuration > General configuration, set its memory to 128 MB and set its timeout to 1 min 0 sec.
  2. To invoke the ontology ingestion service directly from the GitHub webhook subscriber AWS Lambda using the AWS SDK, set the name of the ontology ingestion service AWS Lambda, for example ontology-ingestor-service, as an environment variable named ONTOPOP_ONTOLOGY_INGESTOR_FUNCTION_NAME. This can be done by navigating to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, then selecting Configuration > Environment variables, as illustrated in the following screenshot:
GitHub webhook subscriber environment variables
GitHub webhook subscriber environment variables
  1. Next we need to provision the relevant permission enabling the GitHub webhook subscriber AWS Lambda to directly invoke the ontology ingestion service AWS Lambda. To do this, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, select Configuration > Permissions, then select the execution role name (for example github-webhook-subscriber-role-abc123). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and add the AWSLambdaRole AWS managed policy (or equivalent custom policy), as illustrated in the following screenshot. The GitHub webhook subscriber AWS Lambda now has permission to programmatically and directly invoke the ontology ingestion service AWS Lambda.
GitHub webhook subscriber role permissions
GitHub webhook subscriber role permissions
  1. We are now ready to deploy the GitHub webhook subscriber Node.js application code contained in the $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook project to this AWS Lambda instance. Assuming that you have followed the instructions detailed in the Setup section above, navigate to $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook and execute the following commands via your command line:
# Navigate to the relevant project folder
$ cd $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook

# Package the index.js file into a ZIP archive file
$ zip function.zip index.js

# Use the AWS CLI to deploy the ZIP file to the relevant AWS Lambda instance
$ aws lambda update-function-code --function-name github-webhook-subscriber --zip-file fileb://function.zip
  1. Now that we have uploaded the application code to the GitHub webhook subscriber AWS Lambda instance, we need to make it publicly accessible via HTTP. To do this, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, select "Add trigger" and then select "API Gateway". Configure API Gateway accordingly by creating a new HTTP API and new custom HTTP POST route (for example /subscribers/github), and then integrate this new HTTP POST route with the GitHub webhook subscriber AWS Lambda instance. The GitHub webhook subscriber AWS Lambda is now accessible publicly via HTTP. To identify its HTTPS endpoint, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console and select Configuration > Triggers (press the refresh button if required). The HTTPS endpoint will look similar to https://abcde12345.execute-api.eu-west-2.amazonaws.com/subscribers/github (if you have configured a custom domain name in API Gateway, then the custom domain name can be used instead of the AWS hostname).

  2. Next we need to configure a webhook in the relevant GitHub repository and set its payload URL as the API Gateway HTTPS endpoint noted above. To do this, navigate to the relevant GitHub repository in a web browser, select Settings > Webhooks > Add webhook and enter the following properties:

PropertyDescriptionExample
Payload URLThe public URL of the GitHub webhook subscriber AWS Lambda instance (as noted above in step 5).https://abcde12345.execute-api.eu-west-2.amazonaws.com/subscribers/github
Content typeWebhook media type. This should be set to application/json.application/json
SecretA custom string that will be used by OntoPop to validate GitHub webhook payloads. Please visit Securing your webhooks for further information. Please make a note of the secret token that you create, as it will be required when creating a new ontology for OntoPop to monitor via the OntoPop Management API.mysecret123
SSL verificationWhether to verify SSL certificates when delivering payloads. This should be set to enabled.Enable SSL verification
Which events would you like to trigger this webhook?The event type that will trigger the GitHub webhook. This should be set to the push event.Just the push event.

The following screenshot provides an example GitHub webhook configuration integrated with a GitHub webhook subscriber AWS Lambda instance:

GitHub webhook configuration
GitHub webhook configuration
  1. Press "Add webhook". Now every time a push event occurs in the relevant GitHub repository, this webhook will be triggered and a HTTP POST request made to the public URL of the GitHub webhook subscriber AWS Lambda instance.
Function Apps

In the following instructions we use the ontopop-aws-lambda-app-data-ontology-validator child Maven module as an example with which to demonstrate how to deploy OntoPop's event-driven data pipeline Spring Boot applications to AWS Lambda instances. However these instructions can be equally applied to deploy any and all of the data pipeline Spring Boot applications listed in the Data Pipeline section above.

  1. Create a new (empty) AWS Lambda instance configured with the Java 11 (Corretto) runtime via the AWS Management Console as detailed in the Setup section above. We shall call this AWS Lambda instance ontology-validator-service for the purposes of these instructions. Once created, open this new AWS Lambda instance via the AWS Management Console, navigate to Configuration > General configuration, set its memory to 1024 MB and set its timeout to 5 min 0 sec.
  2. Since we are deploying a Java Spring Boot application that utilizes the Spring Cloud Function project, we need to configure the AWS Lambda instance with details of the main Java class to invoke as well as the name of the Java function that will be executed. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Configuration > Environment variables and set the following environment variables dependent on which OntoPop event-driven data pipeline service you are deploying - note that the main class should be set in an environment variable named MAIN_CLASS, and the function name should be set in an environment variable named spring_cloud_function_definition.
Maven ModuleMain ClassFunction Name
ontopop-aws-lambda-app-data-ontology-ingestorai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .ingestor.OntologyIngestorAwsLambdaAppontologyIngestorAwsLambdaApiGatewayProxyRequestEventConsumer
ontopop-aws-lambda-app-data-ontology-validatorai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .validator.OntologyValidatorAwsLambdaAppontologyValidatorAwsLambdaAmazonMqMessageConsumer
ontopop-aws-lambda-app-data-ontology-loader-triplestoreai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .loader.triplestore.OntologyTriplestoreLoaderAwsLambdaAppontologyTriplestoreLoaderAwsLambdaAmazonMqMessageConsumer
ontopop-aws-lambda-app-data-ontology-parserai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .parser.OntologyParserAwsLambdaApontologyParserAwsLambdaAmazonMqMessageConsumer
ontopop-aws-lambda-app-data-ontology-modeller-graphai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .modeller.graph.OntologyGraphModellerAwsLambdaAppontologyGraphModellerAwsLambdaAmazonMqConsumer
ontopop-aws-lambda-app-data-ontology-loader-graphai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .loader.graph.OntologyGraphLoaderAwsLambdaAppontologyGraphLoaderAwsLambdaAmazonMqMessageConsumer
ontopop-aws-lambda-app-data-ontology-indexer-graphai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .indexer.graph.OntologyGraphIndexerAwsLambdaAppontologyGraphIndexerAwsLambdaAmazonMqMessageConsumer
  1. Next we need to configure the AWS Lambda instance with the fully qualified class name and method of the function handler. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Code and select the "Edit" button belonging to the "Runtime settings" section. In the "Handler" box enter org.springframework.cloud.function.adapter.aws.FunctionInvoker::handleRequest, and then press "Save".
  2. Assuming that you are integrating OntoPop with the AWS Secrets Manager, we need to provide permission for the AWS Lambda instance to read secrets managed by AWS Secrets Manager. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Configuration > Permissions and select the execution role name (for example ontology-validator-service-role-abc123). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and attach the SecretsManagerReadWrite AWS managed policy to this role (or equivalent custom policy). Now when the AWS Lambda instance is invoked, externalized sensitive properties defined in the OntoPop application context will be loaded from AWS Secrets Manager.
  3. With the exception of ontopop-aws-lambda-app-data-ontology-ingestor, all the other OntoPop event-driven data pipeline AWS Spring Boot applications are triggered by the publication of a message to the relevant Amazon MQ (RabbitMQ) queue. Please configure a RabbitMQ trigger for the AWS Lambda instance, where the default queue name to subscribe to depends on which data pipeline service is being deployed, as follows:
Maven ModuleDefault Queue NameApplication Context Binding
ontopop-aws-lambda-app-data-ontology-validatorontopop.data.ingested.ontopopingestedConsumptionChannel
ontopop-aws-lambda-app-data-ontology-loader-triplestoreontopop.data.validated.ontopop.loaders.triplestorevalidatedTriplestoreLoaderConsumptionChannel
ontopop-aws-lambda-app-data-ontology-parserontopop.data.validated.ontopop.parsersvalidatedParserConsumptionChannel
ontopop-aws-lambda-app-data-ontology-modeller-graphontopop.data.parsed.ontopopparsedConsumptionChannel
ontopop-aws-lambda-app-data-ontology-loader-graphontopop.data.modelled.ontopop.loaders.graphmodelledGraphLoaderConsumptionChannel
ontopop-aws-lambda-app-data-ontology-indexer-graphontopop.data.modelled.ontopop.indexers.graphmodelledGraphIndexerConsumptionChannel
  1. We are now ready to deploy the packaged Java Spring application artifact to the AWS Lambda instance. Assuming that you have followed the instructions detailed in the Setup section above, navigate to $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-data-ontology-validator (in our example) and execute the following commands via your command line:
# Navigate to the relevant project folder
$ cd $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-data-ontology-validator/target

# Upload the packaged JAR file to an Amazon S3 bucket
$ aws s3 cp ontopop-aws-lambda-app-data-ontology-validator-2.0.0-aws.jar s3://ontopop-apps

# Deploy the function code from Amazon S3 to the relevant AWS Lambda instance
$ aws lambda update-function-code --function-name ontology-validator-service --s3-bucket ontopop-apps --s3-key ontopop-aws-lambda-app-data-ontology-validator-2.0.0-aws.jar

Now every time a message is published to the relevant Amazon MQ (RabbitMQ) queue, the respective AWS Lambda instance that is subscribed to it will be invoked and will proceed to execute its respective stage of the OntoPop data pipeline (with the exception of ontopop-aws-lambda-app-data-ontology-ingestor which is invoked directly by the GitHub webhook subscriber AWS Lambda).