AWS Lambda
Provision an AWS Lambda instance via the AWS Management Console and deploy an OntoPop event-driven data pipeline app to it.
Please note that the OntoPop backend open-source software project, which includes the event-driven data pipelines and APIs, is undergoing extensive redesign and refactoring as part of OntoPop Community 3.x in order to improve performance, security, extensibility and maintainability. As a result, the documentation on this page will be significantly updated. Please refer to the OntoPop Roadmap for further information.
Overview
AWS Lambda is the native serverless, event-driven compute service offered by the AWS cloud computing platform, enabling applications and backend services to be run without provisioning or managing any servers. This page provides instructions on how to provision AWS Lambda instances and then deploy the OntoPop event-driven data pipeline Spring Boot applications to them.
For further information regarding AWS Lambda, please visit https://aws.amazon.com/lambda.
It is recommended that you configure and integrate the steps described in this page into a CI/CD pipeline in order to automate the build, testing and deployment stages.
Data Pipeline
OntoPop provides AWS Lambda Spring Boot application deployments that wrap around each of the event-driven microservices described in the logical system architecture. These AWS Lambda applications are provided out-of-the-box to enable quick and easy deployment to AWS Lambda instances. Assuming that you have followed the instructions detailed in Build from Source, the AWS Lambda Spring Boot applications for each of the event-driven microservices that make up the OntoPop data pipeline may be found in the $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws
Maven module, which itself contains the following child modules pertinent to the data pipeline:
ontopop-aws-lambda-app-subscriber-github-webhook
- Node.js application that subscribes to GitHub webhooks and invokes the ontology ingestion service lambda directly via the AWS SDK. Note that this is a Node.js application (i.e. not a Spring Boot application) as GitHub webhook requests timeout after 10 seconds after which the HTTP connection is destroyed and the webhook payload lost. Thus we can deploy this lightweight Node.js application that returns a promise (i.e. an immediate response back to GitHub) to avoid the longer cold start-up times incurred by Java-based applications.ontopop-aws-lambda-app-data-ontology-ingestor
- AWS Lambda Spring Boot application deployment wrapper around the ontology ingestion service, invoked by the ontopop-aws-lambda-app-subscriber-github-webhook application directly via the AWS SDK.ontopop-aws-lambda-app-data-ontology-validator
- AWS Lambda Spring Boot application deployment wrapper around the ontology validation service, invoked via its subscription to the shared messaging system.ontopop-aws-lambda-app-data-ontology-loader-triplestore
- AWS Lambda Spring Boot application deployment wrapper around the ontology triplestore loading service, invoked via its subscription to the shared messaging system.ontopop-aws-lambda-app-data-ontology-parser
- AWS Lambda Spring Boot application deployment wrapper around the ontology parsing service, invoked via its subscription to the shared messaging system.ontopop-aws-lambda-app-data-ontology-modeller-graph
- AWS Lambda Spring Boot application deployment wrapper around the property graph modelling service, invoked via its subscription to the shared messaging system.ontopop-aws-lambda-app-data-ontology-loader-graph
- AWS Lambda Spring Boot application deployment wrapper around the property graph loading service, invoked via its subscription to the shared messaging system.ontopop-aws-lambda-app-data-ontology-indexer-graph
- AWS Lambda Spring Boot application deployment wrapper around the property graph indexing service, invoked via its subscription to the shared messaging system.
Setup
Build from Source
In order to compile and build the OntoPop event-driven data pipeline AWS Lambda Spring Boot applications in preparation for deployment to AWS Lambda instances, please follow the instructions detailed in Build from Source.
AWS CLI
We shall use the AWS Command Line Interface (CLI) to deploy the OntoPop Java artifacts (i.e. OntoPop's data pipeline AWS Lambda Spring Boot applications packaged as JAR files) that were created in the Build from Source stage above to AWS Lambda instances. To install the AWS CLI, please follow the instructions below:
The instructions below are for Ubuntu 20.04. Installation instructions for other Linux distributions and other operating systems such as Windows may be found at https://aws.amazon.com/cli.
# Install the required dependencies
$ sudo apt-get update
$ sudo apt-get install glibc groff less
# Install the AWS CLI from a ZIP file
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install
Assuming that the AWS CLI has installed successfully, we can configure it with the Access Key ID and Secret Access Token of an IAM user with privileges to programmatically manage AWS Lambda instances (such as an IAM user provisioned with the AWSLambda_FullAccess AWS managed policy, or similar) as follows:
# Configure the AWS CLI
$ aws configure
AWS Access Key ID [None]: AKIA123456789
AWS Secret Access Key [None]: abcdefg987654321hijklmnop
Default region name [None]: eu-west-2
Default output format [None]: json
AWS Lambda
We shall use the AWS Management Console to provision AWS Lambda instances. To do so, navigate to the AWS Lambda service via the AWS Management Console, select "Create function" and follow the instructions below:
- Function Name - enter a custom function name that describes the purpose of the function, for example
ontology-ingestor-service
. - Runtime - with the exception of the GitHub Webhook Subscriber (which is a Node.js application), all other OntoPop event-driven data pipeline functions are Java Spring Boot applications. Thus if you are deploying
ontopop-aws-lambda-app-subscriber-github-webhook
, then please select "Node.js 14.x". Otherwise please select "Java 11 (Corretto)" for all other deployments.
Once configured, select "Create function" to create the new AWS Lambda instance.
Amazon MQ Trigger
With the exception of ontopop-aws-lambda-app-data-ontology-ingestor
(which is invoked directly by ontopop-aws-lambda-app-subscriber-github-webhook
via the AWS SDK), all other OntoPop event-driven data pipeline AWS Lambda Spring Boot applications are invoked via their subscription to the shared messaging system. Assuming that you are integrating OntoPop with the Amazon MQ (RabbitMQ) managed broker, we need to add an Amazon MQ trigger to our AWS Lambda instances. To do this, navigate to the AWS Lambda service via the AWS Management Console, select the relevant AWS Lambda instance, select "Add trigger", choose "MQ" as the trigger type and enter the following properties:
- Amazon MQ Broker - select an Amazon MQ (RabbitMQ) broker. For details on provisioning an Amazon MQ (RabbitMQ) message broker for integration with OntoPop, please see Amazon MQ.
- Batch Size - enter the maximum number of messages to retrieve in a single batch (for example 1).
- Batch Window - enter the maximum amount of time (in seconds) to gather records before invoking the function (for example 5 seconds)
- Queue Name - enter the name of the Amazon MQ (RabbitMQ) broker destination queue that this AWS Lambda should subscribe to and consume. Note that the name you enter here must equal the name of the destination (topic) and group (queue) binding defined in the OntoPop application context. For example, if we are deploying the
ontopop-aws-lambda-app-data-ontology-validator
AWS Lambda Spring Boot application, which binds to theingestedConsumptionChannel
(see thespring.cloud.stream.bindings
namespace in the OntoPop application context), then we would enterontopop.data.ingested.ontopop
as the fully qualified queue name (i.e. the topic name ofontopop.data.ingested
combined with the queue name ofontopop
). - Source Access Secret - as described in Amazon MQ, a secret must be defined in AWS Secrets Manager containing the RabbitMQ broker credentials (i.e. username and password as key-value pairs) in order for the AWS Lambda instance to connect and subscribe to messages. For this property, select the name of the secret containing the RabbitMQ broker credentials (for example
MQaccess
).
Once configured, select "Add" to add the Amazon MQ trigger to the AWS Lambda instance. Finally we need to provision the AWS Lambda instance permission to subscribe to and read the Amazon MQ (RabbitMQ) message broker queue. To do this, navigate to the AWS Lambda service via the AWS Management Console, select Configuration > Permissions and select the execution role name (for example ontology-validator-service-role-abc123
). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and attach the AmazonMQReadOnlyAccess
AWS managed policy (or equivalent custom policy) to this role. Everything is now setup so that every time a message is published to the relevant queue, the subscribing AWS Lambda instance will be invoked.
Deployment
GitHub Subscriber
In the following instructions we detail how the GitHub webhook subscriber Node.js application can be deployed to an AWS Lambda instance.
- Create a new (empty) AWS Lambda instance configured with the Node.js 14.x runtime via the AWS Management Console as detailed above. We shall call this AWS Lambda instance
github-webhook-subscriber
for the purposes of these instructions. Once created, open this new AWS Lambda instance via the AWS Management Console, navigate to Configuration > General configuration, set its memory to 128 MB and set its timeout to 1 min 0 sec. - To invoke the ontology ingestion service directly from the GitHub webhook subscriber AWS Lambda using the AWS SDK, set the name of the ontology ingestion service AWS Lambda, for example
ontology-ingestor-service
, as an environment variable namedONTOPOP_ONTOLOGY_INGESTOR_FUNCTION_NAME
. This can be done by navigating to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, then selecting Configuration > Environment variables, as illustrated in the following screenshot:

- Next we need to provision the relevant permission enabling the GitHub webhook subscriber AWS Lambda to directly invoke the ontology ingestion service AWS Lambda. To do this, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, select Configuration > Permissions, then select the execution role name (for example
github-webhook-subscriber-role-abc123
). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and add the AWSLambdaRole AWS managed policy (or equivalent custom policy), as illustrated in the following screenshot. The GitHub webhook subscriber AWS Lambda now has permission to programmatically and directly invoke the ontology ingestion service AWS Lambda.

- We are now ready to deploy the GitHub webhook subscriber Node.js application code contained in the
$ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook
project to this AWS Lambda instance. Assuming that you have followed the instructions detailed in the Setup section above, navigate to$ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook
and execute the following commands via your command line:
# Navigate to the relevant project folder
$ cd $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-subscriber-github-webhook
# Package the index.js file into a ZIP archive file
$ zip function.zip index.js
# Use the AWS CLI to deploy the ZIP file to the relevant AWS Lambda instance
$ aws lambda update-function-code --function-name github-webhook-subscriber --zip-file fileb://function.zip
Now that we have uploaded the application code to the GitHub webhook subscriber AWS Lambda instance, we need to make it publicly accessible via HTTP. To do this, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console, select "Add trigger" and then select "API Gateway". Configure API Gateway accordingly by creating a new HTTP API and new custom HTTP POST route (for example
/subscribers/github
), and then integrate this new HTTP POST route with the GitHub webhook subscriber AWS Lambda instance. The GitHub webhook subscriber AWS Lambda is now accessible publicly via HTTP. To identify its HTTPS endpoint, navigate to the GitHub webhook subscriber AWS Lambda instance via the AWS Management Console and select Configuration > Triggers (press the refresh button if required). The HTTPS endpoint will look similar tohttps://abcde12345.execute-api.eu-west-2.amazonaws.com/subscribers/github
(if you have configured a custom domain name in API Gateway, then the custom domain name can be used instead of the AWS hostname).Next we need to configure a webhook in the relevant GitHub repository and set its payload URL as the API Gateway HTTPS endpoint noted above. To do this, navigate to the relevant GitHub repository in a web browser, select Settings > Webhooks > Add webhook and enter the following properties:
Property | Description | Example |
---|---|---|
Payload URL | The public URL of the GitHub webhook subscriber AWS Lambda instance (as noted above in step 5). | https://abcde12345.execute-api.eu-west-2.amazonaws.com/subscribers/github |
Content type | Webhook media type. This should be set to application/json . | application/json |
Secret | A custom string that will be used by OntoPop to validate GitHub webhook payloads. Please visit Securing your webhooks for further information. Please make a note of the secret token that you create, as it will be required when creating a new ontology for OntoPop to monitor via the OntoPop Management API. | mysecret123 |
SSL verification | Whether to verify SSL certificates when delivering payloads. This should be set to enabled. | Enable SSL verification |
Which events would you like to trigger this webhook? | The event type that will trigger the GitHub webhook. This should be set to the push event. | Just the push event. |
The following screenshot provides an example GitHub webhook configuration integrated with a GitHub webhook subscriber AWS Lambda instance:

- Press "Add webhook". Now every time a push event occurs in the relevant GitHub repository, this webhook will be triggered and a HTTP POST request made to the public URL of the GitHub webhook subscriber AWS Lambda instance.
Function Apps
In the following instructions we use the ontopop-aws-lambda-app-data-ontology-validator
child Maven module as an example with which to demonstrate how to deploy OntoPop's event-driven data pipeline Spring Boot applications to AWS Lambda instances. However these instructions can be equally applied to deploy any and all of the data pipeline Spring Boot applications listed in the Data Pipeline section above.
- Create a new (empty) AWS Lambda instance configured with the Java 11 (Corretto) runtime via the AWS Management Console as detailed in the Setup section above. We shall call this AWS Lambda instance
ontology-validator-service
for the purposes of these instructions. Once created, open this new AWS Lambda instance via the AWS Management Console, navigate to Configuration > General configuration, set its memory to 1024 MB and set its timeout to 5 min 0 sec. - Since we are deploying a Java Spring Boot application that utilizes the Spring Cloud Function project, we need to configure the AWS Lambda instance with details of the main Java class to invoke as well as the name of the Java function that will be executed. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Configuration > Environment variables and set the following environment variables dependent on which OntoPop event-driven data pipeline service you are deploying - note that the main class should be set in an environment variable named
MAIN_CLASS
, and the function name should be set in an environment variable namedspring_cloud_function_definition
.
Maven Module | Main Class | Function Name |
---|---|---|
ontopop-aws-lambda-app-data-ontology-ingestor | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .ingestor.OntologyIngestorAwsLambdaApp | ontologyIngestorAwsLambdaApiGatewayProxyRequestEventConsumer |
ontopop-aws-lambda-app-data-ontology-validator | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .validator.OntologyValidatorAwsLambdaApp | ontologyValidatorAwsLambdaAmazonMqMessageConsumer |
ontopop-aws-lambda-app-data-ontology-loader-triplestore | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .loader.triplestore.OntologyTriplestoreLoaderAwsLambdaApp | ontologyTriplestoreLoaderAwsLambdaAmazonMqMessageConsumer |
ontopop-aws-lambda-app-data-ontology-parser | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .parser.OntologyParserAwsLambdaAp | ontologyParserAwsLambdaAmazonMqMessageConsumer |
ontopop-aws-lambda-app-data-ontology-modeller-graph | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .modeller.graph.OntologyGraphModellerAwsLambdaApp | ontologyGraphModellerAwsLambdaAmazonMqConsumer |
ontopop-aws-lambda-app-data-ontology-loader-graph | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .loader.graph.OntologyGraphLoaderAwsLambdaApp | ontologyGraphLoaderAwsLambdaAmazonMqMessageConsumer |
ontopop-aws-lambda-app-data-ontology-indexer-graph | ai.hyperlearning.ontopop.apps.aws.lambda.data.ontology .indexer.graph.OntologyGraphIndexerAwsLambdaApp | ontologyGraphIndexerAwsLambdaAmazonMqMessageConsumer |
- Next we need to configure the AWS Lambda instance with the fully qualified class name and method of the function handler. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Code and select the "Edit" button belonging to the "Runtime settings" section. In the "Handler" box enter
org.springframework.cloud.function.adapter.aws.FunctionInvoker::handleRequest
, and then press "Save". - Assuming that you are integrating OntoPop with the AWS Secrets Manager, we need to provide permission for the AWS Lambda instance to read secrets managed by AWS Secrets Manager. To do this, open the AWS Lambda instance via the AWS Management Console, navigate to Configuration > Permissions and select the execution role name (for example
ontology-validator-service-role-abc123
). This will take you to the IAM Management Console for this role. Select Add permissions > Attach policies and attach the SecretsManagerReadWrite AWS managed policy to this role (or equivalent custom policy). Now when the AWS Lambda instance is invoked, externalized sensitive properties defined in the OntoPop application context will be loaded from AWS Secrets Manager. - With the exception of
ontopop-aws-lambda-app-data-ontology-ingestor
, all the other OntoPop event-driven data pipeline AWS Spring Boot applications are triggered by the publication of a message to the relevant Amazon MQ (RabbitMQ) queue. Please configure a RabbitMQ trigger for the AWS Lambda instance, where the default queue name to subscribe to depends on which data pipeline service is being deployed, as follows:
Maven Module | Default Queue Name | Application Context Binding |
---|---|---|
ontopop-aws-lambda-app-data-ontology-validator | ontopop.data.ingested.ontopop | ingestedConsumptionChannel |
ontopop-aws-lambda-app-data-ontology-loader-triplestore | ontopop.data.validated.ontopop.loaders.triplestore | validatedTriplestoreLoaderConsumptionChannel |
ontopop-aws-lambda-app-data-ontology-parser | ontopop.data.validated.ontopop.parsers | validatedParserConsumptionChannel |
ontopop-aws-lambda-app-data-ontology-modeller-graph | ontopop.data.parsed.ontopop | parsedConsumptionChannel |
ontopop-aws-lambda-app-data-ontology-loader-graph | ontopop.data.modelled.ontopop.loaders.graph | modelledGraphLoaderConsumptionChannel |
ontopop-aws-lambda-app-data-ontology-indexer-graph | ontopop.data.modelled.ontopop.indexers.graph | modelledGraphIndexerConsumptionChannel |
- We are now ready to deploy the packaged Java Spring application artifact to the AWS Lambda instance. Assuming that you have followed the instructions detailed in the Setup section above, navigate to
$ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-data-ontology-validator
(in our example) and execute the following commands via your command line:
# Navigate to the relevant project folder
$ cd $ONTOPOP_BASE/ontopop-apps/ontopop-apps-aws/ontopop-aws-lambda-app-data-ontology-validator/target
# Upload the packaged JAR file to an Amazon S3 bucket
$ aws s3 cp ontopop-aws-lambda-app-data-ontology-validator-2.0.0-aws.jar s3://ontopop-apps
# Deploy the function code from Amazon S3 to the relevant AWS Lambda instance
$ aws lambda update-function-code --function-name ontology-validator-service --s3-bucket ontopop-apps --s3-key ontopop-aws-lambda-app-data-ontology-validator-2.0.0-aws.jar
Now every time a message is published to the relevant Amazon MQ (RabbitMQ) queue, the respective AWS Lambda instance that is subscribed to it will be invoked and will proceed to execute its respective stage of the OntoPop data pipeline (with the exception of ontopop-aws-lambda-app-data-ontology-ingestor
which is invoked directly by the GitHub webhook subscriber AWS Lambda).