Welcome to the Kong Workshop on Enhancing API Resilience of the Kong Gateway.
- Learn how to set up a Kong Gateway Data Plane on your local machine
- Deploy Kong Enterprise Edition, the Control Plane
- Deploy and interact with an upstream microservice
- Simulate failures and delays using Kong plugins
- Enhance the gateway to effectively handle failures and improve performance
By the end of this workshop, you will have hands-on experience in configuring Kong Gateway for improved API resiliency.
Ensure your local machine has the following hardware and software requirements:
-
Hardware Requirements:
-
Minimum 8 GB RAM
-
Minimum 4 CPU cores
-
Minimum 20 GB free disk space
-
-
Docker & Docker Compose
-
Virtualisation environment; recommended allocated resources:
- memory set to at least 4GB RAM
- CPU set to at least 2 cores
- disk space set to at least 20GB
-
Docker Compose Installation Guide
-
Insomnia (Kong's Client App for designing, testing, and debugging API requests)
-
Desktop application used for testing and interacting with APIs
-
Git-Bash (Windows Users)
-
Windows application used for running Shell scripts
-
Kong Enterprise License
- Ensure that you have a valid license key, and export it for use using the terminal:
export KONG_LICENSE_DATA="{ ... your-kong-license-data ... }"
- Ensure that you have a valid license key, and export it for use using the terminal:
To install everything needed for this workshop, we will use the deploy-gateway.sh script. The contents of this script as well as the configuration used are outlined in the subsections below.
💡 Note
As previously mentioned, for the purpose of this workshop, the docker-compose.yaml has been set up to configure the base environment setup on your local machine with an instance of
decKinstalled inside a running container.
As part of the installation process of Kong Gateway's Enterprise Edition, the following Kong-built components run locally on your machine inside of your Docker environment.
Within this list of components an additional service called KongAir Routes is included, and will be used as part of this workshop:
| Name | Overview |
|---|---|
| Kong Control Plane | Manages and propagates configuration across data planes; handles all Kong entities centrally |
| Kong Data Plane | Executes traffic processing based on configurations received from the control plane |
| Kong Air Routes Service | A routes microservice emulating a customer-facing airline system |
The Database, Control Plane, and Data Plane configuration are defined in the docker-compose.yaml file.
The environment variables below are used to configure both the Kong Control Plane and Data Planes.
| ENV Variable | Control Plane Configuration | Data Plane Configuration |
|---|---|---|
| KONG_LICENSE_DATA | Contains the Kong license data | Contains the Kong license data |
| KONG_DATABASE | postgres – Uses a PostgreSQL database |
off – Data plane is decoupled from a database |
| KONG_PG_HOST | db – Host of the PostgreSQL database |
— |
| KONG_PG_DATABASE | kong – Name of the PostgreSQL database |
— |
| KONG_PG_USER | kong – PostgreSQL user |
— |
| KONG_PG_PASSWORD | kong – PostgreSQL password |
— |
| KONG_PASSWORD | password – Administrative password for Kong |
— |
| KONG_PREFIX | /tmp/kong – Working directory for Kong |
/tmp/kong – Working directory for Kong |
| KONG_VITALS | off – Kong Vitals monitoring is disabled |
off – Kong Vitals monitoring is disabled |
| KONG_LOG_LEVEL | notice – Logging level is set to notice |
notice – Logging level is set to notice |
| KONG_PLUGINS | bundled,chaos-experiments – Enables bundled plugins and chaos experiments |
bundled,chaos-experiments – Enables bundled plugins and chaos experiments |
| KONG_PROXY_LISTEN | off – Proxy listener is disabled |
0.0.0.0:8000 – Listens for client traffic |
| KONG_ADMIN_LISTEN | 0.0.0.0:8001 – Exposes the Admin API |
off – Admin API is disabled |
| KONG_ADMIN_GUI_LISTEN | 0.0.0.0:8002 – Exposes the Admin GUI |
off – Admin GUI is disabled |
| KONG_ROLE | control_plane – Designates this instance as the Control Plane |
data_plane – Designates this instance as the Data Plane |
| KONG_CLUSTER_MTLS | shared – Uses a shared certificate for mutual TLS within the cluster |
— |
| KONG_CLUSTER_CERT | /etc/secrets/kong-cluster/tls.crt – Path to the TLS certificate used for cluster mTLS |
/etc/secrets/kong-cluster/tls.crt – Path to the TLS certificate used for cluster mTLS |
| KONG_CLUSTER_CERT_KEY | /etc/secrets/kong-cluster/tls.key – Path to the TLS key used for cluster mTLS |
/etc/secrets/kong-cluster/tls.key – Path to the TLS key used for cluster mTLS |
| KONG_CLUSTER_CONTROL_PLANE | — | kong-control-plane:8005 – Address of the Control Plane from which to fetch configuration |
| KONG_CLUSTER_TELEMETRY_ENDPOINT | — | kong-control-plane:8006 – Endpoint for sending telemetry data to the Control Plane |
| KONG_TRACING_INSTRUMENTATIONS | — | all – Enables tracing for all supported instrumentations |
| KONG_TRACING_SAMPLING_RATE | — | 1.0 – 100% of requests are sampled for tracing |
| KONG_STATUS_LISTEN | — | 0.0.0.0:8100 – Exposes the Status API for health and operational metrics |
Our workshop uses a microservice from KongAir which retrieves route information
when the /routes and /route/{ID} endpoints are called.
💡 Note
The
<REGISTRY>value in the YAML should be replaced with the internal registry Fully-Qualified Domain Name (FQDN), and<DATE>should correspond with the date of this workshop. Please let one of the team know if this image is inaccessible.
kongair-routes:
container_name: kongair-routes
image: <REGISTRY>/kongair-routes:ws-<DATE>
hostname: routes.kongair
restart: on-failure
networks:
- kong-net
ports:
- "5053:8080"To monitor and visualise the performance of the Kong Gateway and microservice(s), we include the below observability and traceability tools:
| Name | Overview |
|---|---|
| DB (Postgres) | Centralised datastore used by the control plane to persist Kong configurations and entities |
| Prometheus | A robust, time-series database and monitoring system used to scrape metrics from services |
| Grafana | Visualisation and analytics platform for metrics, logs, and traces |
| FluentBit | A lightweight, fast, and scalable log processor and forwarder |
| OpenTelemetry Collector | A vendor-agnostic collector for traces, metrics, and logs |
| Loki | A horizontally scalable, highly available log aggregation system inspired by Prometheus |
| Jaeger | A distributed tracing system developed by Uber |
These services have been outlined in the docker-compose.yaml file.
💡 Note
Per note above: "For the purpose of this workshop, the docker-compose.yaml has been set up to configure the base environment setup on your local machine. If you would like to apply the configuration to the Gateway manually, please comment out the
services.deckYAML configuration. The commands that need to be run can be found within the deck-init-apply.sh script." The script will generate thekong.yamlfile which can be applied usingdeckcommands.
Using the command-line, from the root of the repository run the deploy-gateway.sh script to init the environment and start all services:
./deploy-gateway.shThis deployment script runs the following:
- Generates TLS certificate & key using the certificate-key-generation.sh script
- Starts the services using
docker-compose up -dcommand - Generates the kong.yaml file using the deck-init-apply.sh script
The deck-init-apply.sh script generates the kong.yaml file which contains the configuration
for the Kong Gateway.
The contents of this script are primarily deck commands.
These can be run manually after the workshop has finished so you may better understand the commands and their output in more detail.
💡 Note
The first time that this is run, Docker will need to pull all the images which may take a few minutes.
Verify that all services are running using the following command:
docker-compose psAll services listed should be shown with the status Up; the deck container status should show an Exited status.
Checking inside the deck container logs, using the below command, should show a line saying that the Setup is Complete:
docker-compose logs deck This is a clear indicator that the kong.yaml file has been generated and was applied to the Kong Gateway successfully. You should also see this file in the root of the repository under the deck directory.
Once the services are up and running, please move on to the next section.
💡 Note
In the case that these services are not showing this status, please check the logs and/or reach out to one of the workshop facilitators for further assistance.
To confirm that your Control Plane and Data Plane are running correctly, run the below command:
docker-compose psThis will display the status of the containers listed in the docker-compose.yaml file. Each
service should be in a running (ealthy) state.
If any service is in an exited or unhealthy state, investigate further. Be sure to refer to the Troubleshooting documentation
Open up a browser window and navigate to Kong Manager using following address:
http://localhost:8002
Alternatively, use the Admin API to check status of the control plane:
curl localhost:8001/statusMonitoring is essential to understand how a systems' services are performing and to detect issues
Before the routes and services are adjusted in the gateway with additional plugins, it is important to ensure that the Observability functionality is working correctly.
Prometheus, Grafana, Jaeger, Loki and Otel-collector components are already included in the overall docker-compose setup,
per the instructions outlined in the observability tools section. Here we will just need to
configure the plugins to enable the observability functionality.
-
Add the below configuration to the platform/plugins.yaml file which will enable the
prometheusplugin:- name: prometheus enabled: true config: bandwidth_metrics: true latency_metrics: true per_consumer: true status_code_metrics: true upstream_health_metrics: true protocols: - grpc - grpcs - http - https
💡 Note
Make sure to remove the empty array (
[]) from the topplugins:line, and place the configuration directly under it. -
Add the below configuration to the platform/plugins.yaml file which will enable the
http-logplugin:- name: http-log enabled: true config: custom_fields_by_lua: spanid: | local h = kong.request.get_header('traceparent') or '' if h then return h:match("%-[a-f0-9]+%-([a-f0-9]+)%-") end traceid: "local h = kong.request.get_header('traceparent') or ''\nif h then \n return h:match(\"%-([a-f0-9]+)%-[a-f0-9]+%-\")\nend\n" http_endpoint: http://fluentbit:8080
-
Add the below configuration to the platform/plugins.yaml file which will enable the
opentelemetryplugin:- name: opentelemetry config: traces_endpoint: http://otel-collector:4318/v1/traces resource_attributes: service.name: kong-otel-plugin
Now that these have been added, run the deploy-gateway.sh script to apply them.
-
Access Grafana
-
Switch over to a browser and open a new tab
-
Input
http://localhost:3000in the URL bar -
Hit enter & you will be presented with an authentication screen
-
To log in, use the default credentials:
- Username:
admin - Password:
admin
- Username:
-
-
Configure Prometheus Data Source
-
In Grafana, navigate to Dashboards
-
Once there you will see the Kong Dashboard present
💡 Note
This Dashboard was already imported from the JSON configuration
-
To enable monitoring of Kong Gateway metrics, we need to activate the prometheus plugin in Kong which records and
exposes metrics at the node level. Once enabled, the Prometheus server will discover all Kong nodes via a service
discovery mechanism, and consumes data from each node's individually configured /metrics endpoint.
To enable the prometheus plugin in Kong:
-
Using DecK, ensure that the
prometheusplugin is included in the platform/plugins.yaml configuration file under plugins -
Apply using DecK
-
We will be applying the configuration using
decKwith the script below:./deploy-gateway.sh
Please see the Troubleshooting documentation for more information on common issues and how to resolve them.
-
-
Verify that the Plugin is Enabled:
-
In Kong Manager, navigate to the relevant Workspace and check Services or Plugins to confirm that the plugin is active
Alternatively, use the Admin API to list plugins:
curl http://localhost:8001/plugins
-
To enable logging of Kong Gateway metrics, we need to activate the http-log plugin in Kong. This plugin enables
you send request logs and response logs to a specified HTTP server, and supports stream data (TCP, TLS, UDP) as well.
Enable the http-log plugin in Kong:
-
Using DecK, ensure that the
http-logplugin is included in the platform/plugins.yaml configuration file -
Apply using DecK
-
We will be applying the configuration using
decKwith the script below:./deploy-gateway.sh
Please see the Troubleshooting documentation for more information on common issues and how to resolve them.
-
-
Verify the Plugin is Enabled:
-
In Kong Manager, navigate to the relevant Workspace and check Services or Plugins to confirm that the plugin is active
Alternatively, use the Admin API to list plugins:
curl http://localhost:8001/plugins
-
To enable logging of Kong Gateway metrics, the opentelemetry plugin needs to be activated. This plugin propagates
distributed tracing spans and reports low-level spans to a specified OTLP-compatible server.
Enable the opentelemetry plugin in Kong:
-
Using DecK, ensure that the
opentelemetryplugin is included in the platform/plugins.yaml configuration file -
Apply using decK
-
We will be applying the configuration using
decKwith the script below:./deploy-gateway.sh
Please see the Troubleshooting documentation for more information on common issues and how to resolve them.
-
-
Verify the Plugin is Enabled:
-
In Kong Manager, navigate to the relevant Workspace and check Services or Plugins to confirm that the plugin is active
Alternatively, use the Admin API to list plugins:
curl http://localhost:8001/plugins
-
In this section, the Data Plane will be configured via Enterprise using Kong's decK functionality
Within this repository, there is an OpenAPI Specification for the Routes Service in the open-api.yaml file.
We will be using this document to generate the Kong Gateway configuration.
Convert the OpenAPI Specification to Kong Configuration:
- The
decKCLI is used to convert the OpenAPI Specification to Kong configuration - Below is the command specified inside the deck-init-apply.sh file
- This converts the specification while spinning up the instance using deck.
-
Merge Configurations:
-
Combine the generated kong-routes.yaml with the existing configuration.
💡Note
This is handled automatically by the command defined in the deck-init-apply.sh which runs during container startup via the deploy-gateway.sh script.
-
-
Verify Services and Routes in Kong Manager:
- In Kong Manager, go to
Servicesand confirm thatroutes-serviceand its routes are correctly listed.
- In Kong Manager, go to
To ensure that Kong Gateway is correctly routing requests to the Routes-Service, we will send test HTTP requests to it.
Steps:
-
Test the
Health checkendpoint:
Using Insomnia
-
Open Insomnia and import a collection present at routes-oas.yaml
-
In Insomnia, navigate to where the collection has been imported, then navigate to the
/healthendpoint (name in collection:health/Health check). Choose enviroment "OpenAPI env localhost:8000" to apply pre-configured values in URL templates. -
Send the request with expected output:
{ "status": "OK" }
Using cURL
-
Run the command below in the terminal:
curl http://localhost:8000/health
-
Send the request with expected output:
{ "status": "OK" }
-
-
Test the
Get all KongAir routesendpoint:
Using Insomnia
-
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. -
Send the request to receive the following output:
[ { "Id": "LHR-JFK", "Origin": "LHR", "Destination": "JFK", "AvgDuration": 470 }, { "Id": "LHR-SFO", "Origin": "LHR", "Destination": "SFO", "AvgDuration": 660 }, { "...omitted...": "..." }, { "Id": "LHR-LAX", "Origin": "LHR", "Destination": "LAX", "AvgDuration": 675 } ]
Using cURL
-
Run the command below in the terminal:
curl http://localhost:8000/routes
-
Send the request with expected output:
[ { "Id": "LHR-JFK", "Origin": "LHR", "Destination": "JFK", "AvgDuration": 470 }, { "Id": "LHR-SFO", "Origin": "LHR", "Destination": "SFO", "AvgDuration": 660 }, { "...omitted...": "..." }, { "Id": "LHR-LAX", "Origin": "LHR", "Destination": "LAX", "AvgDuration": 675 } ]
-
-
Test the
Get a specific KongAir route by IDEndpoint:
Using Insomnia
-
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routes/{id}endpoint; click on try it out and provideLHR-SINin theidfield and then execute -
Send the request to receive the following output:
{ "avg_duration": 660, "destination": "SIN", "id": "LHR-SIN", "origin": "SIN" }
Using cURL
-
Run the command below in the terminal:
curl http://localhost:8000/routes
-
Send the request with expected output:
{ "avg_duration": 660, "destination": "SIN", "id": "LHR-SIN", "origin": "SIN" }
-
In this section, we will generate traffic and visualise it in the Grafana dashboard.
Steps:
-
Generate Traffic:
Using Insomnia
In Insomnia, navigate to where the collection has been imported, then submit requests to these endpoints multiple times:
http://localhost:8000/routeshttp://localhost:8000/routes/LHR-SIN
Using cURL
Run the following shell command in your terminal to generate traffic:
for i in {1..10}; do curl http://localhost:8000/routes curl http://localhost:8000/routes/LHR-SIN sleep 0.5 done
-
View Metrics in Grafana
-
Open Grafana and navigate to the Kong dashboard
-
You should see metrics such as:
Requests per serviceResponse statusesLatencies
-
-
View Logs in Grafana
-
Click
Explore > Loki -
Choose
label filters: service_name = kong-http-logs -
Click
Run queryto display the log entries -
Click on any individual log to expand and view its details
-
-
View Traces in Grafana
-
In the log entry, click to expand it
-
Scroll to the bottom and click on the Jaeger link
-
This will open the corresponding trace, showing how much time was spent in each part of Kong during the request flow
-
Once traffic is visible in Grafana, move on to the next stage. In the following section, we will simulate system failures by injecting faults into the environment to observe how it behaves under failure conditions.
In this section, simulated failures and delays will be applied to the services in order to test the resilience of the Kong Gateway; faults will be injected to simulate failures.
Simulate errors using the Request Termination Plugin
Sending a request to this endpoint will respond with a 503 Service Unavailable status, imitating a down service:
-
Add the
Request TerminationPlugin to the Routes Service. Update the platform/plugins.yaml file to enable the plugin:- config: message: Service Unavailable status_code: 503 enabled: true name: request-termination service: routes-service
-
Apply the Configuration:
Deploy the
request-terminationplugin configuration to the Gateway using the deploy-gateway.sh script. -
Test the
Get Routesendpoint:
Using Insomnia
-
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. Run the request which will return an error. -
Send the request to receive the following output:
Service Unavailable
Using cURL
-
Run the command below in the terminal:
curl http://localhost:8000/routes
-
Send the request with expected output:
Service UnavailableThe response will be a
503 Service Unavailable
-
-
Remove the Plugin After Testing:
-
Disable the plugin configuration from platform/plugins.yaml by setting the flag
enabled: false -
Reapply the configuration using the approach in Step 2
-
In this section, the Kong Gateway's resilience will be enhanced through the application of plugins and configurations, which in turn will enable the Gateway to better handle failures and improve performance.
Whilst there is no specific Kong plugin to enable this common software pattern, known as circuit-breaking, we can simulate this behaviour using health checks and load balancing.
- Prevent cascading failures by stopping requests to an unhealthy upstream service
- None; this is native Kong Gateway functionality
-
Configure Health Checks for the Routes Service:
Update platform/upstream.yaml to include the below configuration:
_format_version: "3.0" upstreams: - name: routes.kongair algorithm: round-robin targets: - tags: - resiliency target: routes.kongair:8080 weight: 100 hash_fallback: none hash_on: none hash_on_cookie_path: "/" healthchecks: active: concurrency: 10 healthy: http_statuses: [200, 302] interval: 5 successes: 5 http_path: "/health" https_verify_certificate: true timeout: 1 type: http unhealthy: http_failures: 5 http_statuses: [404, 429, 500, 501, 502, 503, 504, 505] interval: 5 tcp_failures: 5 timeouts: 0 passive: healthy: http_statuses: [200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 300, 301, 302, 303, 304, 305, 306, 307, 308] successes: 80 type: http unhealthy: http_failures: 5 http_statuses: [429, 500, 503] tcp_failures: 5 timeouts: 5 threshold: 0 slots: 10000 tags: - resiliency use_srv_name: false
-
Apply the Configuration:
Deploy the
upstreamconfiguration to the Gateway using the deploy-gateway.sh script. -
Simulate Upstream Failure:
-
Stop the
kongair-routescontainer:docker stop kongair-routes
-
-
Test the Circuit Breaker Behaviour with the
Get RoutesEndpoint:Using Insomnia
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. Run the request which will return an error.Using cURL
Run the command below in a terminal:
curl http://localhost:8000/routes
Expected Behaviour:
- Kong will return an error without attempting to connect to the unhealthy upstream
-
Restart the Microservice:
docker start kongair-routes
Once the microservice is back up move on to the next section, it may take a few seconds to detect the change by Kong Gateway.
In this section, the Kong Gateway configuration will be updated to contain configuration that will enable the Retry
and Timeout functionalities.
- Configure Kong to retry failed requests and set appropriate timeouts
- None - native functionality
-
Configure
retriesand the differenttimeoutdurations; each number is inmilliseconds.Update the defaults.yaml to include the below configuration which will set the
connect,write, andreadtimeouts to 1000, 2000, and 3000 respectively; setretriesequal to 2:_info: defaults: service: connect_timeout: 1000 read_timeout: 3000 retries: 2 write_timeout: 2000
-
Apply the Configuration:
Deploy the updated
defaultsconfiguration to the Gateway using the deploy-gateway.sh script. -
Simulate artificial delays in the upstream service to test Kong’s retry and timeout handling:
- In the docker-compose.yaml file find the
kongair-routesservice:
kongair-routes: container_name: ${KONG_MS_CONTAINER_NAME} hostname: routes.kongair restart: on-failure image: ${KONG_MS_IMAGE_REGISTRY}/${KONG_MS_IMAGE_NAME}:${KONG_MS_IMAGE_TAG}
-
Use the image with the incremented tag, e.g.
image: <REGISTRY>/kongair-routes:ws-delays-<DATE>, and remove/comment out the existing -
Run the deploy-gateway.sh script to redeploy the
kongair-routesservice that now includes a 10-second delay.
- In the docker-compose.yaml file find the
-
Test the Configuration with the
Get RoutesEndpoint:Using Insomnia
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. Run the request which will return an error.Using cURL
Run the command below in a terminal:
curl http://localhost:8000/routes
Expected Behaviour:
- Kong will return a timeout response within 3 seconds without forwarding the request to the upstream service
- If the response time exceeds the read timeout, Kong will retry the request based on the retry count field number
- Expected error response:
504 Gateway Time-out{ "message": "The upstream server is timing out", "request_id": "c21ff94651edee784ae55853cc8b3e71" }
-
Rollback the Microservice's image to the previous tag i.e.
kongair-routes:ws-<DATE>:- In the docker-compose.yaml file, navigate to the
kongair-routesservice again:
kongair-routes: image: <REGISTRY>/kongair-routes:ws-delays-<DATE> container_name: kongair-routes hostname: routes.kongair restart: on-failure networks: - kong-net ports: - "5053:80"
-
Use the previous image, e.g.
image: <REGISTRY>/kongair-routes:ws-<DATE>, and remove/comment out the incremented image version -
Run the deploy-gateway.sh script to redeploy the kongair-routes 1.1 service that now does not include a 10-second delay.
- In the docker-compose.yaml file, navigate to the
Here the Rate-Limting Advanced plugin will be applied to the gateway where we will set up additional configuration to
stop too many inbound requests from overwhelming the system.
- Protect upstream services by limiting the number of requests from clients
-
Enable the Rate Limiting Plugin:
Update platform/plugins.yaml:
- name: rate-limiting-advanced enabled: true config: hide_client_headers: false identifier: consumer limit: - 5 namespace: example_namespace strategy: local sync_rate: -1 window_size: - 30
-
Apply the Configuration:
Deploy the
rate-limiting-advancedplugin configuration to the Gateway using the deploy-gateway.sh script. -
Test the Rate Limiting functionality applied:
Using Insomnia
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. Sending the request the first 5 times in the space of a minute will result in a200responses, but the 6th request will return a 429 error.Using cURL
Run the command below in a terminal:
for i in {1..6}; do curl -i http://localhost:8000/routes done
Expected Behaviour:
- The first 5 requests should succeed
- The 6th request should return
429Too Many Requests
-
Remove the Plugin after testing:
-
Disable the plugin configuration from platform/plugins.yaml by setting
enabled: false -
Reapply the configuration using the approach in Step 2
-
In this section, the Proxy Caching Advanced plugin will be applied and help to reduce the amount of effort expended by
the Gateway when consumers request information from the same endpoints frequently.
- Reduce load on upstream services by caching responses
-
Enable the Proxy Caching Plugin:
Update the platform/plugins.yaml with the below configuration:
- name: proxy-cache-advanced enabled: true config: cache_ttl: 30 content_type: - application/json; charset=UTF-8 request_method: - GET response_code: - 200 strategy: memory
-
Apply the Configuration:
Deploy the
proxy-caching-advancedplugin configuration to the Gateway using the deploy-gateway.sh script. -
Test the Caching functionality applied:
Using Insomnia
In Insomnia, navigate to where the collection has been imported, then navigate to the
/routesendpoint. Running the request the first time will take longer than subsequent requests. The first request will take longer as it is fetching data from the upstream service, while subsequent requests will be served from the cache.Using cURL
Run the below in a terminal:
curl -i http://localhost:8000/routes curl -i http://localhost:8000/routes
Expected Behaviour:
- The first request fetches data from the upstream service - this is the only request that will take time
- Subsequent requests within 30 seconds are served from the cache and thus return faster
Verify via Response Headers:
- Check the
X-Cache-Statusheader; it should showMissfor the first request andHitfor all subsequent requests
-
Disable the plugin and apply the configuration using the approach outlined in Step 2
This custom plugin enables chaos engineering experiments to be configured on the Kong gateway.
The experiments supported are:
Request latency: introduce a latency into the request using the Gaussian distribution to simulate real-world lag spikesConnection aborts: forcibly close the connection to the Kong gateway without a responseCustom responses: allow for custom responses to be generated
-
Enable the Chaos Engineering Plugin:
- name: chaos-experiments enabled: true config: abort_request_probability: 0.25 custom_response_probability: 0.5 custom_response_status_codes: [400, 418, 502, 504] request_latency_correlation: 0.5 request_latency_debug_header: X-Kong-Latency-Debug-Header request_latency_jitter_ms: 150 request_latency_mean_ms: 250 request_latency_probability: 0.75
-
Apply the Configuration:
Deploy the
chaos engineeringplugin configuration to the Gateway using the deploy-gateway.sh script. -
Test the Chaos functionality:
for i in {1..10}; do curl http://localhost:8000/routes sleep 2 done
Expected Behaviour:
- Every request will return different response message and status code
-
Disable the plugin and apply the configuration using the deploy-gateway.sh script referenced in Step 2
The following plugins can further enhance API resilience but are not covered in this workshop due to time constraints. Each of these plugins provides unique functionalities that can complement the configurations discussed above.
- Ensures that incoming requests do not exceed a predefined size
- Prevents payloads that are too large from overwhelming services
- Example configuration:
- name: request-size-limiting config: allowed_payload_size: 128 require_content_length: false
Documentation linkwhere you can find out more about this plugin.
- Validates incoming requests against an OpenAPI 3.1 specification
- Ensures compliance with the defined API schema, reducing the risk of invalid or malicious data
- Example configuration:
- name: oas-validation service: routes-service config: api_spec: <OAS FILE LOCATION>
Documentation linkwhere you can find out more about this plugin.
- Protects against malicious JSON payloads
- Limits the depth, size, and structure of incoming JSON to prevent vulnerabilities
- Example configuration:
- name: json-threat-protection config: max_body_size: 10 max_container_depth: 1 max_object_entry_count: 2 max_object_entry_name_length: 3 max_array_element_count: 4 max_string_value_length: 5 enforcement_mode: block error_status_code: 400 error_message: BadRequest
Documentation linkwhere you can find out more about this plugin.
- Limits the number of requests a service can handle in a defined time window
- Helps to prevent overloading and safeguarding of backend services
- Example Configuration:
- name: service-protection service: routes-service config: window_size: - 30 window_type: sliding limit: - 5 namespace: example_namespace
Documentation linkwhere you can find out more about this plugin.
- Defends against SQL injection, XSS, and other common web vulnerabilities by sanitising input
- Ideal for APIs exposed to untrusted or public traffic
- Example configuration:
- name: injection-protection config: injection_types: - sql locations: - path_and_query enforcement_mode: block error_status_code: 400 error_message: Bad Request
Documentation linkwhere you can find out more about this plugin.
- Identifies and blocks automated traffic from bots
- Prevents abuse and preserves resources for legitimate users
- Example configuration:
- name: bot-detection service: routes-service config: deny: - "hello-world"
Documentation linkwhere you can find out more about this plugin.
- Validates incoming request data against user-defined schemas
- Protects APIs from malformed or malicious input
- Example configuration:
- name: request-validator config: body_schema: '[{"name":{"type": "string", "required": true}}]'
Documentation linkwhere you can find out more about this plugin.
This case study highlights how Kong Gateway was instrumental in mitigating a Distributed Denial of Service (DDoS) attack targeting an account-creation API. The attack exploited the API in order to overload a shared database, leading to degraded service across an entire region. Despite not having a dedicated Web Application Firewall (WAF), Kong Gateway’s flexibility, extensibility, and rapid response capabilities demonstrated its effectiveness in addressing evolving threats.
The first response involved implementing the rate-limiting-advanced plugin in order to restrict the number of requests
per IP. This measure provided temporary relief but was quickly circumvented as the attacker distributed the attack across
multiple IPs. This demonstrated the importance of layered defenses when handling threats that can adapt over time.
The file-log plugin was enabled on the Gateway in order to analyse real-time inbound traffic and help identify
patterns; the team detected a suspicious user-agent string in these requests. Using the bot-detection plugin, traffic
with the identified user-agent was blocked which significantly reduced system load. This showcased the value of real-time
monitoring and adaptive defense mechanisms.
Further analysis revealed a consistent malicious cookie signature. A custom pre-function script was created to block
requests containing this cookie pattern. This demonstrated how Kong Gateway’s extensibility allows for rapid, tailored
responses to specific attack vectors.
A custom pre-function (integrated with a third-party geolocation service) restricted traffic to approved regions. This
approach reduced unnecessary load and in the end limited the scope of the attack. Geo-restrictions proved to be a
valuable layer in the defense strategy.
To counter automated malicious requests, another pre-function was introduced to validate reCAPTCHA tokens. This one
ensured that only legitimate users could access the API whilst at the same time blocking bots, adding yet another
automated protection layer.
- Adaptive defense: The attackers' evolving tactics required continuous analysis and rapid adaptation of countermeasures
- Custom solutions: Kong Gateway’s extensibility (through plugins and pre-functions) enabled for custom defenses even without a WAF in place
- Layered security is essential: While Kong Gateway provided a robust response, making use of a dedicated WAF alongside it would further enhance defences against threats like these
Overall, this case study demonstrates how Kong Gateway strengthens API resiliency through a combination of features and strategies:
- Granular Traffic Control: Plugins such as the
rate-limiting-advancedallow for precise control over inbound request thresholds/limitations - Dynamic Threat Mitigation: Logging tools such as
file-logprovide visibility, enabling adaptive responses via plugins and pre-functions - Geographical Access Restrictions: Geo-IP filtering can limit traffic to specific regions, reducing exposure
- Automated Protections: Plugins like
bot-detectionand custom reCAPTCHA validation defend against automated attacks - Scalable and Extensible Defenses: Kong Gateway’s flexibility supports the ongoing development of new defenses to meet evolving threats
This case study explains the importance of leveraging Kong Gateway as a central component in API security strategies.
Its combination of plugins, logging tools and custom pre-functions makes it a versatile and powerful tool for mitigating
cyber threats. When complemented by a WAF, these capabilities form a robust, layered defense against sophisticated API
attacks. Organisations can rely on Kong Gateway to build a scalable, flexible and adaptive API resiliency framework,
which will ensure service continuity even in the face of ever-evolving cyberattacks.
In distributed systems failures are inevitable. Designing for failure involves being able to anticipate potential points of failure and effectively implement mechanisms that can handle them gracefully.
-
Implement upstream redirection, also known as circuit breakers, within Kong in order to prevent cascading failures when an upstream service becomes non-responsive
-
Use Kong’s health check functionality and load balancing features to route traffic away from unhealthy services
-
Configure active health checks to monitor the availability of upstream services
-
Set appropriate threshold for marking services as healthy (or unhealthy)
-
Idempotency - particularly important within financial services to ensure delivery of a request with retry functionality without being processed multiple times (Note: this is not within the scope of this workshop)
-
Fallback mechanisms - default responses when services are unavailable
-
Helps to prevent overloading services which have failed
-
Improves overall system stability
-
Enhances availability through infrastructure redundancy
Properly configured timeout and retry thresholds can prevent requests from hanging indefinitely, and helps to balance the load on upstream services.
-
Set connection, read, and write timeouts based on the expected response times of upstream services
-
Avoid overly long timeouts which can tie up resources unless required as part of an architecture design decision
-
Configure an appropriate/reasonable threshold of retries in order to most effectively handle transient failures
-
Be cautious with high retry counts as these can amplify load on already failing services
-
Improves system responsiveness
-
Reduce resource consumption during times of failure
Rate limiting functionality has been built to protect upstream services from dealing with excessive load, whether due to traffic spikes in customer demand or due to malicious activity from an unknown party.
- Implement the Rate Limiting Advanced plugin to protect backend services from traffic spikes
- Include rate limit headers in responses to let clients know their usage and limits
-
Use of the Rate Limiting Advanced plugin to enforce quotas for each consumer (services and routes can also have these limitations applied to them)
-
Ensures fair usage of the backend services
-
Protects upstream services from demand overload, e.g. customer influxes, Distributed Denial of Service (DDOS) attacks
-
Enforces usage policies and Service-Level Agreements (SLAs)
Caching reduces load on upstream services by serving repeated requests directly from the gateway, without re-querying the upstream.
-
Implement the Proxy Cache Advanced plugin to cache responses
-
Configure cache keys, TTLs and invalidation rules
-
Use memory caching for quick retrieval of information
-
Consider persistent caching by using a system like Redis for larger datasets
-
Decreases latency of responses
-
Reduces load on upstream services for repetitively accessed information
-
Improves the overall user experience with faster response times
Observability is critical for understanding system behaviour, diagnosing issues as well as ensuring that a systems’ performance metrics meet Service Level Agreements (SLAs).
-
Enable the Prometheus plugin to collect detailed metrics
-
Collect system metrics on attributes such as:
-
Request rates
-
Latencies
-
Error rates
-
Upstream health
-
-
Configure structured logging for easier analysis
-
The use of log aggregation tooling to centralise logs
-
The use of observability tools, like Grafana, to visualise metrics
-
Set up alerting systems at the infrastructure and API levels to proactively identify and resolve issues:
-
Infrastructure Alerting: Monitor system resources like CPU, memory and memory usage
-
API-Level Alerting: Track API performance indicators such as error rates and response times
-
-
Improves issue detection and resolution
-
Facilitates capacity planning
-
Ensures high observability standards across both infrastructure and application systems
Security is integral to reliability; a secure deployment reduces the risk of breaches that can lead to downtime or data loss.
-
The use of mTLS to secure communication between the Data Plane and the Control Plane
-
The regular rotation of certificates
-
Implementation of proper auth mechanisms such as:
-
API Keys
-
JSON Web Tokens (JWTs)
-
OAuth 2.0
-
-
Input Validation
-
Sanitise inputs in order to prevent injection attacks
-
Use of plugins such as Request Transformer to enforce input policies
-
Protects sensitive data
-
Ensures compliance with security standards
API resilience is essential for maintaining robust and reliable services in the face of failures and unpredictable traffic patterns. By leveraging Kong Gateway's native capabilities and advanced plugins, you can build systems that withstand partial outages, fail gracefully, and recover quickly. With proper configurations, such as circuit breakers, retries, rate limiting, and caching, coupled with enhanced observability and security, your APIs can deliver consistent performance and exceptional user experiences.
Start implementing these strategies today to future-proof your API infrastructure.
For connectivity issues please see the Troubleshooting document for common issues.
Alternatively, access our Support Page for FAQs as well as the option
to raise your own queries with our Support Team.
This workflow provides a safer and more controlled approach to cleaning up your environment, ensuring essential resources are not removed unintentionally.
Stop all running services and remove associated containers, networks, and volumes for the current project:
docker-compose down --volumes --remove-orphansPerform a safe clean-up of unused containers and networks without affecting other resources:
-
Remove all stopped containers:
docker container prune -f
-
Remove unused networks:
docker network prune -f
To clean up volumes, list and remove only unused ones:
-
List unused volumes:
docker volume ls -qf dangling=true
-
Remove unused volumes selectively:
docker volume rm $(docker volume ls -qf dangling=true)
Remove files generated by the project safely using git clean:
-
Preview untracked files and directories to be removed:
git clean -ndx
-
Clean untracked files and directories:
git clean -fdx
-
Exclude specific files or directories from deletion:
git clean -fdx --exclude=<file_or_dir_to_keep>
Alternatively, run the below to delete all untracked & unstaged files in one fell swoop
for f in $({git diff --name-only ; git ls-files --other --exclude-standard ; } | grep -v .idea); do
if [[ -f $f ]]
rm -rf ./$f
fi
doneAfter performing the clean-up, verify the environment to ensure no unwanted artifacts remain:
-
Check for running containers:
docker ps -a
-
List remaining images, volumes, and networks:
docker images docker volume ls docker network ls
-
Check your working directory:
git status
Congratulations! You have successfully completed the API Resilience Workshop. If you have any questions or need further assistance, please reach out to the workshop facilitators.
Copyright - Professional Services @ Kong Inc.