This directory contains configurations and tools for monitoring Next.js applications in production environments.
monitoring/
├── prometheus/ - Prometheus configurations
│ ├── prometheus.yml - Main Prometheus configuration
│ └── rules/ - Alerting rules
│ └── nextjs-alerts.yml - Next.js specific alerting rules
│
├── grafana/ - Grafana configurations
│ ├── dashboards/ - Grafana dashboards
│ │ └── nextjs-dashboard.json - Next.js application dashboard
│ └── datasources/ - Grafana datasource configurations
│ └── prometheus.yml - Prometheus datasource
│
├── datadog/ - Datadog integration
│ └── datadog-values.yaml - Helm values for Datadog agent
│
└── newrelic/ - New Relic integration
└── newrelic-values.yaml - Helm values for New Relic agent
The Prometheus configuration is set up to scrape metrics from:
- Kubernetes API server
- Kubernetes nodes
- Kubernetes pods
- Next.js application metrics
- cAdvisor for container metrics
- Node Exporter for host metrics
To expose metrics from your Next.js application, you need to:
- Add a metrics endpoint to your Next.js application
- Annotate your Kubernetes pods for Prometheus discovery
Create an API route in your Next.js application at pages/api/metrics.js:
import { register } from 'prom-client';
import { collectDefaultMetrics } from 'prom-client';
// Collect default metrics
collectDefaultMetrics({ prefix: 'nextjs_' });
// Create custom metrics
const httpRequestsTotal = new register.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
});
const httpRequestDurationSeconds = new register.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.01, 0.03, 0.1, 0.3, 1, 3, 10]
});
// Example of recording metrics in your application
export function recordMetrics(req, res, time) {
const route = req.url;
const statusCode = res.statusCode;
const method = req.method;
httpRequestsTotal.inc({ method, route, status_code: statusCode });
httpRequestDurationSeconds.observe(
{ method, route, status_code: statusCode },
time
);
}
// Metrics endpoint
export default async function handler(req, res) {
res.setHeader('Content-Type', register.contentType);
res.send(await register.metrics());
}Add the following annotations to your Kubernetes deployment:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/api/metrics"
prometheus.io/port: "3000"The included Grafana dashboard provides:
- Application overview (running pods, CPU/memory usage, network traffic)
- HTTP metrics (request rate by status code, request duration by route)
- Resource usage (CPU and memory usage by pod)
- Next.js specific metrics (render time by page, API route duration)
- Annotations: Pod restarts and deployment unavailability
- Metrics: Both standard Kubernetes metrics and Next.js specific metrics
- Visualization: Time-series graphs and stat panels
- Filtering: By namespace and application
To integrate with Datadog:
- Install the Datadog agent using Helm:
helm repo add datadog https://helm.datadoghq.com
helm install datadog -f monitoring/datadog/datadog-values.yaml datadog/datadog- Configure your Next.js application to send metrics to Datadog:
import { datadogRum } from '@datadog/browser-rum';
datadogRum.init({
applicationId: '<DATADOG_APPLICATION_ID>',
clientToken: '<DATADOG_CLIENT_TOKEN>',
site: 'datadoghq.com',
service: 'nextjs-app',
env: process.env.NODE_ENV,
version: '1.0.0',
sampleRate: 100,
trackInteractions: true
});To integrate with New Relic:
- Install the New Relic agent using Helm:
helm repo add newrelic https://helm-charts.newrelic.com
helm install newrelic-bundle -f monitoring/newrelic/newrelic-values.yaml newrelic/nri-bundle- Configure your Next.js application to send metrics to New Relic:
import newrelic from 'newrelic';
// Instrument your Next.js application
export function middleware(req, res, next) {
const startTime = process.hrtime();
res.on('finish', () => {
const [seconds, nanoseconds] = process.hrtime(startTime);
const duration = seconds * 1000 + nanoseconds / 1000000;
newrelic.recordMetric(`Custom/Route/${req.url}`, duration);
newrelic.recordCustomEvent('NextjsRequest', {
route: req.url,
method: req.method,
statusCode: res.statusCode,
duration
});
});
next();
}Track the time it takes to render each page:
import { register } from 'prom-client';
const renderDuration = new register.Histogram({
name: 'nextjs_render_duration_seconds',
help: 'Duration of page renders in seconds',
labelNames: ['page'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 3]
});
export function getServerSideProps(context) {
const start = process.hrtime();
// Your normal getServerSideProps code here
const data = fetchData();
const [seconds, nanoseconds] = process.hrtime(start);
const duration = seconds + nanoseconds / 1e9;
renderDuration.observe({ page: context.resolvedUrl }, duration);
return { props: { data } };
}Track the performance of your API routes:
import { register } from 'prom-client';
const apiDuration = new register.Histogram({
name: 'nextjs_api_duration_seconds',
help: 'Duration of API requests in seconds',
labelNames: ['route', 'method', 'status'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 3]
});
export default async function handler(req, res) {
const start = process.hrtime();
// Your API route code here
const data = await processRequest(req);
res.status(200).json(data);
const [seconds, nanoseconds] = process.hrtime(start);
const duration = seconds + nanoseconds / 1e9;
apiDuration.observe(
{ route: req.url, method: req.method, status: res.statusCode },
duration
);
}- Use TLS for all monitoring endpoints
- Implement authentication for Prometheus and Grafana
- Restrict access to monitoring dashboards
- Be careful about exposing sensitive information in metrics
- Set appropriate scrape intervals to balance data granularity and performance
- Use recording rules for complex queries
- Implement rate limiting for metrics endpoints
- Consider the performance impact of instrumentation
- Set up alerts for critical metrics
- Configure proper notification channels
- Implement runbooks for common alerts
- Avoid alert fatigue with proper thresholds
- Cardinality Explosion: Avoid high-cardinality labels in metrics
- Resource Consumption: Monitor the resource usage of your monitoring stack
- Alert Fatigue: Too many alerts can lead to ignoring important ones
- Incomplete Coverage: Ensure all critical components are monitored
- Metric Naming: Use consistent naming conventions for metrics