Skip to content

Latest commit

 

History

History
276 lines (205 loc) · 7.43 KB

File metadata and controls

276 lines (205 loc) · 7.43 KB

Monitoring and Observability for Next.js

This directory contains configurations and tools for monitoring Next.js applications in production environments.

Directory Structure

monitoring/
├── prometheus/ - Prometheus configurations
│   ├── prometheus.yml - Main Prometheus configuration
│   └── rules/ - Alerting rules
│       └── nextjs-alerts.yml - Next.js specific alerting rules
│
├── grafana/ - Grafana configurations
│   ├── dashboards/ - Grafana dashboards
│   │   └── nextjs-dashboard.json - Next.js application dashboard
│   └── datasources/ - Grafana datasource configurations
│       └── prometheus.yml - Prometheus datasource
│
├── datadog/ - Datadog integration
│   └── datadog-values.yaml - Helm values for Datadog agent
│
└── newrelic/ - New Relic integration
    └── newrelic-values.yaml - Helm values for New Relic agent

Prometheus Configuration

The Prometheus configuration is set up to scrape metrics from:

  • Kubernetes API server
  • Kubernetes nodes
  • Kubernetes pods
  • Next.js application metrics
  • cAdvisor for container metrics
  • Node Exporter for host metrics

Scraping Next.js Metrics

To expose metrics from your Next.js application, you need to:

  1. Add a metrics endpoint to your Next.js application
  2. Annotate your Kubernetes pods for Prometheus discovery

Adding a Metrics Endpoint

Create an API route in your Next.js application at pages/api/metrics.js:

import { register } from 'prom-client';
import { collectDefaultMetrics } from 'prom-client';

// Collect default metrics
collectDefaultMetrics({ prefix: 'nextjs_' });

// Create custom metrics
const httpRequestsTotal = new register.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

const httpRequestDurationSeconds = new register.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.03, 0.1, 0.3, 1, 3, 10]
});

// Example of recording metrics in your application
export function recordMetrics(req, res, time) {
  const route = req.url;
  const statusCode = res.statusCode;
  const method = req.method;
  
  httpRequestsTotal.inc({ method, route, status_code: statusCode });
  httpRequestDurationSeconds.observe(
    { method, route, status_code: statusCode },
    time
  );
}

// Metrics endpoint
export default async function handler(req, res) {
  res.setHeader('Content-Type', register.contentType);
  res.send(await register.metrics());
}

Annotating Kubernetes Pods

Add the following annotations to your Kubernetes deployment:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/api/metrics"
    prometheus.io/port: "3000"

Grafana Dashboard

The included Grafana dashboard provides:

  • Application overview (running pods, CPU/memory usage, network traffic)
  • HTTP metrics (request rate by status code, request duration by route)
  • Resource usage (CPU and memory usage by pod)
  • Next.js specific metrics (render time by page, API route duration)

Dashboard Features

  • Annotations: Pod restarts and deployment unavailability
  • Metrics: Both standard Kubernetes metrics and Next.js specific metrics
  • Visualization: Time-series graphs and stat panels
  • Filtering: By namespace and application

Integration with Third-Party Services

Datadog Integration

To integrate with Datadog:

  1. Install the Datadog agent using Helm:
helm repo add datadog https://helm.datadoghq.com
helm install datadog -f monitoring/datadog/datadog-values.yaml datadog/datadog
  1. Configure your Next.js application to send metrics to Datadog:
import { datadogRum } from '@datadog/browser-rum';

datadogRum.init({
  applicationId: '<DATADOG_APPLICATION_ID>',
  clientToken: '<DATADOG_CLIENT_TOKEN>',
  site: 'datadoghq.com',
  service: 'nextjs-app',
  env: process.env.NODE_ENV,
  version: '1.0.0',
  sampleRate: 100,
  trackInteractions: true
});

New Relic Integration

To integrate with New Relic:

  1. Install the New Relic agent using Helm:
helm repo add newrelic https://helm-charts.newrelic.com
helm install newrelic-bundle -f monitoring/newrelic/newrelic-values.yaml newrelic/nri-bundle
  1. Configure your Next.js application to send metrics to New Relic:
import newrelic from 'newrelic';

// Instrument your Next.js application
export function middleware(req, res, next) {
  const startTime = process.hrtime();
  
  res.on('finish', () => {
    const [seconds, nanoseconds] = process.hrtime(startTime);
    const duration = seconds * 1000 + nanoseconds / 1000000;
    
    newrelic.recordMetric(`Custom/Route/${req.url}`, duration);
    newrelic.recordCustomEvent('NextjsRequest', {
      route: req.url,
      method: req.method,
      statusCode: res.statusCode,
      duration
    });
  });
  
  next();
}

Custom Next.js Metrics

Page Render Time

Track the time it takes to render each page:

import { register } from 'prom-client';

const renderDuration = new register.Histogram({
  name: 'nextjs_render_duration_seconds',
  help: 'Duration of page renders in seconds',
  labelNames: ['page'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 3]
});

export function getServerSideProps(context) {
  const start = process.hrtime();
  
  // Your normal getServerSideProps code here
  const data = fetchData();
  
  const [seconds, nanoseconds] = process.hrtime(start);
  const duration = seconds + nanoseconds / 1e9;
  
  renderDuration.observe({ page: context.resolvedUrl }, duration);
  
  return { props: { data } };
}

API Route Performance

Track the performance of your API routes:

import { register } from 'prom-client';

const apiDuration = new register.Histogram({
  name: 'nextjs_api_duration_seconds',
  help: 'Duration of API requests in seconds',
  labelNames: ['route', 'method', 'status'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 3]
});

export default async function handler(req, res) {
  const start = process.hrtime();
  
  // Your API route code here
  const data = await processRequest(req);
  res.status(200).json(data);
  
  const [seconds, nanoseconds] = process.hrtime(start);
  const duration = seconds + nanoseconds / 1e9;
  
  apiDuration.observe(
    { route: req.url, method: req.method, status: res.statusCode },
    duration
  );
}

Best Practices

Security

  • Use TLS for all monitoring endpoints
  • Implement authentication for Prometheus and Grafana
  • Restrict access to monitoring dashboards
  • Be careful about exposing sensitive information in metrics

Performance

  • Set appropriate scrape intervals to balance data granularity and performance
  • Use recording rules for complex queries
  • Implement rate limiting for metrics endpoints
  • Consider the performance impact of instrumentation

Alerting

  • Set up alerts for critical metrics
  • Configure proper notification channels
  • Implement runbooks for common alerts
  • Avoid alert fatigue with proper thresholds

Common Pitfalls

  1. Cardinality Explosion: Avoid high-cardinality labels in metrics
  2. Resource Consumption: Monitor the resource usage of your monitoring stack
  3. Alert Fatigue: Too many alerts can lead to ignoring important ones
  4. Incomplete Coverage: Ensure all critical components are monitored
  5. Metric Naming: Use consistent naming conventions for metrics