Skip to content

[Feature Bounty] System Temperature Monitoring in the Unraid API #1597

@elibosley

Description

@elibosley

Temperature Monitoring System Feature Request

Is your feature request related to a problem?

Currently, the Unraid API provides limited temperature monitoring capabilities. While disk temperatures are available through the DisksService using smartctl, there's no comprehensive temperature monitoring system that covers:

  • CPU temperatures (package and per-core)
  • Motherboard temperatures
  • GPU temperatures
  • NVMe drive temperatures
  • Chipset temperatures
  • System-wide temperature aggregation and alerts

Users need a unified API to monitor all temperature sensors for system health monitoring, alerting, and integration with third-party monitoring solutions like Grafana, Home Assistant, or custom dashboards.

Describe the solution you'd like

A comprehensive temperature monitoring system integrated into the Unraid API that:

  1. Provides real-time temperature data for all available sensors

  2. Supports multiple temperature sources:

    • CPU (package temp, per-core temps)
    • Motherboard sensors (chipset, VRM, ambient)
    • GPU temperatures
    • Storage devices (HDDs, SSDs, NVMe)
    • Custom/additional sensors via IPMI or USB sensors
  3. Features GraphQL queries and subscriptions for:

    • Current temperature readings
    • Historical temperature data (with configurable retention)
    • Temperature alerts and thresholds
    • Real-time temperature updates via subscriptions
  4. Supports multiple monitoring tools:

    • lm-sensors for motherboard/CPU sensors
    • smartctl for disk temperatures (already partially implemented)
    • nvidia-smi for NVIDIA GPU temperatures
    • sensors from IPMI (if available)

Describe alternatives you've considered

  1. External monitoring stacks (Telegraf/InfluxDB/Grafana) - Requires additional containers and configuration complexity
  2. SNMP monitoring - Limited sensor support and requires additional SNMP configuration
  3. Dynamix System Temp plugin - WebGUI only, no API access
  4. Custom scripts - No standardized API, difficult to maintain

Additional context

Binary Management Strategy

IMPORTANT: Temperature monitoring binaries should be downloaded from safe sources during the plugin build process and included in the plugin's TXZ package:

// Enhancement to the plugin build process
// Location: plugin/builder/build-txz.ts

// Add function to download monitoring tools during build
const downloadMonitoringTools = async (targetDir: string) => {
  console.log("Downloading temperature monitoring tools from safe sources...");
  
  const tools = [
    {
      name: 'sensors',
      url: 'https://github.com/lm-sensors/lm-sensors/releases/download/v3.6.0/sensors-3.6.0-x86_64',
      sha256: 'abc123...', // Verify integrity
    },
    {
      name: 'smartctl', 
      url: 'https://sourceforge.net/projects/smartmontools/files/smartmontools/7.4/smartctl-7.4-x86_64',
      sha256: 'def456...', // Verify integrity
    },
    {
      name: 'nvidia-smi',
      url: 'https://developer.nvidia.com/downloads/nvidia-smi-545.29.06-x86_64',
      sha256: 'ghi789...', // Verify integrity
    }
  ];

  const monitoringDir = join(targetDir, 'usr/local/emhttp/plugins/unraid-api/monitoring');
  await fs.mkdir(monitoringDir, { recursive: true });

  for (const tool of tools) {
    console.log(`Downloading ${tool.name}...`);
    const response = await fetch(tool.url);
    const buffer = await response.arrayBuffer();
    
    // Verify SHA256 checksum
    const hash = crypto.createHash('sha256');
    hash.update(Buffer.from(buffer));
    if (hash.digest('hex') !== tool.sha256) {
      throw new Error(`Checksum verification failed for ${tool.name}`);
    }
    
    // Save binary
    const toolPath = join(monitoringDir, tool.name);
    await fs.writeFile(toolPath, Buffer.from(buffer));
    await fs.chmod(toolPath, 0o755);
    
    console.log(`✓ ${tool.name} downloaded and verified`);
  }
};

// Call during TXZ build process
await downloadMonitoringTools(sourceDir);

This approach:

  • Downloads binaries from trusted, official sources during build time
  • Verifies integrity using SHA256 checksums
  • Includes binaries in the plugin TXZ package
  • Ensures consistent versions across all installations
  • Avoids runtime downloads or system package dependencies
  • Maintains security by verifying all downloaded binaries

Integration with Existing Metrics Module

Temperature monitoring should be integrated into the existing MetricsResolver rather than creating a separate module:

// Extend existing MetricsResolver
// Location: api/src/unraid-api/graph/resolvers/metrics/

metrics/
├── metrics.module.ts              # Update to include temperature service
├── metrics.resolver.ts            # Extend with temperature fields
├── metrics.model.ts              # Add temperature types
├── temperature/
   ├── temperature.service.ts    # Core temperature service
   ├── temperature.model.ts      # Temperature-specific models
   └── sensors/
       ├── sensor.interface.ts
       ├── lm-sensors.service.ts
       ├── smartctl.service.ts
       └── gpu.service.ts
└── __tests__/
    └── temperature.service.spec.ts

NestJS Model Extensions (Integrated with Metrics)

// Location: api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.model.ts

import { Field, Float, Int, ObjectType, registerEnumType } from '@nestjs/graphql';
import { Node } from '@unraid/shared/graphql.model.js';
import { IsEnum, IsNumber, IsOptional, IsString } from 'class-validator';

export enum TemperatureUnit {
  CELSIUS = 'CELSIUS',
  FAHRENHEIT = 'FAHRENHEIT',
}

registerEnumType(TemperatureUnit, {
  name: 'TemperatureUnit',
});

export enum TemperatureStatus {
  NORMAL = 'NORMAL',
  WARNING = 'WARNING',
  CRITICAL = 'CRITICAL',
  UNKNOWN = 'UNKNOWN',
}

registerEnumType(TemperatureStatus, {
  name: 'TemperatureStatus',
});

export enum SensorType {
  CPU_PACKAGE = 'CPU_PACKAGE',
  CPU_CORE = 'CPU_CORE',
  MOTHERBOARD = 'MOTHERBOARD',
  CHIPSET = 'CHIPSET',
  GPU = 'GPU',
  DISK = 'DISK',
  NVME = 'NVME',
  AMBIENT = 'AMBIENT',
  VRM = 'VRM',
  CUSTOM = 'CUSTOM',
}

registerEnumType(SensorType, {
  name: 'SensorType',
  description: 'Type of temperature sensor',
});

@ObjectType()
export class Temperature {
  @Field(() => Float, { description: 'Temperature value' })
  @IsNumber()
  value!: number;

  @Field(() => TemperatureUnit, { description: 'Temperature unit' })
  @IsEnum(TemperatureUnit)
  unit!: TemperatureUnit;

  @Field(() => Date, { description: 'Timestamp of reading' })
  timestamp!: Date;

  @Field(() => TemperatureStatus, { description: 'Temperature status' })
  @IsEnum(TemperatureStatus)
  status!: TemperatureStatus;
}

@ObjectType({ implements: () => Node })
export class TemperatureSensor extends Node {
  @Field(() => String, { description: 'Sensor name' })
  @IsString()
  name!: string;

  @Field(() => SensorType, { description: 'Type of sensor' })
  @IsEnum(SensorType)
  type!: SensorType;

  @Field(() => String, { nullable: true, description: 'Physical location' })
  @IsOptional()
  @IsString()
  location?: string;

  @Field(() => Temperature, { description: 'Current temperature' })
  current!: Temperature;

  @Field(() => Temperature, { nullable: true, description: 'Minimum recorded' })
  @IsOptional()
  min?: Temperature;

  @Field(() => Temperature, { nullable: true, description: 'Maximum recorded' })
  @IsOptional()
  max?: Temperature;

  @Field(() => Float, { nullable: true, description: 'Warning threshold' })
  @IsOptional()
  @IsNumber()
  warning?: number;

  @Field(() => Float, { nullable: true, description: 'Critical threshold' })
  @IsOptional()
  @IsNumber()
  critical?: number;
}

@ObjectType()
export class TemperatureSummary {
  @Field(() => Float, { description: 'Average temperature across all sensors' })
  @IsNumber()
  average!: number;

  @Field(() => TemperatureSensor, { description: 'Hottest sensor' })
  hottest!: TemperatureSensor;

  @Field(() => TemperatureSensor, { description: 'Coolest sensor' })
  coolest!: TemperatureSensor;

  @Field(() => Int, { description: 'Count of sensors at warning level' })
  @IsNumber()
  warningCount!: number;

  @Field(() => Int, { description: 'Count of sensors at critical level' })
  @IsNumber()
  criticalCount!: number;
}

@ObjectType({ implements: () => Node })
export class TemperatureMetrics extends Node {
  @Field(() => [TemperatureSensor], { description: 'All temperature sensors' })
  sensors!: TemperatureSensor[];

  @Field(() => TemperatureSummary, { description: 'Temperature summary' })
  summary!: TemperatureSummary;
}

// Extend existing Metrics model
// Location: api/src/unraid-api/graph/resolvers/metrics/metrics.model.ts

import { TemperatureMetrics } from './temperature/temperature.model.js';

@ObjectType({ implements: () => Node })
export class Metrics extends Node {
  // ... existing fields ...

  @Field(() => TemperatureMetrics, { 
    nullable: true, 
    description: 'Temperature metrics' 
  })
  temperature?: TemperatureMetrics;
}

Binary/Tool Setup Requirements

// temperature.service.ts - Use plugin-bundled binaries
import { join } from 'path';
import { ConfigService } from '@nestjs/config';

export class TemperatureService implements OnModuleInit {
  private readonly binPath: string;
  private availableTools: Map<string, string> = new Map();
  
  constructor(private readonly configService: ConfigService) {
    // Use binaries bundled with the plugin
    this.binPath = this.configService.get(
      'API_MONITORING_BIN_PATH', 
      '/usr/local/emhttp/plugins/unraid-api/monitoring'
    );
  }
  
  async onModuleInit() {
    // Use bundled binaries instead of system tools
    await this.initializeBundledTools();
    
    // Initialize sensor detection for available tools
    if (this.availableTools.has('sensors')) {
      await this.initializeLmSensors();
    }
    
    if (this.availableTools.has('smartctl')) {
      // Already available through DisksService
    }
    
    if (this.availableTools.has('nvidia-smi')) {
      await this.initializeNvidiaMonitoring();
    }
  }
  
  private async initializeBundledTools(): Promise<void> {
    const tools = [
      'sensors',     // lm-sensors
      'smartctl',    // smartmontools
      'nvidia-smi',  // NVIDIA driver
      'ipmitool',    // IPMI tools
    ];
    
    for (const tool of tools) {
      const toolPath = join(this.binPath, tool);
      try {
        await execa(toolPath, ['--version']);
        this.availableTools.set(tool, toolPath);
        this.logger.log(`Temperature tool available: ${tool} at ${toolPath}`);
      } catch {
        this.logger.warn(`Temperature tool not found: ${tool}`);
      }
    }
  }
  
  // Use bundled binary paths for all executions
  private async execTool(toolName: string, args: string[]): Promise<string> {
    const toolPath = this.availableTools.get(toolName);
    if (!toolPath) {
      throw new Error(`Tool ${toolName} not available`);
    }
    const { stdout } = await execa(toolPath, args);
    return stdout;
  }
}

Integration with Existing MetricsResolver

// Extend existing MetricsResolver (don't create separate resolver)
// Location: api/src/unraid-api/graph/resolvers/metrics/metrics.resolver.ts

@Resolver(() => Metrics)
export class MetricsResolver implements OnModuleInit {
  constructor(
    private readonly cpuService: CpuService,
    private readonly memoryService: MemoryService,
    private readonly temperatureService: TemperatureService, // Add temperature service
    private readonly subscriptionTracker: SubscriptionTrackerService,
    private readonly subscriptionHelper: SubscriptionHelperService
  ) {}

  onModuleInit() {
    // Existing CPU and Memory polling...
    
    // Add temperature polling with 5 second interval
    this.subscriptionTracker.registerTopic(
      PUBSUB_CHANNEL.TEMPERATURE_METRICS,
      async () => {
        const payload = await this.temperatureService.getMetrics();
        pubsub.publish(PUBSUB_CHANNEL.TEMPERATURE_METRICS, { 
          systemMetricsTemperature: payload 
        });
      },
      5000
    );
  }

  // Add temperature field to Metrics type
  @ResolveField(() => TemperatureMetrics, { nullable: true })
  public async temperature(): Promise<TemperatureMetrics> {
    return this.temperatureService.getMetrics();
  }

  // Add temperature subscription following existing pattern
  @Subscription(() => TemperatureMetrics, {
    name: 'systemMetricsTemperature',
    resolve: (value) => value.systemMetricsTemperature,
  })
  @UsePermissions({
    action: AuthActionVerb.READ,
    resource: Resource.INFO,
    possession: AuthPossession.ANY,
  })
  public async systemMetricsTemperatureSubscription() {
    return this.subscriptionHelper.createTrackedSubscription(
      PUBSUB_CHANNEL.TEMPERATURE_METRICS
    );
  }
}

// Update MetricsModule
@Module({
  imports: [ServicesModule],
  providers: [
    MetricsResolver,
    CpuService,
    MemoryService,
    TemperatureService, // Add temperature service
  ],
  exports: [MetricsResolver],
})
export class MetricsModule {}

Configuration Options

// api/dev/configs/api.json additions
{
  "temperature": {
    "enabled": true,
    "polling_interval": 5000,
    "history_retention": 86400,
    "default_unit": "celsius",
    "thresholds": {
      "cpu_warning": 70,
      "cpu_critical": 85,
      "disk_warning": 50,
      "disk_critical": 60,
      "gpu_warning": 80,
      "gpu_critical": 90
    },
    "sensors": {
      "lm_sensors": {
        "enabled": true,
        "config_path": "/etc/sensors3.conf"
      },
      "smartctl": {
        "enabled": true
      },
      "gpu": {
        "enabled": true,
        "nvidia": true,
        "amd": false
      },
      "ipmi": {
        "enabled": false,
        "host": "localhost",
        "username": "",
        "password": ""
      }
    }
  }
}

Environment (if relevant)

Unraid OS Version: 6.12+ (requires lm-sensors support)

Pre-submission Checklist

  • I have searched existing issues to ensure this feature hasn't already been requested
  • This is not an Unraid Connect related feature
  • I have provided clear examples and implementation details for the feature

Bounty Development Guidelines

For developers interested in implementing this feature:

  1. Download binaries during plugin build - Use build-txz.ts to fetch from safe sources
  2. Integrate with existing MetricsResolver - Don't create a separate temperature module
  3. Start with the core TemperatureService that aggregates data from bundled tools
  4. Implement lm-sensors integration first as it provides the most sensor coverageno
  5. Enhance the existing DisksService temperature monitoring with history tracking
  6. Add GPU temperature support (NVIDIA first, AMD if feasible)
  7. Extend MetricsResolver with temperature fields and subscriptions
  8. Add comprehensive unit tests for all services
  9. Document the API endpoints and configuration options
  10. Consider performance - use caching where appropriate to avoid excessive tool invocations
  11. Follow existing patterns in the codebase (especially systemMetricsCpu/systemMetricsMemory)
  12. Make the feature modular - gracefully handle missing tools

Binary Management

The plugin build process (build-txz.ts) should:

  • Download monitoring tools from official, trusted sources
  • Verify SHA256 checksums for security
  • Include binaries in the plugin TXZ at /usr/local/emhttp/plugins/unraid-api/monitoring/
  • Set proper executable permissions
  • Ensure compatibility across different Unraid versions

Testing Requirements

  • Unit tests for all services
  • Integration tests for GraphQL resolvers
  • Mock data for systems without temperature sensors
  • Performance tests to ensure polling doesn't impact system performance
  • Test with bundled binaries on different Unraid versions

Deliverables

  1. Temperature service implementation within metrics module
  2. Enhancement to build-txz.ts for downloading monitoring tools
  3. NestJS models with GraphQL decorators for temperature types
  4. Unit and integration tests
  5. Documentation (API docs and configuration guide)
  6. Example GraphQL queries for temperature data

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Unclaimed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions