Skip to content

Failed SSD leads to UNKNOWN instead of CRITICAL #110

@robert-scheck

Description

@robert-scheck

First of all, thank you very much for your check_smart. Unfortunately, it doesn't seem to catch an entirely failed SSD.

Example of a working SSD:

$ /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme0n1 -i nvme
OK: Drive  ATP NVMe M.2 2280 SSD S/N 12345678-901234: no SMART errors detected. |Temperature=36 Available_Spare=100 Available_Spare_Threshold=10 Percentage_Used=2 Data_Units_Read=78184185 Data_Units_Written=17963617 Host_Read_Commands=277085817 Host_Write_Commands=1235704305 Controller_Busy_Time=43946 Power_Cycles=33 Power_On_Hours=21848 Unsafe_Shutdowns=12 Media_and_Data_Integrity_Errors=0 Error_Information_Log_Entries=0 Warning__Comp_Temperature_Time=0 Critical_Comp_Temperature_Time=0 Temperature_Sensor_1=39 Temperature_Sensor_2=36 Temperature_Sensor_3=45 Temperature_Sensor_4=36 Temperature_Sensor_5=36 Temperature_Sensor_6=34

Example of a non-working SSD (entirely failed):

$ /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
UNKNOWN: Drive  S/N :  No health status line found, |

When looking to the non-working SSD directly using smartctl:

$ smartctl -a /dev/nvme1n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.14.0-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error
$ smartctl -x /dev/nvme1n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.14.0-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error 

However:

$ ls -l /dev/nvme1n1
brw-rw---- 1 root disk 259, 1 May 22 14:45 /dev/nvme1n1

Did I overlook a possibility to turn an entirely failed SSD into a CRITICAL instead of UNKNOWN? Or is there any chance to catch "NVME_IOCTL_ADMIN_CMD: Input/output error" and to turn this into an error?

Just let me know if you need further details and/or command outputs from the entirely failed SSD.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions