Solid-State-Drives are becoming widely implemented in ESXi hosts for caching (vFlash Read Cache, PernixData FVP), Virtual SAN or plain Datastores. Unfortunately, SSDs have limited lifetime per cell. Its value may range from 1.000 times in consumer TLC SSDs up to 100.000 times in enterprise SLC based SSDs. Lifetime can be estimated by device TBW parameters provided by vendor in its specification, It describes how many Terabytes can be written to the entire device, until the warranty expires.
As VMWare does not provide convenient and easy way to read RAW S.M.A.R.T values on ESXi hosts, a ported version of smartctl has been created, which is part of smartmontools to ESXi.
Below there is an example of an ESXi Host report without smartctl. The device analyzed is a Samsung SSD 850 EVO M.2 250GB used as a local Datastore. Warranty for this SSD is 75TBW.
ESXCLI can display S.M.A.R.T stats with
esxcli storage core device smart get -d [device]
# esxcli storage core device smart get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator N/A N/A N/A
Write Error Count N/A N/A N/A
Read Error Count N/A N/A N/A
Power-on Hours 99 0 99
Power Cycle Count 99 0 99
Reallocated Sector Count 100 10 100
Raw Read Error Rate N/A N/A N/A
Drive Temperature N/A N/A N/A
Driver Rated Max Temperature 49 0 34
Write Sectors TOT Count 100 0 100
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count N/A N/A N/A
The next table shows the stats provided by ESXCLI, which are a bit more verbose.
# esxcli storage core device stats get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____
t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____
Device: t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____
Successful Commands: 93483233
Blocks Read: 205579211
Blocks Written: 2123298938
Read Operations: 3240880
Write Operations: 90144369
Reserve Operations: 39107
Reservation Conflicts: 0
Failed Commands: 22
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
ESXi keeps track of all read and write operations to the disk, but the counters get reset when ESXi is rebooted.
And here is the report by smartctl:
# smartctl -d sat --all /dev/disks/t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 850 EVO M.2 250GB
Serial Number: S24BNXAG805065D
LU WWN Device Id: 5 002538 d404b9f9f
Firmware Version: EMT21B6Q
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed May 16 15:25:26 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
[...]
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5039
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 35
177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always - 122
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 049 034 000 Old_age Always - 51
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 26
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 6343034492
In the SMART Attributes section, there is a Total_LBAs_Written value (ID #241). In order to get Terabytes, we need to multiply this value with the sector size (512 bytes) and divide by 1099511627776 (1024^4).
Total_LBAs_Written * Sector Size / 1024^4 = TBW
6343034492 * 512 / 1099511627776 = 2.95 TBW
That gives us 3 of 75 TBW. Taking into consideration the parameter Power_On_Hours (SMART ID #9), which tells us that the device has been in use for about 200 days, we may prognose that this SSD will last for the next 13 years.
The smartctl can be obtained from here:
http://www.virten.net/files/smartctl-6.6-4321.x86_64.vib
Note: The use of this VIB is totally unsupported, proceed at your own risk. Tested with ESXi only.
This is the review of an article.
Source: www.virten.net
Related materials: