Solid-State-Drives are becoming widely implemented in ESXi hosts for caching (vFlash Read Cache, PernixData FVP), Virtual SAN or plain Datastores. Unfortunately, SSDs have limited lifetime per cell. Its value may range from 1.000 times in consumer TLC SSDs up to 100.000 times in enterprise SLC based SSDs. Lifetime can be estimated by device TBW parameters provided by vendor in its specification, It describes how many Terabytes can be written to the entire device, until the warranty expires.
As VMWare does not provide convenient and easy way to read RAW S.M.A.R.T values on ESXi hosts, a ported version of smartctl has been created, which is part of smartmontools to ESXi.
Below there is an example of an ESXi Host report without smartctl. The device analyzed is a Samsung SSD 850 EVO M.2 250GB used as a local Datastore. Warranty for this SSD is 75TBW.
ESXCLI can display S.M.A.R.T stats with
esxcli storage core device smart get -d [device]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# esxcli storage core device smart get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Parameter Value Threshold Worst ---------------------------- ----- --------- ----- Health Status OK N/A N/A Media Wearout Indicator N/A N/A N/A Write Error Count N/A N/A N/A Read Error Count N/A N/A N/A Power-on Hours 99 0 99 Power Cycle Count 99 0 99 Reallocated Sector Count 100 10 100 Raw Read Error Rate N/A N/A N/A Drive Temperature N/A N/A N/A Driver Rated Max Temperature 49 0 34 Write Sectors TOT Count 100 0 100 Read Sectors TOT Count N/A N/A N/A Initial Bad Block Count N/A N/A N/A |
The next table shows the stats provided by ESXCLI, which are a bit more verbose.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# esxcli storage core device stats get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Device: t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Successful Commands: 93483233 Blocks Read: 205579211 Blocks Written: 2123298938 Read Operations: 3240880 Write Operations: 90144369 Reserve Operations: 39107 Reservation Conflicts: 0 Failed Commands: 22 Failed Blocks Read: 0 Failed Blocks Written: 0 Failed Read Operations: 0 Failed Write Operations: 0 Failed Reserve Operations: 0 |
ESXi keeps track of all read and write operations to the disk, but the counters get reset when ESXi is rebooted.
And here is the report by smartctl:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# smartctl -d sat --all /dev/disks/t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO M.2 250GB Serial Number: S24BNXAG805065D LU WWN Device Id: 5 002538 d404b9f9f Firmware Version: EMT21B6Q User Capacity: 250,059,350,016 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed May 16 15:25:26 2016 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled [...] SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5039 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 35 177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always - 122 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 049 034 000 Old_age Always - 51 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 26 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 6343034492 |
In the SMART Attributes section, there is a Total_LBAs_Written value (ID #241). In order to get Terabytes, we need to multiply this value with the sector size (512 bytes) and divide by 1099511627776 (1024^4).
Total_LBAs_Written * Sector Size / 1024^4 = TBW
6343034492 * 512 / 1099511627776 = 2.95 TBW
That gives us 3 of 75 TBW. Taking into consideration the parameter Power_On_Hours (SMART ID #9), which tells us that the device has been in use for about 200 days, we may prognose that this SSD will last for the next 13 years.
The smartctl can be obtained from here:
http://www.virten.net/files/smartctl-6.6-4321.x86_64.vib
Note: The use of this VIB is totally unsupported, proceed at your own risk. Tested with ESXi only.
This is the review of an article.
Source: www.virten.net
Related materials: