Cache on hardware controller VS StarWind cache

sadman · Wed Feb 27, 2013 6:32 am

Good day, everyone.

I use two node failover cluster with StarWind Native SAN for Hyper-V Free Edition. Servers have onboard SAS-controllers (HP Smart Array P420) with 1GB RAM. And no more tasks running on that servers.

With that configuration i need to enable StarWind cache or HW cache is enought?

Wed Feb 27, 2013 10:57 am

Hardware cache is ages behind StarWind cache.

1) Controller cache is limited in size. We use main system RAM (cheap these days) and StarWind cache can be HUGE. Controller cache is small and it's "island" of memory (no way to use for anything else except cache I/O on controller).

2) Controller cache is slow. It sits on the PCIe bus and StarWind cache sits on memory bus. In a nutshell: to process cached I/O request StarWind serves it immediately and with
controller cache it (request) has to travel down the storage stack with stack overhead and bus latency added (also system RAM is always faster then on-board cache).

3) Controller cache is not safe. Even with BBU there's only once instance of your data. With StarWind metadata is duplicated or triplicated so there are VERY low chances all the nodes will be down @ the same time and all UPS modules will also die @ the same time as well.

4) Controller cache is dumb. With virtual machines moved between hosts there's no way controller hardware cache will be used on the other node. Basically VM has to start "heating" cache from zero stage. VERY different compared to StarWind cache being distributed and synchronized between nodes.

So yes enable hardware controller cache if you have one but DO NOT expect it being used as a replacement for StarWind's caching.

sadman wrote:Good day, everyone.

I use two node failover cluster with StarWind Native SAN for Hyper-V Free Edition. Servers have onboard SAS-controllers (HP Smart Array P420) with 1GB RAM. And no more tasks running on that servers.

With that configuration i need to enable StarWind cache or HW cache is enought?

EinsteinTaylor · Wed Mar 13, 2013 3:21 am

With all due respect to the above reply, we are using controller cache, and disabling Starwind cache. We ran into a bad situation a while back ago where we were using Starwind cache instead of controller cache, and unfortunately our UPS did actually die causing an entire data center outage. We got lucky in that only 1 VM was corrupted, and lost, and it was a VM that was not in production yet, so no harm no foul, but it could have been way worse.

With a battery backed cache, and no starwind caching, we would have at least had all the I/O saved on the controller and ready to write out to disk. We have since replaced the UPS, disabled all starwind caching, and are using controller caching instead.

The other issue I would argue with is performance. For example, we use the PERC H710p RAID card, and am reliably getting 3.5GB/s reads and writes on the local disk. When sharing out as a target, the network quickly becomes the limiting factor unless you are running 10GB ethernet. Seeing as Starwind is the sweet spot for SMB's, I would say 10gig is somewhat unlikely.

Just my 2 cents anyways.

Wed Mar 13, 2013 3:47 am

StarWind nature is CLUSTERING. So for us running one a single node is a non-standard scenario. You definitely should be using duplication and triplication of your critical
components so provide HA and fault tolerance. With this scenario you'll never corrupt your data.

P.S. And run VM backup as it's complimentary rather then competitive to HA / fault tolerance.

EinsteinTaylor wrote:With all due respect to the above reply, we are using controller cache, and disabling Starwind cache. We ran into a bad situation a while back ago where we were using Starwind cache instead of controller cache, and unfortunately our UPS did actually die causing an entire data center outage. We got lucky in that only 1 VM was corrupted, and lost, and it was a VM that was not in production yet, so no harm no foul, but it could have been way worse.

With a battery backed cache, and no starwind caching, we would have at least had all the I/O saved on the controller and ready to write out to disk. We have since replaced the UPS, disabled all starwind caching, and are using controller caching instead.

The other issue I would argue with is performance. For example, we use the PERC H710p RAID card, and am reliably getting 3.5GB/s reads and writes on the local disk. When sharing out as a target, the network quickly becomes the limiting factor unless you are running 10GB ethernet. Seeing as Starwind is the sweet spot for SMB's, I would say 10gig is somewhat unlikely.

Just my 2 cents anyways.

EinsteinTaylor · Wed Mar 13, 2013 4:00 am

The scenario I mentioned was with an HA pair. The UPS had a catastrophic failure(rare, I know but it happens) and so both nodes dropped at the exact same time along with the rest of the rack.

Luckily we only lost one VM we didn't care about(or else we would have restored from tape) but it was enough to warn of the potential for other failures. This is not a knock against starwind and we are very happy with the product, but unless you can somehow guarantee you don't lose both nodes at same time(generator maybe) then I would recommend controller cache. Again, just our experience, and YMMV.

Wed Mar 13, 2013 4:03 am

Interesting... How did you manage to have both UPS being down @ the same time? It does not happen even with a brand new units.

EinsteinTaylor wrote:The scenario I mentioned was with an HA pair. The UPS had a catastrophic failure(rare, I know but it happens) and so both nodes dropped at the exact same time along with the rest of the rack.

Luckily we only lost one VM we didn't care about(or else we would have restored from tape) but it was enough to warn of the potential for other failures. This is not a knock against starwind and we are very happy with the product, but unless you can somehow guarantee you don't lose both nodes at same time(generator maybe) then I would recommend controller cache. Again, just our experience, and YMMV.

rrnworks · Wed Mar 13, 2013 5:59 pm

We have a Native SAN 2 node cluster and have been testing disk performance and high availability scenarios. We have some concerns, both of which relate to starwind write-back caching, on the disk performance penalty introduced by the SW SAN and shutdown scenarios requiring manual intervention.

1) With SW write-back cache enabled (on a 4 drive RAID 10 array on a Perc 6i with 512MB BBU), our large file 24GB copy write test to the SAN consistently starts out at 130 MB/sec but drops to under 50 MB/sec about halfway through (presumably when the write back cache is flushed). If we perform the same large file copy write on the native disk, it maintains about 100 MB/sec all the way through. File copy read tests with or without the SAN are each about 100 MB/sec. So it seems like the starwind SAN is introducing about a 50% write performance penalty hit vs. native disk - is this normal?

2) We also just learned that with SW write-back cache enabled, even if we gracefully shutdown both nodes (one at a time) the SAN will require us to "manually" start it again after turning both nodes back on. If we change to write-through cache, the SAN will start automatically in the same scenario? Any chance development can add an option to auto start the SAN if both nodes are shutdown gracefully and still use write back? Or another workaround?

Thanks,
Chris

Wed Mar 13, 2013 6:07 pm

1) No it's not normal so ask engieers to take a closer look at. Writing huge files is not a good way to learn VM hosting machine performance either and here's why: you simply trash the cache with really non-cacheable data and use it only to increase overhead, something NEVER happens in real production environment. It's like using a car with turbo attached and always driving at low rpm where turbo does not kick by design.

2) After shutting down both (read ALL) nodes in a cluster there's no reasonable way to know which node handles the most recent data. Something not related to cache and its type being used at all.

rrnworks wrote:We have a Native SAN 2 node cluster and have been testing disk performance and high availability scenarios. We have some concerns, both of which relate to starwind write-back caching, on the disk performance penalty introduced by the SW SAN and shutdown scenarios requiring manual intervention.

1) With SW write-back cache enabled (on a 4 drive RAID 10 array on a Perc 6i with 512MB BBU), our large file 24GB copy write test to the SAN consistently starts out at 130 MB/sec but drops to under 50 MB/sec about halfway through (presumably when the write back cache is flushed). If we perform the same large file copy write on the native disk, it maintains about 100 MB/sec all the way through. File copy read tests with or without the SAN are each about 100 MB/sec. So it seems like the starwind SAN is introducing about a 50% write performance penalty hit vs. native disk - is this normal?

2) We also just learned that with SW write-back cache enabled, even if we gracefully shutdown both nodes (one at a time) the SAN will require us to "manually" start it again after turning both nodes back on. If we change to write-through cache, the SAN will start automatically in the same scenario? Any chance development can add an option to auto start the SAN if both nodes are shutdown gracefully and still use write back? Or another workaround?

Thanks,
Chris

rrnworks · Wed Mar 13, 2013 6:21 pm

1) OK - I have worked with an engineer and they concluded it is normal. With that said, what kind of SAN read/write performance should we expect "in comparison" to native disk performance (with or without write-back enabled)?

2) OK - was hoping the 2nd starwind node that is shutdown could at least somehow flag itself as primary when it loses heartbeat to the 1st node that is shutdown?

Sat Mar 16, 2013 1:11 am

1) With flat image files you'll always get 70-80% of underlying physical storage. A bit more on random writes (as StarWind still DOES group writes) and a little less on a lot of random I/O (because of an increased pipeline length).

2) You're not the only one asked for this so we'll provide a functionality to log recent activities and provide user with selection (and scriptable choice) of whom to assign "master node" during sync.

rrnworks wrote:1) OK - I have worked with an engineer and they concluded it is normal. With that said, what kind of SAN read/write performance should we expect "in comparison" to native disk performance (with or without write-back enabled)?

2) OK - was hoping the 2nd starwind node that is shutdown could at least somehow flag itself as primary when it loses heartbeat to the 1st node that is shutdown?

KurianOfBorg · Wed Jun 05, 2013 3:31 pm

I have a much simpler case. I have a single server with an Adaptec 6805 RAID controller with a ZMCP protected 512MiB hardware cache. I am running the StarWind target on this server to provide iSCSI disks to a single client PC.

Data/filesystem integrity is my primary concern. Is it better to disable the StarWind cache completely in this case? There is never going to be more than 512MB of unflushed writes from a single client over 1Gbps Ethernet and the default StarWind cache size is only 128MiB. I risk losing 128MiB of data if the server crashes.

Wed Jun 05, 2013 4:06 pm

Do not disable StarWind cache as it will definitely affect the performance. Use Write-Thru policy (no ACK reported back to caller untill the data is actually written to the platters).

Law is simple: One node -> Write-Thru, many nodes -> Write-Back.

KurianOfBorg wrote:I have a much simpler case. I have a single server with an Adaptec 6805 RAID controller with a ZMCP protected 512MiB hardware cache. I am running the StarWind target on this server to provide iSCSI disks to a single client PC.

Data/filesystem integrity is my primary concern. Is it better to disable the StarWind cache completely in this case? There is never going to be more than 512MB of unflushed writes from a single client over 1Gbps Ethernet and the default StarWind cache size is only 128MiB. I risk losing 128MiB of data if the server crashes.

KurianOfBorg · Wed Jun 05, 2013 4:15 pm

In my case does write through actually mean to the platters or to the RAID controller's cache? In Windows' Device Manager the RAID volume doesn't support device write-cache at all but it's obvious when copying & pasting a big file that it's still in the hardware cache and not yet written to the disk for several seconds after Windows' file copy dialog has finished.

Wed Jun 05, 2013 4:25 pm

In your particular case - RAID controller cache of course.

KurianOfBorg wrote:In my case does write through actually mean to the platters or to the RAID controller's cache? In Windows' Device Manager the RAID volume doesn't support device write-cache at all but it's obvious when copying & pasting a big file that it's still in the hardware cache and not yet written to the disk for several seconds after Windows' file copy dialog has finished.

adamrose045 · Mon Oct 27, 2014 8:56 am

We have some concerns, both of which relate to starwind write-back caching, on the disk performance penalty introduced by the SW SAN and shutdown scenarios requiring manual intervention.

1) With SW write-back cache enabled (on a 4 drive RAID 10 array on a Perc 6i with 512MB BBU), our large file 24GB copy write test to the SAN consistently starts out at 130 MB/sec but drops to under 50 MB/sec about halfway through (presumably when the write back cache is flushed). If we perform the same large file copy write on the native disk, it maintains about 100 MB/sec all the way through. File copy read tests with or without the SAN are each about 100 MB/sec. So it seems like the starwind SAN is introducing about a 50% write performance penalty hit vs. native disk - is this normal?