Hyper-Converged Hyper-V solution using Starwind across two buildings
When I first started working for my current employer back in 2013 one of the first projects was to address business continuity concerns. The Brief was very… brief… no details other than “we have another building on the site which is linked by fibre optic, if our current physical servers had a problem we want to be back up and running in under 4 hours”. No guidance on what we should use or how to accomplish this, so off I went in search for solutions.
We already had a rather expensive quote for a replication (async) solution but I knew I could get a better solution for that same price, in my view asynchronous replication should be a secondary layer to how you achieve business continuity. I already knew that we should move from physical to virtual machines, and I knew Windows Server 2012/R2 would allow us to make the VMs highly available. I also knew my option with Windows Storage Spaces but I needed at least 3 nodes to accomplish this, plus JBODs close by – there seemed to be no obvious way to do what I wanted unless I implemented two expensive traditional SANs and configured synchronous sync between them, in which case it would have blown my idea out the water – it just wouldn’t be feasible.
I also knew that there were products like Veeam backup and replication and hyper-V replica, but these would require manual intervention if there was a failure, and ideally I wanted to reuse older hardware for this kind of layer in the business continuity plan (some minor upgrades of RAM and disks would be fine). I had posted a question online about how to get active-active storage across a distance, there must be a way large datacenters do it so I assumed this kind of thing would be commonplace in 2013/2014. Then someone from StarWind posted a reply and from then on I knew I needed a virtual SAN, but there are a few to choose from. I downloaded a trial of StarWind, and had the demo and it looked nice and easy, simplistic interface, a few clicks and the storage is setup. I tried another product but there were way too many hoops to jump through just to get a trial, and even then I didn’t know where to start on using the product – it was over complicated. So I continued my investigation into StarWind, and found the pre-sales technical support extremely useful. We ended up having quite a lengthy trial period whilst waiting for other parts of the environment to move forwards, finance to become available for the new kit and so on, but the support guys were extremely professional, patient and gave me everything I needed to continue my evaluation (and another, and another, and several more). The thought had entered my mind “if this is the level of support I get before I even pay them for anything then surely the support will be even better after I brought the product” – I wasn’t wrong, that excellent level of support continues today.
We had a rather unique type of requirement which I don’t think any other customers of StarWind had at the time. We wanted to cross a gap of 230 meters between our buildings, a single CAT 6 cable can’t do this without switches or repeaters, and SAS cables have no chance for linking up to JBODs at that distance. The only option we had was Fibre Optic (10Gbps) to link the buildings; we simply used an existing pipe underground. StarWind gave us the active-active storage solution which allows writes to occur at both ends at exactly the same time, reads are local to the server the virtual machine is running on so we don’t have to cross the wire for reads. We initially had some really bad performance on the synchronisation but this was quickly discovered it was due to LACP on the switches (this isn’t a storage system fault it is the way networking works, StarWind published a blog not long after to explain the technical details of this https://slog.starwindsoftware.com/lacp-vs-mpio-on-windows-platform).
The solution has been in place for over a year now and it works perfectly fine, it does everything it is supposed to do. We did have some issues early on but they were networking related issues, it’s really super important to get the networking right when you’re dealing with active-active storage. Other customers usually configure StarWind in the same rack, or maybe the rack next to it – but we had a bit of a distance to cross for our implementation, we had two switches in each side and two fibre optics for redundancy – again, other customers don’t always have this, they just connect servers back to back with no switches, even though our implementation is a little more involved it works the same way as being next to each other.
From a benefits perspective StarWind allows us to do the following:
- Pay for what we need in terms of storage capacity, if we only have 4TB then we pay less, if we have 8TB we pay a bit more
- No physical SAN costing a fortune and taking up rackspace
- Uses local disks so reads don’t cross the wire – higher performance
- Doesn’t care what your networking bandwidth is, you have to make the right decision on whether you need 1Gbps, 10Gbps, 40Gbps or whatever, the writes will only occur as fast as the slowest link in your solution, so if your disks can do a combined write speed of 10Gbps then you need at least 10Gbps on the sync connection (more likely 20Gbps for redundancy)
- Doesn’t care what your underlying storage is, it can be local SATA or SAS, HDD or SSD, iSCSI presented storage, USB drives, and in a near future release it can utilise SMB storage! In fact during our initial testing we purchased 2 x Dell Vostro for about £300 each just to do a proof of concept, no managed switches, we utilised the existing 1Gbps fibre optic between the two buildings – performance was lacking but you get what you pay for, it was rather to test the principle that this would work and it did
- Allows us to use standard connection methods to the StarWind disks using iSCSI and industry standard MPIO, doesn’t matter whether the cables are connected back to back, through switches or the media (fibre or CAT5e/6)
- It allows the storage to become software defined, we can expand it online when we want to and isn’t tied to any vendors physical storage, unlike a physical SAN. It virtualises that physical storage layer. If one day we decide we no longer like HP then we can simply replace it at the normal cost of a server for another brand
- If we have a disk failure then hardware RAID takes care of this (to be honest we’re a little wasteful and could probably get away with RAID 0, RAID 0 is almost certainly going to implemented if we expand to 3 physical servers, we are just a little hesitant if two servers are used just in case), if we have a RAID failure then StarWind takes care of this, we can expand a RAID and rebuild without any downtime because the storage is available on another physical server
- The support is brilliant, the technical guys really do know their stuff – this is what they do, they have deep knowledge of disk performance, IOPS, bandwidth, and the network synchronisation stuff
We have managed to convert half a dozen physical servers into virtual machines, get rid of our expensive inflexible SAN, grow our virtual servers thanks to the new physical servers and the storage solution, and provide high availability for both services and storage.
Written by Steve Mills