Search

High Availability and Fault Tolerance

June 18, 2024
Alex Khorolets
Beginning his journey as a hobbyist in the virtualization and storage sphere, Alex now works as a Pre-Sales Engineer at StarWind. He possesses extensive knowledge in networking, storage technologies, and clustering.
Beginning his journey as a hobbyist in the virtualization and storage sphere, Alex now works as a Pre-Sales Engineer at StarWind. He possesses extensive knowledge in networking, storage technologies, and clustering.

Intro

High availability and fault tolerance are critical concepts in the world of data storage, ensuring that systems remain operational and accessible even in the face of hardware or software failures. Put ’em together, and you’ve got a storage system that’s always on its A-game, ready to handle whatever comes its way, with zero drama. That’s the heart of a solid, reliable storage setup.

Problem

In a virtualized setup, hardware failure is a bigger deal than in a traditional physical-only environment. If a physical server goes down, it takes all its virtual machines (VMs) with it. Since each VM acts like a whole server, this kind of failure can lead to a major service outage. The situation is even worse with databases and thin clients; if a single hypervisor crashes, it could halt a significant chunk of the business. Creating hypervisor clusters that are fault-tolerant and have full redundancy is critical. Shared storage plays a key role in the virtualization infrastructure as it houses the VMs for the environment. Therefore, it’s crucial that shared storage isn’t a single point of failure and is always accessible.

Solution

Depending on how it’s set up, StarWind can run its virtual storage across several hypervisor nodes or on separate, off-the-shelf storage servers. The shared storage unit is essentially “mirrored” among the hosts, ensuring data stays intact and operations keep running smoothly even if a node bites the dust. Each active host acts as a storage controller, and every storage unit has its data duplicated or triplicated for safety, linked to each host through multipath routing. The multipath method guarantees that even if some data paths hit a snag, the system keeps chugging along with no interruptions. This setup nails a 99.99% uptime with a dual replica and an even more impressive 99.9999% with a triple replica. Adding more replicas beyond that usually isn’t necessary,
unless we’re talking about mission-critical systems, like the ones controlling nuclear reactors or steering
rockets. And here’s a bonus: High Availability doesn’t just mean things stay up and running; it also cranks up storage performance! Virtual machines can pull data from all the hypervisor nodes mirroring their virtual disk, making things faster and smoother.

Conclusion

StarWind Virtual SAN kicks the single point of failure to the curb by real-time “mirroring” of data, meaning it duplicates or triplicates the info across different hosts. With Multipath routing in play, it boosts how easily you can get to your storage over the network. This makes your virtual shared storage not just fault-tolerant, but also a high-availability champ, delivering top-notch performance without breaking the bank.

 

Inspired by Alex Khorolets’s success? Want to improve business continuity for your business-critical apps and services?
Taras Shved StarWind HCI Appliance Product Manager
Look no further! StarWind HCI Appliance (HCA) is a plug-and-play solution that combines compute, storage, networking, and virtualization software into a single easy-to-use hyperconverged platform. It's designed to significantly trim your IT costs and save valuable time. Interested in learning more? Book your StarWind HCA demo now to see it in action!