Best network setup 3nodes-HA for virtualisation. X540/X520 ?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

cloudbuilder33
Posts: 1
Joined: Mon Jun 30, 2014 9:31 am

Mon Jun 30, 2014 10:16 am

Hi Starwind guys,

I'm setting up a 3 node H.a. cluster that will become my main SAN for my VDI project. I have some questions regarding the networking aspect, and how performance is affected.

I need to decide between the following;

Is there any performance difference between using 10gb SFP+ D.A. (Intel X520-DA2) and using 10gbase-t (RJ45!) (Intel X540-T2) for the sync and data channels?

The only real difference between SFP+ and RJ45 would be the latency; SFP+ is about 0.3 microseconds per link, RJ45 10gbase-t should be about 2.6 microseconds per link. The HA networking guidelines state that in the same building connections between nodes should be less then 5 ms. (MILIseconds!) Both adapters/standards should easily offer less then a milisecond latency between nodes, even with multiple switches in each link.

The question for me is; Will there be any difference in performance at all? Or would both offer the same real world performance? (for VDI!/High performance virtualization)

Next question is regarding the use of a switch for the sync. channels. VS. a direct connection between all three nodes.

Each node has 2 nic's, so when using direct connections there will be a 10gb link between all three nodes. (node 1 will have 10gb Direct connection to both node 2 and 3.)

This should work, however, when using (redundant, so 2) switches, I would be able to get 1x10gb connection per node to each switch. This would in theory be a far better choice because 2x10gb to switch would offer much better performance then 10gb to each node right? And the second thing; when one node fails, the other 2 nodes still have 2x10gb for sync, while with direct connections there will be only 10gb syncing left in case of a node-failure. (By the way, with all 3 nodes up, the only way to get from node A to B without using the direct connection is through node C, but will the adapters even be able to do this kind of switching? using a switch should be much better right?)

Last questions; the sync channels should have dedicated nic's, i understand that, but what abou the switches used for syncing? It should be fine to use the same 2 switches that are used for the data/heartbeat channel and general network traffic right? Or would i need extra switches just for syncing?

I understand that I will need 2 switches for redundancy and MPIO, so lets say the 2 switches contain enough ports for the 3 nodes, and the clients that will connect to them, this will be all I need in terms of switching for my SAN network right?.

Any tips on what to look for in a switch? I know jumbo frame is important, but my adapters even offer 11 or 12k frames instead of 9k (jumboframes), should i get a switch that supports this, for optimal performance? Does the switch need to be fully managed? or is smartswitch okay? any special ISCSI things to look for?



So, to sum it up;

is using 2x2port 10gbase-t intel X540 adapters per node, connected to 2 redundant 10gbase-t switches recomended in terms of performance ?

Would there be any reason to use SFP+ or direct connections between the hosts?

CloudBuilder33
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jun 30, 2014 4:32 pm

It's recommended to avoid switches as they add operational latency, cost and failure points. However with a 3-node setup it's very difficult to have a fully interconnected mesh so going with a pair of 10 GbE NICs per host and a pair of 10 GbE switches is a proper way to go.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jun 30, 2014 4:34 pm

As you've mentioned yourself SFP+ is recommended as RJ45 is extra "wrapper" on the data path just increasing latency. All modern NICs are SFP+ and RJ45 is legacy. 9K Jumbo frames give the best numbers but you may experiment with a different Jumbo sizes, just make sure a) switches have no issues with them and b) they set the same all-around. Switches should have deep I/O queues. Surprisingly we get the best numbers with a non very expensive Netgear ones.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Tue Jul 15, 2014 3:04 pm

anton (staff) wrote:It's recommended to avoid switches as they add operational latency, cost and failure points. However with a 3-node setup it's very difficult to have a fully interconnected mesh so going with a pair of 10 GbE NICs per host and a pair of 10 GbE switches is a proper way to go.
I am struggling in this exact same situation and I don't have access to any 10 GbE switches. You say that it is difficult but it does not sound impossible, so could you elaborate with some simple diagram or explanation on how setup the sync and iSCSI data channels with three nodes and a pair of 10 GbE NICs on each node without a switch? I do have 1 GbE NIC ports for the virtualization and clustering through switches so that part is not an issue.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jul 21, 2014 2:37 pm

Hi!
Here is how it looks:
Drawing1.jpg
Drawing1.jpg (33.54 KiB) Viewed 8968 times
Let the switch don`t confuse you here - it is not directly required for StarWind, it is just for management.

I hope that diagram makes sense for you, but if not - just ask anything you want.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Mon Jul 21, 2014 2:44 pm

Does it mean that server 2 (the one in the middle) actually needs nine network ports to make it work? With that design, doesn't it also mean that when server 2 is offline, that server 1 and 3 won't be able to communicate with either other with either sync or iSCSI?
User avatar
lohelle
Posts: 144
Joined: Sun Aug 28, 2011 2:04 pm

Mon Jul 21, 2014 3:19 pm

I guess the green links are between server 1 and 3, not going via server 2.
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Mon Jul 21, 2014 3:52 pm

The way I've been experimenting currently with my current configuration is to actually bridge the pair of 10 Gig network ports on each of my servers and assign two ip addresses on each bridge. This seems to allow communication between any server if one of the three is taken offline. I'm not sure how detrimental the performance is with network bridges though. This is my current design:
Capture.JPG
Capture.JPG (109.71 KiB) Viewed 8963 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Jul 23, 2014 5:35 pm

That Diagram looks pretty great!
I`d just suggested using 10Gigs instead of 1Gigs
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Wed Jul 23, 2014 10:05 pm

What I've found is that the bridged adapters seem to really slow everything down and when one node is taken offline, the syncing of even a 1GB CSV takes forever. I've had to revert back to an old design. What I found was that I had to fool the cluster creation wizard by temporarily adding a sync channel IP addresses to servers that are not normally part of sync group just to be able to add the server to the cluster group, afterwards I am able to remove it without any issues it seems. Since I am not using two redundant 10Gig switches but direct cable connections, I don't have the ability to vlan the sync channel and the iscsi channel. Do you think that there will be performance issues if I simply utilize two subnets on each adapter port? So far I have a 1GB witness and a 200GB CSV in a cluster and have tested the rebooting of each server and Starwind seems to resync fast and ok when the server is back up. This is my new design:
Capture.JPG
Capture.JPG (35.8 KiB) Viewed 8938 times
And this is the replication manager shot from the cluster 2 server:
Untitled.jpg
Untitled.jpg (82.96 KiB) Viewed 8938 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jul 28, 2014 3:37 pm

I hope that not :D
Well, we can try to count it. Do you know the approximate IOPS number that the SAN should handle?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Mon Jul 28, 2014 4:55 pm

I will be setting up three CSVs as recommended by your engineer, two 200GB ones and one 1.6TB one, each one hosted by a different primary server. I'm estimating each CSV to only need to handle about 1000-1500 IOPS at extreme peak times with about 300-600 IOPS during normal times. Do you think this is within the capabilities?
monetconsulting
Posts: 5
Joined: Wed Jul 30, 2014 1:42 pm

Wed Jul 30, 2014 1:50 pm

How did this config work for you?
I have a similar setup, 3 nodes. i have two port 10GBe adapters in each node. I have a Cisco 10GBe switch connecting the three nodes. My initial sync is painfully slow (two day sync for 4TB). Max from Starwind stated i may need a new build coming out soon, but i feel something else is wrong. i'm still testing before going into production (which I do not feel good about right now). i want to have three csv's of 4 tb each and a shared witness. i have similar needs of around 1500 iops at peak and 600 on average per csv.
could Cisco SG500XG-8F8T switch be issue?
i used rj45 for 10gbe, could that cause this large sync slowness?
i had iscsi on 1GB connections, is this wrong?
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Thu Jul 31, 2014 3:08 pm

I have a current 3 node I am getting ready to upgrade and reconfigure a little... I thought I would chime in my experiences.

The attached is how it is currently setup and the dotted lines are the additions to the new cluster for extra iSCSI connections (so this is the only change for the new cluster). I currently have a BIG problem with full syncs taking several days to finish and during the time killing throughput for the VMs. I am running 8 (10K SAS) drives in a Raid 5 with two partitions (OS and Starwind storage) It was recommended in the new cluster to RAID 0 the Starwind. I didn't want to have to reconfigure the entire node if (I mean WHEN) I loose a drive, so I am going to mirror two 300GB 10K SAS drives to protect the OS and preventing a Fail-over event if I loose a drive in the RAID 0. Starwind says that the Sync will pull data and unbeknownst to Windows if a drive failure occurs on the Starwind storage (Raid 0 array) so I kinda have a Raid 10 but with the speed where I need it and the redundancy where it is important. I plan to take the remaining 6 bays and putting in 900GB 10K SAS in a RAID 0 I am told this will fix my issues and speed up full resyncs and give me more storage. I am using 10Gbe for sync and Starwind setup the heatbeat and iSCSI on a single 1GB connection and I get over 600IOPS through the CSV (hard to test while it is in production) So going with a RAID 0 and adding two more iSCSI initiators will only increase my performance.

Jason
Dayton, Ohio
NetworkConfig.png
NetworkConfig.png (52.67 KiB) Viewed 8868 times
monetconsulting
Posts: 5
Joined: Wed Jul 30, 2014 1:42 pm

Thu Jul 31, 2014 4:05 pm

Thank you for your info.
I found out that my onboard RAID controller is software and not hardware based. i am getting new raid controllers for my servers.
I am running Raid 0, but i can't comare speeds issues because of the software based raid controllers.
i will have hardware raid next week combined with my raid 0. i am running 6-3TB 7200 rpm sata drives in the raid 0 among three servers.
i didn't expect 2 days to initial sync a 4TB CSV, but maybe the fact that it is software raid is to blame.
Post Reply