Best network setup 3nodes-HA for virtualisation. X540/X520 ?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

monetconsulting
Posts: 5
Joined: Wed Jul 30, 2014 1:42 pm

Wed Aug 06, 2014 12:18 pm

I found the disk write was disabled on the raid controller for the drives. i enabled that and it lowered the sync of a flat format from 9 hours to 2. I then changed to lsfs and thing happened in minutes.
any detriment to lsfs?
Last edited by monetconsulting on Wed Aug 06, 2014 12:48 pm, edited 1 time in total.
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 06, 2014 12:39 pm

Two hours is more like it. LSFS will be near instant for new storage as it's thin provisioned, i.e. there's only a few blocks to mirror. With a 1TB flat disk, it has to mirror it all for new storage although I did post a workaround for that: create tiny 1MB disk first, let it sync and then expand - it doesn't sync the expansion.

Cons with LSFS?
  • It's a new technology in StarWind and new always brings risks with it
  • Blocks are never re-written so if you delete some large files and then Windows re-writes to those blocks, the storage grows and grows - StarWind doesn't overwrite the old blocks. However, there is some background defragmentation that recovers this lost space but checking whether that's working is tricky as there are no reports currently of how many dead blocks there are in there
  • LSFS requires more RAM for the metadata compared to flat - hoping to get sizing calculation RSN
  • Dedupe with LSFS required huge amounts of RAM for large disks so only suitable for small storage IMO
But it is the future and will most likely be the only disk format that StarWind needs to support.
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 06, 2014 12:42 pm

The pros though are that it attempts to iron out the random write storm you get in a typical virtualisation environment whereby there are multiple VMs which may be writing sequentially but because they are all writing at once, arrive at the SAN in a random order. That said, this is a problem with any SAN whether it's been used by virtualisation or not. Multiple physical machines connected to the same SAN also cause a random write storm.
monetconsulting
Posts: 5
Joined: Wed Jul 30, 2014 1:42 pm

Wed Aug 06, 2014 12:47 pm

thanks for your response. i am planning on three csv with sizes of 4TB each. i have installed 64 gb in each host, so i may have some to spare depending on needs of lsfs. my hyper-v hosts are also san nodes since the csv's will be housing the vmdk's.
jhamm@logos-data.com
Posts: 78
Joined: Fri Mar 13, 2009 10:11 pm

Thu Aug 07, 2014 5:54 pm

Anton,

We are looking into upgrading our Starwind Implementation from 1 Gigabit to 10 Gigabit. You mention the following about 10 Gigabit Switches:

"Surprisingly we get the best numbers with a non very expensive Netgear ones."

What models are you using in your labs?

Thanks,
Jeff
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Aug 07, 2014 9:06 pm

712, 716 and 516 (from memory)
jhamm@logos-data.com wrote:Anton,

We are looking into upgrading our Starwind Implementation from 1 Gigabit to 10 Gigabit. You mention the following about 10 Gigabit Switches:

"Surprisingly we get the best numbers with a non very expensive Netgear ones."

What models are you using in your labs?

Thanks,
Jeff
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
vaio
Posts: 2
Joined: Mon May 06, 2013 3:59 pm

Sat Aug 16, 2014 2:09 pm

Any update on the reference design? I am about to deploy a 3-node setup using Dell r720 servers. Each server has 6 10GbE and 2 1Gbe ports (2 x540-T2 nics and 1 built in nic with 2 10Gbe and 2 1Gbe ports). All ports are RJ45.

Each server has 2 10K SAS RAID 1 for host OS and 4 SSDS in RAID 0 for starwind 3-node (480GB Intel DC S3500)

I also have 2 10Gbe switches (Netgear XS712T L2+ 12port 10Gbe). Wouldn't mind using direct connections instead of switches unless there is significant performance impact when one node is brought offline.

Would much appreciate design guides or help.

Many Thanks
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Sat Aug 16, 2014 10:06 pm

I don't know if this helps you, but I have almost the exact same server 3 Node setup that Anatoly from Starwind helped setup just last week and I am using direct connect. I have two Dual 10GBe Nic's (4 total per server) also two 300GB 10k sas as a RAID 0 for the OS and siz 900BGGB sas in Raid 0 for Starwind About 5TB. H700 Raid card with 1GB cache. I am also using a 12TB RAID 5 Synology SAN/SAS for Backups. My Dell R710s have four 1GB nics two of which are for the HyperV to the external network and the other two are the iSCSI for the Synology. I'm running Windows 2012R2.

Jason
Starwind Network Diagram (1).jpg
Starwind Network Diagram (1).jpg (83.5 KiB) Viewed 7331 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Aug 20, 2014 2:45 pm

@vaio in my opinion diagram suggested by upgraders looks great, so take it into the consideration.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Tue Aug 26, 2014 2:38 pm

A question Jason - what happens if you yank one of the RAID-0 drives out of one node and replace it with another drive to simulate a drive failure? I'm mainly interested in whether you have to manually re-create the RAID-0 array and re-synchronise StarWind?

Cheers, Rob.

PS. Of course, don't do this please if you're in production!!
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Tue Aug 26, 2014 3:10 pm

This was my biggest concern and trade off, however I did simulate such a failure before going into production. It took some playing to see what things happened. Funny was the OS had no idea anything happend, you just could not physically copy or move anything to the virtual drive (S: STARWIND) drive. I was also unsure of what the PERC card would do on the reboot, but it basically cleared the drive. So these are the steps:

It required moving all the roles away from the node then a reboot recreating the Array in the PERC card with Fast INT (4 minutes so far) reboot Back in Windows then quick format 1-2 minutes.

Then into Starwind select a working node and then set partner sync..back to the failed node (About 15 minutes total from replacing the drive)

Resync with the current setup of 4TB took 2 hours. previously when I was syncing 2TB on a RAID 5 took 3 Days and performance was killed during it.

So yes there is a trade off with a RAID 0, but with 3 Nodes and getting the steps identified so that any of my staff could follow, in the event of failure was easy and acceptable. Having the OS on a mirrored RAID, eliminated the whole server restoration fiasco. Although it is still backed up in case there was a total failure, the odds of that happening are very slim.
JohnTrent
Posts: 6
Joined: Thu Jul 17, 2014 8:53 am

Wed Aug 27, 2014 11:40 am

upgraders wrote:So yes there is a trade off with a RAID 0, but with 3 Nodes and getting the steps identified so that any of my staff could follow, in the event of failure was easy and acceptable. Having the OS on a mirrored RAID, eliminated the whole server restoration fiasco. Although it is still backed up in case there was a total failure, the odds of that happening are very slim.
What sort of IOPS are you seeing on your array?

I understand that this is just an HDD based array however would be interesting to see the level of performance you get from the config you have.
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Wed Aug 27, 2014 12:25 pm

The IOPS seem to be a elusive magic number that can vary upon current loads, file size and even the software used to measure it. I struggled with that even with Starwind's great staff had problems comparing readings. VeeamOne is great, but differed from Starwind's performance graphs, which was different from Windows Performance Monitors, which was still different from iometer. Then you have to consider WHAT are you measuring. The IOPS between the Sync lines or the "iscsi" and then you have to determine if it is pulling a Local iSCSI or from a partner Node. Since you have targets to itself and every partner node.... (How starwind has figured out how to keep data packets from getting confused by the host OS from multiple servers is simply brilliant.

My point is I could give you numbers, but they would not be very relevant to you. I think a somewhat "good" comparison MBps benchmark is just a simple single 10-20GB file transfer, but even then, current loads will dictate the readings. It would be nice to have a tool that could measure current IOPS and then give you the ability to saturate the "line" for a small downtime window to give you a maxim sustained read/write. If however you want me to run something specific (give me a step by step) I'd be happy to run it during off peak times for you.

Thanks
Jason
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 27, 2014 1:04 pm

Actually, I'm not a big fan of IOPS either as using IOMeter, I was able to get wildly varying values on our existing SAN just by modifying the block size. It may be not scientific, but because I've used it for many devices, I like CrystalBenchMark. I think it's used by several well regarded hardware review sites. I'm sure it's not perfect but the raw sequential read & write results give a very good indication of where the SAN fits in relative to other devices. Here is a report I ran during the day. Basically what this says to me on the sequential read/write is that it's in a ballpark I'd expect for the underlying disk system.

If I was getting (say) half those speeds, I'd be concerned.

So how about posting CrystalBenchMark speeds?

Cheers, Rob.

-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 280.706 MB/s
Sequential Write : 221.522 MB/s
Random Read 512KB : 179.140 MB/s
Random Write 512KB : 130.066 MB/s
Random Read 4KB (QD=1) : 18.603 MB/s [ 4541.8 IOPS]
Random Write 4KB (QD=1) : 12.230 MB/s [ 2985.8 IOPS]
Random Read 4KB (QD=32) : 231.564 MB/s [ 56534.1 IOPS]
Random Write 4KB (QD=32) : 105.039 MB/s [ 25644.2 IOPS]

Test : 1000 MB [E: 0.9% (0.1/10.0 GB)] (x5)
Date : 2014/08/22 14:55:02
OS : Windows Server 2012 Datacenter Edition (Full installation) [6.2 Build 9200] (x64)
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Wed Aug 27, 2014 1:32 pm

Well still not clear to what you would like me to test. To test the CSV (which would be a true test of the Starwind system) I would need to mount the CSV as a drive and run it. Is that what you want?

Here are the results running on the RAID 0 drive directly for all three nodes. Node 1 and 2 right now are partner Nodes. Node3 is "Dormant" at the moment. It is not being used a a partner Sync. (Some issues I am waiting on with Starwind before connecting) so basically very low load.

Thanks
Jason



NODE1

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 964.356 MB/s
Sequential Write : 973.006 MB/s
Random Read 512KB : 497.417 MB/s
Random Write 512KB : 846.189 MB/s
Random Read 4KB (QD=1) : 6.537 MB/s [ 1596.0 IOPS]
Random Write 4KB (QD=1) : 28.223 MB/s [ 6890.3 IOPS]
Random Read 4KB (QD=32) : 24.943 MB/s [ 6089.7 IOPS]
Random Write 4KB (QD=32) : 31.123 MB/s [ 7598.3 IOPS]

Test : 1000 MB [S: 81.6% (4102.3/5026.4 GB)] (x5)
Date : 2014/08/27 9:20:27
OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)



NODE2

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 971.503 MB/s
Sequential Write : 749.947 MB/s
Random Read 512KB : 483.558 MB/s
Random Write 512KB : 800.413 MB/s
Random Read 4KB (QD=1) : 7.203 MB/s [ 1758.5 IOPS]
Random Write 4KB (QD=1) : 27.987 MB/s [ 6832.8 IOPS]
Random Read 4KB (QD=32) : 19.656 MB/s [ 4798.8 IOPS]
Random Write 4KB (QD=32) : 27.462 MB/s [ 6704.5 IOPS]

Test : 1000 MB [S: 81.5% (4097.2/5026.4 GB)] (x5)
Date : 2014/08/27 9:24:04
OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)



NODE3

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 1175.157 MB/s
Sequential Write : 1072.007 MB/s
Random Read 512KB : 676.444 MB/s
Random Write 512KB : 918.333 MB/s
Random Read 4KB (QD=1) : 8.877 MB/s [ 2167.2 IOPS]
Random Write 4KB (QD=1) : 31.301 MB/s [ 7641.9 IOPS]
Random Read 4KB (QD=32) : 25.150 MB/s [ 6140.1 IOPS]
Random Write 4KB (QD=32) : 34.350 MB/s [ 8386.1 IOPS]

Test : 1000 MB [S: 0.4% (17.7/5026.4 GB)] (x5)
Date : 2014/08/27 9:26:49
OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)
Post Reply