Endless/loop HA disk Synchronization

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
fillogio
Posts: 3
Joined: Mon May 26, 2014 7:33 am

Mon Jul 21, 2014 2:50 pm

Dear all,
I am experiencing some sync issues with a HA SW V8 Virtual SAN for Hyper-V disk.
Our scenario in brief:
2 Nodes, 2 dedicated 1 Gbs NICs (Sync+Sync&Heartbeat) per node, 2 iscsi targets and 2 HA virtual disks (18 and 101 GB), the latter sitting on a SSD.
Sync for the smaller HA VD goes smoothly, while for the bigger one the synchronization goes up to 40-60% (estimated total time around 15 minutes) and then starts back from the beginning, endlessly.
When this happens, it affects the smaller disk availability too. The assigned subnets(and NICs) are "crossed", i.e. the 1st NIC is the sync one for the first disk and the sync+heartbeat for the second disk, and viceversa.
Any hints?

I am testing this two targets/disks HA scenario as one of the two disks is supposed to be modified often. Shall I use one HA target/disk instead?

Thanks for your time

Filippo
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jul 21, 2014 9:23 pm

Can you confirm that you are running on the latest build of v8?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
fillogio
Posts: 3
Joined: Mon May 26, 2014 7:33 am

Tue Jul 22, 2014 5:45 am

Dear Anatoly,
Thanks for your answer.
I updated to the last v8 build last week.

Kind Regards
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Jul 23, 2014 5:04 pm

Can I ask you to drop the logs to support@ with the reference to this thread?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
fillogio
Posts: 3
Joined: Mon May 26, 2014 7:33 am

Fri Jul 25, 2014 8:11 am

I've just sent it,
thanks for your support.

Have a nice weekend

Filippo
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jul 28, 2014 7:33 am

Received and pushed to R&D. We`ll get back to you as soon as we`ll get any results.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Aug 01, 2014 1:52 pm

Can I ask you to drop the log from the second node on the time of the issue as well?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Tue Aug 05, 2014 5:04 pm

I've got something similar in the lab right now. Loop is only on the LSFS storage though. The flat ones are fine. I should be able to repeat this if it is repeatable as I know what I did to get to this point.

Cheers, Rob.

NOTE: what I did before I forget:
  1. Shut down TEST90 file server - uses iSCSI initiator to mount drives
  2. Shut down SAN90 (node #1)
  3. Shut down SAN91 (node #2)
  4. Powered up SAN91 alone
  5. Didn't power up SAN90 - simulating a failure to start after clean shutdown
  6. Powered up TEST90 - iSCSI was in "reconnecting" for all targets
  7. Manually flagged the storage (3 x flat + 1 x LSFS) as synchronised on SAN91
  8. TEST90 iSCSI initiator reconnected to SAN91 and storage appeared
  9. Copied some test files on the LSFS storage (note: SAN90 still down)
  10. Powered up SAN90
  11. Auto-sync kicked in and synchronised the 3 x flat storage fine
  12. Stuck in loop synchronising the LSFS storage4 on SAN90
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Thu Aug 07, 2014 2:17 pm

I've had to kill the LSFS storage in my lab as in the scenario described above, it's never completed synchronisation. Worse - it's been creating 5GB files on the node that's been sync'd like they are going out of style! The source has about 40GB of 5GB files. The target mode had got to 500GB before I spotted it.

Cheers, Rob.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Aug 08, 2014 9:38 am

Hi! We can see that there are some errors happening when StarWind tries tor write something on the disk. Below you can find corresponding record:

Code: Select all

[code]7/21 13:02:18.059 e2c HA: CHADevice::EventProc: Event: HA_NN_STORAGE_IO_OPERATION_FAILED, Target name: 'iqn.2008-08.com.starwindsoftware:hyperv1-vm-ssd'
[/code]
Basically this is the reason why synchronization is interrupting.
I`d recommend you to update all the drives on the storage system and doublecheck if there are no hardware malfunctions.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Fri Aug 08, 2014 4:38 pm

I know I've kind of wrapped my lab work here in with the OP problem so I'll break mine out into another topic. I'm also spending a bit of time double-checking the lab environment to make sure it's sound, like wiping the disks block by block to check for write-errors.

Cheers, Rob.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Sat Aug 09, 2014 11:59 am

Thanks. I`ve answered that already.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply