Mon Jan 05, 2015 5:36 pm
I've had one issue with beta build 7450 in my 2-node HA lab cluster. I have a total of two LSFS devices. Both were working for a few days through reboots and tests. One device has had no problems, but I'll describe what has happened with the device hosting a CSV volume:
- After a clean reboot, nodeA began a sync, but got stuck at 29% and never moved beyond that. I left it over the weekend to be sure it wouldn't progress.
- I tried to restart the Starwind service, but it would not stop cleanly.
- I tried rebooting nodeA, but afterwards it displayed "Current Node is not synchronized" and never began synchronization. If I clicked on Synchronization manually, it would bring up an error that said "Failed: some partner node is already starting synchronization for current node." Meanwhile, nodeB showed "Connection Status Active. Synchronization Status Not synchronized" for the nodeA partner.
- I tried restarting nodeB, which caused synchronization to begin again, but it was stuck at the same point (29%).
- I then removed the target and device from nodeA, deleted the device files, and restarted the Starwind service. I then tried to add nodeA as a replica server for the device. The wizard returned that the device was created, but synchronization stuck at 5%. I tried the same thing again and it stuck at 21%. The next time device creation failed. The fourth time I tried, the sync completed successfully and the device is working properly again.
I've noticed a couple of things:
- Connectivity status always shows good from both ends.
- While stuck syncing, a file (CSV1_HA.swdsk.tmp) is continually created and deleted in the storage path for the device on nodeA.
- One of the times it was stuck during synchronization, the server log on nodeB was being spammed with these identical lines every 15ms:
15 93524.807 760 HA CBlocksBarrierEnterBarrier Intersects exist
15 93524.807 760 HA CBlocksBarrierEnterBarrier CurrentBlockID=0x000000000575B7F0, ullStartSector = 0x21EE8, ulLength = 45, bIsSerialize = 1
15 93524.807 760 HA CBlocksBarrierEnterBarrier OtherBlockID=0x0000000005709C20, ullStartSector = 0x21EE8, ulLength = 45, bIsSerialize = 1
I'd be happy to send logs, but didn't want to post them here.
Is this a known issue?