Page 1 of 1

System was offline

Posted: Fri Jun 07, 2019 7:21 am
by Steve Sokolowski
Earlier this morning, the routing table in the main switch to our servers became corrupted for an unknown reason. This error hadn't happened in the 1.5 years that the switch had been running, and it had never been restarted before and was working fine. The packets coming into the system were being sent to the wrong IP addresses.

Like most computer issues that you can't figure out, we turned the switch off and back on again and the issue was resolved.

The system functioned as designed, and stored shares in memory until things were working again. Miners that were connected will see balances rapidly increase over the next hour as idle CPU periods are used to insert missing shares from before 5:00am EDT.

Re: System was offline

Posted: Fri Jun 07, 2019 8:24 am
by Dr_Soos
Hi Steve,

The problem actually started 6/5/2019 just before midnight. Unless there is a separate issue that wasn't reported, there have been 3 outages in the past 2 days that have lasted between 90 and 150 minutes that all appear related. There were also 4 other shorter outages during this same period. If you like I can open a support ticket with a picture showing these details.

Re: System was offline

Posted: Fri Jun 07, 2019 8:44 am
by Steve Sokolowski
Dr_Soos wrote:Hi Steve,

The problem actually started 6/5/2019 just before midnight. Unless there is a separate issue that wasn't reported, there have been 3 outages in the past 2 days that have lasted between 90 and 150 minutes that all appear related. There were also 4 other shorter outages during this same period. If you like I can open a support ticket with a picture showing these details.
That might explain why the Stellar daemon disconnected from its network as well - thanks for pointing that out, because it looks like only some IP addresses were affected.

Hopefully, there was just some sort of corruption in the memory of that router, and rebooting it will have solved the problem. If the problem happens again, then I'll definitely ask you for the logs to see whether we need to start a more in-depth investigation to figure out if there is a bigger issue.