Page 1 of 7

Downtime this morning

Posted: Tue Jul 18, 2017 1:08 pm
by Chris Sokolowski
The pool went offline this morning due to server overload. I was asleep at the time, and while I have alerts set to my phone to wake me when the server goes offline, the phone was accidentally muted preventing me from being woken to fix the issue.

The underlying issue is that we are CPU limited in the process of assigning coins to miners, and the server load is roughly proportional to the number of coins being assigned to miners. As we have more hashrate, we need to distribute miners over more coins to prevent each network from finding blocks too frequently, causing the mining server's CPU load to increase. This morning, the markets were all around the same profitability, causing lots of coins to be mined at once. At the same time, 700+ GH/s of X11 hashrate came online. The server reached 100% CPU usage and was no longer able to process shares frequently enough. Eventually the server crashed lacking the memory to store additional shares.

I am willing to compensate miners for part of the time that the server was online and accepting shares, but I do not have the exact data when the server was online and accepting shares but not recording them versus being completely offline. If someone could send me a private message with hashrate data from an external service, I would appreciate the information.

Steve is working on an architectural overhaul of the mining server that will infinitely parallelize the mining server, which is currently predominantly single-threaded. However, that will not be complete for weeks if not a few months, so I am currently trying to find some ways to reduce server load in the meantime.

I know these issues are frustrating to you, and they are equally frustrating to me. With the growth in mining, managing server load occupies a majority of Steve's time and a lot of mine as well. Please realize we are working to resolve this issue but can't provide an immediate solution. I appreciate your patience while we work to improve stability.

Re: Downtime this morning

Posted: Tue Jul 18, 2017 1:22 pm
by vhmanu
i can link you my MRR profile where you can see the downtime beginning.
What hurt was the fact that the pool was not really dead, instead it was "just" accepting no shares ----> no miner switched to a backup pool. I switched manually to another pool after ~1hour when i noticed this at work.

Re: Downtime this morning

Posted: Tue Jul 18, 2017 1:30 pm
by AvPro
Chris, I PM'd you some statistics and times. Agree with vhmanu, it would be nice if it disconnected the miners rather than allow 100% rejects :)

Re: Downtime this morning

Posted: Tue Jul 18, 2017 2:13 pm
by coldstone
downtime sucks, best thing to do is learn from all things causing downtime and performing tweaks and upgrades so everything runs and keeps running..

Re: Downtime this morning

Posted: Tue Jul 18, 2017 3:45 pm
by biscayne
Murphy's law is becoming daily business at PH, not amused to loose more money.

Re: Downtime this morning

Posted: Wed Jul 19, 2017 2:51 am
by Chris Sokolowski
We increased all users' July 18 earnings by 10% to compensate for any missing shares yesterday. Changes are effective immediately.

Re: Downtime this morning

Posted: Wed Jul 19, 2017 3:31 am
by mine_x
Chris, respects to you for this step, glad to see true fair play :!:

Re: Downtime this morning

Posted: Wed Jul 19, 2017 5:07 am
by mine_x
.. unfortunately, the pool again became unstable.. 30 minutes 100% rejects at pool :(

Re: Downtime this morning

Posted: Wed Jul 19, 2017 5:23 am
by mickeekung
Problem occurs again at the same time as yesterday.

Re: Downtime this morning

Posted: Wed Jul 19, 2017 6:09 am
by FRISKIE
This provides another attack vector for those who have tried so hard to keep PH offline with DDoS.

If knocking PH offline is as simple as setting up some really big Nicehash orders , pointing them at Prohashing, and overloading the server, I'd say we have a serious vulnerability here.

@ Chris - your thoughts?

Maybe remove some of the lower value coins to reduce the load during periods when profitabilities even out?