Bug discovered in share inserter

Post by **Steve Sokolowski** » Sat Jul 01, 2017 2:31 pm

I discovered a bug in the database share inserter.

The cause of the problem is a race condition. When the inserter is disconnected, it tries to reconnect by creating a new instance of itself every time a new share arrives. However, if a new share arrives between the time the reconnect attempt started and the attempt is confirmed to be successful, an additional reconnect attempt occurs. Then, a timeout is set to check if that instance is still connected after 60s, which it will not be, since no data has been sent to these old connections, and it will reconnect again. The correct behavior is to use the auto_reconnect constructor argument, which I wasn't aware of at the time I initially wrote the code. I fixed the problem and added it into the July 2 performance improvements release.

In the meantime today, I had taken the WAMP server offline for 1m for testing to see what was causing it to take so long to respond to calls from customers, which caused the share inserter to run out of memory making hundreds of thousands of reconnect attempts, and crash. Therefore, Chris will correct balances over the next hour.

This is a huge find, and it will fix all of these issues that have plagued the system for months when it is deployed tomorrow:

The CPU usage of the WAMP server was always at 80%, sometimes causing delays in processing requests. After the fix on the development system, usage declined to 2%.
Bandwidth utilization will decline even further from 2Mbps to 40Kbps. That's down from 60Mbps last week before we turned on compression.
The share inserter's memory leak has finally been found. The memory leak caused at least 15 separate instances where balance corrections have needed to be made.
In testing, the share inserter can now handle 20 times more shares than it could handle before. It turns out that the complex changes to insert fewer shares into the database were unnecessary. Combined with those changes, the inserter can handle 100x more capacity than it could three weeks ago.
The mining server's CPU usage declined by 3%, a significant improvement on that highly-optimized server.
The problem of "potentially unhandled rejections" on the forums and website appears to have vanished on the test server, and the "NaN" hashrates have disappeared.

Thanks for your understanding while Chris makes the balance corrections for what will hopefully be the last time today!

GregoryGHarding · Post by **GregoryGHarding** » Sat Jul 01, 2017 2:36 pm

awesome find, finally the memory issue will be no more!

FRISKIE · Post by **FRISKIE** » Sat Jul 01, 2017 2:56 pm

Upgrade sounds like big improvements, I'm convinced!

piet · Post by **piet** » Sat Jul 01, 2017 3:39 pm

I really hope this is (was) the problem, you earn some succes!

FRISKIE · Post by **FRISKIE** » Sun Jul 02, 2017 8:11 am

Is it possible to focus todays activities on the share inserter, and get that fixed first?

For us that has been a serious issue, if that could be fixed and hold a bit on the major code changes on the mining server itself, may make more sense.

Bug discovered in share inserter

Bug discovered in share inserter

Re: Bug discovered in share inserter

Re: Bug discovered in share inserter

Re: Bug discovered in share inserter

Re: Bug discovered in share inserter