Page 1 of 2

Status as of Monday, April 24, 2017

Posted: Mon Apr 24, 2017 8:00 am
by Steve Sokolowski
Here's a morning update. I think that significant progress was made yesterday. I'll separate it into bugs we fixed, and bugs that we still are aware of.

Here's bugs we fixed:
  • Prior to yesterday, there were a lot of x11 coins that had 100% lost blocks, including DASH. We resolved this issue and now there are more x11 coins available to mine as a result.
  • We upgraded the daemon servers so that they do not reestablish connections to the mining server every time there is a new block, reducing mining server load.
  • This daemon server upgrade also restored throttling of daemon upstream bandwidth, which was causing higher orphan rates.
  • We improved performance of the mining server by about 4%, which reduced the time that it was at 100% CPU load from about 15% of the time down to 5% of the time. This is important because when the mining server is at 100% load, it backs up sending new blocks to miners, which can cause stale shares in the meantime.
  • There are additional mining server performance improvements that have not yet been released which will cut another 2% off CPU load.
  • We reduced database load by examining long-running queries and eliminating some of them. This should reduce the amount of time the database is behind.
  • We updated the website to eliminate old REST API calls that no longer functioned and which were causing unnecessary database load by producing errors. We also deleted code from 2014 that had been left in the system but was no longer operational.
  • Michael fixed some website charts and a release of those is pending.
  • The block explorer data was restored, and it is currently being indexed (which is why the database got behind).
What's still to go:
  • We know of several more ways to improve the performance of the mining server. Our goal is to get the mining server to never have 100% CPU load before we address any other suspected mining server issues, because many of them are related to the system being unable to service network traffic in real time.
  • The block explorer data is almost finished indexing, and Chris promises that all block explorers will be online today.
  • The forums still crash if the browser is left open for a long time.
  • The hashrates still sometimes display "NaN" for a while after opening the main site or the forums.
  • One downside to yesterday's accomplishments is that the attempted fix for the hashrate charts showing gaps when the system is behind did not take successfully for an unknown reason, and seems to have caused the charts not to be generated at all. I'll be investigating why today.
There isn't anything causing us to get stuck; the only issue is simply that time needs to be devoted to continue progress against the issues. The major issue that we're encountering with performance is that it seems that every time we improve performance, more customers come and max the system out again. Fortunately, we deployed this new version of the server before we had this surge in customers, because the old version would never have had the performance necessary to handle it. Hopefully, we'll eventually get ahead of the curve soon.

Re: Status as of Monday, April 24, 2017

Posted: Mon Apr 24, 2017 10:19 am
by aspect
CleverMining is shutting down as of 26th of April, that is the immediate reason. BitMain shipping L9s in May-June is the upcoming reason.

You should re-invest in your infrastructure as well as look at horizontal scaling solutions. If I may also suggest to remove some real-time features of the UI and make them like "amount of hashrate allocated to XYZ coins in the last hour" as opposed to "your each worker is mining XYZ coin right now", should reduce your WAMP bandwidth.

Re: Status as of Monday, April 24, 2017

Posted: Mon Apr 24, 2017 1:26 pm
by tmopar
Steve, I am a long time developer myself. Have you considered using process priorities to control things such that the new blocks etc are at a near realtime priority and things like doing the database inserts is done towards the bottom since they arent time critical. Then you can remove the annoying nag warning about the shares being out of date unless they are REALLLY out of date and just put on there that shares are sometimes delayed up to 10 minutes normally.

Aside from that more horsepower, more machines -- faster techniques -- possibly use a ramdisk/memory table with periodic caching for your frequently used or costly to update tables if you have the RAM for it.

I would be interested to know your current hardware setup if you dont mind also.

Re: Status as of Monday, April 24, 2017

Posted: Mon Apr 24, 2017 4:55 pm
by Steve Sokolowski
aspect wrote:CleverMining is shutting down as of 26th of April, that is the immediate reason. BitMain shipping L9s in May-June is the upcoming reason.

You should re-invest in your infrastructure as well as look at horizontal scaling solutions. If I may also suggest to remove some real-time features of the UI and make them like "amount of hashrate allocated to XYZ coins in the last hour" as opposed to "your each worker is mining XYZ coin right now", should reduce your WAMP bandwidth.
What happened to Clevermining? Was there a reason given?

The problem we're having now is that every time we improve performance, more customers come and max out the performance again. I hope we can get ahead of the curve soon.

Unfortunately, buying more hardware isn't a solution. The rest of the system is running fine, but coin assignment is an inherently singlethreaded operation. The number of miners assigned to a coin depends, in part, on how many other miners are assigned to that coin. Unless we can make a discovery to eliminate this dependency, it's possible that there is simply a technological limit that maxes out the size of multipools. I don't think we're at that limit yet, however, as there are still other functions we can reduce the runtime of.

We have already moved the database communication to a separate process, and that is working well. The primary concern there appears to be simply that the code in that process has a bug that blocks instead of creating a new thread for each series of 500 shares, as it should. If so, the database problem should be an easy fix.

Feel free to suggest a solution to the coin assignment problem. I'm not sure if a parallelizable algorithm exists for that task.

Re: Status as of Monday, April 24, 2017

Posted: Mon Apr 24, 2017 5:11 pm
by tmopar
can you ascertain what within the single threaded routine is taking the longest? I have used Xdebug and wincachegrind to very good effect in tracing down the main time hogs. Its difficult for me to visualize this without looking at the code. Are you guys using one of the main daemons or do you have something custom? If you could point me in the direction I would be happy to study the problem.

Re: Status as of Monday, April 24, 2017

Posted: Tue Apr 25, 2017 12:48 am
by aspect
@steve - what technology is the core that manages miner assignment is written in?

Re: Status as of Monday, April 24, 2017

Posted: Tue Apr 25, 2017 7:41 am
by Steve Sokolowski
aspect wrote:@steve - what technology is the core that manages miner assignment is written in?
Python. That's a huge part of the problem.

As you can see, however, it took two days to get all the issues with the release fixed. If I were to rewrite this server in C, it would take a huge amount of effort and the bugs upon deployment would probably be significant. I don't think it is economically possible to justify such a rewrite.

However, I've been able to continue to make progress by cutting out a few lines here and there. If I can improve performance by just 1% per day, this problem would go away entirely within a month. So far, I'm averaging 2% per day, and I haven't even come to some big ideas I had with coin assignment.

Re: Status as of Monday, April 24, 2017

Posted: Tue Apr 25, 2017 10:03 am
by tmopar
can python fork? that might be one easy way to get at least some kind of multithreading.

Re: Status as of Monday, April 24, 2017

Posted: Tue Apr 25, 2017 2:19 pm
by tmopar
i was able to turn some of my more complex php scripts multithreaded using fork and some synching... if you are on linux. I remember something in the past i read on here where you had a windows machine to do some testing on... If it runs under some variant unix I think you can do the fork ultimately either within the script itself or via some low level method.

Re: Status as of Monday, April 24, 2017

Posted: Fri May 26, 2017 1:19 am
by dfair98
Steve Sokolowski wrote:Python. That's a huge part of the problem.
Please tell me it's not Slush's Stratum mining Spaghetti factory?!