Page 1 of 1

Status as of Wednesday, February 24

Posted: Wed Feb 24, 2016 5:22 pm
by Steve Sokolowski
Good evening! Here are a few updates on recent development. We thank everyone for putting up with the issues that some have encountered over the past week. While the overflowing "id" column problem on Saturday night was unrelated, many of these issues have occurred as a result of our recent release of significant changes to a new mining server. The changes were released to improve performance so that we can continue the pool's growth. Here, I'll review how the previous server worked and how the new version resolves the issues.

The previous server inserted shares in batches as they were submitted. Provided the difficulty of a share and the coin the miner was mining, the database reviewed what coins the miner wanted to earn and what the current prices were for the mining coin and the earned coin. The database then reviewed who was mining which coins. It inserted one share row, then one row per coin that the system was mining, and then one row for each of those rows for each coin the miner wanted to earn.

It was determined early in the system's life that it would be difficult to query all the shares every time a miner wanted to determine his or her balances, so a running total of earnings is kept in other tables. These rows are also updated after every share insert. With the old system, if miners were mining five coins and a miner wanted to earn five coins, the number of queries to record a share would be 156 - the share itself, the five mined coins, the 25 earned coin rows, and 125 rows that were updated in the five aggregate tables, because each was incremented 25 times.

When there were too many miners using the system, or too many coins were being mined simultaneously, or the miners requested a lot of payout coins, the system queues shares in memory. But the database couldn't keep up with all the queries, so sometimes it would take hours to catch up until load was reduced.

The only way to resolve this problem was to completely rewrite this extremely critical share-insertion code. The goals were to get the numerical calculations off the database server to reduce CPU usage, to reduce the number of rows that needed to be written to disk and later updated again, and to increase parallelism so that many operations could be done at once, instead of each operation depending upon the previous one.

To accomplish these goals, we wrote a database-operator program, which received shares from the mining server. Shares are now queued in memory until 1,000 shares need to be inserted (we may increase that to 10,000 shares in the future). Then, the shares are iterated through to produce very large queries that impact multiple rows at once. Using the example above with the same 5x5 parameters, one INSERT statement is produced that inserts all 1,000 shares. Then, one INSERT statement is produced to insert 5,000 mined coins, all at once. Then, another INSERT statement is produced to insert 25,000 earned coins, all at once. Finally, five UPDATE statements are produced to make one update to each aggregate table.

The effect is that 156,000 queries that affected 156,000 rows were reduced to eight queries that affect 32,000 rows. After the release of this code on Sunday, Chris performed some testing and determined that the database now performs about 120 times faster than it did before. The database performance is now sufficient to handle the combined hashrate of the entire litecoin network.

After those changes permanently resolved the database load issue, we realized that CPU performance on the mining server would become the next limiting factor. During our testing, we determined that the production mining server used to be able to handle about 300 concurrent miners. Our first thought was to produce a complex architecture of mining servers that connected to a central "coin server" that assigned coins to miners. However, we decided that we should first try to profile the existing server to see if any performance improvements could be made to allow the system to remain on one CPU.

We were able to reduce the CPU usage of the mining server by making a few major changes. First, we removed SSL support, since nobody was using it, there are no passwords being sent, and submitting shares doesn't need security. Second, we realized that a significant portion of time was spent in scrypt hashing. Since hashing is a self-contained operation that doesn't require sharing memory, we created a process that receives input data, computes the scrypt hash, and then returns the hash. Third, we realized that there was a lot of data that could be pre-computed, like converting numbers to string representations, and reusing the pre-computed data instead of computing the same thing every time. Finally, there was some data, like the coins to which a user was assigned, which changed much more rarely than every time a share was submitted, and the calculation of which could therefore be moved out of the share submission functions.

The result of the enhancements was that the system can now handle about 1500 concurrent miners, or about five times as many as it could before. By comparison, Chris estimated that Clevermining had about 4000 concurrent miners before the company began its decline.

On Sunday, we decided that the improvements were sufficient for the indefinite future and we froze the code for a release. Even though we encountered problems, we decided that, at some point, we had a make a permanent decision to move forward and not revert. Thus, there remain some issues with the system that we continue to resolve every day. One of the reasons for there being so many issues is that it is nearly impossible to test all of the situations that occur on the production mining server because it would cost too much to buy a duplicate system that can handle as many coins as the production system does.

Some of the issues that we've fixed so far include balances being subtracted instead of added, a race condition that could cause blocks to fail submission and put the system behind until fixed, and inaccurate earnings when certain coins were assigned. Some remaining issues include miners with a static difficulty being assigned to coins that are too easy for their difficulty, and an unknown increase in "miner is using an incorrect algorithm" rejected shares. We are continuing to investigate these issues and plan to release fixes every day until they and others are resolved.

At the same time, there were also trading issues that needed to be resolved to support future growth. Chris discovered an issue that had apparently gone unresolved for some time where coins would be priced against a minor market like dogecoins, but the trader always traded against the bitcoin market because there were no dogecoins in the exchange to purchase. Thus, miners unintentionally could be paid more than the market value for their coins in some circumstances. Chris resolved the issue and issued the changes yesterday evening.

We plan to begin an aggressive marketing effort today to take advantage of the system's new capabilities, and we won't be adding new features for a while to ensure stability. Please report any issues you encounter so that we can fix them quickly. Thanks for dealing with the recent issues while we made major changes to make the system able to handle a significant increase in usage.