Status as of Monday, October 2, 2017
Posted: Mon Oct 02, 2017 9:05 am
Good morning!
- Yesterday, we successfully completed a release. The release addressed three main issues. First, the mining server will now automatically recover if it shuts down due to a memory leak or some other issue. Second, the "duplicate worker name" error will now be applied to the existing session, rather than the new session, so that when a miner disconnects and reconnects before the original session timed out, the new session will be able to continue using that name.
- Third, we significantly reduced CPU usage on the WAMP server by modifying the output to the website for coin status data. I wrote a library to calculate the differences between the previous data and the current data, and another one to reconstruct the data from the differences. That means that only the difference data needs to be transmitted to subscribers. The CPU usage was reduced from 60% to 40%, and bandwidth usage was reduced from 10Mbps to 5Mbps. There were no API changes in this release.
- We think that we can optimize even further by doing the same for miner data, but the interface for miner data is a public API. We'll be investigating how many customers are using bots that would be affected by a change in the miner status API.
- You can probably see that we've gotten ahead of the performance issues and that the releases aren't critical anymore. Therefore, we will be reducing the frequency of releases to once per week or two weeks to reduce downtime, unless urgent bugs are found.
- We determined that the cause of the spam filtering with Microsoft accounts was caused by a SPF record that was directed at an incorrect IP address. Once we corrected the record, the ticketing system stopped receiving rejections, so we think that that issue has been resolved.
- The next task will be to improve performance of the payout operations. The current operation is to start a transaction, execute the INSERT commands, then hold the transaction open while the coin daemon is contacted, and then either commit or rollback based on the effects. Unfortunately, there are so many coins now that the database can get behind during the time the tables are locked. There are a number of solutions we'll investigate. One is a risky one where we try to do the INSERTs and then rollback, execute the payouts, and then do the INSERTs again and commit. Another is to delay the payouts so that the 480 coins are paid throughout the day, which would still meet the time guarantee but the system would be easily able to catch up. The best option is to partition the balance tables by coin, which would also allow greater parallelism in the share inserters when payouts are not occurring.
- I'm starting to think that the network connectivity issues are almost entirely unrelated to network connectivity and instead are caused by the system working as designed. For example, some coins have "charity blocks" or special format blocks that would be unprofitable for us to support, so those coins go into error periodically. But when we ask customers what password arguments they are using, they often don't reply. Bear in mind that if you submit a support ticket about network connectivity, you'll have to be willing to answer questions so we can get to the bottom of the issue. Unfortunately, we have to close tickets where the only line is "I can't connect" or "I keep getting disconnected." The stratum protocol does not provide for an error message to be returned in the authorization function.