Status as of Tuesday, June 16
Posted: Tue Jun 16, 2015 9:20 am
Here's today's status:
- We spent the entire weekend resolving bugs and issues. I believe we made significant progress, especially in the number of exceptions generated by the mining server.
- There are also a series of changes to the website that are waiting for Chris to take action and release them. They resolve some annoying issues.
- We deleted a number of indexes on a lot of tables in the database, significantly reducing the number of writes. We also disabled some constraints and eliminated some triggers.
- Even though we will be spending almost all our effort during the next month solely resolving bugs, I'm not going to list every resolved issue here (unless someone requests it). Since resolved issues usually don't result in new features, and we are resolving so many issues, it would be a huge drain of time to list everything here.
- Our major issues continue to be system performance. In our continuing quest to improve performance and in direct response to rootdude's suggestions, Chris was moving daemons off the database hypervisor to two new systems. Chris discovered yesterday when he attempted to move the blockchains for some daemons to the new server that he could only sustain a transfer rate off the old disks of 1MBps. Transferring 0.5T of data to these new servers would have taken too long, so he instead is allowing the blockchains to re-download over the Internet, which will be faster and result in less reads to the old server.
- By the end of the day, Chris should have 71 daemons moved from the first server over to the secondary daemon servers. Our goal is to get about 130 daemons (2/3 of them) to the new servers, so that 1/3 of the daemons are distributed on each of the two new servers and the virtual machine on the original server, and then reevaluate performance.
- The recent mining issues appear to be related to poor performance. There are two problems. The first is that there is an inefficient program that notifies the mining server when a new block is received. This program is instantiated every time a block is received, and then it establishes a new connection to the mining server, sends a few bytes, disconnects, and cleans up. We can make this more efficient by having a single program hold a single connection.
- The other issue is that some daemons do not notify the mining server when new blocks are available, so we need to poll them. The polling occurs every few seconds. However, the daemon server gets overloaded easily because the disks are constantly thrashing, so these calls back up, and they eventually back up the non-polled block notifications. The daemons start using more and more disk reads, which then makes the database unable to perform reads, and the system slows down and share submissions slowly reduce over time. The solution to this problem would appear to first, redesign that program to use far fewer resources when a new block is found, and second, as rootdude suggested, move what we can to the other daemon servers to reduce disk load on the main server. Hopefully, this will be completed over the next few days.
- In the end, I think that a lesson to be learned here is that it is extremely easy to write software. You can even get difficult things like figuring out multiple merge mining working relatively quickly. But it is surprising that the hardware to run the software that we wrote, even after significant optimization, is so expensive. We could find ourselves in a situation where we have to reduce profitability to get the system to run until we have enough money to buy hardware, but reduced profitability would restrict the pool's growth to earn the money to buy the necessary hardware. That would be a strange situation.