DeafEyeJedi wrote: ↑Thu Sep 08, 2022 8:09 pm
Steve Sokolowski wrote: ↑Thu Sep 08, 2022 7:41 pm
DeafEyeJedi wrote: ↑Thu Sep 08, 2022 6:27 pm
Copy that. Thanks, Steve!
We resolved the issue successfully. Because it's so late, I'll report exactly what happened tomorrow. Mining is back online, and sorry for the inconvenience!
Thanks so much, Steve! Looking forward to your initial report tomorrow. Happy Mining!
Thanks for your patience.
We discovered that the geth developers removed a field in their API calls that allow pools to determine how many are in, and how much value there is, of transactions in the next block. The removal of the field caused our daemon management software to fail to send new block notifications to the mining servers.
After some time, Vance restarted the mining servers to prepare for Autolykos2's release. At that point, because no blocks were being received from ethash daemons, the coins went into error because their DAGs couldn't be computed. The DAGs couldn't be computed because it wasn't known which block to compute the DAG for.
That issue was resolved quickly, but an additional problem was discovered that required an hour to troubleshoot. There was a bug in the daemon managers where, when a coin's host was changed (as Steven did when he reverted to the previous version of geth,) the new server's daemon manager would not retrieve the data from the database and therefore would not be aware that the coin had been moved to that server. Restarting the daemon managers on all servers resolved that issue.
I added a ticket and assigned to Vance that, before the end of the month, he will deploy a fix to the daemon managers so that, in the rare case that a coin is moved between servers, the daemon managers will now refresh from the database the new coin locations and respond accordingly without needing to be restarted.