Got the memory leak
Posted: Tue Dec 12, 2017 12:34 pm
I figured out the memory leak. It turns out that, when we upgraded to the parallel servers, a tuple of data was being returned to indicate success in inserting a share: the (id, and the id of the mining server). Before, since there was only one mining server, only the id of the share was needed for comparison. In a stunning example of a shortcoming in python, rather than complaining about the comparison between a tuple and an integer, the language happily churned along and indicated that a (tuple) is always < an integer, and shares were never deleted from memory.
This problem had been present since mid-November, but it only became a major problem now since there were more shares being submitted. It also explains why some people were significantly overpaid last week despite the system being offline - because when it came back online, duplicate shares that had not been removed from memory were inserted.
We'll be restarting the entire system in 15 minutes to apply these fixes and to remove the debug code we were previously using.
This problem had been present since mid-November, but it only became a major problem now since there were more shares being submitted. It also explains why some people were significantly overpaid last week despite the system being offline - because when it came back online, duplicate shares that had not been removed from memory were inserted.
We'll be restarting the entire system in 15 minutes to apply these fixes and to remove the debug code we were previously using.