Status as of Saturday, August 26, 2017
Posted: Sat Aug 26, 2017 7:39 am
Good morning!
- Chris has been focusing on network issues over the past day. We believe that connectivity issues are responsible for almost all problems with the system at present.
- So far, he fixed three issues. First, he found that Javapipe was blocking connections to DNS servers other than 8.8.8.8, so when something tried to connect to the backup nameserver, a long delay occurred. By removing the backup nameservers from the configuration, there is no longer a delay before a retry attempt is made to the primary nameserver.
- Second, he found that since Javapipe doesn't support ipV6, and the nameserver was resolving names to ipV6 addresses, services would try to connect to those addresses and be unable to do so. Chris disabled ipV6 on all servers and now DNS lookups always return ipV4 addresses. That results in fewer connectivity issues on ipV6-primary connections.
- Third, there was one IP address that was uploading a huge amount of data to the webserver for an unknown reason. He banned that address and added code to prevent that from happening again.
- He also discovered that Javapipe is having routing issues with its current datacenter, and that they are migrating customers to a new datacenter. They agreed to provide Chris with a second server in the new datacenter before other customers, and this new server has a third-generation processor instead of the first-generation processor the previous servers had. That means that OpenVPN will have a significant increase in performance, and the routing issues should disappear. This new server is not configured yet, however, because Chris will need more time this morning to finish the task. He expects to finish setup by tomorrow, and then transition to the new server. Chris's troubleshooting last night was responsible for a few server restarts.
- On the WAMP side, there are two fixes on which I am focusing. First, I'm creating additional "realms" so that CPU usage for WAMP can be distributed across more cores. Second, I'm going to resurrect the code for the "difference engine," which will reduce bandwidth usage for WAMP by a factor of 10. I hope to complete both of these fixes by the end of the weekend. TransportLost() exceptions have been causing unknown effects on profit recording.
- Once the WAMP issues are resolved, and the connectivity issues are fixed, then we will reevaluate and see whether share issues are still occurring and take further action there. At present, we can't perform a true investigation of the other issues because we need to rule out that disconnects are responsible for them.
- Chris completed the spreadsheet evaluation of future plans, but I still have to review it. I plan to do that next week after we get the connectivity issues fixed and WAMP resolved.
- To reduce confusion about the evaluation process, I created a FAQ and plan to post it later today. Tickets asking about future plans for the system will be directed to the FAQ. The FAQ might also be of interest to people here, as it will aggregate information into one location.