Page 1 of 1
Status as of Wednesday, June 3
Posted: Wed Jun 03, 2015 9:37 am
by Steve Sokolowski
Here's today's status:
- We figured out the problem with the incorrect profits being recorded on Monday that resulted in the 15% bonus that was paid to customers. It turns out that there are buy orders that appear in some markets for 1 satoshi that close a wide spread between the buy and sell price. These orders are probably placed by coin developers or someone with a bot trying to make certain coins look more valuable than they actually are. This problem happened last October at Cryptsy, and we resolved the issue by ignoring them, but those changes were never made to the "polling" APIs like Comkort. We duplicated and deployed the changes for those other exchanges, so these rogue orders will not be able to influence prices again.
- Comkort suddenly announced yesterday that they are going bankrupt at https://comkort.com/blog/news/comkort_close. This unfortunate situation means that some coin developers, whose coins had only one exchange listing them (Comkort), will have their coins discontinued by us. Chris will be posting a list of 20 coins scheduled for deletion soon. If these developers can get their coins listed somewhere else, we'll be happy to re-enable them.
- Yesterday's problem with lost shares for a few minutes in the morning was caused by the block explorer indexing too quickly. I reduced the number of threads indexing, but now it will take years to index all the daemons because the disks are too slow. It's difficult to understand how an 8TB RAID 10 with two SSDs in CacheCade and a several-GB battery backup write back cache can get overloaded. If these disks can't handle the load, then I don't know what can. It may be because our databases consist mostly of writes, with few reads. My major focus on Saturday will consist of looking through the entire system and reducing disk writes through software optimization.
Re: Status as of Wednesday, June 3
Posted: Wed Jun 03, 2015 4:39 pm
by rootdude
Hey Steve -
As an infrastructure and storage specialist, this is why we use (in our enterprise and others) one array for writes, and one array for reads/reports. That way the data layouts of each storage devices matches it's proposed use. I know that's cost prohibitive for many, but if you are leveraging a cloud architecture, it's doable. Have you looked into other storage solutions and cloud database offerings? If you have any questions, drop me a PM and I'll do what I can to help. I have a number of resources here at work who specialize in this besides myself, and I can bring those resources to bear as well if necessary. We may be able to reconfigure the storage into two different pools each with a different data layout to take some pressure off as well.
Rootdude
Re: Status as of Wednesday, June 3
Posted: Wed Jun 03, 2015 5:17 pm
by Steve Sokolowski
Hey root,
We'd be glad to talk with you, but I first want to spend the weekend working on optimizations. I've never even tried to reduce the number of writes before now, and I know that I wrote queries in the past just to make sure they worked, with the intention of fixing them later if performance ever became an issue. This morning I was able to find one query that will eliminate 1 billion writes over the course of this blockchain indexing, so I don't want to take any hardware action until we've at least spent a few days on software.
As to cloud infrastructure, we specifically made a decision on day one that we would own all the physical hardware ourselves. Many of the high-profile hacks in the past have been as a result of social engineering on the part of hosting providers, or rogue employees who stole wallets. The passwords to these servers are not stored on any electronic media, and we have the only physical keys, so there is no way that an employee at another corporation can get to our servers.
In case you're wondering what the problems are, the number of reads is inconsequential to the number of writes because we have to store all the shares, retain all the pricing data, and compute balances for taxes, track blocks to see if they are orphaned, and to figure out what happened if our debts don't match up with what we should owe. Daemons constantly receive new blocks and write them, but the rest of their blockchains are never read. I'd say that the ratio of writes to reads is 1000:1. Using memory tricks, I think I can get it down to 10,000:1 this weekend. But even if I do that, I think the daemons perform far more writes than the database does.
There are three problems with these writes. The first is that compressing all the running transaction logs requires a huge amount of CPU. Second, almost half the bandwidth is being used to upload this huge stream of compressed data to a backup server across the state. Finally, daemons are generally not programmed well and we can't optimize them like we can with our code, so they thrash the disks constantly. Our SSD caches are over half-written now and will fail in a year due to daemon abuse.
Even if we could transfer the database to a cloud server, we would also have to shut down the site and send 8TB of highly sensitive disks by mail, since uploading the data would take longer. I think the solution is much simpler. First, we can eliminate reads through software optimization so that the disks can spin and write the data in sequence, rather than performing random writes; and second, we can move the daemons to the two new low-end servers, so that the database is then responsible for most of the disk activity on the most powerful one. If that doesn't work, then we'll probably be calling you for help.
Re: Status as of Wednesday, June 3
Posted: Thu Jun 04, 2015 8:31 am
by rootdude
Given what I read above, it'd make a lot of sense to run the daemons against a differently configured array than the rest of the operation for many, many good reasons.
The daemons write data that doesn't need to be protected via mirroring and striping, and further, SSD's wouldn't be the choice of media for these either (expensive). The data the daemons write can be refreshed from re-downloading any blockchains with issues and SSD's won't give you any better performance or reliability (the IO restrictions in the daemon's themselves would be the limiting factor, not the disk performance or connectivity performance). I'd recommend a very inexpensive solution for this storage that was merely iSCSI connected and would set up something along the lines of two NAS systems for high availability which shared the private keys of the daemon wallets with one another (the wallets would run on two small VM's connected to the NAS storage). That way if any disk in either NAS array failed or a wallet server went down, you'd merely need to point to the other server while replacing a drive. Further, each of these storage arrays would run as Raid 0 which would provide the best write performance. Offloading the wallet daemons from the main DB storage would leave a TON of I/O available on the main SSD / Raid10 solution for share capture and everything else. Two VM's (one for each 'wallet server') backended to it's own Raid 0 NAS.
There are a large number of inexpensive NAS servers that are available for such things. QNAP is a great, scalable solution for this.
Regarding the main SSD / Raid 10 array - once you offloaded the wallet daemon writes, you'll clearly be able to tell whether or not upgrading the main storage to a different RAID type or communications medium would be of any benefit. This will also serve to increase, markedly, the longevity of the expensive storage array with SSD's you are currently employing. It'll improve recovery and backup times as well.
Those are my first blush thoughts on getting things running more smoothly.
Re: Status as of Wednesday, June 3
Posted: Thu Jun 04, 2015 9:26 am
by Steve Sokolowski
We're already one step ahead of you, actually. Chris is removing daemons from that server as we speak. But hopefully you can also understand that mining has razor-thin margins. Chris and I would love to use your services, but we can't afford them and still be profitable. Even if the existing disk array might not be ideal for this task, we need to try to software optimize before making any purchases, simply because we already own this array.
The new "servers" that Chris purchased follow your plan of being cheap. The only thing that matters from those daemon servers is the wallets, which can be compressed and backed up after every payout that produces change. These two machines are essentially desktop computers that cost $700 each, and have 32GB RAM, a midrange processor, and a 256GB SSD in each. What's great about these is that because they have SSDs like that, and they only have daemons on them, we can shut them down, make a byte-for-byte image, and the daemons on them just go into error for a few minutes. Blockchains are highly compressible and it probably only requires 30GB of bandwidth to get the entire machine to the backup location. If we do that weekly, the worst that happens is that 50 daemons are offline for the time it takes to fix the server and update their blockchains.
Last night, I was able to get rid of a major issue with writes. Previously, when we computed payout proportions for each share, we wrote the adjusted proportions (for when coins have errors and cannot be paid out) to disk, then queried the table. Now, we do that using postgres functions in memory, since we don't care about the data in that table after the share is written. The size of the compressed transaction logs decreased from 750MB to 560MB every 15 minutes.
These transaction logs, I believe, are the core of the problem. Compressing them uses CPU power. Writing them causes the disk heads to go all over the place. Transferring them to the backup location requires a lot of bandwidth.
I believe we can get them down even further by changing some INSERT then DELETE command to single UPDATES, which is what I'm going to do tonight. Once we can get these logs down to 100MB per 15 minutes or less, then we can start addressing the issue of disk fragmentation, which is at 34%.
Re: Status as of Wednesday, June 3
Posted: Thu Jun 04, 2015 9:55 am
by rootdude
I'm glad you are well ahead of my thinking on this - I don't have any visibility
Re: Status as of Wednesday, June 3
Posted: Fri Jun 05, 2015 10:50 am
by silvernonce
cheap and nice solution. Just stick 4 SSD drives and this baby has enough CPU power and memory to do the hard work.
http://www.ebay.co.uk/itm/DELL-XS23-SC- ... 2ee1886856