Page 2 of 3

Re: System offline

Posted: Wed May 27, 2015 8:53 am
by Steve Sokolowski
prohashing.com will redirect to the server as soon as everything is restored.

From what I hear from Chris, the daemons, trader, website, and mining server are online, but the database is still copying from the backup. My understanding is that the copy will finish momentarily, he will bring the database online, change prohashing.com to point to the normal website, and then get on the 11:40am bus to go home.

If that's right, there should be less than three hours of downtime remaining.

Re: System offline

Posted: Wed May 27, 2015 9:24 am
by Steve Sokolowski
I was able to speak with Chris recently.

He got the database restored and the the system is starting up. Unfortunately, he discovered the coin daemons are taking a very long time to start up. We have never restarted the daemon server before, as the system had been up for 6 months without a restart. He thinks the disk is thrashing because the daemons are checking their blockchains at startup. It's interesting what you find when you do things that have never been done before (like restarting the daemon server).

The solution to this problem may simply be to leave the remote site and allow the daemons to start up while he is on the bus. The server will most likely resume normal operation within an hour or two.

Re: System offline

Posted: Wed May 27, 2015 11:32 am
by kires
What kind of beer does he like, and where should I send it?

Re: System offline

Posted: Wed May 27, 2015 12:01 pm
by Steve Sokolowski
Alas, Chris doesn't drink beer, so your kind thoughts would be in vain :(

Re: System offline

Posted: Wed May 27, 2015 12:15 pm
by dexubdg
Great job Brothers !
Thank you for reliable information and im waiting patiently :)

Re: System offline

Posted: Wed May 27, 2015 12:54 pm
by Steve Sokolowski
Everything is set to go at this point. He's just waiting for the daemon server to stop thrashing the disks. Altcoin developers don't give much consideration to performance or reliability, apparently. As soon as that happens, he'll enable the site.

Next time Chris has to shut down the daemon server, he will use "hibernate" mode rather than a reboot, which won't require all the daemons to rescan their blockchains.

Re: System offline

Posted: Wed May 27, 2015 7:07 pm
by Chris Sokolowski
The daemon server is going to require a bit more effort. The disk image that stored the block chains became partially corrupted when the server was shut off. When a Debian hypervisor is issued a command for a system shutdown, it proceeds to shut down the virtual machines one by one. However, it only gives each virtual machine 300 seconds to gracefully shut down before killing it. All but one virtual machine shut down before 300 seconds; the one that took too long was the daemon server, which was killed, causing disk corruption. It looks like about 10 of the block chains got corrupted. It's now just a matter of repairing the disk and then redownloading the affected chains. Once I have the disk repaired, I will start the server with the 140 working daemons; the 10 damaged ones will be disabled as I fix their block chains.

Re: System offline

Posted: Thu May 28, 2015 8:38 am
by Steve Sokolowski
Chris tells me that he was able to get 100 daemons running. They are three days behind and are all downloading blocks to catch up at once, so it will take a few hours. Once that's done, he will bring the system online early this afternoon. Then he will delete the corrupt blockchains for the other daemons and allow them to redownload, which will cause those coins to come online as soon as they are redownloaded.

This is quicker than restoring completely from backup because of the time required to copy and how far behind all the blockchains would be.

The only other file on the server that was damaged was the litecoin wallet. Since these wallets are backed up nightly, there was no money lost and restoring it was as simple as decrypting the backup and overwriting the corrupt file.

Tonight or tomorrow, I will write a summary of the steps that led to the disaster and what happened.

Re: System offline

Posted: Thu May 28, 2015 8:43 am
by kires
If I ever have grandkids, I'm going to tell them the tale of "The Great Reboot of 2015" when daemons laid siege to the boot drive, and battle raged across all of Serverland for days (and days?) before the forces of good, led by the intrepid Kernel Proc, did triumph and reopen the Port of 80 to the outer worlds. Might even make a movie of it. Hrmmm, I wonder if Michael Bay's working on anything at the moment...

Re: System offline

Posted: Thu May 28, 2015 10:36 am
by Steve Sokolowski
Such a movie wouldn't be complete unless there were heroes to save the day - someone who is willing to waste huge sums of money in redundant servers in separate locations that sync with each other in real time using gigabits per second of bandwidth.

The movie would also need villains - evil hackers who stole all the bitcoins by exploiting a new zero-day vulnerability from China, and then turned their proceeds into high-grade Ecstasy and lived the life of drug kingpins ever after.

Alas, such heroes or villains don't exist in this story. There are only boring people like you and me.