Page 1 of 1

Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 8:02 am
by Steve Sokolowski
Good morning!
  • I just got up and Chris told me that, yet again, there was some job that happened during the middle of the night that delayed shares. Apparently, the problem might have been caused by the addition of the swap space that was added to hold the shares pending insertion. Chris doesn't know why swap space would cause the problem and he will investigate once he awakens at 2:00pm. It's also unclear to us why these issues only happen in the middle of the night, so perhaps that's a clue to whatever is causing the problem.
  • Chris already corrected the balances for the period of delayed shares, so some customers might not even know that anything had happened.
  • Chris will be performing a release of many features today that will improve performance, reduce the probability of some types of crashes, and add features for miners. The feature list is posted elsewhere on the site.
  • Three more customers have had money stolen from them because they reused passwords from other sites. Someone is directly targeting our customers. The hackers have not found any security vulnerabilities in our system - these are extremely simplistic attacks with a simplistic solution: use a unique password. We can't reimburse for money lost to hackers who reuse your password from other sites. There are many password databases online for purchase - Sony's network hack from 2015, Target's, and even bitcointalk.org's hashed passwords from a hack they had.
  • Fortunately, here's a tip to create an uncrackable password: Go to brainwallet.io, click the "random" button, and memorize the first four words that are displayed as your password. Four random dictionary words are effectively impossible to crack with current technology. Two-factor authentication will be coming soon, but I fear it will not eliminate this problem because those who reuse passwords are also unlikely to enable two factor authentication.

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 8:29 am
by tmopar
Yesterday I posted a reply to the swap... Interesting about the swap being the cause, the only one i can think of is the swap is so much slower and theres no throttle on the workload coming in, or the swap fighting the main system for io time.

I recommend going with SSD's, and honestly straight pcie models for the speed. Its just soo much slower to use swap you have to put it on different disks or else the swap is competing for device time with the app/data devices.

SSDs are quick but this will generate a lot of extra wear when you swap to them (write/overwrite repeatedly) this why you need a raid so that when one of them fails (will happen eventually given enough load and time) the system will continue to limp and hopefully not crash until you can pop a new one in.

If you need to order the hardware, externalize the swap however you can best in the meantime, it might be saturating the underlying bus or controller conceivably.

Hope it helps,
Tmopar

<<<<< after I posted I thought of something...

If you have another server with a bunch of vacant ram there is an idea that might also work, you need dedicated (10/gigabit at least) and preferably ganged nics, between this machine and a 2nd machine with the free ram. Make a ramdrive on that machine and share it via NFS or samba or whatever. You can then create the swapfile there in that environment and point your swapping to it.

While there will be network over head you make up for it in the rams' speed, plus you might not need to order new hardware and there is no recurring cost in the SSDs. Its a little unorthodox but it might work and could be tested just by writing a program to intentionally fill up the memory and cause a swap condition and watch how it behaves under stress with both methods.

Hope that helps too,
Tmopar

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 8:52 am
by lilbob
There is also a phenomenon happening in the exchanges in the last two days due to the quantity of new users and amount of volume of trades. At the time of this post the Novaexchange website has slowed to struggling crawl. This slow rise and big jump in recent scrypt proffitability is due to this new activity. This is causing problems in blockchains that have been seemingly inactive for years. Old wallets are appearing for download, etc.
It would seem, after discussing it extensively with brother last night, that the arrival a new coin using 'tangle', the increasing success of Etherium, and the dangerous volatility of BTC leading up to the end of June are a combined avalanche of activity. This and preparation for the new machines that have come onto the market (probably pre-ordered, so there has been industrial mining with new machines for maybe two months already).

Much of this from what i understand of block verification would cause a slow down everywhere, i think. The recent changes to Prohashing were very well timed, i think much of the slow reactions are due to so many coins experiencing such a huge increase of BTC holders trickling them down to stabilize the risk factor.

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 9:01 am
by lilbob
From https://novaexchange.com/news/

" Performance issues and lag

We are currently experiencing an huge increase of new users and traffic since the last two days. We are working on stabilizing and scaling the site properly to handle the new increased amount of traffic.
Planned Maintenance Notice

Wednesday 7th June 2017 at 18:00 to 22:00 GMT+2 time we will have planned maintenance window on Novaexchange to improve stability, capacity and implement a smoother user experience. During this time shorter downtime up to 30 minutes is expected.
API Trading is temporarly disabled and will be enabled again after our site Maintenance.

We are sorry for the inconvinience and hopes for your understanding."

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 11:17 am
by VanessaEzekowitz
Since that glitch last night (looks like about 11:36 EDT), my hash rate in the "hashrate history" graph on the site has been showing 3-4 MH/s lower than expected (usually it's close to actual). My miners are hashing quite normally, and have been for a week now.

Also, for a brief time (a measure of hours, but I don't know exactly how long), my miners were working from a backup pool.

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 11:39 am
by VanessaEzekowitz
Also, @tmopar, while RAID is a good idea anyway, SSDs haven't really had a "don't use for swap" problem in quite a long time now.

While the math says that a modern SSD should theoretically only last a few months if you're doing continuous writes at maximum SATA3 speeds (600 MB/sec), you won't be doing that. Even at the consumer level, modern SSDs usually have good controllers, good wear-leveling and so much space reserved for re-allocating failed sectors that it'll take years for a drive to die from storage failure due to swap usage, if it dies at all.

You're more likely to have the controller crap out first, same as on spinning rust.

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 3:23 pm
by AvPro
So I see the new update brings " Last share processed: 2 minutes ago " next to live stats which is pretty cool!
Before we had this timestamp, what was the data before in terms of distance from "live data" - in the order of seconds or minutes?

Re: Status as of Tuesday, June 6, 2017

Posted: Wed Jun 07, 2017 4:49 pm
by Steve Sokolowski
AvPro wrote:So I see the new update brings " Last share processed: 2 minutes ago " next to live stats which is pretty cool!
Before we had this timestamp, what was the data before in terms of distance from "live data" - in the order of seconds or minutes?
If it was more than two minutes, it would display as the system being behind in a warning at the top of the page.

Re: Status as of Tuesday, June 6, 2017

Posted: Thu Jun 08, 2017 1:30 pm
by tmopar
VanessaEzekowitz wrote:Also, @tmopar, while RAID is a good idea anyway, SSDs haven't really had a "don't use for swap" problem in quite a long time now.

While the math says that a modern SSD should theoretically only last a few months if you're doing continuous writes at maximum SATA3 speeds (600 MB/sec), you won't be doing that. Even at the consumer level, modern SSDs usually have good controllers, good wear-leveling and so much space reserved for re-allocating failed sectors that it'll take years for a drive to die from storage failure due to swap usage, if it dies at all.

You're more likely to have the controller crap out first, same as on spinning rust.
@VanessaEzekowitz I wish I had your luck. I have had a lot of issues with SSD's through the years. The fact they might have a predictable problem, and since corrupted swap in the wrong place could bring down the server, dictates the only prudent and reasonable thing to do was to use the RAID as a redundancy.

Also to the point of the controller; I agree controllers can be a point of failure too. This is why i recommend getting the PCIe variants to take the swap stress -- whatever device is ultimately employed-- off of the main mining op's controller.