Jump to content
Covecube Inc.
  • 0

Scanner doesn't appear to work


bissquitt
 Share

Question

It scans every so often and always comes back healthy, but in that time I have had 4+ drives just show up as missing in my drive pool. I always try a reboot to see if it will come back. It never does. I then connect the "missing" drive to my other computer and run a scan, every time there has been some sort of error that has made me RMA them. Each time I cant initialize the disk giving me an error that its write protected.

 

Even when the disk is missing in drive pool, scanner doesn't show any errors, it just doesnt have the disk there. How is drivepool more informative about drive problems than the software made specifically to detect it. Why is Scanner not picking up any of these errors or even telling me that it noticed a drive missing? I have to get a new drive rush shipped in and shut the server down completely until I can put it in because scanner simply isn't reliable and if another drive crashes I'm toast. At this point its pretty much just straining my disks and eating up ram.

 

Its also really annoying that I can't tell if I've lost anything. Everything in my pool is duplicated so I SHOULDN'T have, but I would hate to need a file 4 months from now and find it not there. If everything is duplicated, the software should be able to see that the file was duplicated and verify that theres a copy that exists. It notices that files are no longer duplicated when it rebuilds.

 

I'm really tired of losing disks without warning, and then wondering if all my files are ok each time....and it compounds my worry every time it happens.

Link to comment
Share on other sites

13 answers to this question

Recommended Posts

  • 0

Most likely, the disk isn't showing up in the system. 

 

When this happens, check "Device Manager".   Chances are that the disk itself won't show up here, either.  And if that is the case, then the system is not aware of the hardware, and that StableBit Scanner is able to query info about it (we can't query unknown disks that are "not connected" to the system). 

 

The reason that StableBit DrivePool catches this, is that it expects the members of the pool to be present, and keeps track of them. If one or more of these disks are missing, it's can be VERY bad.  Duplicated data needs to be reduplicated.  Unduplicated data on that disk is gone.   

So it's handled very differently. 

 

 

Also, in StableBit DrivePool, once the missing disk is removed, it does run a duplication pass on the pool, and reduplicates files as needed.  So if this is only happening one disk at a time, then you definitely shouldn't be losing anything. 

But if you're especially concerned about this, you may want to check out this thread: 

http://community.covecube.com/index.php?/topic/1865-howto-file-location-catalog/

 

 

 

However, if you're experiencing a lot of disk failures, you may have other issues.  Especially if the new disks are connected to the same ports and cables.   In fact, if you're using the same power hookup and experiencing multiple drive failures on that cable.... you may have a faulty power cable or power supply.  If that is the case, you may want to replace it before it "eats" any more hardware. 

 

And if this is what is happening, then StableBit Scanner won't catch it.  it relies on the disk's firmware to report issues.  However, if that firmware is being shorted out, it isn't going to report anything. It couldn't. Which means StableBit Scanner won't be able to catch it.

Link to comment
Share on other sites

  • 0

I am a systems engineer by trade. The disks are indeed missing from the system. My question wasn't so much "Why can't you see something thats not there" but "Why didn't you see that it was going bad before it was gone"

 

In every case when a test was run on the drives after "missing" it reported multiple smart errors or bad sectors. It seems like this is the likely cause of the failure and something that scanner should have caught.

 

I offered up the fact that ALL drives were unable to initialize due to write protection as additional information for troubleshooting since, as an IT professional I know any and all information is helpful to have since I'm not familiar enough with the way your software works to know if its relevant.

Link to comment
Share on other sites

  • 0

To be blunt,  StableBit Scanner does check the SMART data very frequently (and for some people, too frequently, as it does wake the disk).   

 

Do you have notifications enabled on StableBit Scanner (eg, email, text, or mobile)?  If so, you *should* get notifications from that. 

But if you do have notifications enabled and are NOT getting notifications, it may be that hte disk is dropping from the system before StableBit Scanner is able to get to the information. 

 

This could be caused by the controller they're attached to, or other hardware issues. 

In some controllers (mostly actual RAID controllers), they will drop the disk completely, if they take too long to respond.  This could be the issue here.

 

 

 

I offered up the fact that ALL drives were unable to initialize due to write protection as additional information for troubleshooting since, as an IT professional I know any and all information is helpful to have since I'm not familiar enough with the way your software works to know if its relevant.

 

 

As for the write protection, that's odd.  

 

I know there are disk commands that can be used to do this.  DISKPART has an "ATTRIB" command that can do this for the disk or volume.   And since this is done through the normal DISK subsystem, anything with access to the API could do this.

 

 

 

 

 

As for how our software works... the SMART querying is done primarily through WMI, but also through commands to the controller/drives directly.   

The surface scan works by addressing each LBA sector on the disk, as well as a custom NTFS parser.  

The file system scan is an API call, that leverages the same code as CHKDSK. 

 

 

 

 

So again, the only reason that StableBit Scanner wouldn't get the SMART data to warn you about the issues, is if it dropped from the system BEFORE they registered in the SMART data.  Which would *usually* be a controller related issue. 

 

You may be able to check/verify this yourself, by checking the Event Viewer.  In the "Windows Logs" section, the "System" log may have disk, or controller errors that indicate what was going on.  

If you would like, export this log and upload it to us so we can take a look.

http://wiki.covecube.com/StableBit_Scanner_2.X_Submit_Files

Link to comment
Share on other sites

  • 0

I'm going to be equally as blunt. I wan't a refund.

The product simply does not work. I can't trust your product to alert me when there is an issue, I have encountered many, and your product has not informed me of a single one.

I just had 3 additional drives go missing at the same time, All are now write-protected and can't be initialized. Can't unwrite protect in diskpart either

I now, have to go through the unimaginably long process of figuring out which of my 40TB of files are missing because you don't even log which files are on which disk.

 

Notifications: they are on, email and mobile, they have been tested, they work.

 

Drivepool: Great

Scanner: absolute horseshit

Link to comment
Share on other sites

  • 0

Hi

 

I use both scanner and drivepool and have done for many years and if there is ever a problem both have always sent a corresponding email notification, are you sure that it is working and the emails are getting sent drivepool sends a missing drive email and scanner send the smart error emails I know this doesn't help you but the system does work I personally would double check everything even the spam folder

 

Lee

Link to comment
Share on other sites

  • 0

Hi

 

I use both scanner and drivepool and have done for many years and if there is ever a problem both have always sent a corresponding email notification, are you sure that it is working and the emails are getting sent drivepool sends a missing drive email and scanner send the smart error emails I know this doesn't help you but the system does work I personally would double check everything even the spam folder

 

Lee

 

I have checked and once in the 3 years or so that Ive had them I have received a text. This happened with my current configuration. I also test the texts as well. I appreciate the response though. I also manually check scanner once a day with no reporting of any errors (I stay RDPed into the server)

 

It just seems HIGHLY suspect to me that 3 drives would all go bad on a reboot when they were functioning fine prior and no indication of errors. Add to that the fact that ALL are uninitialized, marked as read only and no diskpart or anything will let me access them.

 

This has happened prior with 2 or 3 other drives always one at a time, I figured they were just bad and replaced them. 3 at the exact same time though is unlikely, especially when I'm on a commercial UPS protecting from any outside surges or such.

 

As far as I can tell, either drive pool, scanner, or win server 2012r2 is killing drives

FYI this is the most relevant error message that I can find.

"The IO operation at logical block address # for Disk # was retried"

Link to comment
Share on other sites

  • 0

Hi ok let's try help get the sorted because from what I am seeing this is looking more like a hardware issue first culprit is the power supply I know this because it happened to me.

When mine was starting to fail these are the events I would suffer first on reboot certain drives would fail and by that I mean drop from the hba controller so I got 1 or 2 missing drives on reboot. However once the server rebooted it was possible just to remove and and reinstall because of hotswsp I didn't think any more of it.

Then when scanner runs its surface scan I would get the same 1 or 2 drives would drop I would also get unreadable sector warnings also.

So after this going on for a while I decided some of the drives were just old and failing so I bought 2 new harddrives so I could start the process of changing them this is when I got that sickly feeling right at the bottom of your stomach. Once I installed the first drive to be blunt the shit hit the fan the array of now 20 drives went berserk drives started dropping and reconnecting left right and centre drivepool and scanner must have sent me a 1000 email's. So I shut everything down and did the painful process of checking every single drive on another machine and guess what they were all fine no errors no disconnecting nothing. That was the good news so first thought was my hba card was knackered So out it came a mass of wires on the kitchen table and spare components later I fired it up and hmmm it was all fine not one drive dropped for over 24 hrs ok so what did I have or what had I done this was my list

Motherboard
Cpu
Memory
Cables
Hba
Psu

Ok so by pure luck the easiest part to remove from the server array was the psu so swapped that with the one on the kitchen table and wham drives dropped shit load of errors so I changed it for a whopping 850watt psu over kill for 24 drives but it has never had a single blip since.

I know this may be a bit long but the point is your problem could be anything any single part could be failing and while a complete pain in the ass every part needs to be checked. If you don't have the spare to test I would try disconnecting everything that is not needed on the server even half of the drives and slowly re add stuff until the problem starts.

I know it's shit but it's the only way we learn through trial and error I now know if if my drives start to drop the first thing I will be checking is the psu.

Keep posting and we will keep trying to help.

Link to comment
Share on other sites

  • 0

PSU is 750W it could be going bad, but the problem isnt consistent with a bad PSU. (I'm currently at 20 drives FYI)

The drives drop seemingly randomly, and weeks if not months apart from each other with the exception of this 3 drive incident.

A bad PSU would just not spin the drive up and it would go missing. It still shows in POST and in disk management, its just an uninitialized disk that you cant initialize. A bad PSU would not do this either.

 

I finished testing all 3 failed drives. 2 are showing bad sectors and the above errors. The 3rd is completely fine in another system it just wont show up on the server in any bay.

Connected it via a USB enclosure and it is showing up fine. As soon as the system is done checking and balancing from the failure, I'm going to gracefully remove it, reformat and reinsert.

 

Not sure why it would not recognize but I'm hoping Drashna or someone else at covecube can shed some light on that. (Its not a power problem because 2 other drives were just removed and I can feel it spinning up)

 

I still feel like something is damaging the drives. One of them was a month old that was stress tested before being put into the pool.

 

For reference this is the server

http://secure.newegg.com/WishList/PublicWishDetail.aspx?WishListNumber=18563174

Link to comment
Share on other sites

  • 0

I don't disagree that the psu is ok but fluctuations in power does damage components and since you have rightly so a ups we can rule out mains been the fault.

so we come back to either psu/cables or the storage controller and for a disk to suddenly become uninitialised I don't think any software would just randomly do this without user input and it's not something that either drivepool or scanner is capable of doing. So getting back to the process of elimination if we are confident all the hardware is functioning as it should and been tested as best as it can be what software is installed that has the ability to reconfigure harddrives.

I have experienced the random disconnection of harddrives which lead to corrupt data, damaged sectors and only on 1 occasion did have a drive become uninitialised and it was down to a faulty psu and believe me I blew £400 on a brand new Lsi controller as in my haste I thought it was a no-brainer. I can only advise on my own experience check and double check everything and if you have spares try them also before spending any money.

I checked your link everything looks really nice is that all in the 1 norco case in my chenbro 24 bay case it literally only houses the harddrives/psu and sas expander which connects to the actual server which houses all the components including the Lsi hba my setup is in the rack mount section on here I'll try add the link but my 850watt psu only powers the harddrives nothing else. Drashna use's his Norco in a similar way to you but he runs a 1250watt redundant Psu

 

http://community.covecube.com/index.php?/topic/5-my-rackmount-server/&do=findComment&comment=108

Link to comment
Share on other sites

  • 0

If you want a refund, that isn't a problem.

 

However, the disks going "write protected" is HIGHLY unusual.  At best, they should be "offline" (default for new disks in Server OS's).  But not write protected. 

 

But then again, you mentioned a Norco. If you're using Norco backplanes for hotswap... that may actually be the source of your issues.  Norco is known for using sub-par parts, which cause all sorts of bizarre behavior. 

 

 

As for the redundant power supply, the Norco may be able to take one, depending on the specific configuration. 

 

 

 

Drashna use's his Norco in a similar way to you but he runs a 1250watt redundant Psu

 

http://community.covecube.com/index.php?/topic/5-my-rackmount-server/&do=findComment&comment=108

 

Once I stopped using my Norco case, a lot of the weird disk behavior I had went away. And Checking out reddit, I'm far from alone in this. Unfortunately.

 

 

But again, if you do really do wish for a refund, head to https://stablebit.com/Contact

Link to comment
Share on other sites

  • 0

If you want a refund, that isn't a problem.

 

However, the disks going "write protected" is HIGHLY unusual.  At best, they should be "offline" (default for new disks in Server OS's).  But not write protected. 

 

But then again, you mentioned a Norco. If you're using Norco backplanes for hotswap... that may actually be the source of your issues.  Norco is known for using sub-par parts, which cause all sorts of bizarre behavior. 

 

 

As for the redundant power supply, the Norco may be able to take one, depending on the specific configuration. 

 

 

 

 

Once I stopped using my Norco case, a lot of the weird disk behavior I had went away. And Checking out reddit, I'm far from alone in this. Unfortunately.

 

 

But again, if you do really do wish for a refund, head to https://stablebit.com/Contact

What did you move to past the norco? (Nevermind, found it but cant delete comment)

Link to comment
Share on other sites

  • 0

What did you move to past the norco? (Nevermind, found it but cant delete comment)

 

 

Well, I'll answer here:  A 36 bay Supermicro case I found on eBay.  I ended up spending ~$600 all in all ($270 for the case, $40 for 45 drive bays, $250 for SAS2 backplanes to replace the SAS1 ones, and $60 for mounting rails). 

.

The different in build quality is tangible. I mean, it was pretty much night and day.  I highly recommend Supermicro hardware because of just how nice it is. And eBay and the like are a great way to get these cases at a good price. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...