Jump to content
  • 0

Drive suddenly missing!


Sonicmojo

Question

Hello,

After months and months of solid service - I suddenly received an email from Scanner this morning saying a disk was "missing". I am using both Scanner and Drivepool on a Windows 2012R2 server. I have 5 drives in the system - one SSD for  system and 4 4TB Seagate NAS drives to form the pool. ALL NAS drives are setup with mount points (No drive letters).

All I know so far is that the disk is not visible in Disk Management or in CrystalDIskMark when run each within Windows when booted. I have done nothing else so far. 

Changes to the system within the last few days has only been to update to a beta edition of Scanner 2.5.2.3129.

While I am realistic and drives do fail - this is most odd. The only other issue that may be a contributing factor is that Windows Backup was set to run at 3:00am on alternating days and I do see a series of "disk" type messages in Event viewer around these times for the last 2 weeks or so - but this is the first instance of an actual problem,.

What should be my next move so as not to make things worse - AND to avoid data loss? Should I run Stable bit troubleshooter? Power down the box?

Appreciate an update ASAP.

Cheers,

Sonic.

 

Link to comment
Share on other sites

10 answers to this question

Recommended Posts

  • 0

I would consider:

1. Identify which HDD is reported missing
2. Shut down server
And now either or:
3. replace cable / connect to differen port / attach in other machine

If you can find it in another machine then it should nto be the HDD. It may be the port, the cable or simply that a good cable became a bit loose.

I would not power up server until that HDD is attached again (although it should not be a problem I think).
 

Link to comment
Share on other sites

  • 0
14 minutes ago, Umfriend said:

I would consider:

1. Identify which HDD is reported missing
2. Shut down server
And now either or:
3. replace cable / connect to differen port / attach in other machine

If you can find it in another machine then it should nto be the HDD. It may be the port, the cable or simply that a good cable became a bit loose.

I would not power up server until that HDD is attached again (although it should not be a problem I think).
 

Thanks! Yes - I shut down the server before leaving for work today. 

As soon as I get back home - I will pull the drive and bring it over to my personal workstation and see what I can see. If I can spin it up on another machine - I will move to data copy immediately.

I have never seen a drive simply die without any warning whatsoever - especially a NAS specific drive that should be a bit more resilient.

The most ironic part about this is that I did not have Scanner set up correctly on this server for months and months and now - after just two days of having it correctly auditing the environment - I sudden have a drive fail? Karma or what?

Now - assuming I can spin this drive up - and I can get at the data - what is the best way to get the data over to a replacement (which I am going to move forward with anyway) and rebuild the pool?

Sonic

Link to comment
Share on other sites

  • 0

I assume you do not have duplication on the Pool?

In that case, *if* you can read from it on another machine, I would really try and reconnect in server using a different cable and/or port. If you're anal about it you want to know whether it is the cable, the port or whether the current cable became a bit loose only. I would want to know.

If you get it OK on the server again, I would force scanner to rescan the drive (mark readable sector unchecked or somesort).

On replacing, and assuming you have a port available and the HDD in question can be read (and is connected so that the Pool is in OK condition), what I would do is:
1. Add the new HDD to the Server
2. Add the new HDD to the Pool
3. Select the suspect HDD and chose Remove.
The DP will do evertything for you. There are other perhaps faster ways but I rely on DP and rarely mess up that way ;)

 

 

Link to comment
Share on other sites

  • 0
1 hour ago, Umfriend said:

I assume you do not have duplication on the Pool?

In that case, *if* you can read from it on another machine, I would really try and reconnect in server using a different cable and/or port. If you're anal about it you want to know whether it is the cable, the port or whether the current cable became a bit loose only. I would want to know.

If you get it OK on the server again, I would force scanner to rescan the drive (mark readable sector unchecked or somesort).

On replacing, and assuming you have a port available and the HDD in question can be read (and is connected so that the Pool is in OK condition), what I would do is:
1. Add the new HDD to the Server
2. Add the new HDD to the Pool
3. Select the suspect HDD and chose Remove.
The DP will do evertything for you. There are other perhaps faster ways but I rely on DP and rarely mess up that way ;)

 

 

I do have duplication in the pool - but only on one specific folder in the pool (Videos). Will this make a difference?

But i agreed with you on the process. If I can get this drive to behave (new cable, different port etc) - I would most certainly try to put in back in the server - add the new drive and then try the removal.

But let's say the worst happens - like for example - I can only spin this disk up long enough on another machine just to get the data off of it? 

Is there another more manual guideline out there to move the data back into place AFTER a new disk has been added to the pool? Without actually having the old disk in the server? 

My fear is that if this suspect disk is getting actually close to a major failure - I really do not want to chance trying to spin in up in the server - and possibly compromising it completely.

Thoughts?

Sonic.

 

 

Link to comment
Share on other sites

  • 0

Well, if everything was duplicated, you could have removed the HDD in DP, even if it was missing already, add another HDD and have DP rearrange and reduplicate everything.

If you can read from it in another machine then making a quick copy of the files is never a bad idea but... are they not also on your Server Backup? If you want to feel secure, sure, make a copy.

Now let's assume you make that copy and the HDD then fails. You have all your files but I am not sure how to efficiently put it all back into place. My guess would be to:
1. Add new HDD to Server and Pool
2. Stop the DP service (not sure if neccessary)
3. Copy all the files from the failed HDD, so now present somewhere else, to the hidden poolpart folder on the new HDD.
4. Start the DP service (only if you did 2)
5. Let DP do its magic. You may want to press re-measure.

I would hope that DP manages everything for you but WARNING I am not sure.

Christopher would know...

Link to comment
Share on other sites

  • 0
18 hours ago, Umfriend said:

Well, if everything was duplicated, you could have removed the HDD in DP, even if it was missing already, add another HDD and have DP rearrange and reduplicate everything.

If you can read from it in another machine then making a quick copy of the files is never a bad idea but... are they not also on your Server Backup? If you want to feel secure, sure, make a copy.

Now let's assume you make that copy and the HDD then fails. You have all your files but I am not sure how to efficiently put it all back into place. My guess would be to:
1. Add new HDD to Server and Pool
2. Stop the DP service (not sure if neccessary)
3. Copy all the files from the failed HDD, so now present somewhere else, to the hidden poolpart folder on the new HDD.
4. Start the DP service (only if you did 2)
5. Let DP do its magic. You may want to press re-measure.

I would hope that DP manages everything for you but WARNING I am not sure.

Christopher would know...

Update

Yesterday prior to heading out for the day - I shut the server down to ensure the drive in question would not be subject to any further data loss etc. I spent some of the day researching Scanner, Drivepool and some of the other event log messages I was seeing on the server early yesterday morning when this disk went missing.

When I returned home from work - I read that my first move here should be to get this drive out of the Pool - as DP invoked it's "read only" mode to the entire pool due to the missing disk. So I fired up the server expecting to see the missing disk in DP. Much to my surprise - the server came up normally. All disks were online and acting normal. No messages in Event log. No messages in DP. No messages from CrystalDiskMark and no message from Intel Rapid Storage manager. Basically no messages from anything about anything.

I immediately moved into analysis mode and started Scanner to have a good look at this disk. It scanned the entire disk and found nothing. All SMART readings are normal. As far as I can see - there is nothing wrong with this disk given all the angles I probed it from.

A more interesting viewpoint is what I saw from other research regarding Windows Server backup (on 2012R2). I found a few articles and forums posts going into details about VSS, hot swap, filter manager and other oddball scenarios - where basically the Intel Rapid Storage Manager somehow thought that Disk 2 on this server was a hot swap disk and Windows (if just for a short minute or two @ 3:00 am yesterday (the time server backup runs) - decided this disk was pulled or disconnected and marked it as "missing".

Even a reboot did not bring it back online. But after a complete shutdown? Success.

I have had similar situations when using hard disks in hot swap trays on my workstation and the odd time Windows cannot "eject" a disk properly during hot swap and suddenly decides to make it invisible in Explorer etc - even tho the disk is clearly in the tray and running. Only a complete shutdown makes resets the disk subsystem and the disk does appear normally in Explorer after a restart.

While I remain a tad skeptical on this - and have since stopped the clunky Windows Server Backup (I was only backing up the C: drive anyway) I cannot find anything in the slightest that makes me think this disk is suspect. I have backed up all data on it to be safe - but as of right now - it seems 100% solid.

Good news is that Stablebit stuff was rock solid and I am glad I had it on board to give me peace of mind.

Cheers!

Sonic.

 

 

Link to comment
Share on other sites

  • 0

Well I am happy all seems fine now.

But where did you read you should get this drive out of the Pool as a first move? I would, as I said, only consider that if the Pool was fully duplicated. I am doubtfull about Server Backup being the cause and I would turn server backup off when hell freezes over.

My guess is a bad cable or cable that is loossie. But who knows...

Link to comment
Share on other sites

  • 0
1 hour ago, Umfriend said:

Well I am happy all seems fine now.

But where did you read you should get this drive out of the Pool as a first move? I would, as I said, only consider that if the Pool was fully duplicated. I am doubtfull about Server Backup being the cause and I would turn server backup off when hell freezes over.

My guess is a bad cable or cable that is loossie. But who knows...

Getting the drive out of the pool was the only way to get DP to reorganize and to remove the "read only" state it placed on the pool as soon as my drive went "missing"

My assumption all day yesterday was that the drive was suspect and I was going to remove it anyway, copy the files somewhere safe and either replace the drive or leave the pool usuable but smaller.  If you read all about Drivepool here:

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=Removing a Drive from the Pool

It apparently does a bunch of nice things for the user - if the drive is healthy:

  • When a drive is removed from the pool, StableBit DrivePool will move all of the unprotected pooled files stored on it onto a different drive that's part of the pool.
  • StableBit DrivePool will also regenerate every protected file part that is on the disk being removed (unless Duplicate files later was selected).
  • Then, the virtual pool drive shrinks in capacity by the size of the drive that was removed.

But if a drive is unhealthy (or supposedly missing like mine was) - DP does a lock down on the pool. So the only way to get the pool usable (Read/write) again was to remove the drive.

Regarding Windows Server Backup - there is so much bad press on this - it's not worth my time. As soon as I turned this thing on a few weeks back - the event log started filling with bizarre errors. I do not trust anything that is that difficult to get working. A clue prior to yesterday's problem was the sudden appearance of Event ID 157 ("Disk 1 has been surprise removed.") one minute before Scanner notified about a missing disk.

If I cross-reference that message ID - with messages like this from various forums - starts to form an interesting picture on where the problem most likely lies...

"On a recently deployed 2012 R2 system it looks like these warnings were generated during (or immediately after) a "Windows Server Backup".

"Ever since my upgrade to Server 2012 R2 I too am logging this error.  It is always in the 04:30AM time frame daily since the upgrade"

"My server throws these messages every day while finishing its backup job (using Windows Server Backup)"

Another thread is here:

https://community.spiceworks.com/topic/463141-windows-server-backup-causing-eventid-157-disk-n-has-been-surprised-removed

Could it be a bad cable - remotely possible - but this server is racked and cannot be opened - so it's not loose.

I will continue to monitor and act if necessary.

Cheers!

Sonic.

Link to comment
Share on other sites

  • 0

It could be a bad cable..... I've burned out just about every type of PC cable, including SATA cables.  Also, it could have been loose to start with, and worked it's way looser over time. 

On 12/7/2017 at 9:40 AM, Sonicmojo said:

But if a drive is unhealthy (or supposedly missing like mine was) - DP does a lock down on the pool. So the only way to get the pool usable (Read/write) again was to remove the drive.

Yup, this is intentional. It's meant to prevent sync issues, if the drive returns (such as in the case of USB drives, for instance). 

You can remove the drive and things "clean up".  It should run a duplication pass, and start reduplicating data if needed. 
Or, if you hook the drive back up, it will re-add the old drive (since it wasn't marked as "removed"). 

On 12/7/2017 at 9:40 AM, Sonicmojo said:

A clue prior to yesterday's problem was the sudden appearance of Event ID 157 ("Disk 1 has been surprise removed.") one minute before Scanner notified about a missing disk.

Yeah, this happens any time that a disk is not "safely removed". 

On 12/7/2017 at 9:40 AM, Sonicmojo said:

"On a recently deployed 2012 R2 system it looks like these warnings were generated during (or immediately after) a "Windows Server Backup".

Uhhhhgh, yeah.  Windows Backup works (sort of) by mounting a hidden VHDx file, cloning the disk to this VHDx, and then unmounting it. Because it's not "safely removed", it generates this error. 

This is a simplification of the process, but should give you an idea of how it works. 

On 12/7/2017 at 9:40 AM, Sonicmojo said:

Could it be a bad cable - remotely possible - but this server is racked and cannot be opened - so it's not loose.

Shamelessly, this is why I love my server.  36 hot swappable bays, powered by two SAS connections. :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...