Jump to content
  • 0

Reindexing Google Internal Server Errors


srcrist

Question

So my service crashed last night. I opened a ticket and sent you the logs to take a look at, so we can set that aside. But while it was reindexing it got one of the Internal Server Error responses from Google Drive. Just one. Then it started reindexing the entire drive again starting at chunk 4,300,000 or so. Does it really have to do that? This wouldn't have been a big deal when this drive was small...but this process takes about 8 hours now every time it has to reindex the drive, and it happened at around the halfway mark. Four hours of lost time is frustrating. Does it HAVE to start over? Can it not just retry at the point where it got the error? Am I missing something?

 

Just wanted to see what the thought was on this. 

Link to comment
Share on other sites

Recommended Posts

  • 0

Waiting, generally. 

 

Unfortunate "Internal server errors" is exactly what it sounds like.    They're "HTTP 500" errors, which means that the issue is occurring entirely on the server side (google Drive, in this case). 

 

So, the only thing our software can do is "wait and retry", which really is the only thing you can do, as well. Unfortunately. 

Link to comment
Share on other sites

  • 0

 

I believe I did respond to this already.

 

 

Right. We talked in the support ticket.

 

 

 

As for re-indexing.... the only time this should happen is if the "chunk ID" database is lost.   If something happened to that file, then it would trigger this to occur. 

 

In my case it was triggered when the CloudDrive service crashed. But that's fine. I understand that it has to reindex. The very big problems comes when:

 

 

So, the only thing our software can do is "wait and retry", which really is the only thing you can do, as well. Unfortunately. 

 

 

 

The problem is that it does not retry during the indexing process. It starts over. That means you have to go through the entire indexing process without a single error or it simply restarts the process over again. I imagine that this does not *have* to be handled this way, but maybe I'm wrong. I think you suggested, in the support ticket, that this was something for Alex to maybe take a look at.

 

With very large drives, or drives that simply have a lot of chunks, this is very real and very frustrating issue, though. When your reindexing takes 8 or 9 hours, you can neither expect to go that entire time without even a minor server error--nor afford to start the process over when it's halfway done. It took 5 DAYS (no exaggeration at all) to get my largest drive remounted when this happened. Meanwhile, my other, smaller drives mounted just fine because they were able to completely reindex within the time in between server errors. Once the drives are mounted, these internal server errors do not cause downtime. CloudDrive simply tries the request again, Google complies (the errors are always temporary), and life moves on. 

 

But this problem during the reindexing process has to be fixed. Every hour someone has to go without any sort of server error whatsoever makes the process exponentially less likely to complete. Ever. It shouldn't take 5 days of crossing fingers just to get the drive to remount as it restarts over and over again. There needs to be more error tolerance than that.

 

 

 

@srcrist - this keeps happening to me today, did you managed to resolve it? 

 

 

As Christopher said, I didn't (and couldn't) do anything to resolve it. I did, however, eventually get the drive to remount. It just took a very long and very frustrating amount of time. I just had to wait until I didn't get any errors during the reindexing process.

 

 

 

In any case, this seems like a really critical problem to me right now, and I'm dreading the next time my drive gets uncleanly dismounted. Who knows how long it will take to get it back. There just isn't anything that can be done about the occasional bad server response from Google. So we've got to be able to work through those, rather than completely aborting the process and starting over again.

Link to comment
Share on other sites

  • 0

Just to add to this, from the other thread, I started manually stopping the service vice dismounting the drive prior to a pc reboot. Last night, the service shut down cleanly after a few minutes without any intervention. Unfortunately upon restart it began the chunk id rescan/rebuild and took several hours.

 

I've also recently received "Access Denied" when attempting to detach my drive on random occasions. I've been able to get around this by offlining the drive in windows disk manager, then detaching. Interestingly, one time when remounting after doing this, Windows (not cloud drive) mounted the drive as read only. I was able to use the diskpart utility to make it writable again.

Link to comment
Share on other sites

  • 0

Christopher, I don't know if any of the recent beta changes were a part of the more efficient code, but I am stuck (again) in a mounting loop with beta 894. Server went down over 24 hours ago (OVH had some sort of issue) and it's still mounting over and over again due to internal server errors. I'd really rather this not take a week every time it happens. Waiting this long to get access to my data again is honestly rendering CloudDrive an unusable storage option. 

Link to comment
Share on other sites

  • 0

My ticket on this issue suggests that they still think the software attempts to retry after failures when indexing. It doesn't. This doesn't bode well for actually getting a fix.

 

 

I think I agree with you that there seems to be some confusion about what problem we are actually addressing here on the part of the Stablebit team. I tried to clarify in the other thread with some log snippets. Hopefully that will help.

Link to comment
Share on other sites

  • 0

I think that I'm going to drop to < 64TB drives pooled with DrivePool to try to mitigate this issue, but I still need to be able to get this drive mounted to copy the data. 

 

Okay.  Alex is busy working on code, ATM, and trying to get StableBit DrivePool "ready to ship a new version".  So it may be a bit before he can look into this.  But the issue is flagged as important, so it does have higher priority, so he should get to it sooner rather than later. 

 

 

And please do let me know if the smaller size helps.

 

My ticket on this issue suggests that they still think the software attempts to retry after failures when indexing. It doesn't. This doesn't bode well for actually getting a fix.

 

I'm not suggesting, I'm stating:

"CloudFsDisk_MaximumConsecutiveIoFailures".

http://wiki.covecube.com/StableBit_CloudDrive_Advanced_Settings

However, the limit is for a 120 second window. So it may be erroring out too much for your system.

 

You can increase this value, but it can cause serious issues, by doing so.  up to and including causing the system to lock up, if too many errors are occurring. 

 

 

However, the issue has been flagged for review, and we will look into it, because there does seem to be an issue here, and you're not the only one seeing it. Though, it does seem to be pretty rare.

 

But we'd rather address the issue directly, rather than using stop gap measures (above), because it just covers up the issue... rather than fixing it. 

Link to comment
Share on other sites

  • 0

Okay.  Alex is busy working on code, ATM, and trying to get StableBit DrivePool "ready to ship a new version".  So it may be a bit before he can look into this.  But the issue is flagged as important, so it does have higher priority, so he should get to it sooner rather than later. 

 

 

And please do let me know if the smaller size helps.

 

 

 

Yeah, no worries. I have the drive mounted again right now. Google has just been exceptionally stable the last few days and I've been able to get it to remount with maybe one or two restarts (so about 24 hours).

 

As far as the lower drive sizes, now that I've realized chkdsk can't be used on anything larger, this is probably best for NTFS drives anyway--assuming I care about the data longer-term. Drivepool is in the process of migrating the data now, but it's going to take weeks. 

Link to comment
Share on other sites

  • 0

That's great, Christopher. Honestly, though, the efficiency changes have also done wonders to simply make sure that it doesn't have to enumerate every time there is an unclean shutdown, as well. So I actually haven't have to work around this issue since those changes several weeks ago. Good news, in nay case. 

Link to comment
Share on other sites

  • 0

And I know we've done some changes to help make sure that unsafe shutdowns don't happen, unless... they actually happen (eg, power loss, hard reset, etc).  

 

And I believe that Alex upped the number of files we enumerate at one time (at least on Google Drive), so it should use less API calls when indexing.  And that should help reduce the likelihood of issues, as well.

 

 

But I'm glad to hear that all the work has definitely created tangible results! And positive ones, too. 

Link to comment
Share on other sites

  • 0

Christopher,

 

I am still having issues with CloudDrive re-indexing each of my 6 CloudDrives every time I restart my server and this process still takes over a day to complete for all of the drives.  Nothing is happening to the Chunk Database as I can browse to Google Drive and view the file for each drive in the data-ChunkIdStorage folder.  Does the file have to exist somewhere on the computer running CloudDrive for it to not trigger another scan at startup?

 

I am currently running on version .929 but the same thing has happened on every version I have tried since the chunk database was implemented. 

Link to comment
Share on other sites

  • 0

When mounting the drive, it should download the database from the provider. 

This is then stored in "C:\ProgramData\StableBit CloudDrive\Service\Db\ChunkIds", and used there.

 

And if it regenerates the database, it should store it there, as well. 

 

 

Worst case here, stop the service, delete the contents of the folder and restart the service.  This ... will cause it to reindex the drives again, but hopefully, this may be the last time. 

 

If not, then run the Troubleshooter:

http://wiki.covecube.com/StableBit_Troubleshooter

 

It may also be a good idea to wait til it's done indexing, enable file system logging, reboot the system, and run the Troubleshooter.

And then.... wait till it's done indexing, enable boot logging, reboot and run the Troubleshooter

Link to comment
Share on other sites

  • 0

@srcrist, any updates on your experience since pooling 64TB drives instead of using one large one? I'm currently using a massive ReFS volume and I'm considering switching over to what you did. Worth the time?

 

Not worth the time. In fact, I was unable to complete that migration. The new upload limitations Google introduced make migrations of mass data impractical. I simply gave up. 

Link to comment
Share on other sites

  • 0

Not worth the time. In fact, I was unable to complete that migration. The new upload limitations Google introduced make migrations of mass data impractical. I simply gave up. 

Other than this reindexing issue (assuming it'll get fixed eventually) and gdrive's upload caps (got plenty of time on my hands), any serious downsides to what I'm doing then? Basically, if there's no good reason to switch TO that method, are there any good reasons to get OFF OF this method?

Link to comment
Share on other sites

  • 0

Since upgrading to beta 929, I am also seeing this issue again. It enumerates every time the system reboots. This issue had been fixed in the last beta that I was using (902). It looks like something might have reintroduced this problem. 

 

 

If you haven't already, could you open a ticket about the issue at https://stablebit.com/Contact ?

 

And if you have, PM me the ticket number, so I can bump it.

 

 

Not worth the time. In fact, I was unable to complete that migration. The new upload limitations Google introduced make migrations of mass data impractical. I simply gave up. 

 

Yeah, the new limits have been frustrating, to say the least. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...