Jump to content

Alex

Administrators
  • Posts

    252
  • Joined

  • Last visited

  • Days Won

    48

Reputation Activity

  1. Thanks
    Alex got a reaction from TPham in Cloud Providers   
    As I was writing the various providers for StableBit CloudDrive I got a sense for how well each one performs / scales and the various quirks of some cloud services. I'm going to use this thread to describe my observations.
     
    As Stablebit CloudDrive grows, I'm sure that my understanding of the various providers will improve as well. Also remember, this is from a developer's point of view.
     
    Google Cloud Storage
    http://cloud.google.com/storage
     
    Pros:
    Reasonably fast. Simple API. Reliable. Scales very well. Cons:
    Not the fastest provider. Especially slow when deleting existing chunks (e.g. when destroying a drive). Difficult to use and bloated SDK (a development issue really).  
    This was the first cloud service that I wrote a StableBit CloudDrive provider for and initially I started writing the code against their SDK which I later realized was a mistake. I replaced the SDK entirely with my own API, so that improved the reliability of this provider and solved a lot of the issues that the SDK was causing.
     
    Another noteworthy thing about this provider is that it's not as fast as some of the other providers (Amazon S3 / Microsoft Azure Storage).
     
    Amazon S3
    http://aws.amazon.com/s3/
     
    Pros:
    Very fast. Reliable. Scales very well. Beautiful, compact and functional SDK. Cons:
    Configuration is a bit confusing. Here the SDK is uniquely good, it's a single DLL and super simple to use. Most importantly it's reliable. It handles multi-threading correctly and its error handling logic is straightforward. It is one of the few SDKs that StableBit CloudDrive uses out of the box. All of the other providers (except Microsoft Azure Storage) utilize custom written SDKs.
     
    This is a great place to store your mission critical data, I backup all of my code to this provider.
     
    Microsoft Azure Storage
    http://azure.microsoft.com/en-us/services/storage/
     
    Pros:
    Very fast. Reliable. Scales very well. Easy to configure. Cons:
    No reasonably priced support option that makes sense. This is a great cloud service. It's definitely on par with Amazon S3 in terms of speed and seems to be very reliable from my testing.
     
    Having used Microsoft Azure services for almost all of our web sites and the database back-end, I can tell you that there is one major issue with Microsoft Azure. There is no one that you can contact when something goes wrong (and things seem to go wrong quite often), without paying a huge sum of money.
     
    For example, take a look at their support prices: http://azure.microsoft.com/en-us/support/plans/
     
    If you want someone from Microsoft to take a look at an issue that you're having within 2 hours that will cost you $300 / month. Other than that, it's a great service to use.
     
    OneDrive for Business
    https://onedrive.live.com/about/en-US/business/
     
    Pros:
    Reasonable throttling limits in place. Cons:
    Slow. API is lacking leading to reliability issues. Does not scale well, so you are limited in the amount of data that you can store before everything grinds to a halt. Especially slow when deleting existing chunks (e.g. when destroying a drive). This service is actually a rebranded version of Microsoft SharePoint hosted in the cloud for you. It has absolutely nothing to do with the "regular" OneDrive other than the naming similarity.
     
    This service does not scale well at all, and this is really a huge issue. The more data that you upload to this service, the slower it gets. After uploading about 200 GB, it really starts to slow down. It seems to be sensitive to the number of files that you have, and for that reason StableBit CloudDrive sets the chunk size to 1MB by default, in order to minimize the number of files that it creates.
     
    By default, Microsoft SharePoint expects each folder to contain no more than 5000 files, or else certain features simply stop working (including deleting said files). This is by design and here's a page that explain in detail why this limit is there and how to work around it: https://support.office.com/en-us/article/Manage-lists-and-libraries-with-many-items-11ecc804-2284-4978-8273-4842471fafb7
     
    If you're going to use this provider to store large amounts of data, then I recommend following the instructions on the page linked above. Although, for me, it didn't really help much at all.
     
    I've worked hard to try and resolve this by utilizing a nested directory structure in order to limit the number of files in each directory, but nothing seems to make any difference. If there are any SharePoint experts out there that can figure out what we can do to speed this provider up, please let me know.
     
    OneDrive
    https://onedrive.live.com/
    Experimental
     
    Pros:
    Clean API. Cons:
    Heavily and unreasonably throttled. From afar, OneDrive looks like the perfect storage provider. It's very fast, reliable, easy to use and has an inexpensive unlimited storage option. But after you upload / download some data you start hitting the throttling limits. The throttling limits are excessive and unreasonable, so much so, that using this provider with StableBit CloudDrive is dangerous. For this reason, the OneDrive provider is currently disabled in StableBit CloudDrive by default.
     
    What makes the throttling limits unreasonable is the amount of time that OneDrive expects you to wait before making another request. In my experience that can be as high as 20 minutes to 1 hour. Can you imagine when trying to open a document in Microsoft Windows hitting an error that reads "I see that you've opened too many documents today, please come back in 1 hour". Not only is this unreasonable, it's also technically infeasible to implement this kind of a delay on a real-time disk.
     
    Box
    https://www.box.com/
     
    At this point I haven't used this provider for an extended period of time to render an opinion on how it behaves with large amounts of data.
     
    One thing that I can say is that the API is a bit quirky in how it's designed necessitating some extra HTTP traffic that other providers don't require.
     
    Dropbox
    http://www.dropbox.com/
     
    Again, I haven't used this provider much so I can't speak to how well it scales or how well it performs.
     
    The API here is very robust and very easy to use. One very cool feature that they have is an "App Folder". When you authorize StableBit CloudDrive to use your Dropbox account, Dropbox creates an isolated container for StableBit CloudDrive and all of the data is stored there. This is nice because you don't see the StableBit CloudDrive data in your regular Dropbox folder, and Stablebit CloudDrive has no way to access any other data that's in your Dropbox or any data in some other app folder.
     
    Amazon Cloud Drive
    https://www.amazon.com/clouddrive/home
     
    Pros:
    Fast. Scales well. Unlimited storage option. Reasonable throttling limits. Cons:
    Data integrity issues. I know how important it is for StableBit CloudDrive to support this service and so I've spent many hours and days trying to make a reliable provider that works. This single provider delayed the initial public BETA of StableBit CloudDrive by at least 2 weeks.
     
    The initial issue that I had with Amazon Cloud Drive is that it returns various errors as a response to I/O requests. These errors range from 500 Internal Server Error to 400 Bad Request. Reissuing the same request seems to work, so there doesn't appear to be a problem with the actual request, but rather with the server.
     
    I later discovered a more serious issue with this service, apparently after uploading a file, sometimes (very rarely) that file cannot be downloaded. Which means that the file's data gets permanently lost (as far as I can tell). This is very rare and hard to reproduce. My test case scenario needs to run for one whole day before it can reproduce the problem. I finally solved this issue by forcing Upload Verification to be enabled in StableBit CloudDrive. When this issue occurs, upload verification will detect this scenario, delete the corrupt file and retry the upload. That apparently fixed this particular issue.
     
    The next thing that I discovered with this service (after I released the public BETA) is that some 400 Bad Request errors spawn at a later time, long after the initial upload / verification step is complete. After extensively debugging, I was able to confirm this with the Amazon Cloud Drive web interface as well, so this is not a provider code issue, rather the problem actually occurs on the server. If a file gets into this state, a 400 Bad Request error is issued, and if you examine that request, the error message in the response says 404 Not Found. Apparently, the file metadata is there, but the file's contents is gone.
     
    The short story is that this service has data integrity issues that are not limited to StableBit CloudDrive in particular, and I'm trying to identify exactly what they are, how they are triggered and apply possible workarounds.
     
    I've already applied another possible workaround in the latest internal BETA (1.0.0.284), but I'm still testing whether the fix is effective. I am considering disabling this provider in future builds, and moving it into the experimental category.
     
    Local Disk / File Share
     
    These providers don't use the cloud, so there's really nothing to say here.
  2. Haha
    Alex got a reaction from johnnymc in My Rackmount Server   
    @weezywee
     
    Very cool.
     
    That APC box looks familiar (I think).
     
    I have a APC UPS that I've hacked to use 4x deep cycle marine batteries to provide 3 to 4 hours backup for my entire office (instead of the built in Li-Ion which would only last 15 min.).
  3. Like
    Alex got a reaction from Shane in check-pool-fileparts   
    If you're not familiar with dpcmd.exe, it's the command line interface to StableBit DrivePool's low level file system and was originally designed for troubleshooting the pool. It's a standalone EXE that's included with every installation of StableBit DrivePool 2.X and is available from the command line.
     
    If you have StableBit DrivePool 2.X installed, go ahead and open up the Command Prompt with administrative access (hold Ctrl + Shift from the Start menu), and type in dpcmd to get some usage information.
     
    Previously, I didn't recommend that people mess with this command because it wasn't really meant for public consumption. But the latest internal build of StableBit DrivePool, 2.2.0.659, includes a completely rewritten dpcmd.exe which now has some more useful functions for more advanced users of StableBit DrivePool, and I'd like to talk about some of these here.
     
    Let's start with the new check-pool-fileparts command.
     
    This command can be used to:
    Check the duplication consistency of every file on the pool and show you any inconsistencies. Report any inconsistencies found to StableBit DrivePool for corrective actions. Generate detailed audit logs, including the exact locations where each file part is stored of each file on the pool. Now let's see how this all works. The new dpcmd.exe includes detailed usage notes and examples for some of the more complicated commands like this one.
     
    To get help on this command type: dpcmd check-pool-fileparts
     
    Here's what you will get:
     
    dpcmd - StableBit DrivePool command line interface Version 2.2.0.659 The command 'check-pool-fileparts' requires at least 1 parameters. Usage: dpcmd check-pool-fileparts [parameter1 [parameter2 ...]] Command: check-pool-fileparts - Checks the file parts stored on the pool for consistency. Parameters: poolPath - A path to a directory or a file on the pool. detailLevel - Detail level to output (0 to 4). (optional) isRecursive - Is this a recursive listing? (TRUE / false) (optional) Detail levels: 0 - Summary 1 - Also show directory duplication status 2 - Also show inconsistent file duplication details, if any (default) 3 - Also show all file duplication details 4 - Also show all file part details Examples: - Perform a duplication check over the entire pool, show any inconsistencies, and inform StableBit DrivePool >dpcmd check-pool-fileparts P:\ - Perform a full duplication check and output all file details to a log file >dpcmd check-pool-fileparts P:\ 3 > Check-Pool-FileParts.log - Perform a full duplication check and just show a summary >dpcmd check-pool-fileparts P:\ 0 - Perform a check on a specific directory and its sub-directories >dpcmd check-pool-fileparts P:\MyFolder - Perform a check on a specific directory and NOT its sub-directories >dpcmd check-pool-fileparts "P:\MyFolder\Specific Folder To Check" 2 false - Perform a check on one specific file >dpcmd check-pool-fileparts "P:\MyFolder\File To Check.exe" The above help text includes some concrete examples on how to use this commands for various scenarios. To perform a basic check of an entire pool and get a summary back, you would simply type:
    dpcmd check-pool-fileparts P:\
     
    This will scan your entire pool and make sure that the correct number of file parts exist for each file. At the end of the scan you will get a summary:
    Scanning...   ! Error: Can't get duplication information for '\\?\p:\System Volume Information\storageconfiguration.xml'. Access is denied   Summary:   Directories: 3,758   Files: 47,507 3.71 TB (4,077,933,565,417   File parts: 48,240 3.83 TB (4,214,331,221,046     * Inconsistent directories: 0   * Inconsistent files: 0   * Missing file parts: 0 0 B (0     ! Error reading directories: 0   ! Error reading files: 1 Any inconsistent files will be reported here, and any scan errors will be as well. For example, in this case I can't scan the System Volume Information folder because as an Administrator, I don't have the proper access to do that (LOCAL SYSTEM does).
     
    Another great use for this command is actually something that has been requested often, and that is the ability to generate audit logs. People want to be absolutely sure that each file on their pool is properly duplicated, and they want to know exactly where it's stored. This is where the maximum detail level of this command comes in handy:
    dpcmd check-pool-fileparts P:\ 4
     
    This will show you how many copies are stored of each file on your pool, and where they're stored.
     
    The output looks something like this:
    Detail level: File Parts   Listing types:     + Directory   - File   -> File part   * Inconsistent duplication   ! Error   Listing format:     [{0}/{1} IM] {2}     {0} - The number of file parts that were found for this file / directory.     {1} - The expected duplication count for this file / directory.     I   - This directory is inheriting its duplication count from its parent.     M   - At least one sub-directory may have a different duplication count.     {2} - The name and size of this file / directory.   ... + [3x/2x] p:\Media -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media [Device 0] -> \Device\HarddiskVolume3\PoolPart.6a76681a-3600-4af1-b877-a31815b868c8\Media [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media [Device 2] - [2x/2x] p:\Media\commandN Episode 123.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 123.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 123.mov [Device 2] - [2x/2x] p:\Media\commandN Episode 124.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 124.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 124.mov [Device 2] ... The listing format and listing types are explained at the top, and then for each folder and file on the pool, a record like the above one is generated.
     
    Of course like any command output, it could always be piped into a log file like so:
    dpcmd check-pool-fileparts P:\ 4 > check-pool-fileparts.log
     
    I'm sure with a bit of scripting, people will be able to generate daily audit logs of their pool
     
    Now this is essentially the first version of this command, so if you have an idea on how to improve it, please let us know.
     
    Also, check out set-duplication-recursive. It lets you set the duplication count on multiple folders at once using a file pattern rule (or a regular expression). It's pretty cool.
     
    That's all for now.
  4. Like
    Alex got a reaction from KingfisherUK in Introducing the StableBit Cloud   
    The StableBit Cloud is a brand new online service developed by Covecube Inc. that enhances our existing StableBit products with cloud-enabled features and centralized management capabilities. The StableBit Cloud also serves as a foundational technology platform for the future development of new Covecube products and services.
    To get started, visit: https://stablebit.cloud
    Read more about it in our blog post: https://blog.covecube.com/2021/01/stablebit-cloud/
  5. Like
    Alex got a reaction from JesterEE in Surface scan and SSD   
    Hi Saiyan, I'm the developer.
     
    The Scanner never writes to SSDs while performing a surface scan and therefore does not in any way impact the lifespan of the SSD.
     
    However, SSDs do benefit from full disk surface scans, just like spinning hard drive, in that the surface scan will bring the drive's attention to any latent sectors that may become unreadable in the future. The Scanner's disk surface scan will force your SSD to remap the damaged sectors before the data becomes unreadable.
     
    In short, there is no negative side effect to running the Scanner on SSDs, but there is a positive one.
     
    Please let me know if you need more information.
  6. Thanks
    Alex got a reaction from Spider99 in M.2 Drives - No Smart data - NVME and Sata   
    As per your issue, I've obtained a similar WD M.2 drive and did some testing with it. Starting with build 3193 StableBit Scanner should be able to get SMART data from your M.2 WD SATA drive. I've also added SMART interpretation rules to BitFlock for these drives as well.
    You can get the latest development BETAs here: http://dl.covecube.com/ScannerWindows/beta/download/
    As for Windows Server 2012 R2 and NVMe, currently, NVMe support in the StableBit Scanner requires Windows 10 or Windows Server 2016.
  7. Like
    Alex got a reaction from Jaga in M.2 Drives - No Smart data - NVME and Sata   
    As per your issue, I've obtained a similar WD M.2 drive and did some testing with it. Starting with build 3193 StableBit Scanner should be able to get SMART data from your M.2 WD SATA drive. I've also added SMART interpretation rules to BitFlock for these drives as well.
    You can get the latest development BETAs here: http://dl.covecube.com/ScannerWindows/beta/download/
    As for Windows Server 2012 R2 and NVMe, currently, NVMe support in the StableBit Scanner requires Windows 10 or Windows Server 2016.
  8. Thanks
    Alex got a reaction from Penryn in Ridiculous questions on CloudDrive   
    1c. If a cloud drive fails the checksum / HMAC verification, the error is returned to the OS in the kernel (and eventually the app caller). It's treated exactly the same as a physical drive failing a checksum error. StableBit DrivePool won't automatically pull the data from a duplicate, but you have a few options here.
    First, you will see error notifications in the StableBit CloudDrive UI, and at this point you can either:
    Ignore the the verification failure and pull in the corrupted data to continue using your drive. In StableBit CloudDrive, see Options -> Troubleshooting -> Chunks -> Ignore chunk verification. This may be a good idea if you don't care about that particular file. You can simply delete it and continue normally. Remove the affected drive from the pool and StableBit DrivePool will automatically reduplicated the data onto another known good drive. And thank you for your support, we really appreciate it.
    As far as Issue #27859, I've got a potential fix that looks good. It's currently being qualified for the Microsoft Server 2016 certification, Normally this takes about 12 hours, so a download should be ready by Monday (could even be as soon as tomorrow if all goes well).
  9. Like
    Alex got a reaction from Jaga in Forum Downtime   
    My apologies. We've been having issues with the forum's database for the past couple of days, and as a result the forum has been up and down.
    I believe that the problem is resolved and there should be no further downtime.
  10. Like
    Alex reacted to msq in Yay - B2 provider added :)   
    In the very latest beta 1.1.0.991 the B2 provided has been added
    Downloaded, installed, set up and using already. Thank you guys!
  11. Thanks
    Alex got a reaction from Jaga in check-pool-fileparts   
    If you're not familiar with dpcmd.exe, it's the command line interface to StableBit DrivePool's low level file system and was originally designed for troubleshooting the pool. It's a standalone EXE that's included with every installation of StableBit DrivePool 2.X and is available from the command line.
     
    If you have StableBit DrivePool 2.X installed, go ahead and open up the Command Prompt with administrative access (hold Ctrl + Shift from the Start menu), and type in dpcmd to get some usage information.
     
    Previously, I didn't recommend that people mess with this command because it wasn't really meant for public consumption. But the latest internal build of StableBit DrivePool, 2.2.0.659, includes a completely rewritten dpcmd.exe which now has some more useful functions for more advanced users of StableBit DrivePool, and I'd like to talk about some of these here.
     
    Let's start with the new check-pool-fileparts command.
     
    This command can be used to:
    Check the duplication consistency of every file on the pool and show you any inconsistencies. Report any inconsistencies found to StableBit DrivePool for corrective actions. Generate detailed audit logs, including the exact locations where each file part is stored of each file on the pool. Now let's see how this all works. The new dpcmd.exe includes detailed usage notes and examples for some of the more complicated commands like this one.
     
    To get help on this command type: dpcmd check-pool-fileparts
     
    Here's what you will get:
     
    dpcmd - StableBit DrivePool command line interface Version 2.2.0.659 The command 'check-pool-fileparts' requires at least 1 parameters. Usage: dpcmd check-pool-fileparts [parameter1 [parameter2 ...]] Command: check-pool-fileparts - Checks the file parts stored on the pool for consistency. Parameters: poolPath - A path to a directory or a file on the pool. detailLevel - Detail level to output (0 to 4). (optional) isRecursive - Is this a recursive listing? (TRUE / false) (optional) Detail levels: 0 - Summary 1 - Also show directory duplication status 2 - Also show inconsistent file duplication details, if any (default) 3 - Also show all file duplication details 4 - Also show all file part details Examples: - Perform a duplication check over the entire pool, show any inconsistencies, and inform StableBit DrivePool >dpcmd check-pool-fileparts P:\ - Perform a full duplication check and output all file details to a log file >dpcmd check-pool-fileparts P:\ 3 > Check-Pool-FileParts.log - Perform a full duplication check and just show a summary >dpcmd check-pool-fileparts P:\ 0 - Perform a check on a specific directory and its sub-directories >dpcmd check-pool-fileparts P:\MyFolder - Perform a check on a specific directory and NOT its sub-directories >dpcmd check-pool-fileparts "P:\MyFolder\Specific Folder To Check" 2 false - Perform a check on one specific file >dpcmd check-pool-fileparts "P:\MyFolder\File To Check.exe" The above help text includes some concrete examples on how to use this commands for various scenarios. To perform a basic check of an entire pool and get a summary back, you would simply type:
    dpcmd check-pool-fileparts P:\
     
    This will scan your entire pool and make sure that the correct number of file parts exist for each file. At the end of the scan you will get a summary:
    Scanning...   ! Error: Can't get duplication information for '\\?\p:\System Volume Information\storageconfiguration.xml'. Access is denied   Summary:   Directories: 3,758   Files: 47,507 3.71 TB (4,077,933,565,417   File parts: 48,240 3.83 TB (4,214,331,221,046     * Inconsistent directories: 0   * Inconsistent files: 0   * Missing file parts: 0 0 B (0     ! Error reading directories: 0   ! Error reading files: 1 Any inconsistent files will be reported here, and any scan errors will be as well. For example, in this case I can't scan the System Volume Information folder because as an Administrator, I don't have the proper access to do that (LOCAL SYSTEM does).
     
    Another great use for this command is actually something that has been requested often, and that is the ability to generate audit logs. People want to be absolutely sure that each file on their pool is properly duplicated, and they want to know exactly where it's stored. This is where the maximum detail level of this command comes in handy:
    dpcmd check-pool-fileparts P:\ 4
     
    This will show you how many copies are stored of each file on your pool, and where they're stored.
     
    The output looks something like this:
    Detail level: File Parts   Listing types:     + Directory   - File   -> File part   * Inconsistent duplication   ! Error   Listing format:     [{0}/{1} IM] {2}     {0} - The number of file parts that were found for this file / directory.     {1} - The expected duplication count for this file / directory.     I   - This directory is inheriting its duplication count from its parent.     M   - At least one sub-directory may have a different duplication count.     {2} - The name and size of this file / directory.   ... + [3x/2x] p:\Media -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media [Device 0] -> \Device\HarddiskVolume3\PoolPart.6a76681a-3600-4af1-b877-a31815b868c8\Media [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media [Device 2] - [2x/2x] p:\Media\commandN Episode 123.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 123.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 123.mov [Device 2] - [2x/2x] p:\Media\commandN Episode 124.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 124.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 124.mov [Device 2] ... The listing format and listing types are explained at the top, and then for each folder and file on the pool, a record like the above one is generated.
     
    Of course like any command output, it could always be piped into a log file like so:
    dpcmd check-pool-fileparts P:\ 4 > check-pool-fileparts.log
     
    I'm sure with a bit of scripting, people will be able to generate daily audit logs of their pool
     
    Now this is essentially the first version of this command, so if you have an idea on how to improve it, please let us know.
     
    Also, check out set-duplication-recursive. It lets you set the duplication count on multiple folders at once using a file pattern rule (or a regular expression). It's pretty cool.
     
    That's all for now.
  12. Like
    Alex got a reaction from Antoineki in Amazon Cloud Drive - Why is it not supported?   
    Some people have mentioned that we're not transparent enough with regards to what's going on with Amazon Cloud Drive support. That was definitely not the intention, and if that's how you guys feel, then I'm sorry about that and I want to change that. We really have nothing to gain by keeping any of this secret.
     
    Here, you will find a timeline of exactly what's happening with our ongoing communications with the Amazon Cloud Drive team, and I will keep this thread up to date as things develop.
     
    Timeline:
    On May 28th the first public BETA of StableBit CloudDrive was released without Amazon Cloud Drive production access enabled. At the time, I thought that the semi-automated whitelisting process that we went through was "Production Access". While this is similar to other providers, like Dropbox, it became apparent that for Amazon Cloud Drive, it's not. Upon closer reading of their documentation, it appears that the whitelisting process actually imposed "Developer Access" on us. To get upgraded to "Production Access" Amazon Cloud Drive requires direct email communications with the Amazon Cloud Drive team. We submitted our application for approval for production access originally on Aug. 15 over email: On Aug 24th I wrote in requesting a status update, because no one had replied to me, so I had no idea whether the email was read or not. On Aug. 27th I finally got an email back letting me know that our application was approved for production use. I was pleasantly surprised. On Sept 1st, after some testing, I wrote another email to Amazon letting them know that we are having issues with Amazon not respecting the Content-Type of uploads, and that we are also having issues with the new permission scopes that have been changes since we initially submitted our application for approval. No one answered this particular email until... On Sept 7th I received a panicked email from Amazon addressed to me (with CCs to other Amazon addresses) letting me know that Amazon is seeing unusual call patterns from one of our users.  On Sept 11th I replied explaining that we do not keep track of what our customers are doing, and that our software can scale very well, as long as the server permits it and the network bandwidth is sufficient. Our software does respect 429 throttling responses from the server and it does perform exponential backoff, as is standard practice in such cases. Nevertheless, I offered to limit the number of threads that we use, or to apply any other limits that Amazon deems necessary on the client side. I highlighted on this page the key question that really needs answered. Also, please note that obviously Amazon knows who this user is, since they have to be their customer in order to be logged into Amazon Cloud Drive. Also note that instead of banning or throttling that particular customer, Amazon has chosen to block the entire user base of our application.
    On Sept. 16th I received a response from Amazon. On Sept. 21st I haven't heard anything back yet, so I sent them this. I waited until Oct. 29th and no one answered. At that point I informed them that we're going ahead with the next public BETA regardless. Some time in the beginning of November an Amazon employee, on the Amazon forums, started claiming that we're not answering our emails. This was amidst their users asking why Amazon Cloud Drive is not supported with StableBit CloudDrive. So I did post a reply to that, listing a similar timeline. One Nov. 11th I sent another email to them. On Nov 13th Amazon finally replied. I redacted the limits, I don't know if they want those public.
    On Nov. 19th I sent them another email asking to clarify what the limits mean exactly. On Nov 25th Amazon replied with some questions and concerns regarding the number of API calls per second: On Dec. 2nd I replied with the answers and a new build that implemented large chunk support. This was a fairly complicated change to our code designed to minimize the number of calls per second for large file uploads.

    You can read my Nuts & Bolts post on large chunk support here: http://community.covecube.com/index.php?/topic/1622-large-chunks-and-the-io-manager/ On Dec. 10th 2015, Amazon replied: I have no idea what they mean regarding the encrypted data size. AES-CBC is a 1:1 encryption scheme. Some number of bytes go in and the same exact number of bytes come out, encrypted. We do have some minor overhead for the checksum / authentication signature at the end of every 1 MB unit of data, but that's at most 64 bytes per 1 MB when using HMAC-SHA512 (which is 0.006 % overhead). You can easily verify this by creating an encrypted cloud drive of some size, filling it up to capacity, and then checking how much data is used on the server.

    Here's the data for a 5 GB encrypted drive:



    Yes, it's 5 GBs.  
     
      To clarify the error issue here (I'm not 100% certain about this), Amazon doesn't provide a good way to ID these files. We have to try to upload them again, and then grab the error ID to get the actual file ID so we can update the file.  This is inefficient and would be solved with a more robust API that included a search functionality, or a better file list call. So, this is basically "by design" and required currently.  -Christopher     Unfortunately, we haven't pursued this with Amazon recently. This is due to a number of big bugs that we have been following up on.  However, these bugs have lead to a lot of performance, stability and reliability fixes. And a lot of users have reported that these fixes have significantly improved the Amazon Cloud Drive provider.  That is something that is great to hear, as it may help to get the provider to a stable/reliable state.    That said, once we get to a more stable state (after the next public beta build (after 1.0.0.463) or a stable/RC release), we do plan on pursuing this again.     But in the meanwhile, we have held off on this as we want to focus on the entire product rather than a single, problematic provider.  -Christopher     Amazon has "snuck in" additional guidelines that don't bode well for us.  https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide Don’t build apps that encrypt customer data  
    What does this mean for us? We have no idea right now.  Hopefully, this is a guideline and not a hard rule (other apps allow encryption, so that's hopeful, at least). 
     
    But if we don't get re-approved, we'll deal with that when the time comes (though, we will push hard to get approval).
     
    - Christopher (Jan 15. 2017)
     
    If you haven't seen already, we've released a "gold" version of StableBit CloudDrive. Meaning that we have an official release! 
    Unfortunately, because of increasing issues with Amazon Cloud Drive, that appear to be ENTIRELY server side (drive issues due to "429" errors, odd outages, etc), and that we are STILL not approved for production status (despite sending off additional emails a month ago, requesting approval or at least an update), we have dropped support Amazon Cloud Drive. 

    This does not impact existing users, as you will still be able to mount and use your existing drives. However, we have blocked the ability to create new drives for Amazon Cloud Drive.   
     
    This was not a decision that we made lightly, and while we don't regret this decision, we are saddened by it. We would have loved to come to some sort of outcome that included keeping full support for Amazon Cloud Drive. 
    -Christopher (May 17, 2017)
  13. Like
    Alex got a reaction from KiaraEvirm in How to Contribute Test Data   
    I've started putting up disk and controller test data in this forum, as it relates to the StableBit Scanner's ability to gather SMART and Identify data using various disk controllers.
     
    Direct I/O Test
     
    "Direct I/O" is a set of technologies that the StableBit Scanner uses to read data directly from the disk. I collect my test results using an internal Direct I/O testing tool.
     

     
    You can get the latest version here: Download
     
    The tool will probe your disk and controller for various forms of data that the StableBit Scanner uses and will display either a green check mark or a red X to indicate whether the probe was successful. At the bottom, it will list the probing "methods" that were successfully used to probe the controller / disk.
     
    If you're interested in contributing your test data to this forum, then just run the tool and select a disk that is connected to the controller that you want to probe.
     
    Make sure that your computer is not doing anything else while probing. There is a small chance that the probing process will crash your system.
  14. Like
    Alex got a reaction from Minaleque in Large Chunks and the I/O Manager   
    In this post I'm going to talk about the new large storage chunk support in StableBit CloudDrive 1.0.0.421 BETA, why that's important, and how StableBit CloudDrive manages provider I/O overall.
     
    Want to Download it?
     
    Currently, StableBit CloudDrive 1.0.0.421 BETA is an internal development build and like any of our internal builds, if you'd like, you can download it (or most likely a newer build) here:
    http://wiki.covecube.com/Downloads
     
    The I/O Manager
     
    Before I start talking about chunks and what the current change actually means, let's talk a bit about how StableBit CloudDrive handles provider I/O. Well, first let's define what provider I/O actually is. Provider I/O is the combination of all of the read and write request (or download and upload requests) that are serviced by your provider of choice. For example, if your cloud drive is storing data in Amazon S3, provider I/O consists of all of the download and upload requests from and to Amazon S3.
     
    Now it's important to differentiate provider I/O from cloud drive I/O because provider I/O is not really the same thing as cloud drive I/O. That's because all I/O to and from the drive itself is performed directly in the kernel by our cloud drive driver (cloudfs_disk.sys). But as a result of some cloud drive I/O, provider I/O can be generated. For example, this happens when there is an incoming read request to the drive for some data that is not stored in the local cache. In this case, the kernel driver cooperatively coordinates with the StableBit CloudDrive system service in order generate provider I/O and to complete the incoming read request in a timely manner.
     
    All provider I/O is serviced by the I/O Manager, which lives in the StableBit CloudDrive system service.
     
    Particularly, the I/O Manager is responsible for:
    As an optimization, coalescing incoming provider read and write I/O requests into larger requests. Parallelizing all provider read and write I/O requests using multiple threads. Retrying failed provider I/O operations. Error handling and error reporting logic. Chunks
     
    Now that I've described a little bit about the I/O manager in StableBit CloudDrive, let's talk chunks. StableBit CloudDrive doesn't inherently work with any types of chunks. They are simply the format in which data is stored by your provider of choice. They are an implementation that exists completely outside of the I/O manager, and provide some convenient functions that all chunked providers can use.
     
    How do Chunks Work?
     
    When a chunked cloud provider is first initialized, it is asked about its capabilities, such as whether it can perform partial reads, whether the remote server is performing proper multi-threaded transactional synchronization, etc... In other words, the chunking system needs to know how advanced the provider is, and based on those capabilities it will construct a custom chunk I/O processing pipeline for that particular provider.
     
    The chunk I/O pipeline provides automatic services for the provider such as:
    Whole and partial caching of chunks for performance reasons. Performing checksum insertion on write, and checksum verification on read. Read or write (or both) transactional locking for cloud providers that require it (for example, never try to read chunk 458 when chunk 458 is being written to). Translation of I/O that would end up being a partial chunk read / write request into a whole chunk read / write request for providers that require this. This is actually very complicated. If a partial chunk needs to be read, and the provider doesn't support partial reads, the whole chunk is read (and possibly cached) and only the part needed is returned. If a partial chunk needs to be written, and the provider doesn't support partial writes, then the whole chunk is downloaded (or retrieved from the cache), only the part that needs to be written to is updated, and the whole chunk is written back.If while this is happening another partial write request comes in for the same chunk (in parallel, on a different thread), and we're still in the process of reading that whole chunk, then coalesce the [whole read -> partial write -> whole write] into [whole read -> multiple partial writes -> whole write]. This is purely done as an optimization and is also very complicated. And in the future the chunk I/O processing pipeline can be extended to support other services as the need arises. Large Chunk Support
     
    Speaking of extending the chunk I/O pipeline, that's exactly what happened recently with the addition of large chunk support (> 1 MB) for most cloud providers.
     
    Previously, most cloud providers were limited to a maximum chunk size of 1 MB. This limit was in place because:
    Cloud drive read I/O requests, which can't be satisfied by the local cache, would generate provider read I/O that needed to be satisfied fairly quickly. For providers that didn't support partial reads, this meant that the entire chunk needed to be downloaded at all times, no matter how much data was being read. Additionally, if checksumming was enabled (which would be typical), then by necessity, only whole chunks could be read and written. This had some disadvantages, mostly for users with fast broadband connections:
    Writing a lot of data to the provider would generate a lot of upload requests very quickly (one request per 1 MB uploaded). This wasn't optimal because each request would add some overhead. Generating a lot of upload requests very quickly was also an issue for some cloud providers that were limiting their users based on the number of requests per second, rather than the total bandwidth used. Using smaller chunks with a fast broadband connection and a lot of threads would generate a lot of requests per second. Now, with large chunk support (up to 100 MB per chunk in most cases), we don't have those disadvantages.
     
    What was changed to allow for large chunks?
    In order to support the new large chunks a provider has to support partial reads. That's because it's still very necessary to ensure that all cloud drive read I/O is serviced quickly. Support for a new block based checksumming algorithm was introduced into the chunk I/O pipeline. With this new algorithm it's no longer necessary to read or write whole chunks in order to get checksumming support. This was crucial because it is very important to verify that your data in the cloud is not corrupted, and turning off checksumming wasn't a very good option. Are there any disadvantages to large chunks?
    If you're concerned about using the least possible amount of bandwidth (as opposed to using less provider calls per second), it may be advantageous to use smaller chunks. If you know for sure that you will be storing relatively small files (1-2 MB per file or less) and you will only be updating a few files at a time, there may be less overhead when you use smaller chunks. For providers that don't support partial writes (most cloud providers), larger chunks are more optimal if most of your files are > 1 MB, or if you will be updating a lot of smaller files all at the same time. As far as checksumming, the new algorithm completely supersedes the old one and is enabled by default on all new cloud drives. It really has no disadvantages over the old algorithm. Older cloud drives will continue to use the old checksumming algorithm, which really only has the disadvantage of not supporting large chunks. Which providers currently support large chunks?
    I don't want to post a list here because it would get obsolete as we add more providers. When you create a new cloud drive, look under Advanced Settings. If you see a storage chunk size > 1 MB available, then that provider supports large chunks.  
    Going Chunk-less in the Future?
     
    I should mention that StableBit CloudDrive doesn't actually require chunks. If it at all makes sense for a particular provider, it really doesn't have to store its data in chunks. For example, it's entirely possible that StableBit CloudDrive will have a provider that stores its data in a VHDX file. Sure, why not. For that matter, StableBit CloudDrive providers don't even need to store their data in files at all. I can imagine writing providers that can store their data on an email server or a NNTP server (a bit of a stretch, yes, but theoretically possible).
     
    In fact, the only thing that StableBit CloudDrive really needs is some system that can save data and later retrieve it (in a timely manner). In that sense, StableBit CloudDrive can be a general purpose drive virtualization solution.
  15. Like
    Alex got a reaction from Minaleque in How to Contribute Test Data   
    I've started putting up disk and controller test data in this forum, as it relates to the StableBit Scanner's ability to gather SMART and Identify data using various disk controllers.
     
    Direct I/O Test
     
    "Direct I/O" is a set of technologies that the StableBit Scanner uses to read data directly from the disk. I collect my test results using an internal Direct I/O testing tool.
     

     
    You can get the latest version here: Download
     
    The tool will probe your disk and controller for various forms of data that the StableBit Scanner uses and will display either a green check mark or a red X to indicate whether the probe was successful. At the bottom, it will list the probing "methods" that were successfully used to probe the controller / disk.
     
    If you're interested in contributing your test data to this forum, then just run the tool and select a disk that is connected to the controller that you want to probe.
     
    Make sure that your computer is not doing anything else while probing. There is a small chance that the probing process will crash your system.
  16. Like
    Alex got a reaction from Minaleque in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  17. Like
    Alex got a reaction from Minaleque in Forum Downtime   
    Our forum, wiki and blog web sites experienced an issue with the database server that caused those sites to be down for the past 34 hours. The issue has been resolved and everything is back up and running. I'm sorry for the inconvenience.
     
    StableBit.com, the download server, and software activation services were not affected.
  18. Like
    Alex got a reaction from Antoineki in How to Contribute Test Data   
    I've started putting up disk and controller test data in this forum, as it relates to the StableBit Scanner's ability to gather SMART and Identify data using various disk controllers.
     
    Direct I/O Test
     
    "Direct I/O" is a set of technologies that the StableBit Scanner uses to read data directly from the disk. I collect my test results using an internal Direct I/O testing tool.
     

     
    You can get the latest version here: Download
     
    The tool will probe your disk and controller for various forms of data that the StableBit Scanner uses and will display either a green check mark or a red X to indicate whether the probe was successful. At the bottom, it will list the probing "methods" that were successfully used to probe the controller / disk.
     
    If you're interested in contributing your test data to this forum, then just run the tool and select a disk that is connected to the controller that you want to probe.
     
    Make sure that your computer is not doing anything else while probing. There is a small chance that the probing process will crash your system.
  19. Like
    Alex got a reaction from jaynew in Large Chunks and the I/O Manager   
    In this post I'm going to talk about the new large storage chunk support in StableBit CloudDrive 1.0.0.421 BETA, why that's important, and how StableBit CloudDrive manages provider I/O overall.
     
    Want to Download it?
     
    Currently, StableBit CloudDrive 1.0.0.421 BETA is an internal development build and like any of our internal builds, if you'd like, you can download it (or most likely a newer build) here:
    http://wiki.covecube.com/Downloads
     
    The I/O Manager
     
    Before I start talking about chunks and what the current change actually means, let's talk a bit about how StableBit CloudDrive handles provider I/O. Well, first let's define what provider I/O actually is. Provider I/O is the combination of all of the read and write request (or download and upload requests) that are serviced by your provider of choice. For example, if your cloud drive is storing data in Amazon S3, provider I/O consists of all of the download and upload requests from and to Amazon S3.
     
    Now it's important to differentiate provider I/O from cloud drive I/O because provider I/O is not really the same thing as cloud drive I/O. That's because all I/O to and from the drive itself is performed directly in the kernel by our cloud drive driver (cloudfs_disk.sys). But as a result of some cloud drive I/O, provider I/O can be generated. For example, this happens when there is an incoming read request to the drive for some data that is not stored in the local cache. In this case, the kernel driver cooperatively coordinates with the StableBit CloudDrive system service in order generate provider I/O and to complete the incoming read request in a timely manner.
     
    All provider I/O is serviced by the I/O Manager, which lives in the StableBit CloudDrive system service.
     
    Particularly, the I/O Manager is responsible for:
    As an optimization, coalescing incoming provider read and write I/O requests into larger requests. Parallelizing all provider read and write I/O requests using multiple threads. Retrying failed provider I/O operations. Error handling and error reporting logic. Chunks
     
    Now that I've described a little bit about the I/O manager in StableBit CloudDrive, let's talk chunks. StableBit CloudDrive doesn't inherently work with any types of chunks. They are simply the format in which data is stored by your provider of choice. They are an implementation that exists completely outside of the I/O manager, and provide some convenient functions that all chunked providers can use.
     
    How do Chunks Work?
     
    When a chunked cloud provider is first initialized, it is asked about its capabilities, such as whether it can perform partial reads, whether the remote server is performing proper multi-threaded transactional synchronization, etc... In other words, the chunking system needs to know how advanced the provider is, and based on those capabilities it will construct a custom chunk I/O processing pipeline for that particular provider.
     
    The chunk I/O pipeline provides automatic services for the provider such as:
    Whole and partial caching of chunks for performance reasons. Performing checksum insertion on write, and checksum verification on read. Read or write (or both) transactional locking for cloud providers that require it (for example, never try to read chunk 458 when chunk 458 is being written to). Translation of I/O that would end up being a partial chunk read / write request into a whole chunk read / write request for providers that require this. This is actually very complicated. If a partial chunk needs to be read, and the provider doesn't support partial reads, the whole chunk is read (and possibly cached) and only the part needed is returned. If a partial chunk needs to be written, and the provider doesn't support partial writes, then the whole chunk is downloaded (or retrieved from the cache), only the part that needs to be written to is updated, and the whole chunk is written back.If while this is happening another partial write request comes in for the same chunk (in parallel, on a different thread), and we're still in the process of reading that whole chunk, then coalesce the [whole read -> partial write -> whole write] into [whole read -> multiple partial writes -> whole write]. This is purely done as an optimization and is also very complicated. And in the future the chunk I/O processing pipeline can be extended to support other services as the need arises. Large Chunk Support
     
    Speaking of extending the chunk I/O pipeline, that's exactly what happened recently with the addition of large chunk support (> 1 MB) for most cloud providers.
     
    Previously, most cloud providers were limited to a maximum chunk size of 1 MB. This limit was in place because:
    Cloud drive read I/O requests, which can't be satisfied by the local cache, would generate provider read I/O that needed to be satisfied fairly quickly. For providers that didn't support partial reads, this meant that the entire chunk needed to be downloaded at all times, no matter how much data was being read. Additionally, if checksumming was enabled (which would be typical), then by necessity, only whole chunks could be read and written. This had some disadvantages, mostly for users with fast broadband connections:
    Writing a lot of data to the provider would generate a lot of upload requests very quickly (one request per 1 MB uploaded). This wasn't optimal because each request would add some overhead. Generating a lot of upload requests very quickly was also an issue for some cloud providers that were limiting their users based on the number of requests per second, rather than the total bandwidth used. Using smaller chunks with a fast broadband connection and a lot of threads would generate a lot of requests per second. Now, with large chunk support (up to 100 MB per chunk in most cases), we don't have those disadvantages.
     
    What was changed to allow for large chunks?
    In order to support the new large chunks a provider has to support partial reads. That's because it's still very necessary to ensure that all cloud drive read I/O is serviced quickly. Support for a new block based checksumming algorithm was introduced into the chunk I/O pipeline. With this new algorithm it's no longer necessary to read or write whole chunks in order to get checksumming support. This was crucial because it is very important to verify that your data in the cloud is not corrupted, and turning off checksumming wasn't a very good option. Are there any disadvantages to large chunks?
    If you're concerned about using the least possible amount of bandwidth (as opposed to using less provider calls per second), it may be advantageous to use smaller chunks. If you know for sure that you will be storing relatively small files (1-2 MB per file or less) and you will only be updating a few files at a time, there may be less overhead when you use smaller chunks. For providers that don't support partial writes (most cloud providers), larger chunks are more optimal if most of your files are > 1 MB, or if you will be updating a lot of smaller files all at the same time. As far as checksumming, the new algorithm completely supersedes the old one and is enabled by default on all new cloud drives. It really has no disadvantages over the old algorithm. Older cloud drives will continue to use the old checksumming algorithm, which really only has the disadvantage of not supporting large chunks. Which providers currently support large chunks?
    I don't want to post a list here because it would get obsolete as we add more providers. When you create a new cloud drive, look under Advanced Settings. If you see a storage chunk size > 1 MB available, then that provider supports large chunks.  
    Going Chunk-less in the Future?
     
    I should mention that StableBit CloudDrive doesn't actually require chunks. If it at all makes sense for a particular provider, it really doesn't have to store its data in chunks. For example, it's entirely possible that StableBit CloudDrive will have a provider that stores its data in a VHDX file. Sure, why not. For that matter, StableBit CloudDrive providers don't even need to store their data in files at all. I can imagine writing providers that can store their data on an email server or a NNTP server (a bit of a stretch, yes, but theoretically possible).
     
    In fact, the only thing that StableBit CloudDrive really needs is some system that can save data and later retrieve it (in a timely manner). In that sense, StableBit CloudDrive can be a general purpose drive virtualization solution.
  20. Like
    Alex got a reaction from Antoineki in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  21. Like
    Alex got a reaction from Antoineki in Forum Downtime   
    Our forum, wiki and blog web sites experienced an issue with the database server that caused those sites to be down for the past 34 hours. The issue has been resolved and everything is back up and running. I'm sorry for the inconvenience.
     
    StableBit.com, the download server, and software activation services were not affected.
  22. Like
    Alex got a reaction from Antoineki in Large Chunks and the I/O Manager   
    In this post I'm going to talk about the new large storage chunk support in StableBit CloudDrive 1.0.0.421 BETA, why that's important, and how StableBit CloudDrive manages provider I/O overall.
     
    Want to Download it?
     
    Currently, StableBit CloudDrive 1.0.0.421 BETA is an internal development build and like any of our internal builds, if you'd like, you can download it (or most likely a newer build) here:
    http://wiki.covecube.com/Downloads
     
    The I/O Manager
     
    Before I start talking about chunks and what the current change actually means, let's talk a bit about how StableBit CloudDrive handles provider I/O. Well, first let's define what provider I/O actually is. Provider I/O is the combination of all of the read and write request (or download and upload requests) that are serviced by your provider of choice. For example, if your cloud drive is storing data in Amazon S3, provider I/O consists of all of the download and upload requests from and to Amazon S3.
     
    Now it's important to differentiate provider I/O from cloud drive I/O because provider I/O is not really the same thing as cloud drive I/O. That's because all I/O to and from the drive itself is performed directly in the kernel by our cloud drive driver (cloudfs_disk.sys). But as a result of some cloud drive I/O, provider I/O can be generated. For example, this happens when there is an incoming read request to the drive for some data that is not stored in the local cache. In this case, the kernel driver cooperatively coordinates with the StableBit CloudDrive system service in order generate provider I/O and to complete the incoming read request in a timely manner.
     
    All provider I/O is serviced by the I/O Manager, which lives in the StableBit CloudDrive system service.
     
    Particularly, the I/O Manager is responsible for:
    As an optimization, coalescing incoming provider read and write I/O requests into larger requests. Parallelizing all provider read and write I/O requests using multiple threads. Retrying failed provider I/O operations. Error handling and error reporting logic. Chunks
     
    Now that I've described a little bit about the I/O manager in StableBit CloudDrive, let's talk chunks. StableBit CloudDrive doesn't inherently work with any types of chunks. They are simply the format in which data is stored by your provider of choice. They are an implementation that exists completely outside of the I/O manager, and provide some convenient functions that all chunked providers can use.
     
    How do Chunks Work?
     
    When a chunked cloud provider is first initialized, it is asked about its capabilities, such as whether it can perform partial reads, whether the remote server is performing proper multi-threaded transactional synchronization, etc... In other words, the chunking system needs to know how advanced the provider is, and based on those capabilities it will construct a custom chunk I/O processing pipeline for that particular provider.
     
    The chunk I/O pipeline provides automatic services for the provider such as:
    Whole and partial caching of chunks for performance reasons. Performing checksum insertion on write, and checksum verification on read. Read or write (or both) transactional locking for cloud providers that require it (for example, never try to read chunk 458 when chunk 458 is being written to). Translation of I/O that would end up being a partial chunk read / write request into a whole chunk read / write request for providers that require this. This is actually very complicated. If a partial chunk needs to be read, and the provider doesn't support partial reads, the whole chunk is read (and possibly cached) and only the part needed is returned. If a partial chunk needs to be written, and the provider doesn't support partial writes, then the whole chunk is downloaded (or retrieved from the cache), only the part that needs to be written to is updated, and the whole chunk is written back.If while this is happening another partial write request comes in for the same chunk (in parallel, on a different thread), and we're still in the process of reading that whole chunk, then coalesce the [whole read -> partial write -> whole write] into [whole read -> multiple partial writes -> whole write]. This is purely done as an optimization and is also very complicated. And in the future the chunk I/O processing pipeline can be extended to support other services as the need arises. Large Chunk Support
     
    Speaking of extending the chunk I/O pipeline, that's exactly what happened recently with the addition of large chunk support (> 1 MB) for most cloud providers.
     
    Previously, most cloud providers were limited to a maximum chunk size of 1 MB. This limit was in place because:
    Cloud drive read I/O requests, which can't be satisfied by the local cache, would generate provider read I/O that needed to be satisfied fairly quickly. For providers that didn't support partial reads, this meant that the entire chunk needed to be downloaded at all times, no matter how much data was being read. Additionally, if checksumming was enabled (which would be typical), then by necessity, only whole chunks could be read and written. This had some disadvantages, mostly for users with fast broadband connections:
    Writing a lot of data to the provider would generate a lot of upload requests very quickly (one request per 1 MB uploaded). This wasn't optimal because each request would add some overhead. Generating a lot of upload requests very quickly was also an issue for some cloud providers that were limiting their users based on the number of requests per second, rather than the total bandwidth used. Using smaller chunks with a fast broadband connection and a lot of threads would generate a lot of requests per second. Now, with large chunk support (up to 100 MB per chunk in most cases), we don't have those disadvantages.
     
    What was changed to allow for large chunks?
    In order to support the new large chunks a provider has to support partial reads. That's because it's still very necessary to ensure that all cloud drive read I/O is serviced quickly. Support for a new block based checksumming algorithm was introduced into the chunk I/O pipeline. With this new algorithm it's no longer necessary to read or write whole chunks in order to get checksumming support. This was crucial because it is very important to verify that your data in the cloud is not corrupted, and turning off checksumming wasn't a very good option. Are there any disadvantages to large chunks?
    If you're concerned about using the least possible amount of bandwidth (as opposed to using less provider calls per second), it may be advantageous to use smaller chunks. If you know for sure that you will be storing relatively small files (1-2 MB per file or less) and you will only be updating a few files at a time, there may be less overhead when you use smaller chunks. For providers that don't support partial writes (most cloud providers), larger chunks are more optimal if most of your files are > 1 MB, or if you will be updating a lot of smaller files all at the same time. As far as checksumming, the new algorithm completely supersedes the old one and is enabled by default on all new cloud drives. It really has no disadvantages over the old algorithm. Older cloud drives will continue to use the old checksumming algorithm, which really only has the disadvantage of not supporting large chunks. Which providers currently support large chunks?
    I don't want to post a list here because it would get obsolete as we add more providers. When you create a new cloud drive, look under Advanced Settings. If you see a storage chunk size > 1 MB available, then that provider supports large chunks.  
    Going Chunk-less in the Future?
     
    I should mention that StableBit CloudDrive doesn't actually require chunks. If it at all makes sense for a particular provider, it really doesn't have to store its data in chunks. For example, it's entirely possible that StableBit CloudDrive will have a provider that stores its data in a VHDX file. Sure, why not. For that matter, StableBit CloudDrive providers don't even need to store their data in files at all. I can imagine writing providers that can store their data on an email server or a NNTP server (a bit of a stretch, yes, but theoretically possible).
     
    In fact, the only thing that StableBit CloudDrive really needs is some system that can save data and later retrieve it (in a timely manner). In that sense, StableBit CloudDrive can be a general purpose drive virtualization solution.
  23. Like
    Alex got a reaction from Ginoliggime in Amazon Cloud Drive - Why is it not supported?   
    Some people have mentioned that we're not transparent enough with regards to what's going on with Amazon Cloud Drive support. That was definitely not the intention, and if that's how you guys feel, then I'm sorry about that and I want to change that. We really have nothing to gain by keeping any of this secret.
     
    Here, you will find a timeline of exactly what's happening with our ongoing communications with the Amazon Cloud Drive team, and I will keep this thread up to date as things develop.
     
    Timeline:
    On May 28th the first public BETA of StableBit CloudDrive was released without Amazon Cloud Drive production access enabled. At the time, I thought that the semi-automated whitelisting process that we went through was "Production Access". While this is similar to other providers, like Dropbox, it became apparent that for Amazon Cloud Drive, it's not. Upon closer reading of their documentation, it appears that the whitelisting process actually imposed "Developer Access" on us. To get upgraded to "Production Access" Amazon Cloud Drive requires direct email communications with the Amazon Cloud Drive team. We submitted our application for approval for production access originally on Aug. 15 over email: On Aug 24th I wrote in requesting a status update, because no one had replied to me, so I had no idea whether the email was read or not. On Aug. 27th I finally got an email back letting me know that our application was approved for production use. I was pleasantly surprised. On Sept 1st, after some testing, I wrote another email to Amazon letting them know that we are having issues with Amazon not respecting the Content-Type of uploads, and that we are also having issues with the new permission scopes that have been changes since we initially submitted our application for approval. No one answered this particular email until... On Sept 7th I received a panicked email from Amazon addressed to me (with CCs to other Amazon addresses) letting me know that Amazon is seeing unusual call patterns from one of our users.  On Sept 11th I replied explaining that we do not keep track of what our customers are doing, and that our software can scale very well, as long as the server permits it and the network bandwidth is sufficient. Our software does respect 429 throttling responses from the server and it does perform exponential backoff, as is standard practice in such cases. Nevertheless, I offered to limit the number of threads that we use, or to apply any other limits that Amazon deems necessary on the client side. I highlighted on this page the key question that really needs answered. Also, please note that obviously Amazon knows who this user is, since they have to be their customer in order to be logged into Amazon Cloud Drive. Also note that instead of banning or throttling that particular customer, Amazon has chosen to block the entire user base of our application.
    On Sept. 16th I received a response from Amazon. On Sept. 21st I haven't heard anything back yet, so I sent them this. I waited until Oct. 29th and no one answered. At that point I informed them that we're going ahead with the next public BETA regardless. Some time in the beginning of November an Amazon employee, on the Amazon forums, started claiming that we're not answering our emails. This was amidst their users asking why Amazon Cloud Drive is not supported with StableBit CloudDrive. So I did post a reply to that, listing a similar timeline. One Nov. 11th I sent another email to them. On Nov 13th Amazon finally replied. I redacted the limits, I don't know if they want those public.
    On Nov. 19th I sent them another email asking to clarify what the limits mean exactly. On Nov 25th Amazon replied with some questions and concerns regarding the number of API calls per second: On Dec. 2nd I replied with the answers and a new build that implemented large chunk support. This was a fairly complicated change to our code designed to minimize the number of calls per second for large file uploads.

    You can read my Nuts & Bolts post on large chunk support here: http://community.covecube.com/index.php?/topic/1622-large-chunks-and-the-io-manager/ On Dec. 10th 2015, Amazon replied: I have no idea what they mean regarding the encrypted data size. AES-CBC is a 1:1 encryption scheme. Some number of bytes go in and the same exact number of bytes come out, encrypted. We do have some minor overhead for the checksum / authentication signature at the end of every 1 MB unit of data, but that's at most 64 bytes per 1 MB when using HMAC-SHA512 (which is 0.006 % overhead). You can easily verify this by creating an encrypted cloud drive of some size, filling it up to capacity, and then checking how much data is used on the server.

    Here's the data for a 5 GB encrypted drive:



    Yes, it's 5 GBs.  
     
      To clarify the error issue here (I'm not 100% certain about this), Amazon doesn't provide a good way to ID these files. We have to try to upload them again, and then grab the error ID to get the actual file ID so we can update the file.  This is inefficient and would be solved with a more robust API that included a search functionality, or a better file list call. So, this is basically "by design" and required currently.  -Christopher     Unfortunately, we haven't pursued this with Amazon recently. This is due to a number of big bugs that we have been following up on.  However, these bugs have lead to a lot of performance, stability and reliability fixes. And a lot of users have reported that these fixes have significantly improved the Amazon Cloud Drive provider.  That is something that is great to hear, as it may help to get the provider to a stable/reliable state.    That said, once we get to a more stable state (after the next public beta build (after 1.0.0.463) or a stable/RC release), we do plan on pursuing this again.     But in the meanwhile, we have held off on this as we want to focus on the entire product rather than a single, problematic provider.  -Christopher     Amazon has "snuck in" additional guidelines that don't bode well for us.  https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide Don’t build apps that encrypt customer data  
    What does this mean for us? We have no idea right now.  Hopefully, this is a guideline and not a hard rule (other apps allow encryption, so that's hopeful, at least). 
     
    But if we don't get re-approved, we'll deal with that when the time comes (though, we will push hard to get approval).
     
    - Christopher (Jan 15. 2017)
     
    If you haven't seen already, we've released a "gold" version of StableBit CloudDrive. Meaning that we have an official release! 
    Unfortunately, because of increasing issues with Amazon Cloud Drive, that appear to be ENTIRELY server side (drive issues due to "429" errors, odd outages, etc), and that we are STILL not approved for production status (despite sending off additional emails a month ago, requesting approval or at least an update), we have dropped support Amazon Cloud Drive. 

    This does not impact existing users, as you will still be able to mount and use your existing drives. However, we have blocked the ability to create new drives for Amazon Cloud Drive.   
     
    This was not a decision that we made lightly, and while we don't regret this decision, we are saddened by it. We would have loved to come to some sort of outcome that included keeping full support for Amazon Cloud Drive. 
    -Christopher (May 17, 2017)
  24. Like
    Alex got a reaction from Ginoliggime in Forum Downtime   
    Our forum, wiki and blog web sites experienced an issue with the database server that caused those sites to be down for the past 34 hours. The issue has been resolved and everything is back up and running. I'm sorry for the inconvenience.
     
    StableBit.com, the download server, and software activation services were not affected.
  25. Like
    Alex got a reaction from KiaraEvirm in Forum Downtime   
    Our forum, wiki and blog web sites experienced an issue with the database server that caused those sites to be down for the past 34 hours. The issue has been resolved and everything is back up and running. I'm sorry for the inconvenience.
     
    StableBit.com, the download server, and software activation services were not affected.
×
×
  • Create New...