Jump to content

Cloud Providers


Alex

Recommended Posts

As I was writing the various providers for StableBit CloudDrive I got a sense for how well each one performs / scales and the various quirks of some cloud services. I'm going to use this thread to describe my observations.

 

As Stablebit CloudDrive grows, I'm sure that my understanding of the various providers will improve as well. Also remember, this is from a developer's point of view.

 

Google Cloud Storage

http://cloud.google.com/storage

 

Pros:

  • Reasonably fast.
  • Simple API.
  • Reliable.
  • Scales very well.

Cons:

  • Not the fastest provider.
  • Especially slow when deleting existing chunks (e.g. when destroying a drive).
  • Difficult to use and bloated SDK (a development issue really).

 

This was the first cloud service that I wrote a StableBit CloudDrive provider for and initially I started writing the code against their SDK which I later realized was a mistake. I replaced the SDK entirely with my own API, so that improved the reliability of this provider and solved a lot of the issues that the SDK was causing.

 

Another noteworthy thing about this provider is that it's not as fast as some of the other providers (Amazon S3 / Microsoft Azure Storage).

 

Amazon S3

http://aws.amazon.com/s3/

 

Pros:

  • Very fast.
  • Reliable.
  • Scales very well.
  • Beautiful, compact and functional SDK.

Cons:

  • Configuration is a bit confusing.

Here the SDK is uniquely good, it's a single DLL and super simple to use. Most importantly it's reliable. It handles multi-threading correctly and its error handling logic is straightforward. It is one of the few SDKs that StableBit CloudDrive uses out of the box. All of the other providers (except Microsoft Azure Storage) utilize custom written SDKs.

 

This is a great place to store your mission critical data, I backup all of my code to this provider.

 

Microsoft Azure Storage

http://azure.microsoft.com/en-us/services/storage/

 

Pros:

  • Very fast.
  • Reliable.
  • Scales very well.
  • Easy to configure.

Cons:

  • No reasonably priced support option that makes sense.

This is a great cloud service. It's definitely on par with Amazon S3 in terms of speed and seems to be very reliable from my testing.

 

Having used Microsoft Azure services for almost all of our web sites and the database back-end, I can tell you that there is one major issue with Microsoft Azure. There is no one that you can contact when something goes wrong (and things seem to go wrong quite often), without paying a huge sum of money.

 

For example, take a look at their support prices: http://azure.microsoft.com/en-us/support/plans/

 

If you want someone from Microsoft to take a look at an issue that you're having within 2 hours that will cost you $300 / month. Other than that, it's a great service to use.

 

OneDrive for Business

https://onedrive.live.com/about/en-US/business/

 

Pros:

  • Reasonable throttling limits in place.

Cons:

  • Slow.
  • API is lacking leading to reliability issues.
  • Does not scale well, so you are limited in the amount of data that you can store before everything grinds to a halt.
  • Especially slow when deleting existing chunks (e.g. when destroying a drive).

This service is actually a rebranded version of Microsoft SharePoint hosted in the cloud for you. It has absolutely nothing to do with the "regular" OneDrive other than the naming similarity.

 

This service does not scale well at all, and this is really a huge issue. The more data that you upload to this service, the slower it gets. After uploading about 200 GB, it really starts to slow down. It seems to be sensitive to the number of files that you have, and for that reason StableBit CloudDrive sets the chunk size to 1MB by default, in order to minimize the number of files that it creates.

 

By default, Microsoft SharePoint expects each folder to contain no more than 5000 files, or else certain features simply stop working (including deleting said files). This is by design and here's a page that explain in detail why this limit is there and how to work around it: https://support.office.com/en-us/article/Manage-lists-and-libraries-with-many-items-11ecc804-2284-4978-8273-4842471fafb7

 

If you're going to use this provider to store large amounts of data, then I recommend following the instructions on the page linked above. Although, for me, it didn't really help much at all.

 

I've worked hard to try and resolve this by utilizing a nested directory structure in order to limit the number of files in each directory, but nothing seems to make any difference. If there are any SharePoint experts out there that can figure out what we can do to speed this provider up, please let me know.

 

OneDrive

https://onedrive.live.com/

Experimental

 

Pros:

  • Clean API.

Cons:

  • Heavily and unreasonably throttled.

From afar, OneDrive looks like the perfect storage provider. It's very fast, reliable, easy to use and has an inexpensive unlimited storage option. But after you upload / download some data you start hitting the throttling limits. The throttling limits are excessive and unreasonable, so much so, that using this provider with StableBit CloudDrive is dangerous. For this reason, the OneDrive provider is currently disabled in StableBit CloudDrive by default.

 

What makes the throttling limits unreasonable is the amount of time that OneDrive expects you to wait before making another request. In my experience that can be as high as 20 minutes to 1 hour. Can you imagine when trying to open a document in Microsoft Windows hitting an error that reads "I see that you've opened too many documents today, please come back in 1 hour". Not only is this unreasonable, it's also technically infeasible to implement this kind of a delay on a real-time disk.

 

Box

https://www.box.com/

 

At this point I haven't used this provider for an extended period of time to render an opinion on how it behaves with large amounts of data.

 

One thing that I can say is that the API is a bit quirky in how it's designed necessitating some extra HTTP traffic that other providers don't require.

 

Dropbox

http://www.dropbox.com/

 

Again, I haven't used this provider much so I can't speak to how well it scales or how well it performs.

 

The API here is very robust and very easy to use. One very cool feature that they have is an "App Folder". When you authorize StableBit CloudDrive to use your Dropbox account, Dropbox creates an isolated container for StableBit CloudDrive and all of the data is stored there. This is nice because you don't see the StableBit CloudDrive data in your regular Dropbox folder, and Stablebit CloudDrive has no way to access any other data that's in your Dropbox or any data in some other app folder.

 

Amazon Cloud Drive

https://www.amazon.com/clouddrive/home

 

Pros:

  • Fast.
  • Scales well.
  • Unlimited storage option.
  • Reasonable throttling limits.

Cons:

  • Data integrity issues.

I know how important it is for StableBit CloudDrive to support this service and so I've spent many hours and days trying to make a reliable provider that works. This single provider delayed the initial public BETA of StableBit CloudDrive by at least 2 weeks.

 

The initial issue that I had with Amazon Cloud Drive is that it returns various errors as a response to I/O requests. These errors range from 500 Internal Server Error to 400 Bad Request. Reissuing the same request seems to work, so there doesn't appear to be a problem with the actual request, but rather with the server.

 

I later discovered a more serious issue with this service, apparently after uploading a file, sometimes (very rarely) that file cannot be downloaded. Which means that the file's data gets permanently lost (as far as I can tell). This is very rare and hard to reproduce. My test case scenario needs to run for one whole day before it can reproduce the problem. I finally solved this issue by forcing Upload Verification to be enabled in StableBit CloudDrive. When this issue occurs, upload verification will detect this scenario, delete the corrupt file and retry the upload. That apparently fixed this particular issue.

 

The next thing that I discovered with this service (after I released the public BETA) is that some 400 Bad Request errors spawn at a later time, long after the initial upload / verification step is complete. After extensively debugging, I was able to confirm this with the Amazon Cloud Drive web interface as well, so this is not a provider code issue, rather the problem actually occurs on the server. If a file gets into this state, a 400 Bad Request error is issued, and if you examine that request, the error message in the response says 404 Not Found. Apparently, the file metadata is there, but the file's contents is gone.

 

The short story is that this service has data integrity issues that are not limited to StableBit CloudDrive in particular, and I'm trying to identify exactly what they are, how they are triggered and apply possible workarounds.

 

I've already applied another possible workaround in the latest internal BETA (1.0.0.284), but I'm still testing whether the fix is effective. I am considering disabling this provider in future builds, and moving it into the experimental category.

 

Local Disk / File Share

 

These providers don't use the cloud, so there's really nothing to say here.

Link to comment
Share on other sites

Well, Alex has covered the development side of this, I've been looking into the pricing of the different providers.  
 
While "unlimited" is clearly the best option for many here, I want focus on the "big" providers (Amazon S3, Azure Storage, and Google Cloud Storage).  Unfortunately, for many users, the high cost associated with these providers may immediately put them out of range. But we still think it is a good idea to compare the pricing (at least for reference sake). 
 
All three of these providers include a storage pricing (how much data you're storing), "Request" pricing (how many API requests you make), and data transfer pricing (how much data you've transferred to and from. And all prices are listed for the US Standard region, ATM.
I've tried to reorder lists, so that each provider is shown using the same layout for their different tiers. 
 
Amazon S3
 
 
Storage Pricing (Amount stored)
 
                                        Reduced Redundancy             Standard                  Glacier
Up to     1TB / Month             $0.0240 per GB               $0.0300 per GB        $0.0100 per GB
Up to   50TB / Month             $0.0236 per GB               $0.0295 per GB        $0.0100 per GB
Up to 500TB / Month             $0.0232 per GB               $0.0290 per GB        $0.0100 per GB
Up to     1PB / Month             $0.0228 per GB               $0.0285 per GB        $0.0100 per GB
Up to     4PB / Month             $0.0224 per GB               $0.0280 per GB        $0.0100 per GB
Over      5PB / Month             $0.0220 per GB               $0.0275 per GB        $0.0100 per GB
 
Specifically, Amazon lists "for the next" pricing, so the pricing may be cumulative.
Also, "reduced Redundancy" means that they're using mostly only local servers to you, and not redundant throughout various regions. 
And this is ~$25 per TB per month of storage for the Reduced Redundancy, about $30 per TB per month for Standard and $10.24 per TB per month for Glacier.
 
This may seem like a deal, but lets look at the data transfer pricing.
 
Transfer Pricing
 
Data Transfer In to S3 (upload)  $0.000 per GB
Data Transfer OUT to the internet (download)
First       1GB / Month                   $0.000 per GB
Up to    10TB / Month                   $0.090 per GB
"Next"   40TB / Month (50TB)       $0.085 per GB
"Next" 100TB / Month (150TB)     $0.070 per GB
"Next" 350TB / Month (500TB)     $0.050 per GB
"Next" 524TB / Month (1024TB)   Contact Amazon S3 for special consideration
 

That's $92 per TB per month, up to 10TBs

Chances are, that unless you have a very good speed, that's where you're going to be "stuck" at 

 

So, that boils down to $115/month to store and access 1TB per month. Your usage may vary, but this may get very expensive, very quickly (fortunately, upload is free, so getting the storage there isn't that expensive, it's getting it back that will be).

 

Additionally, Amazon S3 charges you per transaction (API call), as well.

 

Request Pricing (API Calls)

PUT, COPY, POST, LIST Requests            $0.005 per 1000 requests

Glacier Archive and Restore Requests        $0.050 per 1000 requests

DELETE Requests                                         Free (caveat for Glacier)

GET and other requests                              $0.004 per 10,000 requests

Glacier Data Restores                                  Free

                   (due to infrequent usage expected, can restore up to 5% monthly for free)

 

Needless to say, that every time you list contents, you may be making multiple requests (we minimize this as much as possible with the caching/prefetching options, but that only limits it to a degree).  This one is hard to quantify without actual usage.

 

 

Microsoft Azure Storage
 
Storage Pricing (Amount stored) for Block Blob
 
                                                               LRS                       ZRS                        GRS                       RA-GRS
First          1TB / Month                  $0.0240 per GB    $0.0300 per GB        $0.0480 per GB        $0.0610 per GB
"Next"     49TB / Month (50TB)      $0.0236 per GB    $0.0295 per GB        $0.0472 per GB        $0.0599 per GB
"Next"   450TB / Month (500TB)    $0.0232 per GB    $0.0290 per GB        $0.0464 per GB        $0.0589 per GB
"Next"   500TB / Month (1000TB)  $0.0228 per GB    $0.0285 per GB        $0.0456 per GB        $0.0579 per GB
"Next" 4000TB / Month (5000TB)  $0.0224 per GB    $0.0280 per GB        $0.0448 per GB        $0.0569 per GB
Over   5000PB / Month                                      Contact Microsoft Azure for special consideration
 
The LRS and ZRS "zones" are priced identically to Amazon S3 here. 
However, lets explain these terms:
LRS:  Multiple copies of the data on different physical servers, as the same datacenter (one location).
ZRS: Three copies at different data centers within a region, or in different regions. For "blob storage only".
GRS: Same as LRS, but with multiple (asynchronous) copies at other another datacenter.
RA-GRS: Same was GRS, but with read access to the secondary data center
 
And this is ~$25 per TB per month of storage for the LRS, about $30 per TB per month for ZRS, about $50 per TB per month for GRS, and about $60 per TB per month for RA-GRS.
 
Microsoft Azure offers other storage types, but it gets much more expensive, very quickly (double the of what's listed for Blob storage, or higher).
 
Transfer Rate
 
Unfortunately, Microsoft isn't as forthcoming about their transfer rates. They tuck it away on another page, so it's harder to access.  However, it is 
 
Data Transfer IN  (upload)          $0.000 per GB
Data Transfer OUT to the internet (download)
First       5GB / Month                   $0.000 per GB
Up to    10TB / Month                   $0.087 per GB
"Next"   40TB / Month (50TB)       $0.083 per GB
"Next" 100TB / Month (150TB)     $0.070 per GB
"Next" 350TB / Month (500TB)     $0.050 per GB
"Next" 524TB / Month (1024TB)   Contact Micorosft Azure for special consideration
 

That's $89 per TB per month, up to 10TBs

Chances are, that unless you have a very good speed, that's where you're going to be "stuck" at 

 

This is slightly cheaper than Amazon S3, but not by a whole lot, and it heavily depends on the level of redundancy and storage type you use. 

 

Request Pricing (API Calls)

Any Request                         $0.0036 per 10000 requests

Import/Export (HDDs)           $80 per drive, may not be not suitable for CloudDrive

 

 

This is definitely much cheaper than Amazon S3's request pricing.

It's still going to run you around $100 per TB per month to store and transfer, but it's a bit better than Amazon S3. And that's not counting the request transaction pricing.

 

Google Cloud Storage
 
 
Storage Pricing (Amount stored)
 
  DRA Storage                    Standard          Cloud Storage Nearline
$0.0200 per GB               $0.0260 per GB        $0.0100 per GB
 
DRA (Durability Reduced Availability) means that the data is not always available. While this is the cheapest, it will definitely cause latency issues (or worse).  
Cloud Storage Nearly is a step cheaper, and is at a reduced performance, and has less Availability.
 
However, this is a flat rate, so it's very simple to figure out what your cost will be here.
 
And this is ~$20.48 per TB per month of storage for the DRA Storage, $26.63 per TB per month for Standard and $10.24 per TB per month for Cloud Storage Nearline.
 
Now lets look at the transfer pricing.
Transfer Pricing
 
Data Transfer In to Google (upload)  $0.000 per GB
Data Transfer OUT to the internet (download)
First       1GB / Month                           $0.120 per GB
"Next"   10TB / Month                           $0.110 per GB
Over     40TB / Month (50TB)               $0.080 per GB
 

That's about $122 per TB per month, up to 10TBs

 

 

So, that boils down to $140/month to store and access 1TB per month. This is definitely more expensive than either Amazon S3 or Azure Storage.

 

Additionally, Google Cloud Storage does charge you per API call, as well.

 

Request Pricing (API Calls)

LIST, PUT, COPY, POST Requests       $0.010 per 1000 requests

GET, and others Requests                     $0.001 per 1000 requests

DELETE Requests                                  Free

 

Google is definitely significantly more expensive when it comes to API calls. 

 

 

 

Backblaze B2
 
 
Storage Pricing (Amount stored)
 
 Flat Storage Rate
   $0.005 per GB
 
 
The first 10GBs is free, but that's a small account, so we won't even bother computing it (it's a $0.05 difference, specifically)
But that's basically $5.12 per TB per month for storage. 
 
 
Transfer Pricing
 
Data Transfer In to S3 (upload)  $0.000 per GB
Data Transfer OUT to the internet (download)
First 1GB / Month                   $0.000 per GB
Past 1GB / Month                   $0.050 per GB
 

That's $51 per TB per month transferred. This is by far, the cheapest option here.

And chances are, that unless you have a very good speed, that's where you're going to be "stuck" at 

 

So, that boils down to $56/month to store and access 1TB per month. Your usage may vary, but this may get very expensive, very quickly (fortunately, upload is free, so getting the storage there isn't that expensive, it's getting it back that will be).

 

Additionally, Amazon S3 charges you per transaction (API call), as well.

 

Request Pricing (API Calls)

DELETE bucket/file version, HIDE, UPLOAD Requests            Free

GET, DOWNLOAD file by ID/Name                                          $0.004 per 10,000 requests

Authorize, CREATE, GET, LIST, UPDATE Requests               $0.004 per 1000 requests

                   (due to infrequent usage expected, can restore up to 5% monthly for free)

 

First 2500 are free each day, and this is different from the other providers.  However, as above, it's hard to predict the usage without actual usage.

 

 


Is there a clear winner here? No. Depending on the available, amount of data and traffic, and usage, it varies depending on how you want to use the provider.

 

Well, in regards to pricing, Backblaze is clearly the winner here. But giving other issues with Backblaze (eg, sourcing, reporting statistically insignificant findings, etc), the question is "Will they be able to maintain their B2 business?" And that is a significant one. Only time will tell. 

Edited by Christopher (Drashna)
Added Backblaze B2
Link to comment
Share on other sites

  • 1 month later...

I was really looking forward to possibly using the CloudPool solution for simple data backup to the cloud, but man that's some pretty steep cost's..

That's for the enterprise storage solutions. 

 

 

But yes, they are.  But you have to keep in mind the amount of drives these companies are purchasing (even on a daily bases). Also, factor in the cost of their internet connection (which is a very redundant, multiple connection setup), the cost of powering just the datacenter, and then the cost of the cooling solution.  It's not cheap for them, either. 

 

 

ALso, Amazon's Glacier service is significantly cheaper, but not viable for our usage (it has up to a 4 hour wait time to access uploaded data!)

 

 

But you could have multiple DropBox accounts, and pool them together. :)

Link to comment
Share on other sites

  • 2 months later...
  • 2 years later...
On 10/10/2015 at 12:38 PM, Reptile said:

Google Drive costs $40 dollars / month for unlimited business storage. Rock solid provider with reasonable upload limits.

And is limited to uploading 750GB/day/account.  That's 75-80mbps upload.  

And unlimited is TECHNICALLY only if you have 5+ accounts. So that's more than just $40.  They may not enforce this currently, but that may change. 

Link to comment
Share on other sites

On 4/20/2018 at 6:05 AM, Leonardo said:

It would be nice if you could update this review  

For the most part, there isn't a need to.  Not a lot has changed, unfortunately. 

I could add Google Drive to the list, but for the most part, that may not be necessary. 

 

However, when we add more providers, it may become relevant to do so. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...