Jump to content

Alex

Administrators
  • Posts

    253
  • Joined

  • Last visited

  • Days Won

    49

Reputation Activity

  1. Like
    Alex got a reaction from Minaleque in Large Chunks and the I/O Manager   
    In this post I'm going to talk about the new large storage chunk support in StableBit CloudDrive 1.0.0.421 BETA, why that's important, and how StableBit CloudDrive manages provider I/O overall.
     
    Want to Download it?
     
    Currently, StableBit CloudDrive 1.0.0.421 BETA is an internal development build and like any of our internal builds, if you'd like, you can download it (or most likely a newer build) here:
    http://wiki.covecube.com/Downloads
     
    The I/O Manager
     
    Before I start talking about chunks and what the current change actually means, let's talk a bit about how StableBit CloudDrive handles provider I/O. Well, first let's define what provider I/O actually is. Provider I/O is the combination of all of the read and write request (or download and upload requests) that are serviced by your provider of choice. For example, if your cloud drive is storing data in Amazon S3, provider I/O consists of all of the download and upload requests from and to Amazon S3.
     
    Now it's important to differentiate provider I/O from cloud drive I/O because provider I/O is not really the same thing as cloud drive I/O. That's because all I/O to and from the drive itself is performed directly in the kernel by our cloud drive driver (cloudfs_disk.sys). But as a result of some cloud drive I/O, provider I/O can be generated. For example, this happens when there is an incoming read request to the drive for some data that is not stored in the local cache. In this case, the kernel driver cooperatively coordinates with the StableBit CloudDrive system service in order generate provider I/O and to complete the incoming read request in a timely manner.
     
    All provider I/O is serviced by the I/O Manager, which lives in the StableBit CloudDrive system service.
     
    Particularly, the I/O Manager is responsible for:
    As an optimization, coalescing incoming provider read and write I/O requests into larger requests. Parallelizing all provider read and write I/O requests using multiple threads. Retrying failed provider I/O operations. Error handling and error reporting logic. Chunks
     
    Now that I've described a little bit about the I/O manager in StableBit CloudDrive, let's talk chunks. StableBit CloudDrive doesn't inherently work with any types of chunks. They are simply the format in which data is stored by your provider of choice. They are an implementation that exists completely outside of the I/O manager, and provide some convenient functions that all chunked providers can use.
     
    How do Chunks Work?
     
    When a chunked cloud provider is first initialized, it is asked about its capabilities, such as whether it can perform partial reads, whether the remote server is performing proper multi-threaded transactional synchronization, etc... In other words, the chunking system needs to know how advanced the provider is, and based on those capabilities it will construct a custom chunk I/O processing pipeline for that particular provider.
     
    The chunk I/O pipeline provides automatic services for the provider such as:
    Whole and partial caching of chunks for performance reasons. Performing checksum insertion on write, and checksum verification on read. Read or write (or both) transactional locking for cloud providers that require it (for example, never try to read chunk 458 when chunk 458 is being written to). Translation of I/O that would end up being a partial chunk read / write request into a whole chunk read / write request for providers that require this. This is actually very complicated. If a partial chunk needs to be read, and the provider doesn't support partial reads, the whole chunk is read (and possibly cached) and only the part needed is returned. If a partial chunk needs to be written, and the provider doesn't support partial writes, then the whole chunk is downloaded (or retrieved from the cache), only the part that needs to be written to is updated, and the whole chunk is written back.If while this is happening another partial write request comes in for the same chunk (in parallel, on a different thread), and we're still in the process of reading that whole chunk, then coalesce the [whole read -> partial write -> whole write] into [whole read -> multiple partial writes -> whole write]. This is purely done as an optimization and is also very complicated. And in the future the chunk I/O processing pipeline can be extended to support other services as the need arises. Large Chunk Support
     
    Speaking of extending the chunk I/O pipeline, that's exactly what happened recently with the addition of large chunk support (> 1 MB) for most cloud providers.
     
    Previously, most cloud providers were limited to a maximum chunk size of 1 MB. This limit was in place because:
    Cloud drive read I/O requests, which can't be satisfied by the local cache, would generate provider read I/O that needed to be satisfied fairly quickly. For providers that didn't support partial reads, this meant that the entire chunk needed to be downloaded at all times, no matter how much data was being read. Additionally, if checksumming was enabled (which would be typical), then by necessity, only whole chunks could be read and written. This had some disadvantages, mostly for users with fast broadband connections:
    Writing a lot of data to the provider would generate a lot of upload requests very quickly (one request per 1 MB uploaded). This wasn't optimal because each request would add some overhead. Generating a lot of upload requests very quickly was also an issue for some cloud providers that were limiting their users based on the number of requests per second, rather than the total bandwidth used. Using smaller chunks with a fast broadband connection and a lot of threads would generate a lot of requests per second. Now, with large chunk support (up to 100 MB per chunk in most cases), we don't have those disadvantages.
     
    What was changed to allow for large chunks?
    In order to support the new large chunks a provider has to support partial reads. That's because it's still very necessary to ensure that all cloud drive read I/O is serviced quickly. Support for a new block based checksumming algorithm was introduced into the chunk I/O pipeline. With this new algorithm it's no longer necessary to read or write whole chunks in order to get checksumming support. This was crucial because it is very important to verify that your data in the cloud is not corrupted, and turning off checksumming wasn't a very good option. Are there any disadvantages to large chunks?
    If you're concerned about using the least possible amount of bandwidth (as opposed to using less provider calls per second), it may be advantageous to use smaller chunks. If you know for sure that you will be storing relatively small files (1-2 MB per file or less) and you will only be updating a few files at a time, there may be less overhead when you use smaller chunks. For providers that don't support partial writes (most cloud providers), larger chunks are more optimal if most of your files are > 1 MB, or if you will be updating a lot of smaller files all at the same time. As far as checksumming, the new algorithm completely supersedes the old one and is enabled by default on all new cloud drives. It really has no disadvantages over the old algorithm. Older cloud drives will continue to use the old checksumming algorithm, which really only has the disadvantage of not supporting large chunks. Which providers currently support large chunks?
    I don't want to post a list here because it would get obsolete as we add more providers. When you create a new cloud drive, look under Advanced Settings. If you see a storage chunk size > 1 MB available, then that provider supports large chunks.  
    Going Chunk-less in the Future?
     
    I should mention that StableBit CloudDrive doesn't actually require chunks. If it at all makes sense for a particular provider, it really doesn't have to store its data in chunks. For example, it's entirely possible that StableBit CloudDrive will have a provider that stores its data in a VHDX file. Sure, why not. For that matter, StableBit CloudDrive providers don't even need to store their data in files at all. I can imagine writing providers that can store their data on an email server or a NNTP server (a bit of a stretch, yes, but theoretically possible).
     
    In fact, the only thing that StableBit CloudDrive really needs is some system that can save data and later retrieve it (in a timely manner). In that sense, StableBit CloudDrive can be a general purpose drive virtualization solution.
  2. Like
    Alex got a reaction from Minaleque in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  3. Like
    Alex got a reaction from Antoineki in Amazon Cloud Drive - Why is it not supported?   
    Some people have mentioned that we're not transparent enough with regards to what's going on with Amazon Cloud Drive support. That was definitely not the intention, and if that's how you guys feel, then I'm sorry about that and I want to change that. We really have nothing to gain by keeping any of this secret.
     
    Here, you will find a timeline of exactly what's happening with our ongoing communications with the Amazon Cloud Drive team, and I will keep this thread up to date as things develop.
     
    Timeline:
    On May 28th the first public BETA of StableBit CloudDrive was released without Amazon Cloud Drive production access enabled. At the time, I thought that the semi-automated whitelisting process that we went through was "Production Access". While this is similar to other providers, like Dropbox, it became apparent that for Amazon Cloud Drive, it's not. Upon closer reading of their documentation, it appears that the whitelisting process actually imposed "Developer Access" on us. To get upgraded to "Production Access" Amazon Cloud Drive requires direct email communications with the Amazon Cloud Drive team. We submitted our application for approval for production access originally on Aug. 15 over email: On Aug 24th I wrote in requesting a status update, because no one had replied to me, so I had no idea whether the email was read or not. On Aug. 27th I finally got an email back letting me know that our application was approved for production use. I was pleasantly surprised. On Sept 1st, after some testing, I wrote another email to Amazon letting them know that we are having issues with Amazon not respecting the Content-Type of uploads, and that we are also having issues with the new permission scopes that have been changes since we initially submitted our application for approval. No one answered this particular email until... On Sept 7th I received a panicked email from Amazon addressed to me (with CCs to other Amazon addresses) letting me know that Amazon is seeing unusual call patterns from one of our users.  On Sept 11th I replied explaining that we do not keep track of what our customers are doing, and that our software can scale very well, as long as the server permits it and the network bandwidth is sufficient. Our software does respect 429 throttling responses from the server and it does perform exponential backoff, as is standard practice in such cases. Nevertheless, I offered to limit the number of threads that we use, or to apply any other limits that Amazon deems necessary on the client side. I highlighted on this page the key question that really needs answered. Also, please note that obviously Amazon knows who this user is, since they have to be their customer in order to be logged into Amazon Cloud Drive. Also note that instead of banning or throttling that particular customer, Amazon has chosen to block the entire user base of our application.
    On Sept. 16th I received a response from Amazon. On Sept. 21st I haven't heard anything back yet, so I sent them this. I waited until Oct. 29th and no one answered. At that point I informed them that we're going ahead with the next public BETA regardless. Some time in the beginning of November an Amazon employee, on the Amazon forums, started claiming that we're not answering our emails. This was amidst their users asking why Amazon Cloud Drive is not supported with StableBit CloudDrive. So I did post a reply to that, listing a similar timeline. One Nov. 11th I sent another email to them. On Nov 13th Amazon finally replied. I redacted the limits, I don't know if they want those public.
    On Nov. 19th I sent them another email asking to clarify what the limits mean exactly. On Nov 25th Amazon replied with some questions and concerns regarding the number of API calls per second: On Dec. 2nd I replied with the answers and a new build that implemented large chunk support. This was a fairly complicated change to our code designed to minimize the number of calls per second for large file uploads.

    You can read my Nuts & Bolts post on large chunk support here: http://community.covecube.com/index.php?/topic/1622-large-chunks-and-the-io-manager/ On Dec. 10th 2015, Amazon replied: I have no idea what they mean regarding the encrypted data size. AES-CBC is a 1:1 encryption scheme. Some number of bytes go in and the same exact number of bytes come out, encrypted. We do have some minor overhead for the checksum / authentication signature at the end of every 1 MB unit of data, but that's at most 64 bytes per 1 MB when using HMAC-SHA512 (which is 0.006 % overhead). You can easily verify this by creating an encrypted cloud drive of some size, filling it up to capacity, and then checking how much data is used on the server.

    Here's the data for a 5 GB encrypted drive:



    Yes, it's 5 GBs.  
     
      To clarify the error issue here (I'm not 100% certain about this), Amazon doesn't provide a good way to ID these files. We have to try to upload them again, and then grab the error ID to get the actual file ID so we can update the file.  This is inefficient and would be solved with a more robust API that included a search functionality, or a better file list call. So, this is basically "by design" and required currently.  -Christopher     Unfortunately, we haven't pursued this with Amazon recently. This is due to a number of big bugs that we have been following up on.  However, these bugs have lead to a lot of performance, stability and reliability fixes. And a lot of users have reported that these fixes have significantly improved the Amazon Cloud Drive provider.  That is something that is great to hear, as it may help to get the provider to a stable/reliable state.    That said, once we get to a more stable state (after the next public beta build (after 1.0.0.463) or a stable/RC release), we do plan on pursuing this again.     But in the meanwhile, we have held off on this as we want to focus on the entire product rather than a single, problematic provider.  -Christopher     Amazon has "snuck in" additional guidelines that don't bode well for us.  https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide Don’t build apps that encrypt customer data  
    What does this mean for us? We have no idea right now.  Hopefully, this is a guideline and not a hard rule (other apps allow encryption, so that's hopeful, at least). 
     
    But if we don't get re-approved, we'll deal with that when the time comes (though, we will push hard to get approval).
     
    - Christopher (Jan 15. 2017)
     
    If you haven't seen already, we've released a "gold" version of StableBit CloudDrive. Meaning that we have an official release! 
    Unfortunately, because of increasing issues with Amazon Cloud Drive, that appear to be ENTIRELY server side (drive issues due to "429" errors, odd outages, etc), and that we are STILL not approved for production status (despite sending off additional emails a month ago, requesting approval or at least an update), we have dropped support Amazon Cloud Drive. 

    This does not impact existing users, as you will still be able to mount and use your existing drives. However, we have blocked the ability to create new drives for Amazon Cloud Drive.   
     
    This was not a decision that we made lightly, and while we don't regret this decision, we are saddened by it. We would have loved to come to some sort of outcome that included keeping full support for Amazon Cloud Drive. 
    -Christopher (May 17, 2017)
  4. Like
    Alex got a reaction from Antoineki in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  5. Like
    Alex got a reaction from Minaleque in Forum Downtime   
    Our forum, wiki and blog web sites experienced an issue with the database server that caused those sites to be down for the past 34 hours. The issue has been resolved and everything is back up and running. I'm sorry for the inconvenience.
     
    StableBit.com, the download server, and software activation services were not affected.
  6. Like
    Alex got a reaction from KiaraEvirm in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  7. Like
    Alex reacted to Christopher (Drashna) in Amazon Cloud Drive - Why is it not supported?   
    To be honest, it's not really a high priority as we want to get a stable version of StableBit CloudDrive out sooner rather than later. And fighting with Amazon Cloud Drive may take weeks, months or longer. 
     
    However, the "Chunk 0" issue is new, due to some of the backend changes. And Alex is aware of it, and looking into it. 
    Edit: Fixed in the 1.0.0.459 build: http://dl.covecube.com/CloudDriveWindows/beta/download/
     
     
    But again, we really do recommend against using the Amazon Cloud Drive provider right now. As we've said, it's just not reliable or stable. 
  8. Like
    Alex reacted to Christopher (Drashna) in Originality and Sci Fi   
    Cool!
    And, just add onto my already long list of things to read.
  9. Like
    Alex got a reaction from steffenmand in Question about loading multiple files   
    Technically speaking, StableBit CloudDrive doesn't really deal with files directly. It actually emulates a virtual disk for the Operating System, and other applications or the Operating System can then use that disk for whatever purpose that suits them.
     
    Normally, a file system kernel driver (such as NTFS) will be mounted on the disk and will provide file based access. So when you say that you're accessing 2 files on the cloud drive, you're actually dealing with the file system (typically NTFS). The file system then translates your file-based requests (e.g. read 10248 bytes at offset 0 of file \Test.bin) into disk-based requests (e.g. read 10248 bytes at position 12,447,510 on disk #4).
     
    But you may ask, well, eventually these file-based requests must be scheduled by something, so who prioritizes which requests get serviced first?
     
    Yes, and here's how that works:
    The I/O pipeline in the kernel is inherently asynchronous. In other words, multiple I/O request can be "in-flight" at the same time. Scheduling of disk-based I/O requests occurs right before our virtual disk, by the NT disk subsystem (multiple drivers involved). It determines which requests get serviced first by using the concept of I/O priorities. To read more about this topic (probably much more than you'd ever want to know) see: 
    https://www.microsoftpressstore.com/articles/article.aspx?p=2201309&seqNum=3
     
    Skip to the I/O Prioritization section for the relevant information. You can also read about the concept of bandwidth reservation which deals with ensuring smooth media playback.
  10. Like
    Alex got a reaction from danfer in The Roadmap   
    As of 1/28/2014, in an effort to streamline the development process and to centralize the management of planned feature requests I've moved the current development status to our development wiki.
     
    You can access it here:
    http://wiki.covecube.com/Development_Status
     
    In addition to implementing new features we now have a separate system for identifying and tracking existing issues. See the wiki page for a bit more on that.
  11. Like
    Alex got a reaction from Ginoliggime in SSD Optimizer Balancing Plugin   
    I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.
     
    I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.
     
    The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.
     
    The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.
     
    Check out the attached screenshot of what it looks like
     
    Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt
    Download: http://stablebit.com/DrivePool/Plugins
     
    Edit: Now up on stablebit.com, link updated.

  12. Like
    Alex got a reaction from Shane in check-pool-fileparts   
    If you're not familiar with dpcmd.exe, it's the command line interface to StableBit DrivePool's low level file system and was originally designed for troubleshooting the pool. It's a standalone EXE that's included with every installation of StableBit DrivePool 2.X and is available from the command line.
     
    If you have StableBit DrivePool 2.X installed, go ahead and open up the Command Prompt with administrative access (hold Ctrl + Shift from the Start menu), and type in dpcmd to get some usage information.
     
    Previously, I didn't recommend that people mess with this command because it wasn't really meant for public consumption. But the latest internal build of StableBit DrivePool, 2.2.0.659, includes a completely rewritten dpcmd.exe which now has some more useful functions for more advanced users of StableBit DrivePool, and I'd like to talk about some of these here.
     
    Let's start with the new check-pool-fileparts command.
     
    This command can be used to:
    Check the duplication consistency of every file on the pool and show you any inconsistencies. Report any inconsistencies found to StableBit DrivePool for corrective actions. Generate detailed audit logs, including the exact locations where each file part is stored of each file on the pool. Now let's see how this all works. The new dpcmd.exe includes detailed usage notes and examples for some of the more complicated commands like this one.
     
    To get help on this command type: dpcmd check-pool-fileparts
     
    Here's what you will get:
     
    dpcmd - StableBit DrivePool command line interface Version 2.2.0.659 The command 'check-pool-fileparts' requires at least 1 parameters. Usage: dpcmd check-pool-fileparts [parameter1 [parameter2 ...]] Command: check-pool-fileparts - Checks the file parts stored on the pool for consistency. Parameters: poolPath - A path to a directory or a file on the pool. detailLevel - Detail level to output (0 to 4). (optional) isRecursive - Is this a recursive listing? (TRUE / false) (optional) Detail levels: 0 - Summary 1 - Also show directory duplication status 2 - Also show inconsistent file duplication details, if any (default) 3 - Also show all file duplication details 4 - Also show all file part details Examples: - Perform a duplication check over the entire pool, show any inconsistencies, and inform StableBit DrivePool >dpcmd check-pool-fileparts P:\ - Perform a full duplication check and output all file details to a log file >dpcmd check-pool-fileparts P:\ 3 > Check-Pool-FileParts.log - Perform a full duplication check and just show a summary >dpcmd check-pool-fileparts P:\ 0 - Perform a check on a specific directory and its sub-directories >dpcmd check-pool-fileparts P:\MyFolder - Perform a check on a specific directory and NOT its sub-directories >dpcmd check-pool-fileparts "P:\MyFolder\Specific Folder To Check" 2 false - Perform a check on one specific file >dpcmd check-pool-fileparts "P:\MyFolder\File To Check.exe" The above help text includes some concrete examples on how to use this commands for various scenarios. To perform a basic check of an entire pool and get a summary back, you would simply type:
    dpcmd check-pool-fileparts P:\
     
    This will scan your entire pool and make sure that the correct number of file parts exist for each file. At the end of the scan you will get a summary:
    Scanning...   ! Error: Can't get duplication information for '\\?\p:\System Volume Information\storageconfiguration.xml'. Access is denied   Summary:   Directories: 3,758   Files: 47,507 3.71 TB (4,077,933,565,417   File parts: 48,240 3.83 TB (4,214,331,221,046     * Inconsistent directories: 0   * Inconsistent files: 0   * Missing file parts: 0 0 B (0     ! Error reading directories: 0   ! Error reading files: 1 Any inconsistent files will be reported here, and any scan errors will be as well. For example, in this case I can't scan the System Volume Information folder because as an Administrator, I don't have the proper access to do that (LOCAL SYSTEM does).
     
    Another great use for this command is actually something that has been requested often, and that is the ability to generate audit logs. People want to be absolutely sure that each file on their pool is properly duplicated, and they want to know exactly where it's stored. This is where the maximum detail level of this command comes in handy:
    dpcmd check-pool-fileparts P:\ 4
     
    This will show you how many copies are stored of each file on your pool, and where they're stored.
     
    The output looks something like this:
    Detail level: File Parts   Listing types:     + Directory   - File   -> File part   * Inconsistent duplication   ! Error   Listing format:     [{0}/{1} IM] {2}     {0} - The number of file parts that were found for this file / directory.     {1} - The expected duplication count for this file / directory.     I   - This directory is inheriting its duplication count from its parent.     M   - At least one sub-directory may have a different duplication count.     {2} - The name and size of this file / directory.   ... + [3x/2x] p:\Media -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media [Device 0] -> \Device\HarddiskVolume3\PoolPart.6a76681a-3600-4af1-b877-a31815b868c8\Media [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media [Device 2] - [2x/2x] p:\Media\commandN Episode 123.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 123.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 123.mov [Device 2] - [2x/2x] p:\Media\commandN Episode 124.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 124.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 124.mov [Device 2] ... The listing format and listing types are explained at the top, and then for each folder and file on the pool, a record like the above one is generated.
     
    Of course like any command output, it could always be piped into a log file like so:
    dpcmd check-pool-fileparts P:\ 4 > check-pool-fileparts.log
     
    I'm sure with a bit of scripting, people will be able to generate daily audit logs of their pool
     
    Now this is essentially the first version of this command, so if you have an idea on how to improve it, please let us know.
     
    Also, check out set-duplication-recursive. It lets you set the duplication count on multiple folders at once using a file pattern rule (or a regular expression). It's pretty cool.
     
    That's all for now.
  13. Like
    Alex reacted to Christopher (Drashna) in Amazon Cloud Drive, near future.?   
    Specifically, scales better than Amazon's able to accommodate for (because they're not throttling on the server side properly).  
  14. Like
    Alex got a reaction from jaynew in Large Chunks and the I/O Manager   
    In this post I'm going to talk about the new large storage chunk support in StableBit CloudDrive 1.0.0.421 BETA, why that's important, and how StableBit CloudDrive manages provider I/O overall.
     
    Want to Download it?
     
    Currently, StableBit CloudDrive 1.0.0.421 BETA is an internal development build and like any of our internal builds, if you'd like, you can download it (or most likely a newer build) here:
    http://wiki.covecube.com/Downloads
     
    The I/O Manager
     
    Before I start talking about chunks and what the current change actually means, let's talk a bit about how StableBit CloudDrive handles provider I/O. Well, first let's define what provider I/O actually is. Provider I/O is the combination of all of the read and write request (or download and upload requests) that are serviced by your provider of choice. For example, if your cloud drive is storing data in Amazon S3, provider I/O consists of all of the download and upload requests from and to Amazon S3.
     
    Now it's important to differentiate provider I/O from cloud drive I/O because provider I/O is not really the same thing as cloud drive I/O. That's because all I/O to and from the drive itself is performed directly in the kernel by our cloud drive driver (cloudfs_disk.sys). But as a result of some cloud drive I/O, provider I/O can be generated. For example, this happens when there is an incoming read request to the drive for some data that is not stored in the local cache. In this case, the kernel driver cooperatively coordinates with the StableBit CloudDrive system service in order generate provider I/O and to complete the incoming read request in a timely manner.
     
    All provider I/O is serviced by the I/O Manager, which lives in the StableBit CloudDrive system service.
     
    Particularly, the I/O Manager is responsible for:
    As an optimization, coalescing incoming provider read and write I/O requests into larger requests. Parallelizing all provider read and write I/O requests using multiple threads. Retrying failed provider I/O operations. Error handling and error reporting logic. Chunks
     
    Now that I've described a little bit about the I/O manager in StableBit CloudDrive, let's talk chunks. StableBit CloudDrive doesn't inherently work with any types of chunks. They are simply the format in which data is stored by your provider of choice. They are an implementation that exists completely outside of the I/O manager, and provide some convenient functions that all chunked providers can use.
     
    How do Chunks Work?
     
    When a chunked cloud provider is first initialized, it is asked about its capabilities, such as whether it can perform partial reads, whether the remote server is performing proper multi-threaded transactional synchronization, etc... In other words, the chunking system needs to know how advanced the provider is, and based on those capabilities it will construct a custom chunk I/O processing pipeline for that particular provider.
     
    The chunk I/O pipeline provides automatic services for the provider such as:
    Whole and partial caching of chunks for performance reasons. Performing checksum insertion on write, and checksum verification on read. Read or write (or both) transactional locking for cloud providers that require it (for example, never try to read chunk 458 when chunk 458 is being written to). Translation of I/O that would end up being a partial chunk read / write request into a whole chunk read / write request for providers that require this. This is actually very complicated. If a partial chunk needs to be read, and the provider doesn't support partial reads, the whole chunk is read (and possibly cached) and only the part needed is returned. If a partial chunk needs to be written, and the provider doesn't support partial writes, then the whole chunk is downloaded (or retrieved from the cache), only the part that needs to be written to is updated, and the whole chunk is written back.If while this is happening another partial write request comes in for the same chunk (in parallel, on a different thread), and we're still in the process of reading that whole chunk, then coalesce the [whole read -> partial write -> whole write] into [whole read -> multiple partial writes -> whole write]. This is purely done as an optimization and is also very complicated. And in the future the chunk I/O processing pipeline can be extended to support other services as the need arises. Large Chunk Support
     
    Speaking of extending the chunk I/O pipeline, that's exactly what happened recently with the addition of large chunk support (> 1 MB) for most cloud providers.
     
    Previously, most cloud providers were limited to a maximum chunk size of 1 MB. This limit was in place because:
    Cloud drive read I/O requests, which can't be satisfied by the local cache, would generate provider read I/O that needed to be satisfied fairly quickly. For providers that didn't support partial reads, this meant that the entire chunk needed to be downloaded at all times, no matter how much data was being read. Additionally, if checksumming was enabled (which would be typical), then by necessity, only whole chunks could be read and written. This had some disadvantages, mostly for users with fast broadband connections:
    Writing a lot of data to the provider would generate a lot of upload requests very quickly (one request per 1 MB uploaded). This wasn't optimal because each request would add some overhead. Generating a lot of upload requests very quickly was also an issue for some cloud providers that were limiting their users based on the number of requests per second, rather than the total bandwidth used. Using smaller chunks with a fast broadband connection and a lot of threads would generate a lot of requests per second. Now, with large chunk support (up to 100 MB per chunk in most cases), we don't have those disadvantages.
     
    What was changed to allow for large chunks?
    In order to support the new large chunks a provider has to support partial reads. That's because it's still very necessary to ensure that all cloud drive read I/O is serviced quickly. Support for a new block based checksumming algorithm was introduced into the chunk I/O pipeline. With this new algorithm it's no longer necessary to read or write whole chunks in order to get checksumming support. This was crucial because it is very important to verify that your data in the cloud is not corrupted, and turning off checksumming wasn't a very good option. Are there any disadvantages to large chunks?
    If you're concerned about using the least possible amount of bandwidth (as opposed to using less provider calls per second), it may be advantageous to use smaller chunks. If you know for sure that you will be storing relatively small files (1-2 MB per file or less) and you will only be updating a few files at a time, there may be less overhead when you use smaller chunks. For providers that don't support partial writes (most cloud providers), larger chunks are more optimal if most of your files are > 1 MB, or if you will be updating a lot of smaller files all at the same time. As far as checksumming, the new algorithm completely supersedes the old one and is enabled by default on all new cloud drives. It really has no disadvantages over the old algorithm. Older cloud drives will continue to use the old checksumming algorithm, which really only has the disadvantage of not supporting large chunks. Which providers currently support large chunks?
    I don't want to post a list here because it would get obsolete as we add more providers. When you create a new cloud drive, look under Advanced Settings. If you see a storage chunk size > 1 MB available, then that provider supports large chunks.  
    Going Chunk-less in the Future?
     
    I should mention that StableBit CloudDrive doesn't actually require chunks. If it at all makes sense for a particular provider, it really doesn't have to store its data in chunks. For example, it's entirely possible that StableBit CloudDrive will have a provider that stores its data in a VHDX file. Sure, why not. For that matter, StableBit CloudDrive providers don't even need to store their data in files at all. I can imagine writing providers that can store their data on an email server or a NNTP server (a bit of a stretch, yes, but theoretically possible).
     
    In fact, the only thing that StableBit CloudDrive really needs is some system that can save data and later retrieve it (in a timely manner). In that sense, StableBit CloudDrive can be a general purpose drive virtualization solution.
  15. Like
    Alex got a reaction from Kraevin in +1 for GoogleDrive for Work support   
    Ok, I have some good news and more good news regarding the Google Drive provider.
     
    I've been playing around with the Google Drive provider and it had some issues which were not related to OAuth at all. In fact, the problem was that Google was returning 403 Forbidden HTTP status codes when there were too many requests sent. By default StableBit CloudDrive treats all 403 codes as an OAuth token failure. But Google being Google has a very thorough documentation page explaining all of the various error codes, what they mean and how to handle them.
     
    Pretty awesome: https://developers.google.com/drive/web/handle-errors
     
    So I've gone ahead and implemented all of that information into the Google Drive provider.
     
    Another really neat thing that Google does for their APIs is not hide the limits.
     
    Check this out:

    That's absolute heaven for a developer. They even have a link to apply for a higher quota, how cool is that? (Are you listening Amazon?)
     
    And I can even edit the per user limit myself!
     
    So having fixed the 403 bug mentioned above, StableBit CloudDrive started working pretty well. However, it was bumping into that 10 requests / second limit. But this wasn't a huge problem because Google was kind enough to encourage exponential backoff (they suggested that very thing in the docs, and even provided the exact formula to use).
     
    Having fixed Google Drive support I started thinking... Hmm... Google doesn't seem to care about how much bandwidth you're using, all they care about is the number of requests. So how awesome would it be if I implemented chunks larger than 1 MB. This could dramatically lower the number of requests per second and at the same time increase overall throughput. I know I said in the past that I wouldn't do that, but Google being so upfront with their limits, I decided to give it a shot.
     
    And I implemented it, and it works. But there's one caveat. Chunks > 1 MB cannot have checksumming enabled. Because in order to verify a checksum of a chunk, the entire thing needs to be downloaded, and in order to make large chunks work I had to enable partial chunk downloading. There may be a way around this if we change our checksumming algorithm to checksum every 1MB, regardless of the actual chunk size. But that's not what it does now.
     
    Download the latest internal BETA (1.0.0.410): http://dl.covecube.com/CloudDriveWindows/beta/download/
     
    Please note that this is very much an experimental build and I haven't stress tested it at all, so I have no idea how it will behave under load yet. I've started uploading 900 GB right now using 10 MB chunks, and we'll see how that goes.
  16. Like
    Alex reacted to Christopher (Drashna) in Amazon Cloud Drive - Why is it not supported?   
    They did respond, but it's ... well more of "developer" stuff. However, the outlook is good. (basically, it looks like the calls per second are within acceptable values).  
     
    It looks like Alex hasn't had a chance to respond (they replied on Wednesday... so really bad timing, basically). But hopefully, the ball is actually rolling!
  17. Like
    Alex reacted to Christopher (Drashna) in Amazon Glacier for Archival purposes?   
    Nope. 
     
    The problem is that there is a 4+ hour wait period to be able to access the data that we've uploaded. That basically makes it unusable for us. Both because file systems like to recheck data, and because it would make the upload verification impossible.
     
    We have already looked into it, and discovered this. Sorry.
  18. Like
    Alex reacted to Christopher (Drashna) in RSS feed for CloudDrive forum   
    Fixed.
     
     
    You can also "follow" the forum (button at top) to get email notifications of topics.
  19. Like
    Alex reacted to Christopher (Drashna) in NetDrive now having same issue as CloudDrive?   
    It's  not quite the same as what happened to us. 
     
    However, it does confirm that Amazon is absolutely having infrastructure issues with Amazon Cloud Drive. 
    Basically, too much traffic generated and their network and database infrastructures are struggling to deal with all the demand. 
  20. Like
    Alex reacted to Christopher (Drashna) in Amazon Cloud Drive - Why is it not supported?   
    The problem with faster internet connection is that it isn't that simple. Just because you may have gigabit internet doesn't mean that the server you're connecting to can support that speed.   
     
    But to summarize here, it's a complicated issue, and we do plan on re-evaluating in the "near future".
     
    But this won't affect the performance of the drive. Specifically, each chunk is accessed by a single thread. That means that more threads you use, the more of these chunks are accessed in parallel.  So, with enough threads, you could ABSOLUTELY saturate your bandwidth.
     
    Actually, this SPECIFICALLY was the issue with Amazon Cloud Drive... and what Alex means by "scales well".
     
     
    To clarify, we had at least one user was uploadind at 45MB/s (360mbps).  That's with ~45x1MB chunks at the same time.
     
    So you may be able to see why the 1MB size really isn't a bad limit here. It ensure responsiveness of the file system, even under heavy load.
     
     
     
     
     
    See above.  Larger chunk sizes aren't strictly necessary.  And we do plan on re-evaluating this issue in the future.  
     
     
     
     
    But currently, there isn't a way to get the checksum without downloading the entire chunk. Anything over 1MB will be using the partial chunk retrieval.   And changing this would require rewriting the providers.
     
     
    And there is still the upload verification option.
  21. Thanks
    Alex got a reaction from Jaga in check-pool-fileparts   
    If you're not familiar with dpcmd.exe, it's the command line interface to StableBit DrivePool's low level file system and was originally designed for troubleshooting the pool. It's a standalone EXE that's included with every installation of StableBit DrivePool 2.X and is available from the command line.
     
    If you have StableBit DrivePool 2.X installed, go ahead and open up the Command Prompt with administrative access (hold Ctrl + Shift from the Start menu), and type in dpcmd to get some usage information.
     
    Previously, I didn't recommend that people mess with this command because it wasn't really meant for public consumption. But the latest internal build of StableBit DrivePool, 2.2.0.659, includes a completely rewritten dpcmd.exe which now has some more useful functions for more advanced users of StableBit DrivePool, and I'd like to talk about some of these here.
     
    Let's start with the new check-pool-fileparts command.
     
    This command can be used to:
    Check the duplication consistency of every file on the pool and show you any inconsistencies. Report any inconsistencies found to StableBit DrivePool for corrective actions. Generate detailed audit logs, including the exact locations where each file part is stored of each file on the pool. Now let's see how this all works. The new dpcmd.exe includes detailed usage notes and examples for some of the more complicated commands like this one.
     
    To get help on this command type: dpcmd check-pool-fileparts
     
    Here's what you will get:
     
    dpcmd - StableBit DrivePool command line interface Version 2.2.0.659 The command 'check-pool-fileparts' requires at least 1 parameters. Usage: dpcmd check-pool-fileparts [parameter1 [parameter2 ...]] Command: check-pool-fileparts - Checks the file parts stored on the pool for consistency. Parameters: poolPath - A path to a directory or a file on the pool. detailLevel - Detail level to output (0 to 4). (optional) isRecursive - Is this a recursive listing? (TRUE / false) (optional) Detail levels: 0 - Summary 1 - Also show directory duplication status 2 - Also show inconsistent file duplication details, if any (default) 3 - Also show all file duplication details 4 - Also show all file part details Examples: - Perform a duplication check over the entire pool, show any inconsistencies, and inform StableBit DrivePool >dpcmd check-pool-fileparts P:\ - Perform a full duplication check and output all file details to a log file >dpcmd check-pool-fileparts P:\ 3 > Check-Pool-FileParts.log - Perform a full duplication check and just show a summary >dpcmd check-pool-fileparts P:\ 0 - Perform a check on a specific directory and its sub-directories >dpcmd check-pool-fileparts P:\MyFolder - Perform a check on a specific directory and NOT its sub-directories >dpcmd check-pool-fileparts "P:\MyFolder\Specific Folder To Check" 2 false - Perform a check on one specific file >dpcmd check-pool-fileparts "P:\MyFolder\File To Check.exe" The above help text includes some concrete examples on how to use this commands for various scenarios. To perform a basic check of an entire pool and get a summary back, you would simply type:
    dpcmd check-pool-fileparts P:\
     
    This will scan your entire pool and make sure that the correct number of file parts exist for each file. At the end of the scan you will get a summary:
    Scanning...   ! Error: Can't get duplication information for '\\?\p:\System Volume Information\storageconfiguration.xml'. Access is denied   Summary:   Directories: 3,758   Files: 47,507 3.71 TB (4,077,933,565,417   File parts: 48,240 3.83 TB (4,214,331,221,046     * Inconsistent directories: 0   * Inconsistent files: 0   * Missing file parts: 0 0 B (0     ! Error reading directories: 0   ! Error reading files: 1 Any inconsistent files will be reported here, and any scan errors will be as well. For example, in this case I can't scan the System Volume Information folder because as an Administrator, I don't have the proper access to do that (LOCAL SYSTEM does).
     
    Another great use for this command is actually something that has been requested often, and that is the ability to generate audit logs. People want to be absolutely sure that each file on their pool is properly duplicated, and they want to know exactly where it's stored. This is where the maximum detail level of this command comes in handy:
    dpcmd check-pool-fileparts P:\ 4
     
    This will show you how many copies are stored of each file on your pool, and where they're stored.
     
    The output looks something like this:
    Detail level: File Parts   Listing types:     + Directory   - File   -> File part   * Inconsistent duplication   ! Error   Listing format:     [{0}/{1} IM] {2}     {0} - The number of file parts that were found for this file / directory.     {1} - The expected duplication count for this file / directory.     I   - This directory is inheriting its duplication count from its parent.     M   - At least one sub-directory may have a different duplication count.     {2} - The name and size of this file / directory.   ... + [3x/2x] p:\Media -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media [Device 0] -> \Device\HarddiskVolume3\PoolPart.6a76681a-3600-4af1-b877-a31815b868c8\Media [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media [Device 2] - [2x/2x] p:\Media\commandN Episode 123.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 123.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 123.mov [Device 2] - [2x/2x] p:\Media\commandN Episode 124.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 124.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 124.mov [Device 2] ... The listing format and listing types are explained at the top, and then for each folder and file on the pool, a record like the above one is generated.
     
    Of course like any command output, it could always be piped into a log file like so:
    dpcmd check-pool-fileparts P:\ 4 > check-pool-fileparts.log
     
    I'm sure with a bit of scripting, people will be able to generate daily audit logs of their pool
     
    Now this is essentially the first version of this command, so if you have an idea on how to improve it, please let us know.
     
    Also, check out set-duplication-recursive. It lets you set the duplication count on multiple folders at once using a file pattern rule (or a regular expression). It's pretty cool.
     
    That's all for now.
  22. Like
    Alex got a reaction from CosmicPuppy in check-pool-fileparts   
    If you're not familiar with dpcmd.exe, it's the command line interface to StableBit DrivePool's low level file system and was originally designed for troubleshooting the pool. It's a standalone EXE that's included with every installation of StableBit DrivePool 2.X and is available from the command line.
     
    If you have StableBit DrivePool 2.X installed, go ahead and open up the Command Prompt with administrative access (hold Ctrl + Shift from the Start menu), and type in dpcmd to get some usage information.
     
    Previously, I didn't recommend that people mess with this command because it wasn't really meant for public consumption. But the latest internal build of StableBit DrivePool, 2.2.0.659, includes a completely rewritten dpcmd.exe which now has some more useful functions for more advanced users of StableBit DrivePool, and I'd like to talk about some of these here.
     
    Let's start with the new check-pool-fileparts command.
     
    This command can be used to:
    Check the duplication consistency of every file on the pool and show you any inconsistencies. Report any inconsistencies found to StableBit DrivePool for corrective actions. Generate detailed audit logs, including the exact locations where each file part is stored of each file on the pool. Now let's see how this all works. The new dpcmd.exe includes detailed usage notes and examples for some of the more complicated commands like this one.
     
    To get help on this command type: dpcmd check-pool-fileparts
     
    Here's what you will get:
     
    dpcmd - StableBit DrivePool command line interface Version 2.2.0.659 The command 'check-pool-fileparts' requires at least 1 parameters. Usage: dpcmd check-pool-fileparts [parameter1 [parameter2 ...]] Command: check-pool-fileparts - Checks the file parts stored on the pool for consistency. Parameters: poolPath - A path to a directory or a file on the pool. detailLevel - Detail level to output (0 to 4). (optional) isRecursive - Is this a recursive listing? (TRUE / false) (optional) Detail levels: 0 - Summary 1 - Also show directory duplication status 2 - Also show inconsistent file duplication details, if any (default) 3 - Also show all file duplication details 4 - Also show all file part details Examples: - Perform a duplication check over the entire pool, show any inconsistencies, and inform StableBit DrivePool >dpcmd check-pool-fileparts P:\ - Perform a full duplication check and output all file details to a log file >dpcmd check-pool-fileparts P:\ 3 > Check-Pool-FileParts.log - Perform a full duplication check and just show a summary >dpcmd check-pool-fileparts P:\ 0 - Perform a check on a specific directory and its sub-directories >dpcmd check-pool-fileparts P:\MyFolder - Perform a check on a specific directory and NOT its sub-directories >dpcmd check-pool-fileparts "P:\MyFolder\Specific Folder To Check" 2 false - Perform a check on one specific file >dpcmd check-pool-fileparts "P:\MyFolder\File To Check.exe" The above help text includes some concrete examples on how to use this commands for various scenarios. To perform a basic check of an entire pool and get a summary back, you would simply type:
    dpcmd check-pool-fileparts P:\
     
    This will scan your entire pool and make sure that the correct number of file parts exist for each file. At the end of the scan you will get a summary:
    Scanning...   ! Error: Can't get duplication information for '\\?\p:\System Volume Information\storageconfiguration.xml'. Access is denied   Summary:   Directories: 3,758   Files: 47,507 3.71 TB (4,077,933,565,417   File parts: 48,240 3.83 TB (4,214,331,221,046     * Inconsistent directories: 0   * Inconsistent files: 0   * Missing file parts: 0 0 B (0     ! Error reading directories: 0   ! Error reading files: 1 Any inconsistent files will be reported here, and any scan errors will be as well. For example, in this case I can't scan the System Volume Information folder because as an Administrator, I don't have the proper access to do that (LOCAL SYSTEM does).
     
    Another great use for this command is actually something that has been requested often, and that is the ability to generate audit logs. People want to be absolutely sure that each file on their pool is properly duplicated, and they want to know exactly where it's stored. This is where the maximum detail level of this command comes in handy:
    dpcmd check-pool-fileparts P:\ 4
     
    This will show you how many copies are stored of each file on your pool, and where they're stored.
     
    The output looks something like this:
    Detail level: File Parts   Listing types:     + Directory   - File   -> File part   * Inconsistent duplication   ! Error   Listing format:     [{0}/{1} IM] {2}     {0} - The number of file parts that were found for this file / directory.     {1} - The expected duplication count for this file / directory.     I   - This directory is inheriting its duplication count from its parent.     M   - At least one sub-directory may have a different duplication count.     {2} - The name and size of this file / directory.   ... + [3x/2x] p:\Media -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media [Device 0] -> \Device\HarddiskVolume3\PoolPart.6a76681a-3600-4af1-b877-a31815b868c8\Media [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media [Device 2] - [2x/2x] p:\Media\commandN Episode 123.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 123.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 123.mov [Device 2] - [2x/2x] p:\Media\commandN Episode 124.mov (80.3 MB - 84,178,119 -> \Device\HarddiskVolume2\PoolPart.5823dcd3-485d-47bf-8cfa-4bc09ffca40e\Media\commandN Episode 124.mov [Device 0] -> \Device\HarddiskVolume8\PoolPart.d1033a47-69ef-453a-9fb4-337ec00b1451\Media\commandN Episode 124.mov [Device 2] ... The listing format and listing types are explained at the top, and then for each folder and file on the pool, a record like the above one is generated.
     
    Of course like any command output, it could always be piped into a log file like so:
    dpcmd check-pool-fileparts P:\ 4 > check-pool-fileparts.log
     
    I'm sure with a bit of scripting, people will be able to generate daily audit logs of their pool
     
    Now this is essentially the first version of this command, so if you have an idea on how to improve it, please let us know.
     
    Also, check out set-duplication-recursive. It lets you set the duplication count on multiple folders at once using a file pattern rule (or a regular expression). It's pretty cool.
     
    That's all for now.
  23. Like
    Alex got a reaction from Christopher (Drashna) in +1 for GoogleDrive for Work support   
    Yeah, it's basically just a flip of a switch if this works. All providers use a common infrastructure for chunking.
  24. Like
    Alex got a reaction from Christopher (Drashna) in Local cache data   
    I called this "Full Round Trip Encryption", for a lack of a better term, and it was very much a core part of the original design.
     
    Specifically, what this means is that any byte that gets written to an encrypted cloud drive, is never stored in its unencrypted form on any permanent storage medium (either locally or in the cloud).
  25. Like
    Alex got a reaction from Talyrius in Mount Existing Files   
    When I said unique, I didn't mean that we're unique at offering encryption for the cloud, or mounting your cloud data as a drive. Surely that's been done by other products.
     
    And by the way, I'm not trying to convince you to use StableBit CloudDrive, in fact, I'm saying the exact opposite. If you want to share your files using the tools that your cloud provider offers (such as web access), StableBit CloudDrive simply can't do what you're asking.
     
    I'm simply trying to explain the motivation behind the design.
     
    Think about TrueCrypt. When you create a TrueCrypt container and mount that as a drive, and then copy some files into that encrypted drive, do those files get stored on some file system somewhere in an encrypted form? Does the encrypted data retain its folder structure? No, the files get stored in an encrypted container, which is a large random blob of data.
     
    You can think of StableBit CloudDrive as TrueCrypt for the cloud, in the sense that the files that you copy into a cloud drive get stored in random blobs in the cloud. They don't retain any structure.
     
    (I know, we're not open source, but that's the best analogy that I can come up with)
     
    Now you may ask, why do this? Why not simply encrypt the files and upload them to the cloud as is?
     
    Because the question that StableBit CloudDrive is trying to answer is, how can we build the best full drive encryption product for the cloud? And once you're encrypting your data, any services offered by your cloud provider, such as web access, become impossible, regardless of whether you're uploading files or unstructured blobs of data.
     
    So really, once you're encrypting, it doesn't matter which approach you use. Either way, encrypted data is opaque.
     
    The reasons that I chose the opaque block approach for StableBit CloudDrive are:
    You get full random data access, regardless of whether the provider supports "range" (partial file) requests or not. 
    Say for example that you upload a 4GB file to the cloud (or maybe even a 50GB file or larger). This file perhaps may be a database file, or a movie file. Now say that you now try to open this file from your cloud drive, with something like a movie player. And the movie player need to read some bytes from the end of the file before it can start playback.
     
    If this file is an actual file in the cloud, uploaded to the cloud as one single contiguous chunk (not like what StableBit CloudDrive does), how long do you have to wait until the playback starts? Your cloud drive would need to download most of the file before that read request can be satisfied. In other words, you may have to wait a long while.
     
    With the block based approach (which StableBit CloudDrive does use), we calculate the block that we need to download, download it, read that part of the block that the movie player is requesting, and we're done. Very quick.
    For StableBit CloudDrive, the cloud provider doesn't need to support any kind of structured file system. All we need is key/value pair storage. In that sense, StableBit CloudDrive is definitely geared more towards enterprise-grade providers such as Amazon S3, Google Cloud Storage, Microsoft Azure, and other similar providers. Partial file caching becomes a lot simpler. Say for example that you've got a huge database file, and a few tables of that file are accessed frequently. StableBit CloudDrive can easily recognize that some parts of that file need to be stored locally. It's not aware of file access, it's simply looking at the entire drive and caching locally the most frequently accessed areas. So those frequently accessed tables will be cached locally, even if they're not stored contiguously. And lastly, because the file based approach has already been done over and over again. Why make a me too product? 
    With StableBit CloudDrive, it is my hope to offer a different approach to encrypted cloud storage.
     
    I don't know if we've nailed it, but there it is, let the market decide. It'll succeed if it should.
×
×
  • Create New...