Jump to content

Prioritizing chunk upload order?


Jonibhoni

Recommended Posts

Hi,

isn't there some priorization taking place under the hood when deciding about which chunk to upload first?

I just did few experiments with Google Cloud Storage and 100 MB chunk size, cache size 1 GB (initially empty except pinned metadata and folders), no prefetch, latest public CloudDrive:

a) pause upload
b) copy a 280 MB file to the cloud drive
c) resume upload

With this sequence, the whole plan of actions seems to be well defined before the actual transfer starts. So lot's of opportunity for CloudDrive for batching, queueing in order etc.

Observing the "Technical Details" window for the latest try, the actual provider I/O (in this order) was:

  • Chunk 3x1 Read 100MB because: "WholeChunkIoPartialMasterRead", length 72 MB
  • Chunk 3x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 72 MB
  • Chunk 4x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 80 MB
  • Chunk 10x1 Read 100MB because: "WholeChunkIoPartialMasterRead", length 4 kb, + 3 "WholeChunkIoPartialSharedWaitForRead" with few kb (4 kb, 4 kb, 8 kb)
  • Chunk 10x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 4 kb, + 3 "WholeChunkIoPartialSharedWaitForCompletion" with few kb (4 kb, 4 kb, 8 kb)
  • Chunk 0x1 Read 100MB because: "WholeChunkIoPartialMasterRead", length 4 kb
  • Chunk 0x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 4 kb
  • Chunk 4x1 Read 100MB because: "WholeChunkIoPartialMasterRead", length 23 MB
  • Chunk 4x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 23 MB
  • Chunk 5x1 Write 100MB, length 100 MB
  • Chunk 6x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 11 MB
  • Chunk 10x1 Read 100MB because: "WholeChunkIoPartialMasterRead", length 16 kb, + 4 "WholeChunkIoPartialSharedWaitForRead" with few kb (4 kb, 4 kb, 4 kb, 12 kb)
  • Chunk 10x1 Write 100MB because: "WholeChunkIoPartialMasterWrite", length 16 kb, + 4 "WholeChunkIoPartialSharedWaitForCompletion" with few kb (4 kb, 4 kb, 4 kb, 12 kb)

So my questions / suggestions / hints to things that shouldn't happen (?) in my opinion:

  1. The chunk 10x1 is obviously just for filesystem metadata or something; it's few kb, for which a chunk of 100 MB has to be downloaded and uploaded - so far so unavoidable (as described in here). Now the elephant in the room: Why is it downloaded and uploaded TWICE? The whole copy operation and all changes were clear from the beginning of the transmission (that's what I paused the upload for until copying completely finished). Ok, may be Windows thought to write some desktop.ini or stuff while CloudDrive was doing the work. But then why did it have to be read again and wasn't in the cache on the second read? Caching was enabled with enough space, also metadata pinning was enabled, so shouldn't it then be one of the first chunks to cache?.
  2. Why is chunk 4x1 uploaded TWICE (2 x 100MB) with 80 MB productive data the first time and 20 MB the second?! Isn't this an obviuous candidate for batching?
  3. If chunk 5x1 is known to be fully new data (100 MB actual WORTH of upload), why does it come after 3x1, 4x1 and 10x1, which were all only "partial" writes that needed the full chunk to be downloaded first, only to write the full chunk back with only a fraction of it actually being new data. Wouldn't it be more efficient to upload completely new chunks first? Especially the filessystem chunks (10x1 and 0x1 I'm looking at you) are very likely to change *very* often; so prioritizing them (with 2x99 MB wasted transfered bytes) over 100 MB of actual new data (e.g. in chunk 5x1) seems a bad decision for finishing the job fast, doesn't it? Also, each upload triggers a new file version of 100 MB at e.g. Google Cloud Storage, which get's billed (storage, early deletion charges, ops...) without any actual benefit for me.

So regarding network use (which is billed by cloud providers!):

  • Naive point of view: I want to upload 280 MB of productive data
  • Justified because of chunking etc.: 300 MB download (partial chunks 0x1, 3x1, 10x1) + 600 MB upload (4x1, 5x1, 6x1, 0x1, 3x1, 10x1)
  • Actually transfered in the end: 500 MB download + 800 MB upload. That's 66% resp. 33% more than needed?
Link to comment
Share on other sites

Would you mind opening a ticket about this?  
https://stablebit.com/contact/

This is definitely diving into the more technical aspects of the software, and I'm not as comfortable with how well I understand it, and would prefer to point Alex, the developer, to this discussion directly.  

However, I think that some of the "twice" is part of the upload verification process, which can be changed/disabled.   Also, the file system has duplicate blocks enabled for the storage, for extra redundancy in case of provider issues (*cough* google drive *cough*).  But it also sounds like this may not be related to that.  

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...