Jump to content
  • 0

CloudDrive and Thousands of Tiny Files


womegunite

Question

I'm a little confused about about something, and hoping you all can help me better understand what CD should have been doing versus what is was doing.

Edit 1: I'm on 1 Gbps fiber, and tested to theoretical maximums.

First off, my setup (using bullets to show the hierarchy):

  • Cloud Drives
    • I have 2 CloudDrives:
      • CloudDrive-GDrive-Account1 (1PB)
      • CloudDrive-GDrive-Account2 (1PB)
    • Both have the following specs:
      • Local Cache is 50 GB
      • Download Threads = 10
      • Upload threads = 5
      • Background IO enabled
      • Prefetching
        • Trigger = 20BM
        • Forward = 175 MB
        • Window = 10 S
      • Data Duplication = On
      • Pinning = directories and metadata
  • Drivepool; Duplication x3 (should duplicate across all 3 of the below drives).
    • Drivepool-Cloud-GDrive-All; balanced using SSD Optimized w/ 20GB NVME marked as SSD, and Ordered Placement to fill up each Archive Drive in order from 00 to 01, etc. 
      • Drivepool-Cloud-GDrive-00; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
        • CloudDrive-GDrive-Account1-Partition-00 (50TB)
        • CloudDrive-GDrive-Account2-Partition-00 (50TB)
      • Drivepool-Cloud-GDrive-01; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
        • CloudDrive-GDrive-Account1-Partition-01 (50TB)
        • CloudDrive-GDrive-Account2-Partition-01 (50TB)
    • NVME-Local-1.5TB
    • USB-HDD-5TB

The problem I had is this. I had a directory (browser) with 10's of thousands of 1KB files, but the space taken was only a few MB. I understand CloudDrive to be block-based storage Given my setup, I would have expected these files to cover at most 8 blocks (2 from each account if the files crossed a block-range, then duplicated). I would have expected CloudDrive to analyze the files needed to be uploaded, intelligently determine the blocks that needed to be downloaded, update the blocks, then reupload them. This should, IMO, only have taken a few minutes.

Instead, what I witnessed was hours (probably 8-10) of CloudDrive appearing to continuously download and upload blocks. I did look at the details and could tell that CD was trying to queue up changes in the same block, but I have no idea why it took so long. I could see this being a problem if the data were continuously changing, but it wasn't. It was all transferred through the DrivePool in a few minutes, and was sitting on the SSD drive.

I need to know what should have happened, and it may help me re-architect my setup, if needed.

Link to comment
Share on other sites

0 answers to this question

Recommended Posts

There have been no answers to this question yet

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...