CloudDrive and Thousands of Tiny Files

womegunite · September 6, 2021

I'm a little confused about about something, and hoping you all can help me better understand what CD should have been doing versus what is was doing.

Edit 1: I'm on 1 Gbps fiber, and tested to theoretical maximums.

First off, my setup (using bullets to show the hierarchy):

Cloud Drives
- I have 2 CloudDrives:
  - CloudDrive-GDrive-Account1 (1PB)
  - CloudDrive-GDrive-Account2 (1PB)
- Both have the following specs:
  - Local Cache is 50 GB
  - Download Threads = 10
  - Upload threads = 5
  - Background IO enabled
  - Prefetching
    - Trigger = 20BM
    - Forward = 175 MB
    - Window = 10 S
  - Data Duplication = On
  - Pinning = directories and metadata

Drivepool; Duplication x3 (should duplicate across all 3 of the below drives).
- Drivepool-Cloud-GDrive-All; balanced using SSD Optimized w/ 20GB NVME marked as SSD, and Ordered Placement to fill up each Archive Drive in order from 00 to 01, etc.
  - Drivepool-Cloud-GDrive-00; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
    - CloudDrive-GDrive-Account1-Partition-00 (50TB)
    - CloudDrive-GDrive-Account2-Partition-00 (50TB)
  - Drivepool-Cloud-GDrive-01; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
    - CloudDrive-GDrive-Account1-Partition-01 (50TB)
    - CloudDrive-GDrive-Account2-Partition-01 (50TB)
- NVME-Local-1.5TB
- USB-HDD-5TB

The problem I had is this. I had a directory (browser) with 10's of thousands of 1KB files, but the space taken was only a few MB. I understand CloudDrive to be block-based storage Given my setup, I would have expected these files to cover at most 8 blocks (2 from each account if the files crossed a block-range, then duplicated). I would have expected CloudDrive to analyze the files needed to be uploaded, intelligently determine the blocks that needed to be downloaded, update the blocks, then reupload them. This should, IMO, only have taken a few minutes.

Instead, what I witnessed was hours (probably 8-10) of CloudDrive appearing to continuously download and upload blocks. I did look at the details and could tell that CD was trying to queue up changes in the same block, but I have no idea why it took so long. I could see this being a problem if the data were continuously changing, but it wasn't. It was all transferred through the DrivePool in a few minutes, and was sitting on the SSD drive.

I need to know what should have happened, and it may help me re-architect my setup, if needed.

Sign In

CloudDrive and Thousands of Tiny Files

Question

womegunite

Link to comment

Share on other sites

0 answers to this question

Recommended Posts

Join the conversation

Browse

Activity