Jump to content

womegunite

Members
  • Posts

    9
  • Joined

  • Last visited

Posts posted by womegunite

  1. I'm a little confused about about something, and hoping you all can help me better understand what CD should have been doing versus what is was doing.

    Edit 1: I'm on 1 Gbps fiber, and tested to theoretical maximums.

    First off, my setup (using bullets to show the hierarchy):

    • Cloud Drives
      • I have 2 CloudDrives:
        • CloudDrive-GDrive-Account1 (1PB)
        • CloudDrive-GDrive-Account2 (1PB)
      • Both have the following specs:
        • Local Cache is 50 GB
        • Download Threads = 10
        • Upload threads = 5
        • Background IO enabled
        • Prefetching
          • Trigger = 20BM
          • Forward = 175 MB
          • Window = 10 S
        • Data Duplication = On
        • Pinning = directories and metadata
    • Drivepool; Duplication x3 (should duplicate across all 3 of the below drives).
      • Drivepool-Cloud-GDrive-All; balanced using SSD Optimized w/ 20GB NVME marked as SSD, and Ordered Placement to fill up each Archive Drive in order from 00 to 01, etc. 
        • Drivepool-Cloud-GDrive-00; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
          • CloudDrive-GDrive-Account1-Partition-00 (50TB)
          • CloudDrive-GDrive-Account2-Partition-00 (50TB)
        • Drivepool-Cloud-GDrive-01; balanced using Disk Space Equalizer (by free space remaining); No automatic rebalancing. Purpose here is to just to do best-effort splitting of files across the accounts.
          • CloudDrive-GDrive-Account1-Partition-01 (50TB)
          • CloudDrive-GDrive-Account2-Partition-01 (50TB)
      • NVME-Local-1.5TB
      • USB-HDD-5TB

    The problem I had is this. I had a directory (browser) with 10's of thousands of 1KB files, but the space taken was only a few MB. I understand CloudDrive to be block-based storage Given my setup, I would have expected these files to cover at most 8 blocks (2 from each account if the files crossed a block-range, then duplicated). I would have expected CloudDrive to analyze the files needed to be uploaded, intelligently determine the blocks that needed to be downloaded, update the blocks, then reupload them. This should, IMO, only have taken a few minutes.

    Instead, what I witnessed was hours (probably 8-10) of CloudDrive appearing to continuously download and upload blocks. I did look at the details and could tell that CD was trying to queue up changes in the same block, but I have no idea why it took so long. I could see this being a problem if the data were continuously changing, but it wasn't. It was all transferred through the DrivePool in a few minutes, and was sitting on the SSD drive.

    I need to know what should have happened, and it may help me re-architect my setup, if needed.

  2. I have a complicated DrivePool nesting that include my local SSD, CloudDrives, and some USB HDDs. I have always noticed in the past, when accessing a USB HDD directly, that explorer will hang until a cold USB HDD has gone through he power-up and handshake phases, and then response is normal.

    I still get the same experience on my DrivePool if my USB HDDs are cold. I'd expect this phenomenon if I were reading/writing a file directly. If DrivePool is pinning metadata, which I assume includes the MFTs, why would this be the case? What am I missing?

  3. Following some, of what I can figure to be best practices found here on the forum, I have decided to create a CloudDrive for each GSuite account that I have. Each CD will have multiple 50 TB partitions. Those partitions will be pooled into their DrivePools (no letter) for each account, and then those DPs will be pooled into a single letter'd DrivePool.

    My question is around the optimal settings for the CloudDrives. I intend to use the drives for a variety of content, including large media like movies and TV shows, as well as small media like documents, and really anything in-between. I found some optimal Plex settings that an admin posted on here (disable background IO, min download size = 20 MB, Pre Trigger = 20 MB, Pre Forward = 175 MB, Pre time = 10 seconds).

    • Will these settings also work for regular access to small files?
    • Are there more optimal settings I should change to?
    • Would it just be better to have two sets of CDs to support large and small files? I don't really want to do this, because the overhead of management is significant, but I can if the benefits are significant.

    I'm on Gigabit fiber with a good provider.

  4. 18 minutes ago, Christopher (Drashna) said:

    Yes.  A service account can only see data it created.  Since we use multiple accounts to reduce load, the next time you authorize the drive, you would likely see nothing.   Not exactly good behavior! 

    Also, normal access to the files is blocked. There are cases that we need to grab the "-METADATA" file, and this would be impossible, as well (from what I understand)

     

    I can probably test the first part. I'll use the same service account for two different apps. Cloud berry, and probably rclone. If I can see data from both apps then the first part is not a problem, right?

    I'm not quite sure what you mean in the second part. How would I be able see this file normally, so I can try to replicate with service account?

  5. 1 hour ago, Christopher (Drashna) said:

    From here: https://developers.google.com/drive/v3/web/appdata

    We use a pool of keys.  That means that you would ABSOLUTELY have to get the same App ID again.  This means it would be a gamble if your drive shows up, at all.  

    So no, we couldn't do that.  At best, we may be able to add support for a custom path in the future. 

    Alright. Custom Path would be the best route then, with support for Team Drives (rclone can recognize them, so I know the API is there). So long as the number of potential files stays under 100k, then a Team Drive would hide that activity from my main account. For the crypto-malware issue, I'll just have to use another software that supports service accounts, and back-up the files in the Cloud-Drive (could do a direct copy of the CD files, I suppose).

     

    Are you sure the service account can't be used?

  6. Here is the Google Cloud doc about service accounts: https://cloud.google.com/iam/docs/service-accounts?hl=en_US.

    Anyone with a G Suite account, and therefore a Drive account, I believe can create a project in GCP and create a Service Account, however I know that anyone with at least the $10/month version of G Suite can do this. The Service Account is then given Drive API access. The Service Account will then share quota with the other members, but for all intents has its own Drive. Nothing the SA does shows up in anyone else's account, unless purposefully made to manipulate other accounts. I can't confirm it, but I'm pretty sure it has its own per-user API limit, as any other user does. Furthermore, it would allow me to separate out accounts for different uses and exposure, in case crypto-malware were to hit. I'm using this tactic with Cloudberry Backup to sync to a Service Account all on its own, while still allowing me to expose the main Drive to the PC without fear that malware would ruin everything.

  7. I want to hide CloudDrive's activity from the rest of my GDrive. There are 2 ways to do this. Have the app use the special hidden AppData folder within GDrive, or to use a Service Account. I have created Service Accounts for use with Cloudberry Backup, and it works great. Can CloudDrive be manually set-up to do this, or would this have to be a feature request?

×
×
  • Create New...