Jump to content

Firerouge

Members
  • Posts

    31
  • Joined

  • Days Won

    1

Posts posted by Firerouge

  1. Can this be revaluated. With workspace drives now being limited by google, team drives are the only google option that has no sized based quota.

     

    Large chunk sizes could help mitigate the 400k files limit. With 100mb chunks, 40TB team drives may be possible.

  2. On 11/29/2017 at 1:01 PM, Kraevin said:

    Just to add, while the team drive does sound like a good idea, i do not see it really being possible, as a team drive has a limit of 100k files per drive. I could see hitting that pretty quickly with the way cloudrive uses chunks.

     

    On 10/1/2019 at 3:09 PM, Christopher (Drashna) said:

    Unfortunately, no, there really won't be an update on this.  Google Team drive is limited on the number of files that can be hosted. And that limits the size.

    That would be ~10TB normally, or 5TB if duplication is enabled for the provider. 

    Additionally, because it's a shared drive, it's a potential issue too. 

    With the new hierarchical chunk organization, shouldn't this now be technically possible?

  3. 3 hours ago, JulesTop said:

    I would not delete the cache. I have a 270TB drive with 55TB used and on 1307 I am upgrading at about 10% per 24 hour period. Which is faster than before.

    I would stay the course if I were you.

    Whoa, that's way slower than I expected. You're seeing only about 5.5TB migrated per day!

    What sort of system specs, or resource consumption are you seeing, does it seem bottlenecked by anything other than Google's concurrency limit?

  4. It's certainly possible but cloud hosting can be prohibitively expensive if you intend to get a system capable of hardware transcoding (a computational must for more than one or two high resolution streams) along with the bandwidth capacity.

    Furthermore, you'll probably want to look at providers who has locality to your client streaming locations.

     

    It's also important, since you mention using a (sketchy) seebox host, that you don't attempt to download torrents directly into your cloud drive. You will almost certainly fragment the filesystem and nullify the capabilities of the prefetcher.

     

    But fundamentally the cloud drive migration is as simple as unmounting from one location and remounting in another.

  5. New beta looks to have major improvements to the migration process, make sure you're on it before reporting any additional bugs or doing something crazy like delete local cache.

    .1307
    * Added detailed logging to the Google Drive migration process that is enabled by default.
    * Redesigned the Google Drive migration process to be quicker in most cases:
        - For drives that have not run into the 500,000 files per folder limit, the upgrade will be nearly instantaneous.
        - Is able to resume from where the old migration left off.
    * [Issue #28410] Output a descriptive Warning to the log when a storage provider's data organization upgrade fails.

     

  6. I'm still holding off patiently on the conversion, it sounds like it works, but waiting to get a better idea at the of the time it takes by drive data size.

     

    I've noticed that without any changed settings these past few days I've gotten a couple yellow I/O error warnings about user upload rate limit exceeded (which otherwise haven't been problems), and I've noticed gdrive side upload throttling at a lower than normal concurrency, only 4 workers at 70mbit.

     

    I'm guessing some of these rate limit errors people may be seeing in converting are transient from gdrive being under high load.

  7. I'm guessing this latest beta changelog is referencing the solution to this

    .1305
    * Added a progress indicator when performing drive upgrades.
    * [Issue #28394] Implemented a migration process for Google Drive cloud drives to hierarchical chunk organization:
        - Large drives with > 490,000 chunks will be automatically migrated.
            - Can be disabled by setting GoogleDrive_UpgradeChunkOrganizationForLargeDrives to false.
        - Any drive can be migrated by setting GoogleDrive_ForceUpgradeChunkOrganization to true.
        - The number of concurrent requests to use when migrating can be set with GoogleDrive_ConcurrentRequestCount (defaults to 10).
        - Migration can be interrupted (e.g. system shutdown) and will resume from where it left off on the next mount.
        - Once a drive is migrated (or in progress), an older version of StableBit CloudDrive cannot be used to access it.
    * [Issue #28394] All new Google Drive cloud drives will use hierarchical chunk organization with a limit of no more than 100,000 children per folder.

    Some questions, seeing as the limit appears to be around 500,000, is there an option to set the new hierarchical chunk organization folder limit to something higher than 100,000?

    Has anyone performed the migration yet, what is the approximate time it takes to transfer a 500,000 chunk drive to the new format? Seeing as there are concurrency limit options, does the process also entail a large amount of upload or download bandwidth?

    After migrating, is there any performance difference compared to the prior non hierarchical chunk organization?

    Edit: if the chunk limit is 500,000, and if chunks are 20Mb, shouldn't this be occurring on all drives over 10Tb in size?

    Note, I haven't actually experienced this issue and I have a few large drives under my own api key, so it may be a very slow rollout or an A/B test.

  8. It's worth mentioning that in low disk space scenarios, the drive will also stop writing entirely.

    With about 3GBs of space left on the cache hosting disk (with expandable cache set to minimal) it will entirely disable upload IO. This is independant of upload size, so for example, with 3GBs of space on the cache drive left, you'll still be unable to upload a 700MB file.

    Upload IO is also significantly slowed in the range of only 4-6GBs of space on the cache hosting drive.


    This is worth noting, as it can lead to scenarios where you're trying to move files off the cache hosting drive into the cloud drive, but be unable to make more room for the cache.

  9. I actually think I know what I was observing.

    It would seem that if the cache hosting drive nears (or perhaps hits) it's full capacity, the entirety of the cache appears to get wiped.

    This is probably intended behavior, so I've simply set the cache to a smaller size, which seems to more or less resolve the issue.

  10. Is there a way to set the cache cleanup/expiration time to be higher or infinite?

     

    Essentially, I have a large expandable cache set, but with time the cache shrinks as it automatically removes data, presumedly if it's not accessed soon/often enough.

     

    I'd like the cache to remain at it's maximum level until I see fit to clear it myself, or there after no longer any files in the drive to cache.

     

    Is this possible? Perhaps with one of the other cache modes besides expandable?

  11. I caught it happening on 861. As you can see a 2+ minute read operation on one chunk...

     

    18WUnqi.png

     

    I've attempted to trace this (I didn't get the start, but It should have recorded this read). Upon completion of the connection, it jumped back up to its normal one to two dozen parallel write operations (most in a variant of the SharedWait state)

     

    I'll hopefully be switching to a faster transit VPS shortly, in an effort to disprove network misconfiguration as the cause.

     

    I realize also, that this is in part a limitation in the program utilizing the clouddrive, as it seems to wait until all (or most) of the burst of operations complete before starting the next wave, so even a relatively slow 20 second read can have blocking implications on additional writes. However, a fast fix for the worst offenders (multi minute connections) would be quite beneficial.

  12. It's even a drop down text box.  But this is a common issue, so maybe we need to change that? 

     

    I too have noticed this is a common user oversight in the current design.

     

     

    If I can make a suggestion, I think the windows 7 screen resolution slider (sadly now gone) is a decent case study of how this can be cleanly implemented, by listing only the extremes and common middle options. Obviously using a slider has limitations to fine granularity, so for users not inclined to max out sliders, the box should still be type-able.

    I suspect a majority of users would fall into one of these common drive sizes 1, 10, 50, 100, 256, 512GB, or 1, 10, 100, or, 256TB. Probably most heavily dictated by available storage options from each provider.

  13. The settings you're trying to mess with don't do what you think.

     

    You want "IoManager_HttpConnectionTimeoutMS" here, as that is closer to what you want. 

    That said, could you enable drive tracing and reproduce the issue? 

    http://wiki.covecube.com/StableBit_CloudDrive_Drive_Tracing

     

    I just now noticed that setting in the wiki as well, it isn't listed in the default config, I'm going to experiment with some variations of that setting as a solution.

     

     

    As for recording it, I've only ever noticed it twice, and that was just luck of glancing at the technical log at the right time and noticing that it had dropped to one single read request, which upon a detailed look, showed the slow speed and the 1min+ connection timer. I'll try and create logs, but I might not have much luck locking it down.

     

     

     

     

    Minor side point while I have your attention, a brand new windows VPS running almost exclusively 854+rtorrent+rclone rarely has unexpected reboots during peak disk i/o. The problem seems to be described in issue 27416, ostensibly fixed a month ago, but in a seemingly unreleased version 858. Can we expect a new RC soon? The issue tracker seems to imply internally you're already past version 859 even.

  14. Remember that a fixed cache will also throttle all write requests once the cache is full too. So you'll only be able to write to the drive at the speed of the upstream connection. If you're using a torrent client to interact directly with the drive that could slow everything down overall.

     

    That's true, but so will a flexible cache, which queues up writes ontop of the existing cache, and if the cache drive itself gets within 6GB of being full it'll throttle. Where as the fixed queue will shrink the existing cache until it's all a write queue, before throttling.

    My cache is 15GB smaller than the 60GB free SSD it is on, so for a flexible cache I'd only get about 9GBs of queued writes before throttling, where as the fixed queue can dedicate all 45GB of the cache to writing (at the loss of all other cached torrent data), before throttling.

     

    Better still, since that initial preallocation write queue has replaced a portion of the cache (where as flexible doesn't necessarily retain any of the recent write queue in cache after uploading) downloads usually are immediately faster, as they'll modify more zeroed chunks straight from local cache.

  15. I should add that the fixed cache type is another setting directly benefiting torrenting.

     

    From the CoveCube blog "Overall, the fixed cache is optimized for accessing recently written data over the most frequently accessed data."

     

    A new torrent is likely to have the majority of seeding requests, so fixed is the best cache if you're continually downloading new torrents. Plus I prefer the predictable size of the drive cache when performing a large file preallocation.

  16. Most every read operation finishes so quickly that it's almost impossible to even see the connection speeds for them in the log.

     

    Occasionally, maybe one read per 100gigs, I'll get an incredibly slow read operation download. Occasionally taking over a minute to download the 20MB chunk (longest I've seen was a minute 50), with speeds around 200-500kb/s.

     

    These slow reads tend to block other operations for the program I'm using. This is pretty bad.

     

    To try and circumvent this. I edited the IoManager_ReadAbort line in advanced settings, down from 1:55, to :30 seconds.

    However, this command doesn't work as expected, instead of aborting the read and retrying, if a connection exceeds this timeframe, it actually disconnects the drive (unmounts it), and presents the retry and reauthorize options in clouddrive UI. Retry will always reconnect it right away, but this doesn't solve the errant slow connection issue.

     

     

    I believe the IoManger_ReadAbort would be better suited if it actually just reattempted the read connection on a timeout, instead of assuming a full provider failure.

     

    With that in mind, I propose that if IoManager_ReadAbort is triggered it should utilize the IoManager_ReadRetries variable to attempt a specified number of reconnects.

     

    Alternatively, a new flag, IoManager_MinSustainedReadSpeed (defined in kb/s) could be implemented, to specifically retry connections with very slow read speeds, which would likely detect and rectify these connections quicker than waiting for a timeout period before retrying.

  17. If your minimum download is set higher it should not be able to download only 1MB at a time. Mine is set to 10MB, for example. It simply cannot download less than a full chunk on that particular drive at a time. That's one of the reasons that I find your hashing prefetcher settings to be a bit redundant. If you're not prefetching any more than one chunk at a time, the minimum download could handle that setting all by itself.

     

    No disk usage should ever dismount your drive. That indicates other problems. Specifically, the drive dismounts because of read/write errors to the provider. If it's happening during heavy usage it's probably related to timeouts from your system I/O. Adjust the relevant setting in the config file in your CloudDrive directory and see if that helps. See my guide here, near the bottom, for specifics: https://www.reddit.com/r/PleX/comments/61ppfi/stablebit_clouddrive_plex_and_you_a_guide/

     

    And I agree, it wasn't a controlled test by any means, other tools were using the drive at the time I tried to defrag it. I haven't given it a second attempt. I needed to create a new disk anyway, and preallocate eliminates the fragmentation problem.

     

    Similar point with minimum download, my initial drive configuration had a 1MB min, my new one uses 5, which hopefully should perform better (fewer API requests as well).

     

    Hopefully final builds better guide users on setting these, or ideally configures more dynamically by need.

     

    Speaking of which, the any other tips from the advanced config? LocalIo_ReleaseHandlesDelayMS particularly looks interesting.

  18. Right. So, what's important is not the 1MB part, but the 1MB in relation to the time window you've set. YOUR setup will only prefetch if 1MB of data is requested in less than 3 seconds. That's a pretty big request, particularly for a torrent client--where many downloads are still measured in the KB/sec range. But you say you have different settings for seeding, so I guess that's fine. I honestly think I would just disable the prefetcher for hashing files. I'm not sure if it really adds anything there.

    Don't disable it for hashing, watching the technical log shows that (atleast rtorrent) hashes files by requesting them 1MB at a time. And only the next meg after the previous read finishes. Furthermore, each 1MB request shows a download speed, implying each meg from the CD chunk is being downloaded independently. Hashing rates skyrocket with the prefetch settings I've used vs no prefetcher.

     

     

    In any case I think you're both dramatically overestimating the importance of file data being stored in sequential chunks, and underestimating the intelligence of the CloudDrive prefetch algorithms. I think you're making assumptions about the nature of the prefetcher that may not be true, though, until documentation is completed we can probably only speculate.

     

    For what it's worth, you can defragment a CloudDrive--if you just want to eliminate the problem altogether. 

    One thing I'm certain on is that the prefetcher currently only queries subsequent chunk numbers. This is obvious from the technical logs as well. It has some clever logic for existing cached blocks, but it does not find the next chunk number for a file, simply the next chunk in the CD. In my experience, the prefetcher will never prefetch the correct file if it does not use chunks numbered sequentially.

     

    Though likely only Alex could give us the definite answer on how it works at the moment.

     

    I actually tried windows disk defrag, but for me it caused the drive to disconnect on version 854 during the analyze step.

  19. Your prefetcher settings are probably too conservative depending on what you're trying to accomplish.

     

    Mine are set up this way because I want CloudDrive to start pulling down content to the local disks when someone starts actively downloading one of my seeds. As such, I want it to respond at a much lower rate than 1MB in 3 secs because many people download from my seeds much slower than that.

     

    My config setting is different for seeding (longer time window, more data fetched than one block if using a sequential drive) what I gave was my hashing config. 

    Since we both have 1MB triggers, we both should cache after the client loads the first meg to give to the peer, you are correct that a longer wait time (while having more false positives) will allow for prefetching blocks to slower peer connections.

     

    But that impact seems minimal, particularly on scrambled drives, the minimum download size should result in caching slightly more than you need with each read, and if connection speed are really slow CD probably isn't the bottleneck.

     

     

    I also see no reason for you to limit yourself to a single CD chunk of prefetch unless your storage situation is so dire that you simply do not have any overhead to spare.

     

    The limit to a single CD chunk is since, if the files are nonsequential, the next chunk will contain a totally different and useless file. More is better only if data was preallocated, or written to CD after a full download locally first.

     

    Moral of the story, always preallocate. There are still significant additional improvements that could be made to the cache and prefetcher to better support this type of usage (as mentioned in my earlier post)

  20. My seed drive can hash FAR more than 20GB/day. Now I'm just wondering about your settings. What are they?

     

    I recently changed quite a few settings, that have greatly improved performance. rtorrent downloads about twice as fast now

     

    Overall two settings are crucial:

    1. Having the torrent client preallocating files (so that chunks are sequential). This solves many problems, specifically the prefetcher not fetching useful chunks
    2. Optimal prefetch settings, the breakdown of this is:
    • 1MB prefetch trigger = the size the torrent client attempts to hash at a time
    • 20MB read ahead = the provider block size the fs was setup with (might want this 1MB lower, as this actually flows into the next chunk, or possibly the exact torrent client chunk size of the file you're hashing) This can (should) be higher if you know a torrent to have it's data stored in sequential chunks in the clouddrive, but if that is not the case, the additional prefetched data will not be useful.
    • 3 second trigger = roughly the longest any read request should take. You want this low enough that apps trickle accessing data don't have enough time to read the trigger amount and cause a useless prefetch, but high enough that the hash check has time to read the 1MB. 1 second works as well for me

     

    The remainder of my settings are the established optimal plex settings, same as yours in all other ways, except 5MB min download and different thread counts.

     

    The solution really shouldn't depend on users reconfiguring the cache depending on whatever scenario they're undergoing though. This should be an easy change to the cache. If you see a chunk being read 1MB at a time, maybe you should just automatically cache all full chunks following those initial 1MB automatically. Even better if the prefetcher was file boundary and chunk placement aware, and could pull the files next chunk even if it wasn't the sequentially next one.

  21. You're talking about the inarguable issues that it has downloading torrent content. I'm talking about long-term storage for seeding for months or even years after a download. Many old torrents are rarely downloaded and poorly seeded. A CloudDrive, particularly paired with one of the unlimited cloud providers, can host that content essentially indefinitely. 

     

    You've hit exactly upon my goal. Long term seeding without long term storage costs. I'm trying to perform that feat from within the same instance that downloads, important as the drive can only be mounted on any one PC at a time. This instance is a high bandwidth unmetered, but tightly SSD capacity capped VPS, both of these are important as you'll see later.

     

    The crux of the problem I'm trying to get addressed here is, performing hash checks on torrents sent straight to the drive from a torrent client (an important distinction from files placed on a clouddrive which were first downloaded entierly), does not work.

     

    Incomplete downloads are not an issue, since seeds are already downloaded. Hash checks take time, but like any server-based storage solution, proper management can minimize if not eliminate the need for them once they're hashed once.

    Ultimately, though, if you just want a drive to sit there and store content and seed it to your trackers CloudDrive works just fine.

     

    The fact that hashing straight from the clouddrive takes about a day per 20Gb with 4 cores means no amount of infrastructure optimization or upgrades will make it scale. Particularly since most clients also restrict the hashing process to one torrent at a time.

     

    This actually has one important implication on your simple solution

     

    The simple solution is obviously to just download to a local drive, upload the completed content to your CloudDrive, hash it (once), and seed from there forever more. 

     

    This is the ideal pipeline, since quantity of torrents is limited by bandwidth emptying the upload queue
    add_torrent -> (download->upload_que)(upload->clouddrive) -> seed
    The only time data is written to local disk is when placing it in the clouddrive upload queue. This allows for torrents larger than the entirety of local storage capacity.
     
     
     
    While this is fine and dandy, hashing may be required to be performed, either from fastresume corruptions, or unscheduled shutdowns. This simply can't be done on an online copy. So a new process must be performed every time hashing is required:
    torrent_needing_checking -> (download->local) -> client_hash -> (upload->clouddrive or delete&symlink) -> seed
    

    This requires the entirety of the torrent to be stored in local disk space until the torrent is completely finished checking. That means that paralleling the first half of the process is storage limited, torrents larger than local capacity can't be fixed, and you are practically restricted to one or two parallel files being checked.

     

    This works alright as I've said before (see individual client problems in my post earlier), but if anything goes wrong during a download, or afterwards the process must be performed entirely anew for every faulty torrent, which is tedious to do manually, harder to do programmatically, and not compatible with my requirement of working with files larger than local storage.

     

    Thanks to stablebit's seamless nature in explorer, provided by cache and upload queing, files many times greater than local storage capacity should be simple to store and use directly from the cloud, but the hashing problem makes any direct torrenting (especially large ones) impractical for anything beyond very lite home usage.

     

    This style of usage (to my knowledge) can only be done with stablebit or by fusing an rclone mount with a caching system, which is yet to be heavily documented, and is still under development for windows.

  22. In any case, CloudDrive DOES work for torrents. In particular, it makes a great drive to hold long-term seeds. The downside, as observed, is that hash checks and such will take a long time initially, but once that's completed you should notice few differences as long as your network speeds can accommodate the overhead for CloudDrive. 

     

    I'd say it's far from a great drive to torrent too, but in a pinch it works. To recap the state of the windows torrent clients.

     

    All hash impossibly slow (never let a torrent client close with a partial download, ever)

     

    Rtorrent hashs a tiny bit quicker, but has download speeds under half of what should be sustainable (probably cygwin overhead).

     

    qbitorrent will slowly get more and more disk overloads, before locking up at 100%  overload after a few hours.

     

    Vuze will download, pause while it flushes to disk, and writes an unusually large amount of extra data, overall probably the slowest client, though it seems to share the same minor hashing optimization as rtorrent.

     

    Transmission, uTorrent and Deluge all work fairly similarly to above, I haven't yet done detailed performance testing as they have a bad habits of crashing and needing hashing of even completed downloads.

     

     

     

    None of them properly utilize the upload que, as it will never exceed a small amount of queued operations

     

    I've come to being quite certain that the clients are causing severe fragmentation, that harms prefetching performance. For this reason, I believe the single most important setting any of these clients should have enabled is preallocate storage. This creates a considerable write que, but decreases out of sequence chunks and improves initial downloading speeds (file is cached already), use with caution, as starting multiple torrents simultaneously can be too much for the cache to handle

×
×
  • Create New...