Jump to content

bblue

Members
  • Posts

    27
  • Joined

  • Last visited

  • Days Won

    1

bblue last won the day on February 6 2014

bblue had the most liked content!

bblue's Achievements

  1. No one with ideas or suggestions for drives that should be available for the pool not showing up? ??
  2. Hi, I've been using the above for at least two years with no issues. The main host is W8.1 Pro running a virtual host of WHS2011. There is a 16 drive pool on the host, and a four drive pool on the WHS2011 guest. These four drives are offline in the host so they can be assigned as physical drives in WHS, and that all works fine But recently I had two drives of four in a non-duplicating pool on WHS fail within days of each other. It was a hardware failure on one, and a big chunk of sectors in the other. WHS got all confused and couldn't do anything useful, and there was too much data loss to try and save anything so I started over after fixing WHS's problems. So drives F: and G: on WHS are new drives, J: and K: are the drives that have been there for the last couple of years. To be consistent, I did a quick format and drive letter assignment on all four drives. Now, in WHS all drives are empty and available, mounted correctly and all. But when DrivePool is run, it only shows F: and G: drives as available for the pool (non-pooled). I assigned them, no problem. But what could be causing the other two not to show up? All four look identical when viewed in the OS (with hidden files showing). And since they were all quick-formatted, they should be in an identical condition. I'm at a loss. I've gone ahead and started backups going on the 1/2 size pool I have, and can add the other two when I figure out the problem. Any assistance here? Is there some place I need to delete an entry for the second two because they were the active ones in the previous pool? Thanks for any info. --Bill
  3. Alex, I just read your analysis. Are those times shown in minutes? Or what? I'm not sure what they means, since drive throughput at certain times is the issue. I don't think you're quite duplicating the setup here wrt client computers, as your backups seem to be quite smaller, and do not take into account what happens when the global files I referenced start becoming huge and you can actually see the effects of the slow reads and writes specifically on those files. That's when much time is wasted in the total backup time, by those being referenced repeatedly for each cluster of blocks sent from the client to the whs server. That can only be seen after the total data on the whs client filesystem, has backed up up several systems that are fairly good sized. The global.dat file will be in excess of 40g, and the global file for each filesystem of the drive on each machine may be as much as half of that. Even in your test, though, by simply watching the Drivepool read/write activity, when the read and write indicators are both showing active on one filename, the throughput will be very very slow, 4-45MB/s but mostly in the low 20's or lower. In your test case since you aren't accumulating global data in the WHS backup hierarchy, and therefore the amount of time spent doing these read/writes is proportionately much smaller relative to the backup and if you're not watching DP you might not be aware of it. I've attached a directory listing of the client backup folder. I draw your attention to the size of the file GlobalCluster.4096.dat, at over 38GB. This file contains a summary of all backups of all drives on the client machines. And S-1-5-21-2138122482-2019343856-1460700813-1026.F.VolumeCluster.4096.dat at a little over 34GB. That is drive F: on machine 1026 (My DAW). You'll also see a number of multi-GB files of these types. These are what DP has speed problems with. If you watch DP and a network monitor for inbound traffic from the clients, you see that the actual data blocks come in quite rapidly, about interface speed, just below 125MB/s. After that, there is a r/w access first to the specific drive and machine VolumeCluster file, which is many minutes long at the slow speeds I have referenced. When that is done, there's a little housekeeping then r/w access to GlobalCluster.4096.dat, at the snails-pace speeds. This file in particular is hit after each cluster of filesystem updates from the client have been received. They take maybe 3-4 minutes, but this r/w operation can take up to 20 minutes! Each time! Work out the math, it's slower than slow and kills the backup speeds. During those times DP is showing what's going on and the speed at which it's going! Can't miss it. Even without large filesystems on individual machines, the GlobalCluster.4096.dat grows and grows with each daily backup cycle, and thus WHS slows down getting slower each time. If I run a series of backups on a non-DP pool drive, the speed is pretty decent even with the large file access. So the issue has to do with what WHS and DP are doing to achieve such slow speed when r/w'ing to large files. Though I thought I covered all this before, maybe it was too fragmented to understand, so I hope this explanation and the directory list is helpful. To put this in perspective to a single machine backup, my DAW would take just shy of 4 DAYS to backup the first time because of this speed issue. --Bill Client-Bak-Dir.zip
  4. Alex, Christopher, Any news on this issue? --Bill
  5. Right. Ok, running 481 and with FileBalance_BackgroundIO set to False, I'm not seeing any significant change in ON the pool IO. It could be ever so slightly higher, maybe 5-10%, but since I can't really AB the exact same scenarios I can't be sure. For the most part it's still sub 30MB/s on any of those operations. --Bill
  6. Yes, I updated to .467 some time ago, and will upgrade again (later today) to .481 after WHS gets done with its post-backup gyrations. Thanks Christopher, I will make the FileBalance_BackgroundIO change when I have the service down for the version upgrade to .481. --Bill
  7. Hi Alex, I've pretty much come to the conclusion that it's not balance or duplication i/o related. I'm not using duplication on this particular pool, and there seems to be very little (if any) background balancing ever occurring. The four drives in this pool are way out of balance from a usage standpoint, and very little seems to be done to change that. A visual guess at the usage numbers: Drive 1 (G) at 15-20% Drive 2 (H) at 55% Drive 3 (J) at 50% Drive 4 (K) at 1% (designated as a feeder) Balancing is set for once per day at 7pm, and balancers in order are: Archive Optimizer Scanner (it is disabled in services for now) Volume EQ Prevent Drive Overfill I believe these are at the default settings or very close. For testing, I had moved the large GlobalCluster.4096.dat to the feeder disk (K) but a couple of days later during normal operation of WHS it now appears on (G). So apparently the balancing is doing something, or WHS re-creates it during maintenance. But if WHS did it, wouldn't a new file be preferentially placed on the feeder (K)? Regarding file i/o priority, does DrivePool_BackgroundTasksVeryLowPriority - Sets the CPU priority of background tasks such as re-measuring, background duplication and re-balancing to IDLE (Windows Priority 1). Note that this setting does not affect I/O priority which is set separately. refer to CoveFs_BoostNetworkIoPriorityWrite - Boost the priority of write network I/O. Or something else? (which I can't find) Standard write from network I/O seems to be right at whatever the network speed is, or about 125MB/s in my case when the GiGe interface is maxed. Have you found out anything interesting from your tests? --Bill
  8. That makes sense, but in practice, at least in WHS2011 during backups, it doesn't seem to hold up. If I'm interpreting what happens in WHS in the backup phase correctly, the very time consuming tasks involve pretty large files that are somehow being appended to or inserted into from smaller files. I don't know how they are actually accomplishing this, but it's hard to imagine they are reading and writing to the entire file for that length of time, and ending up with a larger file of the same name. The sequences seem to be: 1. Read system and block information of the backed up computer 2. perform analysis for which blocks will be sent (have been changed on the computer) 3. Send the blocks as a cluster of individual files 4. One at a time, "append" those to <SID>1026.<drive>.VolumeCluster.4096.dat 4a. During this time the DP Disk Performance display shows the drive which hosts the above file being read and written to at the nominal rate of 25MB/s. 5. Then, either the same data or a summary of it are appended to GlobalCluster.4096.dat, which can be a very large file containing a summary of all the blocks backed up of all drives in all machines. Especially this file grows each day, and at the moment mine is about 35GB in size. 5a. During these long append times, DP Disk Performance display shows the drive hosting the Global file as being the sole file being read and written to. I believe that in steps 4a and 5a, besides being extremely slowly, that DP is somehow misrepresenting what is actually occurring, which is a read from one file and a write to another, on the same drive (usually). What is displayed in DP makes sense in most cases but these two. --Bill
  9. Haven't heard anything from StableBit yet, but have continued my testing. And rather that to rely on stats displayed by the Disk Performance display in DrivePool, which as Christopher suggests, only shows what is going through the DrivePool driver, I have taken to monitoring drive read/write speeds in the Windows Performance monitor software, which can display the data as updating text or different graph types. Independently of that, I have checked other possibilities by 1) disabling all balancing plugins (no difference) and 2) enabling one drive as a feeder (non duplicating pool) which is said to be faster overall. It may be under some circumstances, but to my issue is makes no difference. It seems that *any* writing that is controlled internally by DrivePool has a very limited throughput on the disk. If data comes in through the DrivePool filesystem, it's considerably higher in throughput and very close to the native capabilities of the hard drives. For example, typical write speed for functions occurring within DrivePool (a copy to same or different disks, shuffling files between drives as in balancing), seems to be limited to the range of 10MB/s to 45MB/s. Read speeds for the same operations can be up to three times that, but is typically around 60-65 MB/s. Native disk throughput for newer model SATA 6G drives and SATA III controllers is 250MB/s or more. It's the internal data rate that is causing the significant problems with WHS2011, but it's pretty slow for anything and is undoubtedly the cause for long balancing and duplication times. So now the question is if it's by design or bug? And in the case of a dedicated backup server, can it be made optional or variable? --Bill
  10. A suggestion, if possible. Could the code in Disk Performance be changed so that instead of showing one filename (unknown as to whether it is the read or write file, when you hover over the up arrow, it shows the read filename, and hovering over the down arrow shows the write filename? Could be very useful if practical in the code. --Bill
  11. I think I have a handle on what's going on with this. The reason it looks like a file is being copied is because it is, sorta. Actually one is being appended or inserted into another each time there is an exchange of data packets from the WSS client software. There are other data files transferred, but there are two of particular concern because they grow in size. The two are: GlobalCluster.4096.dat and S<ID #>-<backup client #>.<drive-letter>.VolumeCluster.4096.dat eg S-1-5-21-2138122482-2019343856-1460700813-1026.F.VolumeCluster.4096.dat This naming follows pretty much throughout the transfers and is unique for each machine and drive letter on that machine. As I look at the filesystem right now (backups are live) GlobalCluster.4096.dat is up to 16,844,800KB (16.8GB) and the other is at 12,277,760KB. At specific points in time after large transfers of data from the client, DP will go to its read/write mode on one drive, and when it's done, one or the other file will have increased in size. Both are increased during one full procedural cycle. Now, during these apparent read/write cycles, DP's throughput is between 9 and 25MB/s. Horribly slow. It seems that is the standard rate seen for any of these operations, balancing, duplication, etc. Which is probably why there have been many comments about the slowness. And what's worse, that during backup exchanges it blocks the progress of the data exchange from the client! So the minutes that we are dawdling around is taking precious time away from the data transfer. But it's only these particular types of operations that are slow. Receiving data directly into the pool can easily exceed 110MB/Sec from a network link. This behavior does not appear during a transfer which bypasses DP and goes straight to a drive. The operations are just so fast, the time spent for the read/write cycle is barely significant. Also, the effect is far less if a pool consists of just one drive. You start really noticing the slowdowns and blocking with two drives or more. I ran the trace mode for about 15 minutes to capture two or three full exchange cycles and will post 'bblue-Service.zip' as instructed in a few minutes. Hopefully it will help to find the bottleneck. A backup of my workstation, for example, is 3.44TB across four filesystems and this alone will take about 1.5 days solid to back up. Of course, subsequent backups are much less, but still... Oh, I upgraded DP to 2-BETA-467 just before this testing. It seems to behave the same as *432 in this regard. --Bill
  12. I'm performing some backups right now to a single 3T drive with no Drivepool in the loop at all. It's very fast, reaching and holding at GigE cable speeds (900Mbits or better). During this, the drive is writing in real time and more than keeping up. There are no long pauses of heavy drive activity and the blocking of network traffic. It looks like the WHS client software on the machine being backed up, writes to the net in typically 40G chunks, 4G at a time. It does it one right after another as *.new files. Those accumulate until there's a little burst of small control files and a commit.dat. Then it renames all the *.new's to *.dat's and waits for the client to send the next batch, which isn't more than a minute or so, depending on the filesystem. So the multi-minute pauses after each *.new file that I was seeing, appears to be caused by drivepool, or perhaps just the version I was using BETA 2*.432. Next test will be the same single drive in a pool of one and the BETA 2*.467 version of DP. --Bill
  13. You're saying what I'm saying, which is what I'd expect to happen. But when you watch the Disk Performance area under Disks in DrivePool UI, it clearly is doing a specific copy, with both an up and down arrow showing the same throughput, and that's always copying one of the file types I mentioned to the same drive. It will be the only drive whose LED is on 100%. As soon as that stops, all the network activity and normal multi-drive activity resumes. It's very strange. Whatever OS is connecting to the physical drives will manage them, virtual or not (if they are declared as physical drives). But that really isn't an issue because what they're managing is a data stream, just like any other drive. Yes, I do understand what Christopher is suggesting. Essentially, for a test, bypass the pool altogether and make the target a single drive. That's coming up. --Bill
  14. Thanks Christopher. I will do that over the weekend, along with another test I want to try. I'd have to ask the question that if WHS only knew about the drive pool as a unit (a single 'drive'), how would it know how to copy to the same physical drive it was reading from? I don't believe it could, thus my suspicion of another culprit. --Bill
×
×
  • Create New...