Jump to content

bblue

Members
  • Posts

    27
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by bblue

  1. No one with ideas or suggestions for drives that should be available for the pool not showing up? ??
  2. Hi, I've been using the above for at least two years with no issues. The main host is W8.1 Pro running a virtual host of WHS2011. There is a 16 drive pool on the host, and a four drive pool on the WHS2011 guest. These four drives are offline in the host so they can be assigned as physical drives in WHS, and that all works fine But recently I had two drives of four in a non-duplicating pool on WHS fail within days of each other. It was a hardware failure on one, and a big chunk of sectors in the other. WHS got all confused and couldn't do anything useful, and there was too much data loss to try and save anything so I started over after fixing WHS's problems. So drives F: and G: on WHS are new drives, J: and K: are the drives that have been there for the last couple of years. To be consistent, I did a quick format and drive letter assignment on all four drives. Now, in WHS all drives are empty and available, mounted correctly and all. But when DrivePool is run, it only shows F: and G: drives as available for the pool (non-pooled). I assigned them, no problem. But what could be causing the other two not to show up? All four look identical when viewed in the OS (with hidden files showing). And since they were all quick-formatted, they should be in an identical condition. I'm at a loss. I've gone ahead and started backups going on the 1/2 size pool I have, and can add the other two when I figure out the problem. Any assistance here? Is there some place I need to delete an entry for the second two because they were the active ones in the previous pool? Thanks for any info. --Bill
  3. Alex, I just read your analysis. Are those times shown in minutes? Or what? I'm not sure what they means, since drive throughput at certain times is the issue. I don't think you're quite duplicating the setup here wrt client computers, as your backups seem to be quite smaller, and do not take into account what happens when the global files I referenced start becoming huge and you can actually see the effects of the slow reads and writes specifically on those files. That's when much time is wasted in the total backup time, by those being referenced repeatedly for each cluster of blocks sent from the client to the whs server. That can only be seen after the total data on the whs client filesystem, has backed up up several systems that are fairly good sized. The global.dat file will be in excess of 40g, and the global file for each filesystem of the drive on each machine may be as much as half of that. Even in your test, though, by simply watching the Drivepool read/write activity, when the read and write indicators are both showing active on one filename, the throughput will be very very slow, 4-45MB/s but mostly in the low 20's or lower. In your test case since you aren't accumulating global data in the WHS backup hierarchy, and therefore the amount of time spent doing these read/writes is proportionately much smaller relative to the backup and if you're not watching DP you might not be aware of it. I've attached a directory listing of the client backup folder. I draw your attention to the size of the file GlobalCluster.4096.dat, at over 38GB. This file contains a summary of all backups of all drives on the client machines. And S-1-5-21-2138122482-2019343856-1460700813-1026.F.VolumeCluster.4096.dat at a little over 34GB. That is drive F: on machine 1026 (My DAW). You'll also see a number of multi-GB files of these types. These are what DP has speed problems with. If you watch DP and a network monitor for inbound traffic from the clients, you see that the actual data blocks come in quite rapidly, about interface speed, just below 125MB/s. After that, there is a r/w access first to the specific drive and machine VolumeCluster file, which is many minutes long at the slow speeds I have referenced. When that is done, there's a little housekeeping then r/w access to GlobalCluster.4096.dat, at the snails-pace speeds. This file in particular is hit after each cluster of filesystem updates from the client have been received. They take maybe 3-4 minutes, but this r/w operation can take up to 20 minutes! Each time! Work out the math, it's slower than slow and kills the backup speeds. During those times DP is showing what's going on and the speed at which it's going! Can't miss it. Even without large filesystems on individual machines, the GlobalCluster.4096.dat grows and grows with each daily backup cycle, and thus WHS slows down getting slower each time. If I run a series of backups on a non-DP pool drive, the speed is pretty decent even with the large file access. So the issue has to do with what WHS and DP are doing to achieve such slow speed when r/w'ing to large files. Though I thought I covered all this before, maybe it was too fragmented to understand, so I hope this explanation and the directory list is helpful. To put this in perspective to a single machine backup, my DAW would take just shy of 4 DAYS to backup the first time because of this speed issue. --Bill Client-Bak-Dir.zip
  4. Alex, Christopher, Any news on this issue? --Bill
  5. Right. Ok, running 481 and with FileBalance_BackgroundIO set to False, I'm not seeing any significant change in ON the pool IO. It could be ever so slightly higher, maybe 5-10%, but since I can't really AB the exact same scenarios I can't be sure. For the most part it's still sub 30MB/s on any of those operations. --Bill
  6. Yes, I updated to .467 some time ago, and will upgrade again (later today) to .481 after WHS gets done with its post-backup gyrations. Thanks Christopher, I will make the FileBalance_BackgroundIO change when I have the service down for the version upgrade to .481. --Bill
  7. Hi Alex, I've pretty much come to the conclusion that it's not balance or duplication i/o related. I'm not using duplication on this particular pool, and there seems to be very little (if any) background balancing ever occurring. The four drives in this pool are way out of balance from a usage standpoint, and very little seems to be done to change that. A visual guess at the usage numbers: Drive 1 (G) at 15-20% Drive 2 (H) at 55% Drive 3 (J) at 50% Drive 4 (K) at 1% (designated as a feeder) Balancing is set for once per day at 7pm, and balancers in order are: Archive Optimizer Scanner (it is disabled in services for now) Volume EQ Prevent Drive Overfill I believe these are at the default settings or very close. For testing, I had moved the large GlobalCluster.4096.dat to the feeder disk (K) but a couple of days later during normal operation of WHS it now appears on (G). So apparently the balancing is doing something, or WHS re-creates it during maintenance. But if WHS did it, wouldn't a new file be preferentially placed on the feeder (K)? Regarding file i/o priority, does DrivePool_BackgroundTasksVeryLowPriority - Sets the CPU priority of background tasks such as re-measuring, background duplication and re-balancing to IDLE (Windows Priority 1). Note that this setting does not affect I/O priority which is set separately. refer to CoveFs_BoostNetworkIoPriorityWrite - Boost the priority of write network I/O. Or something else? (which I can't find) Standard write from network I/O seems to be right at whatever the network speed is, or about 125MB/s in my case when the GiGe interface is maxed. Have you found out anything interesting from your tests? --Bill
  8. That makes sense, but in practice, at least in WHS2011 during backups, it doesn't seem to hold up. If I'm interpreting what happens in WHS in the backup phase correctly, the very time consuming tasks involve pretty large files that are somehow being appended to or inserted into from smaller files. I don't know how they are actually accomplishing this, but it's hard to imagine they are reading and writing to the entire file for that length of time, and ending up with a larger file of the same name. The sequences seem to be: 1. Read system and block information of the backed up computer 2. perform analysis for which blocks will be sent (have been changed on the computer) 3. Send the blocks as a cluster of individual files 4. One at a time, "append" those to <SID>1026.<drive>.VolumeCluster.4096.dat 4a. During this time the DP Disk Performance display shows the drive which hosts the above file being read and written to at the nominal rate of 25MB/s. 5. Then, either the same data or a summary of it are appended to GlobalCluster.4096.dat, which can be a very large file containing a summary of all the blocks backed up of all drives in all machines. Especially this file grows each day, and at the moment mine is about 35GB in size. 5a. During these long append times, DP Disk Performance display shows the drive hosting the Global file as being the sole file being read and written to. I believe that in steps 4a and 5a, besides being extremely slowly, that DP is somehow misrepresenting what is actually occurring, which is a read from one file and a write to another, on the same drive (usually). What is displayed in DP makes sense in most cases but these two. --Bill
  9. Haven't heard anything from StableBit yet, but have continued my testing. And rather that to rely on stats displayed by the Disk Performance display in DrivePool, which as Christopher suggests, only shows what is going through the DrivePool driver, I have taken to monitoring drive read/write speeds in the Windows Performance monitor software, which can display the data as updating text or different graph types. Independently of that, I have checked other possibilities by 1) disabling all balancing plugins (no difference) and 2) enabling one drive as a feeder (non duplicating pool) which is said to be faster overall. It may be under some circumstances, but to my issue is makes no difference. It seems that *any* writing that is controlled internally by DrivePool has a very limited throughput on the disk. If data comes in through the DrivePool filesystem, it's considerably higher in throughput and very close to the native capabilities of the hard drives. For example, typical write speed for functions occurring within DrivePool (a copy to same or different disks, shuffling files between drives as in balancing), seems to be limited to the range of 10MB/s to 45MB/s. Read speeds for the same operations can be up to three times that, but is typically around 60-65 MB/s. Native disk throughput for newer model SATA 6G drives and SATA III controllers is 250MB/s or more. It's the internal data rate that is causing the significant problems with WHS2011, but it's pretty slow for anything and is undoubtedly the cause for long balancing and duplication times. So now the question is if it's by design or bug? And in the case of a dedicated backup server, can it be made optional or variable? --Bill
  10. A suggestion, if possible. Could the code in Disk Performance be changed so that instead of showing one filename (unknown as to whether it is the read or write file, when you hover over the up arrow, it shows the read filename, and hovering over the down arrow shows the write filename? Could be very useful if practical in the code. --Bill
  11. I think I have a handle on what's going on with this. The reason it looks like a file is being copied is because it is, sorta. Actually one is being appended or inserted into another each time there is an exchange of data packets from the WSS client software. There are other data files transferred, but there are two of particular concern because they grow in size. The two are: GlobalCluster.4096.dat and S<ID #>-<backup client #>.<drive-letter>.VolumeCluster.4096.dat eg S-1-5-21-2138122482-2019343856-1460700813-1026.F.VolumeCluster.4096.dat This naming follows pretty much throughout the transfers and is unique for each machine and drive letter on that machine. As I look at the filesystem right now (backups are live) GlobalCluster.4096.dat is up to 16,844,800KB (16.8GB) and the other is at 12,277,760KB. At specific points in time after large transfers of data from the client, DP will go to its read/write mode on one drive, and when it's done, one or the other file will have increased in size. Both are increased during one full procedural cycle. Now, during these apparent read/write cycles, DP's throughput is between 9 and 25MB/s. Horribly slow. It seems that is the standard rate seen for any of these operations, balancing, duplication, etc. Which is probably why there have been many comments about the slowness. And what's worse, that during backup exchanges it blocks the progress of the data exchange from the client! So the minutes that we are dawdling around is taking precious time away from the data transfer. But it's only these particular types of operations that are slow. Receiving data directly into the pool can easily exceed 110MB/Sec from a network link. This behavior does not appear during a transfer which bypasses DP and goes straight to a drive. The operations are just so fast, the time spent for the read/write cycle is barely significant. Also, the effect is far less if a pool consists of just one drive. You start really noticing the slowdowns and blocking with two drives or more. I ran the trace mode for about 15 minutes to capture two or three full exchange cycles and will post 'bblue-Service.zip' as instructed in a few minutes. Hopefully it will help to find the bottleneck. A backup of my workstation, for example, is 3.44TB across four filesystems and this alone will take about 1.5 days solid to back up. Of course, subsequent backups are much less, but still... Oh, I upgraded DP to 2-BETA-467 just before this testing. It seems to behave the same as *432 in this regard. --Bill
  12. I'm performing some backups right now to a single 3T drive with no Drivepool in the loop at all. It's very fast, reaching and holding at GigE cable speeds (900Mbits or better). During this, the drive is writing in real time and more than keeping up. There are no long pauses of heavy drive activity and the blocking of network traffic. It looks like the WHS client software on the machine being backed up, writes to the net in typically 40G chunks, 4G at a time. It does it one right after another as *.new files. Those accumulate until there's a little burst of small control files and a commit.dat. Then it renames all the *.new's to *.dat's and waits for the client to send the next batch, which isn't more than a minute or so, depending on the filesystem. So the multi-minute pauses after each *.new file that I was seeing, appears to be caused by drivepool, or perhaps just the version I was using BETA 2*.432. Next test will be the same single drive in a pool of one and the BETA 2*.467 version of DP. --Bill
  13. You're saying what I'm saying, which is what I'd expect to happen. But when you watch the Disk Performance area under Disks in DrivePool UI, it clearly is doing a specific copy, with both an up and down arrow showing the same throughput, and that's always copying one of the file types I mentioned to the same drive. It will be the only drive whose LED is on 100%. As soon as that stops, all the network activity and normal multi-drive activity resumes. It's very strange. Whatever OS is connecting to the physical drives will manage them, virtual or not (if they are declared as physical drives). But that really isn't an issue because what they're managing is a data stream, just like any other drive. Yes, I do understand what Christopher is suggesting. Essentially, for a test, bypass the pool altogether and make the target a single drive. That's coming up. --Bill
  14. Thanks Christopher. I will do that over the weekend, along with another test I want to try. I'd have to ask the question that if WHS only knew about the drive pool as a unit (a single 'drive'), how would it know how to copy to the same physical drive it was reading from? I don't believe it could, thus my suspicion of another culprit. --Bill
  15. I've been wrestling with a problem that really has me baffled. I'm running W8.1 Pro as a host Hyper-V. On the host there's 10 4t drives with Drivepool and Scanner (currently disabled). There's one guest OS which is WHS2011 running 4 2T drives on native motherboard ports. They are offline in the host, and then allocated to WHS in the guest configuration, and become a part of a second Drivepool running on WHS. Like on the host, scanner is disabled. I'm using Drivepool Beta *.432 presently. The host is a quad core i7 with hyper threading disabled, and two cores allocated to the WHS guest. It runs around 3Ghz clock and very low CPU utilization on both host and guest. There's plenty of memory. Everything 'works', no errors no blowups, but during backups for other computers in the house, WHS takes prizes for slowness. It's probably 8x slower than my ancient WHSv1 system. The 4 drive pool is the target for Client backups. I've been studying just what is happening and noticed there are only occasional bursts of data being sent from the machine being backed up to WHS. They are a couple of minutes in length, and typically run in the area of 900Mbits over the net (seen at WHS). But they stop for sometimes as much as ten minutes before another burst. Since these machines are being backed up for the first time (on WHS2011) all blocks should be 'unbacked' and subject to being sent. The standard data blocks written to the drivepool are 4GB in size (seen in the directory), but there are occasional Global data files that could be 17GB or more. But during the time of no data from the backed up machine, hovering over the Disk Performance indicator, the drive which is showing active solid, is also showing both read and write activity of the same speed, and it is very very slow. Like around 15-20MBytes/second. Imagine how long it takes to copy a 4-20GB file at 20MB/second! As soon as that process is done, the data flow from the backed up computer immediately starts up again. Sometimes the file indicated for the drive enduring the slow transfer is *.new, *.tmp or *.dat. It appears to be copying these files to the same physical drive. It's as if Drivepool is converting what should be a file move or rename to a copy then delete. I can't see any reason for all these copies going on. When there is no drive running solid, and data is streaming in from the backed-up machine the drive activity data is upwards of 100MB/sec, so I don't think it's the drives themselves, though older ones would certainly be much slower than the latest generation. Is this a WHS issue, or a DrivePool issue? It makes using WHS2011 downright painful. I'm on my third day of backing up just one machine and not even half way done! I don't have problems writing my own data from a remote host to the WHS pool, this strange behavior happens only during backups and only during these copy processes. What to do? Oh, I tried without duplication and without balancing, but nothing made any difference. Whether a balancing is in effect at the moment doesn't seem to bother anything (make it better or worse). I've removed the drives from port adapters and connected all of them directly to the motherboard 6G SATA ports. That improved it slightly, but not significantly. Anyone with thoughts or suggestions? --Bill
  16. These almost look like auto-nightly builds. Is 472 known to be in reasonable condition? It seems to be working fine, and watching the memory, there hasn't been any significant increase that I notice. So far no directory stalls, either. So that's a good thing. Do you ever find yourself needing to modify mftzone in fsutil? It seems not to apply unless you have gazillions of very small files. Not usually something you would see in a media oriented server. --Bill
  17. My Placement balancer shows all drives present, and both duplicated and non-duplicated checkboxes checked for each. That was the default, is it not correct? I'm using 2.1.0.432 which is the latest Beta I can find anywhere. Where do I download 46x? There is no troubleshooting command that I can see in 432. Why is it a hack? It seems like a good adjustment to make for a system that has numerous drives with high directory and file counts? I would run out of directory caching on certain drives frequently. While it updates and shuffles things around to even complete the directory listing on the requested drive, it can take a minute or two, and that was before I added DrivePool for testing with ten more drives. Seems quite logical to me. Any further details on the 'hackness' of it? Thanks. --Bill
  18. I've read that DrivePool should correctly handle bringing up drives in the pool as needed, but if a full directory is done, all drives may have to be powered up and this can take some five seconds per drive. Right now I'm on my second server build with W8.1 Pro as the host. Right now this server has 18 drives, ten of which are in one drive pool, containing audio and video media, to the tune of about 34T, mirrored. Most of the time there is no issue, but occasionally after several hours away from the computer I'll still have a window open to one of the media directories, and then proceed to drag-'n-drop copy to that folder. If the copy contains a bunch of smaller files like an album, it will copy one or two, and then stop. I've waited up to 15 minutes for it to do something, but nothing happens and I cannot close the window of the open folder, though I can kill the copy process. I see something similar occasionally when after a long time (sufficient for all-drive power down) I open a folder to the top level of the pool. I can see its directory (probably cached) but trying to access a sub directory, I get the same behavior. This suggests one or more drives not powering up appropriately, or DP somehow not aware that it/they did. Is any of this familiar? I'm not running the scanner, and the service for it is currently disabled (that'll be another issue I'll post about in a few days). Both of these are the latest V2 BETA's (on the web site). The drives are all Seagate 3T or 4T and one WD 3T. Mostly 4T's are used. Any comments or advice? --Bill
  19. But if it had, there would be corresponding entries in the SMART table. Specifically Reallocated_Sector_Ct, and if a reallocation is pending Current_Pending_Sector. Both of these remained at zero. On point #2, that easily could have happened early on in the logfile, but in the current and current-1 week, the machine was completely static, sealed up and unbothered. So it wouldn't have been likely unless there is a cabling problem of some sort. Will do, thanks Alex.
  20. So Drashna, I do or do not want to use DirectIO for SMART requests? I can't tell which you are advocating.
  21. I don't see anything labeled UnsafeDirectio. There is a checkbox option "Do not use Direct I/O when querying SMART". Those were all unchecked. I think what you're suggesting is that they should all be checked? For the six media drives in the pool, all are on 5 port port multipliers. The first group of five are on ATA Channel 0, the second port multiplier is on ATA Channel 1, but only has one drive on it. The drive with the read error was the second on the first port multiplier. I believe what is being designated as ATA Channel 0 and 1, is the ASMedia AHCI controller with just two ports, part of the 8 SATA ports on the motherboard. Other drives in the system are on the Intel 8 series/C220 Series SATA AHCI controller, also part of the MB. There are 6 ports on it, but none of those are used for the Media Server Drives Pool. Another odd thing about all this, is that when I placed the 'defective' drive in another machine and did a complete read surface scan, there were no errors. Following that, a complete long format (high level) yielded no errors. SMART showed no reallocated sectors, and no pending reallocation sectors. The drive also passed the smartctl long drivetest with no issues. Now I'm wondering if this was really a drive error in the first place? It could have been transient, I suppose. --Bill
  22. This is W8.1 Pro, with six 4T drives in the media pool. Two weeks ago, Scanner had finished scanning all drives and pronounced them healthy. Last week one day I didn't have any content in my media directory and found that all the drives were missing. Not just offline, but not even showing up in the Windows Disk Manager. Scanner was unresponsive. I rebooted the machine and all media drives re-appeared. A couple of minutes later I started the UI to Scanner and noticed on one drive it had found a bad (512b) sector, and as of the reboot had scanned past it about 4 (tiny) blocks. About that time I received an email from Scanner: StableBit Scanner Unreadable Sectors Found on "PMSERVER". One or more disks are damaged and have data on them that is unreadable: ST4000DM000-1F2168 ATA Device - 512 B unreadable (1 sectors) Model: ST4000DM000-1F2168 Serial number: W300M44F Case: Server Tower Bay: 2-1 You are receiving this message because you have set up email notifications to be sent to this address from the StableBit Scanner. But it was dated as of the reboot, not when this event occurred several days prior. During that time, Scanner was (apparently) locked up and media drives were not available. Looking at the Windows Event Viewer, there were several scanner error reports referenced, which I have attached. Maybe they'll help make some sense of this. --Bill ScannerErrorRpts.zip
  23. I have a Norco 4220 I'd love to get rid of. It comes with 20 3.5" drive trays.I have the original noisy fans and lower velocity quieter fans are in it now. The lower velocity are suitable for the later generation of 1T to 4T drives. You can use a standard 750-850 watt power supply, or I also have the 3 section power supply with 3 AC connections designed to make power redundant if the AC inputs are on electrically diverse sources or ups's. This power supply is loud, but effective. Also have the deep rack slide rails for the case. There are also available two AMCC/3Ware RAID controller cards and interfacing cables for 8 SATA drives each. There's probably other stuff too, once I start looking through everything. This is a heavy case, so shipping could become an issue if you're too far from the San Diego (southern CA) area. I'm not going to be picky about the prices of things, I just need the room. Any questions about this stuff, please email me at bblue@netoldies.com. Make a reasonable offer and it's all yours. --Bill
  24. I thought I read somewhere that you could just install the newer version over the existing and the installer will do all the right things. ??
×
×
  • Create New...