Jump to content

Alex

Administrators
  • Posts

    253
  • Joined

  • Last visited

  • Days Won

    49

Reputation Activity

  1. Like
    Alex got a reaction from Tardas-Zib in The Roadmap   
    I've been thinking about how I can better communicate what is in store for each of our products, there are 3 now and another one in the works. Starting today I'll be setting up topics in each forum that I'll be updating on a regular basis. Each post will maintain what the future holds for each product.
     
    I try to keep a lot of the development driven by user feedback, but most of that feedback doesn't happen in the public forum (but usually in tech support tickets). I'd just like to take this opportunity to highlight the direction that each product is heading in, a kind of roadmap.
     
    I'll be setting up those posts today so look for them popping up soon in each respective forum.
  2. Like
    Alex got a reaction from imxjihsk in The Roadmap   
    I've been thinking about how I can better communicate what is in store for each of our products, there are 3 now and another one in the works. Starting today I'll be setting up topics in each forum that I'll be updating on a regular basis. Each post will maintain what the future holds for each product.
     
    I try to keep a lot of the development driven by user feedback, but most of that feedback doesn't happen in the public forum (but usually in tech support tickets). I'd just like to take this opportunity to highlight the direction that each product is heading in, a kind of roadmap.
     
    I'll be setting up those posts today so look for them popping up soon in each respective forum.
  3. Like
    Alex got a reaction from Tardas-Zib in StableBit DrivePool - Controlling Folder Placement   
    I like writing these posts because they give me feedback as to what the community is really interested in. I can see that my last post about the Scanner was not very interesting, it was probably too technical and there's probably not much to add to what I've already said.
     
    Well, this time let's talk about StableBit DrivePool. In particular, I'd like to talk about DrivePool beyond 2.0.
     
    Controlling Folder Placement
     
    I think that I have a few great ideas for DrivePool 2.1+ but some of them depend on the ability to control folder (or file) placement, per pool part. I've kind of hinted at this capability in the thread that talked about taking out per-folder duplication, but I think that I've figured out how we can make this work.
     
    What I would like to be able to do in future versions is to give you guys the ability to associate folders with one or more disks that are part of the pool. So that any files in those folders would be stored on those pool parts only (unless they're full).
     
    This should be trivial to implement on the file system level, but the balancing framework would need to be enhanced to support this, and I think that I've figured out how to make that work.
     
    Theoretically, you should even be able to use wildcard patterns such as /Virtual Machines/Windows* to associate all of those files with a group of pooled disks.
     
    What do you guys think, is this worthwhile doing?
  4. Like
    Alex got a reaction from meeldilla in Welcome to the new Forum   
    Welcome everyone to the new Forum!
     
    In an effort to support our growing community, the old Vanilla based forum was retired. It will remain available at its old URL http://forum.covecube.com, but you won't be able to post anything new there.
    This new forum can be accessed at http://community.covcecube.com.
     
    Login Credentials
     
    If you've created an account on the old Vanilla forum, you can actually use the same username and password to log into this forum.
    In addition, this new forum now supports logging in with your Windows Live ID.
     
    Community
     
    With this new forum, my hope is to give our community a place to talk about technology as it surrounds our products, and not just focus on the products themselves.
     
    To this end, I've set up a few sub-forums:
    StableBit Scanner - Compatibility
    With all the different hardware out there (disk controllers, hard drives, ect...) it's tough to know what works best, especially when dealing with SMART passthrough.
    I'd like this to be the place to discuss what works and what doesn't.
      StableBit DrivePool - Hardware
    StableBit DrivePool is a very fast pooling solution that is capable of creating very large pooling arrays.
    Share your setup, brag about your pool size , and get hardware advice here. Nuts & Bolts
     
    Recently I've been overhauling the manuals section over at stablebit.com/support (yes, it's way overdue), and I found it amazing how many intricacies there are to these little programs.
    I realized that it might be fun to talk about these things so I've set up a new forum called Nuts & Bolts where I will be posting a topic every now and again about some technical aspect of our software. The topics will be open for discussion, so we can go back and forth about what people think about that particular topic.
     
    Moderators
     
    Our "Resident Gurus" Shane and saitoh183 will be joining us in the new forum as moderators, and we have Drashna (technical support) and myself as the Admins.
  5. Like
    Alex got a reaction from Christopher (Drashna) in Software I use and recommend   
    I just have to mention that, personally, I've used SyncBack SE (not Pro) for years. It's a great piece of software and reasonably priced too.
  6. Like
    Alex got a reaction from gringott in Going to the Cloud   
    Secure cloud storage has been foremost on my mind. I've been thinking about different options that would offer practical and affordable cloud storage. There will be more coming from Covecube regarding cloud storage stay tuned, the wheels are already in motion.
  7. Like
    Alex reacted to daveyboy37 in StableBit DrivePool - Controlling Folder Placement   
    This is something that I have hoped for, for a very long time. I really have never been keen on having things scattered around the pool. Especially with music... 10 album tracks scattered over 5 or 6 drives.. But then I'm probably a bit O.C.D.
     
    I'm sure for people who have various devices streaming to different rooms this must be a good thing. Knowing that all the Disney films are all on one hard drive for the rug rats, and the teenage daughter can watch her twilight knowing its on a separate drive so no risk of intensive I/O and so on. And yes i know that this could be achieved by organising multiple pools. However when you get to 13 drives and around 22TB of data creating new pools seems like a hassle. 
     
    First thought is that this would eliminate the need.
    Second thought is that once implemented the folder placement would to my mind then simplify the operation of creating separate pools and may actually lead me to do it, instead of just thinking about it 
     
    I'm all for it!!!
     
    .
  8. Like
    Alex reacted to Rychek in StableBit DrivePool - Controlling Folder Placement   
    At firtst I wasn't sure I would have any use for such a feature, but the more I think about it, the more I like the idea of having more control over what goes where.  It could be very useful as my children get old and skilled enough to use the server.  Bring on the progress!
     
    Oh, yeah, and thanks so much for all your hard work Alex!  Drivepool and Scanner are awesome and the WS 2012 E integration in the last update felt like an early Christmas present.
  9. Like
    Alex got a reaction from Christopher (Drashna) in StableBit DrivePool - Controlling Folder Placement   
    I can be a bit wordy by my very nature. But yes, that's exactly what I'm talking about, controlling which files go onto which pool part.
     
    And, in the future, a pool part may not necessarily represent a single local physical disk, which would make this even more interesting
  10. Like
    Alex got a reaction from Christopher (Drashna) in Going to the Cloud   
    Oh and thank you
     
    I'm no expert at public speaking, but I do like to talk about the products that I've built, which I believe in very strongly.
  11. Like
    Alex got a reaction from Christopher (Drashna) in WS2012E Dashboard Responsiveness with Scanner Scanning   
    Just to follow up on this,
     
    I've done extensive troubleshooting on this issue with Kihim in a support ticket. It seems like this is caused by the system getting bogged down at certain times leading to a periodic slowdown in the UI. At least that's what I can see from the Scanner's built-in UI performance profiler (accessed by pressing P in a BETA build).
     
    I don't see any Scanner UI task running away with the CPU. I've also set up a local test rig with similar specs, 26 disks and a 2Ghz 8-core AMD server, and have run the Scanner UI for hours looking for any slowdowns and I couldn't find any.
     
    But I think the bottom line here is that the Scanner uses WPF to render its UI, and WPF does require some more CPU resources than your typical Win32 application. I think that that's really the core issue here. So in the future I would like to offer a "lite" version of the UI for situations where CPU resources are scarce. I imagine that there will be a simple drop down that will let you switch between "standard" and "lite" versions of the UI, letting you run whichever one you want.
  12. Like
    Alex reacted to gringott in Going to the Cloud   
    I like my "cloud" local and private. Therefore, no external access for me. Feel free to develop for others, however.
     
    I do the math every year or so. Any serious online storage [TBs] cost more than buying and rotating drives every three years, which of course I haven't had to do that frequently.
     
    You can check for yourself, how much does it cost for 55 plus TBs online 24/7?
     
    Alex, I "saw" [heard] you being interviewed on utube "StableBit on Home Server Show 219".
     
    Very good representation of the product. Yes, I know it was from April but I just found it this morning.
  13. Like
    Alex got a reaction from nouxuntainutt in Going to the Cloud   
    It seems like just about every application today is going to the cloud.
     
    What do you guys think of us adding tighter integration with the cloud (so to speak)?
     
    For instance:
    How about saving your StableBit Scanner disk scan history online?

    This will mean that if you plug in the same hard drive into a different computer it will instantly know when that disk was last scanned.
      Perhaps we can keep track of your disk's temperature history as well, and synchronize that with the cloud?

    You would be able to query for temperature history for any disk, from any point in time.

    We can do the same with disk performance, disk uptime (when it goes to sleep or when it's running),
      For DrivePool, we can augment remote control peer discovery with a centralized server. So DrivePool will automatically know about every machine running DrivePool with the same Activation ID.
      Whenever you add or remove a disk, DrivePool would save that event to the cloud. You would be able to see pool participation history and disk space usage utilization over time.
      We can build some mobile apps around this data to let you query and access it. Would you guys find this service valuable and would you be willing to pay a small yearly fee for a service like this (say, $4.99 / yr)?
  14. Like
    Alex got a reaction from gringott in StableBit Scanner - Identifying Disks Uniquely   
    Keeping this board on topic, I'd like to talk about something pretty technical that is central to how the StableBit Scanner operates, and perhaps get some feedback on the topic.
     
    One of the benefits of running the StableBit Scanner is not just to predict drive failure, but to prevent it. The technical term for what the StableBit Scanner performs on your drives to prevent data loss is called Data Scrubbing (see: http://en.wikipedia.org/wiki/Data_scrubbing). By periodically scanning the entire surface of the drive you are actually causing the drive to inspect its own surface for defects and to recognize those defects before they turn into what is technically called a latent sector error (i.e. a sector that can't be read).
     
    In order to do the periodic surface scan of a disk, the StableBit Scanner needs to know when it scanned a disk last, which means that it needs to identify a disk uniquely and remember which sectors it has scanned last and when. The StableBit Scanner uses sector ranges to remember exactly which parts of which disk were scanned when, but that's a whole other discussion.
     
    I would like to focus this post on the issue of identifying a disk uniquely, which is absolutely required for data scrubbing to function properly, and this was overhauled in the latest BETA (2.5).
     
    The original StableBit Scanner (1.0) used a very simple method to identify disks. StableBit Scanner 1.0 used the MBR signature to differentiate disks among each other.
     
    For those who don't know what an MBR signature is, I'll explain it briefly here. When you buy a new disk from the store, it's probably completely blank. In other words, the disk contains all 0's written to it throughout. There is absolutely nothing written to it to differentiate it from any other disks (other than the serial number, which may not be accessible from Windows).
     
    When you first connect such a blank disk to a Windows machine it will ask you whether you want to "initialize" it. This "initialization" is actually the writing of the MBR (master boot record, see: https://en.wikipedia.org/wiki/Master_boot_record), or GPT (GUID Partition Table) if you so choose. The MBR and GPT define the header (and perhaps footer) of the disk, kind of like when you write a letter to someone and you have a standard header and footer that always follows the same format.
     
    One of the things that initializing a disk does is write a unique "signature" to it in the MBR or GPT. It's simply a long random number that identifies a disk uniquely. The problem with a MBR signature is that the random number is not large enough and so it is only meant to be unique on one single system. So if you connect a disk from a different computer, the disk signature on the foreign disk has a miniscule chance of being the same as a disk on the system that its being connected to.
     
    Well, for the StableBit Scanner 1.0 this would be a problem. It would recognize the new disk as being the old disk, which would cause all sorts of issues. For one, you can't have the same disk connected to the same computer twice. That's simply not possible and we would write out an error report and crash.
     
    StableBit Scanner 2.0 improved things a bit by utilizing the GPT signature, which was guaranteed to be unique across multiple systems. The only problem with using the GPT disk signature to identify disks uniquely is that disk cloning software is capable of placing the same signature on 2 different physical disks which would end up causing the same problem. In addition, many disks still utilize MBR, so we can't solely rely on GPT to resolve this issue.
     
    As you can see this has not been an easy problem to solve
     
    In the latest StableBit Scanner 2.5 BETA I've completely overhauled how we associate disk scan history (and other persistent settings) with each disk in the system. This is a major change from how things used to work before.
     
    In 2.5 we now have a brand new Disk ID system. The Disk ID system is heuristic based and picks the best disk scan history that it knows of based on the available information. We no longer rely on a single factor such as a MBR or GPT. Instead, we survey a combination of disk identifiers and pick the disk scan history that fits the available data best.
     
    Here is the list of factors that we use, starting from the highest priority:
    Direct I/O disk serial number GPT Signature + WMI Serial number + Disk size GPT Signature + WMI Model + Disk size GPT Signature + Disk size MBR Signature + WMI Serial number + Disk size MBR Signature + WMI Model + Disk size MBR Signature + Disk size See the change log for more info on what this change entails. I hope that you give the new build a try.
     
    This post has definitely been on the technical side, but that's what this forum is all about
     
    Let me know if you have any comments or suggestions (or can find a better way to identify disks uniquely).
  15. Like
    Alex reacted to mrbiggles in StableBit DrivePool Per-Folder Duplication   
    I agree with DrParis - per-folder duplication is a must have of DP, along with >2 duplication counts.
     
    For me, the simplicity of managing one pool, with variable duplication counts depending on the importance and volume of my data, is the whole attraction of DP and the thing that makes it stand head and shoulders above the others.  I never have to worry about (manually) juggling data between individual disks or backup schemes or complex raid / parity schemes or any of that tedium again.  For me it's the perfect balance between efficient storage and reliable resiliency to disk failures (and I've had a few).  And I don't have to worry about my future needs, I can just adjust a duplication count here and there, add some storage and grow my pool reliably and smoothly.
     
    To explain my rationalle..
     
    I have lots of disks, large volumes (90%+) of low priority data (TV recordings etc), and small volumes of very high priority data (family pictures etc) - and I can't imagine I'm alone in this balance.  I love the fact that I don't have to duplicate the low priority data (wasting precious and expensive space), yet can keep lots of copies of my important docs and photos and never worry about another hard drive failure again.  I can just throw in another disk when I run out of space and add it to the pool.  A marvellous, almost maintenance free, reliable and efficient system - with one big simple pool.
     
    On parity - Parity wouldn't be any good to me as I'd waste space a large amount of space adding lots of parity data for data I don't care for much, and it will waste my biggest (and generally newest) hard drive as that's the one required for parity.  It assumes all your data is equally important.  So in my PVR machine for example, where I have lots of odd disk sizes it becomes complicated and inefficient.  I'd much rather just pool together the mismatched disks into one lovely simple space for my unduplicated recordings, and have some other folders duplicated 3 or more times for important files on that computer (so I can use PVR as a network backup for important stuff).  And whilst I have the space, I can duplicate my low priority stuff also - and then just remove the duplication as I start to run out of space, or just add another disk or two to the pool, change a duplication setting and voila it all just gets rebalanced in the background.  So perfect and simple! Not to mention wonderfully scalable and future proof.
     
    On using multiple pools for differing redundancy - definitely not.  DP doesn't allow me to add multiple pools to the same set of disks, and even if it did this approach would be a real pain for me.  I'd end up having to setup a different pool for each type of data which I might conceivably want to vary the duplication for - photos, TV, docs - so would end up being a cumbersome mess.  Else I'd have to start manually shovelling data between pools to manage things when I change a duplication count, and that would be so messy. 
     
    Ps. I acknowledge that per folder parity system, with variable parity, would be (architecturally) possibly the perfect solution - but I'm more than happy to waste a bit of space for the simplicity and reliability of the DP per-folder file duplication approach.  If I could trust a parity implementation, and all my disks were the same size, and all my data was the same priority, and I knew exactly what my future redundancy requirements are, and I knew that they'd never change, I'd consider parity, But this is not the case!
     
    In short - please don't remove these two fabulous features!  
  16. Like
    Alex got a reaction from Christopher (Drashna) in Beta Updater   
    I've been debating how to handle automatic updates while we're in BETA.
     
    There are 2 issues I think:
    Too many updates are annoying. (E.g. Flash) Because this is a BETA there is the potential for a build to have some issues, and pushing that out to everyone might not be such a good idea. So I've come up with a compromise. The BETA automatic updates will not push out every build, but builds that have accumulated some major changes since the last update. Also, I don't want to push the automatic update as soon as the build is out because I want to give our users the chance to submit feedback in case there are problems with that build.
     
    Once we go into release final, then every final build will be pushed out at the same time that it's published.
  17. Like
    Alex got a reaction from Christopher (Drashna) in Beta 320 BSOD   
    I've looked at the dump that was referencing this thread and this has already been fixed and the fix will be available in the next public build.
  18. Like
    Alex got a reaction from Henrik in Server Backup and duplication question   
    If you are looking for a way to backup only the "master" copy of a file and not the duplicated part, then that is not possible.
     
    DrivePool has no notion of a master copy vs. a backup copy. Each copy of the same file is exactly identical, and the copies can reside on any pool part.
     
    If you want to backup your duplicated pooled files without backing them up twice, you will need to use a backup solution that does deduplication (E.g. the client computer backup engine part of the Windows Home Server 2011 / Windows Server 2012 Essentials). Alternatively you can use a file based backup system to backup directly from the pool (such as SyncBackSE)
  19. Like
    Alex got a reaction from Christopher (Drashna) in Symbolic Link Support   
    Thanks for testing it out.
     
    My initial implementation in build 281 above was based on the premise that we can reuse the reparse functionality that's already present in NTFS.
     
    I've been reading up some more on exactly how this is supposed to work and playing around with some different approaches, it looks like the entire concept of reusing NTFS for this is not going to work.
     
    So don't use build 281
     
    I'm going to take the current reparse implementation out and rewrite it from scratch using a different approach.
     
    Using this new approach, reparse points (or symbolic links) will appear as regular files or directories on the underlying NTFS volume, but will work like reparse points on the pool. This will also eliminate the burden of accounting for reparse points when duplicating or rebalancing, since they will be regular files on the underlying NTFS volume.
     
    Look for that in a future build. I don't think that it will make it into the next build because that one is concentrating on updating the locking and caching model, which is a big change as it is.
  20. Like
    Alex got a reaction from Shane in Questions regarding hard drive spindown/standby   
    This is actually a fairly complicated topic.
     
    Let's start by talking about how normal standby works without the StableBit Scanner getting involved.
     
    Windows "Put the Disk to Sleep" Feature
     
    Normally, Windows will monitor the disk for activity and if there is no disk activity for some preset amount of time it will put the disk to "sleep" by flushing all of the data in the cache to the disk and sending a special standby command to it. At the same time, it will remember that the disk is asleep in case any other application asks.
     
    Shortly after the disk goes to sleep, the StableBit Scanner will indicate the fact that the disk is asleep in the Power column. Normally, the Scanner gets the power status of the disk by querying Windows and not the disk.
     
    It does not query the disk directly for the power state because Windows considers this power query disk activity and wakes up the disk as a result.
     
    Now, things get a bit more complicated if you want to include the on-disk power management in this picture.
     
    Disks can optionally support these features, which can put them to sleep without Windows knowing it:
    Advanced power management. Standby timer Advanced Power Management
     
    This is a technology that implements power consumption profiles. For instance, if you don't care about performance but want maximum power savings, then you can tell your disk just that. Simply set the Advanced Power Management to Minimum Power Consumption. Or you can do the exact opposite by setting it to Maximum Performance (which guarantees no standby).
     
    With Advanced Power Management you don't concern yourself with "sleep timeouts", like in Windows. You simply state your intent and the disk will adjust various parameters, including the standby time, according to your setting.
     
    The implementation of Advanced Power Management is completely up to the manufacturer of the drive, and there are no specifications that explicitly state what each power mode does. This entire feature may not even be supported, depending on the disk model.
     
    Standby Timer
     
    The Standby timer is more widely supported because it is an older feature. You simply specify after how much disk inactivity you would like the disk to be put to sleep. This is similar to how things work in Windows, except that the low power mode will be initiated by the disk firmware itself.
     
    Again, the implementation of this is up to the manufacturer of the drive.
     
    StableBit Scanner "Put into Standby"
     
    In the StableBit Scanner, you can right click on a disk and put it into standby mode. What this does is send a power down command to the disk. This type of power down is equivalent to what Advanced Power Management or the Standby Timer would do.
     
    More importantly, when a disk is powered down in this way, Windows will not be aware that the disk is in a low power state, and will continue to report that the disk is still powered up. This is not an issue because the disk will simply spin up the next time that Windows tries to access it.
     
    But this leaves the StableBit Scanner with a dilemma. If we can't query the disk for the power state directly, how do we report the true power state of the disk? What the StableBit Scanner implements is a power state in which it's not sure whether the disk is in standby or active, and this is what you were seeing.
     
    Forcing the StableBit Scanner to Query the Power Mode from the Disk
     
    If you want to use on-disk power management exclusively, and you don't care about Windows putting your disks to sleep, you can instruct the StableBit Scanner to query the power mode directly from the disk.
     

     
    When this is enabled, you will no longer see the standby or active message, but Windows will never try to put that disk to sleep. That's why this is off by default.
     
    SMART
     
    And just to make things even more complicated, sometimes a disk will wake up when it's queried for SMART data.
     
    To this end the StableBit Scanner implements some more settings to deal with this:
     

     
    I hope that this clears things up.
  21. Like
    Alex reacted to Shane in First OFF TOPIC! New competition is coming to Town!   
    Thanks for the explanation, Alex. It's a shame that shell extensions are so inefficient; if I have to choose between "works well" and "looks slick" I'll pick the former every time.
  22. Like
    Alex reacted to saitoh183 in First OFF TOPIC! New competition is coming to Town!   
    Nothing can stop you from adding it as a option down the line
  23. Like
    Alex reacted to Doug in RocketRAID 2760 PCI-Express 2.0 SATA III (6.0Gb/s)   
    Specifications:
    Speed: SATA 3 (6.0Gb/s) Ports: 6x Internal Mini SAS SFF-8087 Slot: PCI-Express 2.0 x16 Chipset: See Marvell 9485 SAS/SATA Controller Chip Firmware
    Firmware: 1.3AHCI compatible: No (proprietary driver required) Link: http://www.highpoint-tech.com/USA_new/CS-PCI-E_2_0_x16_Configuration.html Driver: 1.2.12.1023 (10/23/2012)rr276x.sys Link: http://www.highpoint-tech.com/USA_new/CS-PCI-E_2_0_x16_Configuration.html Performance
    SATA III SSDBurst: 481 MB/s Drive: Intel SSDSC2CT180A4 OS Tested: Windows Home Server 2011
      SATA II HDDBurst: 171 MB/s Drive: Seagate ST3200045AS OS Tested: Windows Home Server 2011
    SATA III HDDBurst: 459 MB/s Drive: Seagate ST2000DM001 OS Tested: Windows Home Server 2011
  24. Like
    Alex got a reaction from Shane in 2.x BETA - "Duplicate" default is off?   
    To be honest, I wasn't 100% comfortable with the arrow not having any text next to it, but I doubt that any designer ever is 100% satisfied with their design. You always want to keep tweaking it to make it perfect, but there are time constraints (plus, we can't exactly afford Johny Ive here).
     
    I decided to ship it and listen for feedback, and based on that feedback I've slightly modified the pool options menu.
     
    It looks like this now:
     

     
    Let me know what you think.
  25. Like
    Alex got a reaction from DrParis in BSOD in srv2.sys on Windows 8 / 2012   
    Continuing the thread from the old forum: http://forum.covecube.com/discussion/1129/critical-bsod-when-accessing-dp-over-a-network-share
     
    Just to recap, I've received a number of memory dumps over the past month or so showing a system crash in srv2.sys. Srv2.sys is Microsoft's file sharing driver that translates network file I/O requests into local file I/O requests on the server, but the implication is of course that StableBit DrivePool may somehow be the cause of these crashes. The crashes only occur on Windows 8 / Windows Server 2012 (including Essentials).
     
    Paris has submitted the best dump to date on this issue and I've analyzed it in detail.
     
    I'm going to post a technical description of the crash for anyone who can read this sort of thing.
     

     
    What we have
    A full memory dump of the crash. Verifier enabled on all drivers at the time of the crash (giving us additional data about the crash). ETW logging enabled on CoveFS / CoveFSDisk, giving us everything that CoveFS did right before the crash. The system
    3: kd> vertarget Windows 8 Kernel Version 9200 MP (4 procs) Free x64 Product: Server, suite: TerminalServer DataCenter SingleUserTS Built by: 9200.16581.amd64fre.win8_gdr.130410-1505 Kernel base = 0xfffff800`65214000 PsLoadedModuleList = 0xfffff800`654e0a20 Debug session time: Fri May 31 04:45:08.610 2013 (UTC - 4:00) System Uptime: 0 days 0:21:01.550 3: kd> !sysinfo cpuinfo [CPU Information] ~MHz = REG_DWORD 3100 Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0 Identifier = REG_SZ Intel64 Family 6 Model 58 Stepping 9 ProcessorNameString = REG_SZ Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz Update Status = REG_DWORD 2 VendorIdentifier = REG_SZ GenuineIntel MSR8B = REG_QWORD ffffffff00000000 The crash
    3: kd> !Analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* PAGE_FAULT_IN_NONPAGED_AREA (50) Invalid system memory was referenced. This cannot be protected by try-except, it must be protected by a Probe. Typically the address is just plain bad or it is pointing at freed memory. Arguments: Arg1: fffffa8303aa41ea, memory referenced. Arg2: 0000000000000000, value 0 = read operation, 1 = write operation. Arg3: fffff8800556e328, If non-zero, the instruction address which referenced the bad memory address. Arg4: 0000000000000000, (reserved) Debugging Details: ------------------ READ_ADDRESS: fffffa8303aa41ea Nonpaged pool FAULTING_IP: srv2!Smb2ContinueUncachedRead+26c28 fffff880`0556e328 410fb644240a movzx eax,byte ptr [r12+0Ah] MM_INTERNAL_CODE: 0 IMAGE_NAME: srv2.sys DEBUG_FLR_IMAGE_TIMESTAMP: 51637dde MODULE_NAME: srv2 FAULTING_MODULE: fffff880054ff000 srv2 DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: AV PROCESS_NAME: System CURRENT_IRQL: 0 TRAP_FRAME: fffff88004d3fb60 -- (.trap 0xfffff88004d3fb60) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=fffffa830396c150 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 rip=fffff8800556e328 rsp=fffff88004d3fcf0 rbp=fffff9801e9e0af0 r8=00000000000006e5 r9=fffff88005544680 r10=fffffa83035a6c40 r11=0000000000000001 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei ng nz na po cy srv2!Smb2ContinueUncachedRead+0x26c28: fffff880`0556e328 410fb644240a movzx eax,byte ptr [r12+0Ah] ds:00000000`0000000a=?? Resetting default scope LAST_CONTROL_TRANSFER: from fffff8006532f3f1 to fffff8006526e440 STACK_TEXT: fffff880`04d3f978 fffff800`6532f3f1 : 00000000`00000050 fffffa83`03aa41ea 00000000`00000000 fffff880`04d3fb60 : nt!KeBugCheckEx fffff880`04d3f980 fffff800`652a8acb : 00000000`00000000 fffffa83`03aa41ea fffffa83`035be040 fffff880`03cc4419 : nt! ?? ::FNODOBFM::`string'+0x33c2b fffff880`04d3fa20 fffff800`6526beee : 00000000`00000000 fffff980`254b6950 fffff980`1e9e0b00 fffff880`04d3fb60 : nt!MmAccessFault+0x55b fffff880`04d3fb60 fffff880`0556e328 : 00000000`00000000 fffff880`00000000 ffff2f2d`390b1a54 fffff980`01dc8f20 : nt!KiPageFault+0x16e fffff880`04d3fcf0 fffff880`055470de : fffffa83`03c3d1e0 fffff980`01dc8f20 fffff980`254b6950 fffffa83`01f99040 : srv2!Smb2ContinueUncachedRead+0x26c28 fffff880`04d3fd50 fffff880`055455bd : 00000000`00000002 fffffa83`01f99040 fffff980`254b6c60 fffff800`6524acbe : srv2!Smb2ExecuteRead+0x6ce fffff880`04d3fde0 fffff880`05545a64 : fffffa83`0084cd18 fffff980`254b6950 fffff980`1b44efd0 fffff980`254b6950 : srv2!Smb2ExecuteProviderCallback+0x6d fffff880`04d3fe50 fffff880`05543180 : fffff980`1b54cd80 fffff980`1b54cd80 00000000`00000001 fffff980`254b6950 : srv2!SrvProcessPacket+0xed fffff880`04d3ff10 fffff800`65268b27 : fffff880`04d3ff01 00000000`00000000 fffff980`254b6960 fffff880`05546000 : srv2!SrvProcpWorkerThreadProcessWorkItems+0x171 fffff880`04d3ff80 fffff800`65268aed : fffff980`1b54cd01 00000000`0000c000 00000000`00000003 fffff800`652c3ab8 : nt!KxSwitchKernelStackCallout+0x27 fffff880`07f9c9e0 fffff800`652c3ab8 : fffffa83`00000012 fffff980`1b54cd01 00000000`00000006 fffff880`07f97000 : nt!KiSwitchKernelStackContinue fffff880`07f9ca00 fffff800`652c63f5 : fffff880`05543010 fffff980`1b54cd80 fffff980`1b54cd00 fffff980`00000000 : nt!KeExpandKernelStackAndCalloutInternal+0x218 fffff880`07f9cb00 fffff880`05500da5 : fffff980`1b54cd80 fffff980`1b54cd00 fffff980`1b54cd80 fffff800`65277cc4 : nt!KeExpandKernelStackAndCalloutEx+0x25 fffff880`07f9cb40 fffff800`652ac2b1 : fffffa83`035be040 fffff880`05546000 fffff980`1b54cde0 fffff900`00000000 : srv2!SrvProcWorkerThreadCommon+0x75 fffff880`07f9cb80 fffff800`65241045 : fffffa83`03058660 00000000`00000080 fffff800`652ac170 fffffa83`035be040 : nt!ExpWorkerThread+0x142 fffff880`07f9cc10 fffff800`652f5766 : fffff800`6550c180 fffffa83`035be040 fffff800`65566880 fffffa83`0088d980 : nt!PspSystemThreadStartup+0x59 fffff880`07f9cc60 00000000`00000000 : fffff880`07f9d000 fffff880`07f97000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND: kb FOLLOWUP_IP: srv2!Smb2ContinueUncachedRead+26c28 fffff880`0556e328 410fb644240a movzx eax,byte ptr [r12+0Ah] SYMBOL_STACK_INDEX: 4 SYMBOL_NAME: srv2!Smb2ContinueUncachedRead+26c28 FOLLOWUP_NAME: MachineOwner BUCKET_ID_FUNC_OFFSET: 26c28 FAILURE_BUCKET_ID: AV_VRF_srv2!Smb2ContinueUncachedRead BUCKET_ID: AV_VRF_srv2!Smb2ContinueUncachedRead Followup: MachineOwner --------- From the auto analysis we can see that the memory address 0xfffffa8303aa41ea was being accessed from some code at address 0xfffff8800556e328.
     
    We can also see that the function that crashed the system is srv2!Smb2ContinueUncachedRead.
     
    We can check that memory address and it is indeed invalid:
    3: kd> dd 0xfffffa8303aa41ea fffffa83`03aa41ea ???????? ???????? ???????? ???????? fffffa83`03aa41fa ???????? ???????? ???????? ???????? fffffa83`03aa420a ???????? ???????? ???????? ???????? fffffa83`03aa421a ???????? ???????? ???????? ???????? fffffa83`03aa422a ???????? ???????? ???????? ???????? fffffa83`03aa423a ???????? ???????? ???????? ???????? fffffa83`03aa424a ???????? ???????? ???????? ???????? fffffa83`03aa425a ???????? ???????? ???????? ???????? What was srv2 trying to do?
     
    So the next question to ask is what was srv2 trying to do and why did it fail?
     
    I've gone ahead and decompiled the portion of srv2 that is causing the crash, and here it is:
    if ( mdl1 && mdl1 != mdl2 && !(mdl1->MdlFlags & MDL_SOURCE_IS_NONPAGED_POOL) ) { do { mdlCurrent = mdl1; mdlFlags = mdl1->MdlFlags; mdl1 = mdl1->Next; if ( mdlFlags & (MDL_PARTIAL_HAS_BEEN_MAPPED | MDL_PAGES_LOCKED) ) { MmUnlockPages(mdlCurrent); } IoFreeMdl(mdlCurrent); } while ( mdl1 ); *(_DWORD *)(Length + 4) = 0; } A MDL is a kernel structure that simply describes some memory (http://msdn.microsoft.com/en-us/library/windows/hardware/ff554414(v=vs.85).aspx).
     
    The MDL variables:
    mdl1: 0xfffffa83`03aa41e0 (invalid memory pointer) mdl2: 0xfffffa83`03c3d1e0

    3: kd> dt nt!_MDL fffffa83`03c3d1e0 +0x000 Next : (null) +0x008 Size : 0n568 +0x00a MdlFlags : 0n4 +0x00c AllocationProcessorNumber : 0xf980 +0x00e Reserved : 0xffff +0x010 Process : (null) +0x018 MappedSystemVa : 0xfffffa83`03bfd000 Void +0x020 StartVa : 0xfffffa83`03bfd000 Void +0x028 ByteCount : 0x40150 +0x02c ByteOffset : 0 The crash occurs at the point when the function tries to access the MdlFlags member of mdl1 (mdl1->MdlFlags). Since mdl1 points to an invalid memory address, we can't read the flags in.
     
    The assembly instructions look like this:
    srv2!Smb2ContinueUncachedRead+0x26c28: fffff880`0556e328 410fb644240a movzx eax,byte ptr [r12+0Ah] fffff880`0556e32e a804 test al,4 fffff880`0556e330 0f853294fdff jne srv2!Smb2ContinueUncachedRead+0x68 (fffff880`05547768) r12 is mdl1, and we crash when trying to read in the flags.
     
    The connection to Fast I/O
     
    In every single crash dump that I've seen, the crash always occurs after a successful (non-waiting) Fast I/O read. In fact, the function that calls the crashing function (srv2!Smb2ExecuteRead+0x6ce) has an explicit condition to test for this.
     
    Where did mdl1 go?
     
    So the question is, why is mdl1 invalid? Did it exist before and was freed, or was there some kind of memory corruption?
     
    Here are my observations on this:
    In every dump that I've seen, the addresses look right. What I mean by that is that the seemingly invalid mdl1 address falls roughly into the same address range as mdl2. It always starts correctly and always ends with 1e0.

    If this crash was due to faulty RAM, then I would expect to see this address fluctuate wildly.
      The crash always occurs in the same place (plus or minus a few lines of code).

    To me, this indicates that there is a bug somewhere. Based on these observations I'm assuming that the mdl1 address in indeed valid, and so it must have been previously freed.
     
    But who freed it?
     
    We can answer that with a simple verifier query:
    3: kd> !verifier 0x80 fffffa8303aa41e0 Log of recent kernel pool Allocate and Free operations: There are up to 0x10000 entries in the log. Parsing 0x0000000000010000 log entries, searching for address 0xfffffa8303aa41e0. ====================================================================== Pool block fffffa83`03aa3000, Size 00000000000018e0, Thread fffff80065566880 fffff80065864a32 nt!VfFreePoolNotification+0x4a fffff80065486992 nt!ExFreePool+0x8a0 fffff80065855597 nt!VerifierExFreePoolWithTag+0x47 fffff880013b32bf vmbkmcl!VmbChannelPacketComplete+0x1df fffff88003f91997 netvsc63!NvscMicroportCompleteMessage+0x67 fffff88003f916a3 netvsc63!ReceivePacketMessage+0x1e3 fffff88003f913ff netvsc63!NvscKmclProcessPacket+0x23f fffff880013b2844 vmbkmcl!InpProcessQueue+0x164 fffff880013b402f vmbkmcl!InpFillAndProcessQueue+0x6f fffff880013b7cb6 vmbkmcl! ?? ::FNODOBFM::`string'+0xb16 fffff880014790d7 vmbus!ChildInterruptDpc+0xc7 fffff80065296ca1 nt!KiExecuteAllDpcs+0x191 fffff800652968e0 nt!KiRetireDpcList+0xd0 ====================================================================== Pool block fffffa8303aa3000, Size 00000000000018d0, Thread fffff80065566880 fffff80065855a5d nt!VeAllocatePoolWithTagPriority+0x2d1 fffff88001058665 VerifierExt!ExAllocatePoolWithTagPriority_internal_wrapper+0x49 fffff80065855f02 nt!VerifierExAllocatePoolEx+0x2a fffff880013b2681 vmbkmcl!InpFillQueue+0x641 fffff880013b4004 vmbkmcl!InpFillAndProcessQueue+0x44 fffff880013b7cb6 vmbkmcl! ?? ::FNODOBFM::`string'+0xb16 fffff880014790d7 vmbus!ChildInterruptDpc+0xc7 fffff80065296ca1 nt!KiExecuteAllDpcs+0x191 fffff800652968e0 nt!KiRetireDpcList+0xd0 fffff800652979ba nt!KiIdleLoop+0x5a ====================================================================== The memory has been originally allocated by vmbkmcl.sys, and has already been freed at the point of the crash.
     
    Googling, I found that vmbkmcl.sys is a "Hyper-V VMBus KMCL", and netvsc63.sys is the "Virtual NDIS6.3 Miniport".
     
    File times
     
    Here are the file times of the drivers that are involved in this complicated interaction.
    3: kd> lmvm srv2 start end module name fffff880`054ff000 fffff880`055a0000 srv2 (private pdb symbols) c:\symbols\srv2.pdb\B796522F4D804083998D25552950C4202\srv2.pdb Loaded symbol image file: srv2.sys Image path: \SystemRoot\System32\DRIVERS\srv2.sys Image name: srv2.sys Timestamp: Mon Apr 08 22:33:02 2013 (51637DDE) CheckSum: 000A6B64 ImageSize: 000A1000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 3: kd> lmvm vmbkmcl start end module name fffff880`013b0000 fffff880`013c6000 vmbkmcl (pdb symbols) c:\symbols\vmbkmcl.pdb\82188957E5784EDD91906B760767302E1\vmbkmcl.pdb Loaded symbol image file: vmbkmcl.sys Image path: \SystemRoot\System32\drivers\vmbkmcl.sys Image name: vmbkmcl.sys Timestamp: Wed Jul 25 22:28:33 2012 (5010AB51) CheckSum: 000250C9 ImageSize: 00016000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 3: kd> lmvm netvsc63 start end module name fffff880`03f90000 fffff880`03faa000 netvsc63 (private pdb symbols) c:\symbols\netvsc63.pdb\BD38B199A4C94771860A5F2390CC30E61\netvsc63.pdb Loaded symbol image file: netvsc63.sys Image path: netvsc63.sys Image name: netvsc63.sys Timestamp: Sat Feb 02 02:23:05 2013 (510CBED9) CheckSum: 0001B2D9 ImageSize: 0001A000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 Possible sequence of events
     
    In short, it seems to me that:
    Some memory was allocated to process a network request. That memory was passed to srv2.sys, which is processing that request. The original driver has decided that the memory is no longer needed and freed the memory. srv2.sys is ignorantly trying to access the now freed memory. Workarounds
     
    As a potential workaround, turning off Fast I/O should prevent the code that is causing the problem from running.
     
    DrivePool 2.0 doesn't yet contain a setting for this but I'll add it in the next build. Turning on Network I/O Boost should also prevent the problem because we do some extra processing on networked read requests when that is enabled, which bypasses Fast I/O.
     
    Connection to DrivePool
     
    I'm still trying to find a connection to DrivePool in all of this, but I can't. I still can't reproduce this crash on any of the test servers here (4 of them running the windows 8 kernel), nor can I reproduce this on any of my VMs (using VirtualBox).
     
    Fast I/O doesn't deal with MDLs at all, so DrivePool never sees any of the variables that I've talked about here. The Fast I/O code in CoveFS is fairly short and easy to check.
     
    Because of the potential Hyper-V connection shown above, I'll try to test for this on a Hyper-V VM.
     
    As far as what DrivePool was doing right before the crash, I can see from the in-memory logs that it has just completed a Fast I/O read request and has successfully read in 262,144 bytes.
     
    Because I don't have a definitive reproducible case, I can't be 100% certain as to what the issue is. I'll keep digging and will let you guys know if I find anything else.
×
×
  • Create New...