Jump to content
  • 0

Performance / Threads Limit???


zim2323

Question

So, I looked for the previous conversation I had created on this, but I couldn't find it.  I am assuming it was purged as "unanswered" or just fell off the list because of age.

 

Last year I started a conversation that I was unable to respond to because of life, work, etc.  Ultimately, the testing I needed to be able to do required more time and hardware then I had to give.  In short, I run virtual machine labs via VMware Workstation Pro.  I virtualize everything from your everyday Windows desktop OS to Server, Linux, and OS X.  From a hardware perspective I'm using a Rampage IV Extreme with 64gb of RAM and a 6-core i7 processor.  I have installed a 4-port PCI-e SATA III card for additional drives and also had an external 8-bay USB3(HASP)/eSATA external enclosure from Startech.  Storage is as follows: 1 Samsung 850 Pro 512GB SSD for Operating System, 1 Samsung 850 Pro 1TB SSD for gaming, 8 internal drives of random size, speed, and form factor, but are all mechanical drives.  The 8-bay external holds 8 additional mechanical drives, also of random size, speed, and form factor.

 

All 16 (non-SSD) mechanical drives are joined to a single pool drive.  Duplication is enabled for archive data folders and I made sure that guest VM folders were NOT included.  At one time I enabled duplication on the guest VM folders at 2x, 3x, and 4x to try and take advantage of higher read speeds.  I found this more troublesome for guest VM's, and got more performance from migrating to split disks within the VM infrastructure for high use virtual disks.  I use the balance plugin so if I have a 20gb drive split into 4gb files, then the virtual disk is essentially split 5 ways.  I am still only reading on a single disk, but as the VM's grow read data is split more and performance is still better than duplication read speeds from the pool.  This has been my experience and could be related to the issue below.

 

The issue I had/have is that when I start more than 2 "actions", i.e. file copy, vm guest etc., each of them will randomly lock up and I will only see activity on no more than 2 of these actions at a time.  During this time most of the actions are "not responding", performance tanks, and I can see wait times as long as 20-30 minutes before one that was "working" goes not responding, and one of the other tasks starts "working" again.

 

The information requested at the time was basically about eliminating or isolating different hardware and configurations in order to see if this issue continued.  Since that time I have done the following:

1) Removed the external 8-bay from the pool.  NO CHANGE IN "NOT RESPONDING"

2) Removed all duplication from the pool.  NO CHANGE IN "NOT RESPONDING"

3) Removed all drives in the pool but 1, a WD Black 2TB 7200 RPM drive.  NO CHANGE IN "NOT RESPONDING"

4) Added a 4-port 1gb server NIC and have 5 teamed 1gb ports and configured LACP on my managed switch to see if adding sessiosn would help.  NO CHANGE IN "NOT RESPONDING"

5) Wiped and rebuilt OS (Windows 7 x64) and tested each of the above again.  NO CHANGE IN "NOT RESPONDING"

6) Added a Samsung 840 Pro 500GB SSD and isolated all disk activity here (guest VM's, file copies, etc) and had NO ISSUE.  Never saw "not responding in this format".  This included 4 threads from an external FTP (work PC), 11 guest VM's, which included 2 FreeNAS virtual machines with 6 virtual disks each, a DC, a WSUS server, 2 Hyper-V 2012 R2, 2 Hyper-V 2016, 2 ESX 6.5, and a management server with SQL 2016, vCenter Server, Hyper-V, and SCVMM, along with two separate 5gb ISO copies.  I was absolutely KILLING this drive and it never hiccuped a single time.  On the pool, I can only do a combination of any 2 of those.

 

To this day I still have this issue and am at the point where either I am using DrivePool in a way that it was never intended or capable of being used, or I have an issue on my system.  Hardware was all tested "good", all drivers, Windows updates, etc. up to date and everything operates fine.

 

It seems that there is a maximum thread count on the DrivePool driver or in how the system handles activity on that pool/infrastructure.  This issue completely disappears when I move all activity away from the DrivePool pool.

 

 

Thoughts, ideas?  At this point, my only option seems to be to try something like DriveBender to see if it has the same performance hit.  Perhaps there's a "secret setting" that a dev knows about that would fix this for me.  I completely understand that I may be the .000001% of the use cases that use the product in this way.  I may have this problem no matter what I do, but I'd rather not have to go back to splitting all my files and VM's across disks manually.

 

 

Thanks for your time and any help you can give!

 

Chris

Link to comment
Share on other sites

13 answers to this question

Recommended Posts

  • 0

It was probably this thread: 

http://community.covecube.com/index.php?/topic/2043-limitation-settings/&do=findComment&comment=14113

 

Otherwise, we don't delete or hide threads unless they're duplicates, or explicitly requested, or spam. 

 

 

A for the issues, that's ... odd.  The pool should be highly parallelized.  I can't reproduce this issue on my own systems, so I'm not sure why you're seeing this. 

 

That said, I think you've opened a ticket for this already.  If not, then please do. 

 

And enable logging and reproduce the issue:

http://wiki.covecube.com/StableBit_DrivePool_2.x_Log_Collection

 

 

And if you have any antivirus software installed, let me know. 

Link to comment
Share on other sites

  • 0

The other thread was specific to issues I was having with a group of cloud drives combined into a single DrivePool.  I have since given up on this because reboots are troublesome in timing when the CloudDrive initiates, when the DrivePool comes online, and some other dependencies I have loaded on the DrivePool that I don't care to change how I'm doing at the moment.

 

As for Antivirus, I have ESET SmartSecurity v10.  I've used v8 and v9 since this issue started.  I have all Stablebit folders explicitly excluded from all scans and are ignored entirely by the product.

 

I wish I had the money for a set of SSD drives.  I'm wondering if this is a limitation of the mechanical disks I'm using also (all SATA III).  In that, if all drives were SSD, would I ever see this issue.  That said, the external drive was all 2TB 7200 WD Black drives at one time.  Both USB (UASP) and eSATA had the same issue.  I isolated all of these drives into their own pool.

 

I'll have to see what I can gather for logs.  I do remember that was something that I was trying to do.  If I remember right I had an issue with the logs not saving/working because the entire system hangs when this happens.  I could never gather logs to send.  I'll have to see if I can do this again.

 

Honestly, I don't like forcing this issue to happen anymore, because when I do I run the risk of corruption.  I have lost some data because of this in the past.  I corrupted almost 300gb of VM's during one of these "episodes".  I now do a full sync of the drive using Syncovery to an external USB 3.0 backup drive, but it's still something I don't want to risk any more than I have to.

 

I'll see what I can do to try and gather logs again.

Link to comment
Share on other sites

  • 0

Well, Alex is going through active DrivePool issues.  I think this may have already been flagged for him (or a similar ticket). We do have a few users reporting this sort of issue, but ... I haven't bee nable to reproduce this myself (and I don't think that Alex has either).  Unfortunately, without catching it "in action" it can be hard to tell why it's happening, let alone deal with it. 

 

 

That said, please make sure you're on the latest beta version, as the issue *may* already be addressed.

http://dl.covecube.com/DrivePoolWindows/beta/download/StableBit.DrivePool_2.2.0.754_x64_BETA.exe

 

If you have been using a more recent version and have seen this .... please let me know. 

 

As for ESET, .... I hate to ask, but does this occur when the antivirus is uninstalled? 
The reason I ask, is that even with disabled or excluded properly, the file system filter that the real time production uses can still cause issues, despite being "disabled".  In theory, it should just "ignore all requests", but that doesn't always happen (properly). 

 

Additionally, it may be worth toggling the "bypass file system filters" option, as some antivirus solutions ... have had issues when we do bypass them.  

 

 

And yeah, if you can pinpoint an action that triggers this and outline it in detail, that would be incredibly helpful for us.  Both in attempting to reproduce, but also for identifying the cause. 

 

 

 

As for the data, I absolutely understand.  It's shitty anytime that data is at risk. But hopefully, we are able to identify and fix the issue.

 

 

Additionally, it may be worth opening up StableBit Scanner or running "resmon" and watching the disk queue length when this happens.  If you're seeing this skyrocket, it may be because one or more disks is being overworked or otherwise causing issues. 

(this doesn't apply to SSDs, which can hit very high values and then drop back to zero rapidly). 

Link to comment
Share on other sites

  • 0

I'm on 2.2.0.753 right now.  I've been trying to keep up with the latest and greatest.  I'll download the new version and get it installed.

 

When I reloaded the operating system I tested without ESET installed and same issue.

 

I have bypass file system filters checked.  Did you mean toggle on or off?

 

Stablebit Scanner always shows between 1 and 2 disk queue's during these times.

 

Right now I'm running into a performance issue after adding the external drive enclosure and 8 drives.  Before adding to the pool I tested a large copy and averaged between 80-90MB/s for each drive.  I can't get full speed because I'm on Win7 and the USB 3.0 UASP driver from ASUS hasn't been updated since the board was released and the configuration tool won't even run with the Startech 8bay connected.  When I get their tool/driver to work I can get 4GB/s over that USB 3.0 UASP connection.  I'm planning to upgrade to Win10 soon to also take advantage of SMB 3.0 multichannel for my Bigger 4x1gb Synology SAN's.  That should fix the USB 3.0 speed issues because it will use native Windows drivers then.

 

I added all 8 drives to the current pool and re-measured.  Re-balancing started at about 60-70MB/s and after about 3 drives has dropped to the 5-7MB/s range and has been working for the last 18 hours to finish balancing to the rest.  Still at 91.8%.  I'm curious if that drop in performance is part of the problem I'm seeing in the other issue.  I have noticed the same issue when re-balancing a large amount of data on the internal drives and it drops to that speed.  I'm not re-balancing 2TB of data though so I'm not thinking about it as much.  I've been monitoring everything with Scanner.

Link to comment
Share on other sites

  • 0

The drop in performance may be unrelated. 

 

Specifically, for balancing and duplication, we use a background IO priority. This means that these operations may not see high speeds, when the drive can get those speeds normally.  

 

There are some advanced settings to tweak that, though. But I don't think that this is related to what you're seeing. 

 

 

As for the "bypass file system filters", I mean toggle it. If it was on, turn it off, if it was off, turn it on.  See if that helps or affects anything.  

 

 

 

So if this issue is continuing, please do grab the logs.

Link to comment
Share on other sites

  • 0

Thanks Chris!

 

I have toggled the setting on and off and haven't seen a difference.  I have enabled logging.

 

I'd like to know what those tweaks are for balancing.  I'd like to maximize throughput and get those done as quickly as possible.  It took almost 2 days to balance 2TB across 9 additional drives I added to the 7 drive pool.  I clicked the "higher priority" setting.  I wasn't using the computer at all.  I feel like this should have been done a lot quicker than that given the circumstance.  Anything I can do to improve that would be greatly appreciated.

 

 

 

btw...I'm also going to have to go through the process of getting my external enclosure SMART working.  I've used the tool and it mimics the other posts you've worked on where I need to gather HD Sentinel information and get it submitted.  HD Sentinel picks up all the SMART data as USB no issue, so it's just a matter of adding the code/info to be properly recognized I imagine.

 

Thanks!

Link to comment
Share on other sites

  • 0

To change the Background IO settings:

 

http://wiki.covecube.com/StableBit_DrivePool_Advanced_Settings

 

Set "FileBalance_BackgroundIO" and "FileDuplication_BackgroundIO" to "False" and reboot. 

 

This should cause balancing and duplication tasks to run at a higher priority.  

 

 

 

As for the external enclosure, 

 

What enclosure are you using, and how is it connected (USB2/USB3, eSATA, etc).  And what is it connected to? Motherboard, a controller card?  And what model? 

Link to comment
Share on other sites

  • 0

Thanks Chris!

 

So, big changes recently!  I've been starting to wonder about a lot of things in my setup.  The ASUS ROG Rampage IV Extreme is showing it's age.  To get full USB 3.0 speeds you have to really monkey around with things and I just haven't been able to achieve what I wanted to.  The controller on the board (for USB 3.0) is an Asmedia 1042.  4 seperate HUB's for 5gb (yeah right!) per port.  The external controller is the Startech 8-bay eSATA/USB 3.0 UASP JBOD enclosure.  It's loaded with 8 1tb/500gb(de-stroked) WD Enterprise SATA II drives.  I could achieve a solid 90-100 MB/s from the enclosure.

 

As of today I changed to this:

Startech PEXUSB312A USB 3.0 Gen 2 Type A 2-port card.  I now have this installed and the enclosure connected to it.  I'm getting a full 250-300 MB/s transfer rate from the enclosure, which is about as good as SATA II can give me.  All 4 of my PCIe slots are v3.0.  So there's good news on the horizon.  I'm currently in the process of moving my pool to this external enclosure.

 

I also purchased more SSD's, so I am going to build a new pool just for SSD's and move all my VM's there.  4 x 500GB Samsung 850 EVO's, 1 x 1TB Samsung 850 EVO, 1 x 512GB Samsung 840 Pro, and 2 x 256GB OCZ Vector 4's.  All of these SSD's will live on my internal SATA controllers and the old Startech 4-port SATA 3 6gb card.  So I have 8 6gb ports and 4 3gb ports that will comprise this SSD pool.  I'm making sure my stand along Samsung 850 Pro 512gb(OS) and 850 Pro 1TB(installs) are on the 6gb ports.

 

I'm hoping to just get rid of all my issues getting away from the on board Asmedia 1042 controller and using a more solid USB 3.1gen2 interface for better performance, etc.

 

I'll update with how things go, and perhaps the other issues will just magically disappear.  Once this is done/stable, I'll be wiping and reloading Windows 10.  I just want to make sure my pools and data are configured the way I want them before I do so.

Link to comment
Share on other sites

  • 0

Quick update...

 

Installed SSD's to internal controllers, moved all other drives except OS and INSTALLS to external.  Created a new pool just for VM's using the SSD's.  Copied all VM's from backup into new pool.

 

I made sure that the lab VM's (11) I'm running are balanced across the drives 2 each except for 1, via rules.  No duplication.

 

I still get HORRIBLE performance from the pool and "not responding" errors, although much less frequently with the SSD pool vs the mechanical drive pool.  I actually got better performance running every VM from a single non-pooled SSD then I am spreading the load across a pool of no more than 2 VM's per drive.

 

At this point I'm going to destroy the pool, install DriveBender, create a new DriveBender pool and restore them there and see how it compares in performance.

 

 

Chris

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...