Jump to content
  • 0

Beta after 747 have memory issues?


browned

Question

Just a quick note to say that any beta I have tried after 747 has after a few hours caused the server to die a terrible and slow death of memory exhaustion.

 

I do not have logs and have not bothered to mention it before as I have been to busy. But on trying most builds after 747 to see if the performance and memory issues are fixed I thought I better post since I had some time.

 

I have VMWare based Windows 2016, 4 Cores, 8GB RAM (now 12GB), 2 x HPE P410 smart array cards with 7 x raid 0 arrays. The I have a pool within a pool, so Pool A consists of 7 x Raid 0 disks 20TB, Pool B consists of 7 x Raid 0 Disks 20TB, Pool C consists of Pool A and Pool B duplicated.

 

One thing I noticed is that stablebit was checking Pool C and it was at 3% for many hours, file details changing every 5 to 10 seconds.

 

Hopefully you have noticed something similar, or can replicate without logs as the impact on my server is too drastic for it to run more than a few hours.

Link to comment
Share on other sites

Recommended Posts

  • 0

I'm on the latest beta and I've had big memory problems too, consistently using 90-100% of my memory after having the machine on for a while. I use both DrivePool and CloudDrive, haven't really been able to find clear evidence tied to either one, but RAM Map always ends up with a huge Non-Paged Pool and no processes tied to it.

Link to comment
Share on other sites

  • 0

I'm running 2.2.0.754 on my personal server and not seeing a memory issue (aside from running too much on it). 

 

 

Thta said, if you suspect that you're having an issue with memory usage (eg, a memory leak), please do this: 

http://wiki.covecube.com/StableBit_DrivePool_System_Freeze

 

This will dump the memory to disk by causing a BSOD, and is a great snapshot of what was going on.  The linked page has an upload form as well.  

 

Compress the file and upload the crash dump.

 

 

 

This way, we can take a look at what is going on DIRECTLY, and verify the problem.  And if it is a memory leak, then we can take a look and see what is causing it. 

Link to comment
Share on other sites

  • 0

Not sure it is a leak, on my system seems more like a river. As I said in my first post I had 8GB ram, increased to 12GB. This was exhausted in a matter of hours. I did noet that there was 4.5GB paged pool used, and about 4GB non page pool memory used. Nothing else stood out as no actual applications listed excessive memory usage.

 

As a comparison, with my server reverted to 747 and it has been running for a day and a half, is using 3.4GB of 12GB ram. Paged Pool is 659MB and non paged pool is 257MB.

 

Maybe next time I upgrade, won't be for a while, I will hopefully have more time to investigate.

Link to comment
Share on other sites

  • 0

I've been crashing lately as well.  it seems that the latest beta (Whatever was the latest 2 days ago) seems to be at least better than the ones before it. I haven't dug into it. just replying here as I've been having a crash about every 3 days, and AFAIK the only change on my system was updating Drivepool... Think i might go back to 747 if I have another problem and see if it goes away.

Link to comment
Share on other sites

  • 0

Not sure it is a leak, on my system seems more like a river. As I said in my first post I had 8GB ram, increased to 12GB. This was exhausted in a matter of hours. I did noet that there was 4.5GB paged pool used, and about 4GB non page pool memory used. Nothing else stood out as no actual applications listed excessive memory usage.

 

As a comparison, with my server reverted to 747 and it has been running for a day and a half, is using 3.4GB of 12GB ram. Paged Pool is 659MB and non paged pool is 257MB.

 

Maybe next time I upgrade, won't be for a while, I will hopefully have more time to investigate.

Well, a leak is a leak, regardless how much it may be "hemorrhaging"  :)

 

But either way, a memory leak is NOT good, at all.  And there are a couple of changes that could have caused this.

 

 

That said, the reason I ask for a memory dump, is that this going to be harder to track down than a normal application.  The CoveFS.sys driver is what "is" the pool, and this runs in "kernel mode", meaning that it shows up as part of "system" usually.  

 

So, it may not show up in an easy to see way.  Getting a memory dump lets us see exactly what was happening at that point in time, and allows Alex to delve deep into the issue. 

 

 

 

tl;dr: the literal best then you can do for us when you're experiencing a memory leak is getting a memory dump. 

 

I've been crashing lately as well.  it seems that the latest beta (Whatever was the latest 2 days ago) seems to be at least better than the ones before it. I haven't dug into it. just replying here as I've been having a crash about every 3 days, and AFAIK the only change on my system was updating Drivepool... Think i might go back to 747 if I have another problem and see if it goes away.

 

Crashes, or the system freezing up?

 

If it's crashing, then get us the crash dumps. 

 

You can do this manually: 

http://wiki.covecube.com/StableBit_DrivePool_System_Crashes

 

Or you can grab the StableBit TroubleShooter, and run that. This will grab the minidumps, which are useful for diagnosing BSODs.

 

 

Otherwise, if the system is locking up, then ... above. Do the "system freeze" link from above and grab a crash dump. And then roll back to a version you know works better. 

Link to comment
Share on other sites

  • 0

grab the troubleshooter from here:  http://wiki.covecube.com/StableBit_Troubleshooter 

 

logs submitted

 

I'm installing 754 as recommended now...

 

 

Beta Versions I've had installed (that are still in my history... in case it helps)
.762
.758
.748
.744
grabbing 754 now
 
Well... not so fast - When I try to install it says there's a version already installed. do I need to un-license  the current version? and what about the SSD plugin?
Link to comment
Share on other sites

  • 0

I'm not glad to say that I was the only one having this issue, but me too.   My memory 64gb is maxed after a few hours of balancing and such.

 

NTFS or ReFS pool?

 

Also, could you grab a memory dump of the system when this is occurring?

http://wiki.covecube.com/StableBit_DrivePool_System_Freeze

 

 

grab the troubleshooter from here:  http://wiki.covecube.com/StableBit_Troubleshooter 

 

logs submitted

 

I'm installing 754 as recommended now...

 

 

Beta Versions I've had installed (that are still in my history... in case it helps)

.762

.758

.748

.744

grabbing 754 now
 
Well... not so fast - When I try to install it says there's a version already installed. do I need to un-license  the current version? and what about the SSD plugin?

 

 

A quick look at the logs you've submitted, I'm seeing a LOT of crashes being caused by the "StorAHCI" driver.  This may indicate an issue with your storage controller (the drives for it), rather than StableBit DrivePool.   But it's entirely possible that it's an odd interaction.

 

I've flagged the logs for Alex anyways, just in case. 

 

https://stablebit.com/Admin/IssueAnalysis/27539

 

 

And to downgrade, uninstall the software, reboot, reinstall.  It will retain the balancing and licenses settings in this case. 

Link to comment
Share on other sites

  • 0

So ... The Controller or the drive? I ask because one drive (the SSD) StableBit is saying "doesn't have smart errors but some things known to us indicate this drive is garbage" <- I'm sure it's worded slightly differently... I'm hoping it's the drive? 

 

S.M.A.R.T. is not predicting imminent disk failure. However, some well known S.M.A.R.T. attributes that are indicators of mechanical problems are showing signs that the drive could be failing.

Link to comment
Share on other sites

  • 0

So ... The Controller or the drive? I ask because one drive (the SSD) StableBit is saying "doesn't have smart errors but some things known to us indicate this drive is garbage" <- I'm sure it's worded slightly differently... I'm hoping it's the drive? 

 

S.M.A.R.T. is not predicting imminent disk failure. However, some well known S.M.A.R.T. attributes that are indicators of mechanical problems are showing signs that the drive could be failing.

 

The storage controller.  Eg, the AHCI controller on the motherboard, most likely. 

 

That said, if a drive is failing (or going downhill), it *could* cause issues with the controller it's attached to, and BSOD like this. I mean... that's not the weirdest issue I've seen (when I had a HighPoint card, I had a bad drive that would cause the system to crash with a TCPIP.sys BSOD, because the highpoint management software was web enabled and using it ....  so...)

 

 

But that's why I flagged these for Alex.  he's better at reading them than I am, so he should have a better idea of what is going on. 

 

 

that said, a memory test on the system in question isn't a bad idea.

Link to comment
Share on other sites

  • 0

I had some time and grabbed this yesterday. Been running for almost 24 hours and total system ram use is 3.2GB, paged pool 850MB, Non Paged 256MB. Memory leak seems to be resolved as my system used to die after a few hours before.

 

Great work thanks.

Link to comment
Share on other sites

  • 0

I had some time and grabbed this yesterday. Been running for almost 24 hours and total system ram use is 3.2GB, paged pool 850MB, Non Paged 256MB. Memory leak seems to be resolved as my system used to die after a few hours before.

 

Great work thanks.

 

Fantastic. That's great to hear. 

 

 

And yeah, there were 4 big changes recently, so that's probably why the leak "snuck in"

 

But we're glad to get confirmation. 

Link to comment
Share on other sites

  • 0

Drashna, can someone smarter than me show me how to determine which program is memleaking?  For the longest time I thought it was the Drivepool Beta.  Then you guys fixed it and others said the problem was solved.

 

However, I'm still within a few hours of balancing using all 64gb of RAM.  That shouldn't be normal.

 

I'm looking at Task Mananager as well as Resource Manager and can't notice any program using 50gb plus of RAM.

 

Is there something I can do to determine the problem?

Link to comment
Share on other sites

  • 0

Well, the problem is that it's not a "program" per say, most likely.  If it's out software, it's the "system" that is leaking, because of the kernel driver.  

 

As for actually troubleshooting.... download the Windows Driver Kit, for "poolmon",  Enable "pool tagging", and reboot.  Then run poolmon, and dig into it.

 

Likely, it's going to be "covefs.sys", which is the pool driver.

'

 

That said... enabling "verifier" and getting a memory dump is going to be super helpful for us to identify and fix the problem.

 

To enable verifier, run "verifier", create a new profile, selecting "covefs.sys" for the driver, and then reboot.   Then get a memory dump like normal, and then re-run verifier and delete the profile. 

 

The "Driver Verifier" is great for diagnostics and development, but it can (will) cause issues.... for instance, it can cause measuring a large pool to take DAYS, rather than minutes or an hour.  I say this from personal experience.... and this is why internal beta versions should flag you if the verifier is enabled. 

Link to comment
Share on other sites

  • 0

Chris, I would love to provide that data but despite following the instructions in your wiki and previous posts, I can't get the system to crash and create a dump.

 

Here's what I have done, disabled all my other start up programs and then gone by beta by beta and loaded it to see if the memory still leaks.  I'm on .738 now and 2 hours into balancing I'm at 62gb RAM used.

 

How far back can I go before I lose REFS support since about 1/4 of my drives are converted to REFS?

Link to comment
Share on other sites

  • 0

Drashna, one other thing to consider that I'm testing, I had a RAM disk setup and no virtual paging file (it was set to 0).  During balancing and there's a lot of that since I'm removing drives and formatting to REFS, perhaps it's using a lot of virtual memory that I don't have since I wanted it all to be used in RAM?

 

I don't know.  I'm not a programmer.

Link to comment
Share on other sites

  • 0

Chris, I would love to provide that data but despite following the instructions in your wiki and previous posts, I can't get the system to crash and create a dump.

 

Here's what I have done, disabled all my other start up programs and then gone by beta by beta and loaded it to see if the memory still leaks.  I'm on .738 now and 2 hours into balancing I'm at 62gb RAM used.

 

How far back can I go before I lose REFS support since about 1/4 of my drives are converted to REFS?

 

This should work:

http://wiki.covecube.com/StableBit_DrivePool_System_Freeze

 

But it does require a reset before it becomes effective.

 

If that doesn't work, then "NotMyFault" will definitely work:

https://technet.microsoft.com/en-us/sysinternals/notmyfault.aspx

 

Just run it first, and leave it running until you need it.  (don't click "crash" until it's ready). 

 

As for ReFS, it really depends on the version of Windows.  The first version that ReFS existed in is Windows 8 or Server 2012.  But as a nice user posted, this is a good chart: 

https://gist.github.com/0xbadfca11/da0598e47dd643d933dc

 

If you're running Windows 10 Creators Update, you can run "fsutil fsinfo refsinfo x:" on the drive in question to get the specific ReFS version. 

 

 

Drashna, one other thing to consider that I'm testing, I had a RAM disk setup and no virtual paging file (it was set to 0).  During balancing and there's a lot of that since I'm removing drives and formatting to REFS, perhaps it's using a lot of virtual memory that I don't have since I wanted it all to be used in RAM?

 

I don't know.  I'm not a programmer.

 

Oh, yeah, you need a page file for this to work at all.  Period.  Without it, it won't properly create a crash dump.  

 

But you should be able to get away with a small one (such as 1GB), at least temporarily. 

Link to comment
Share on other sites

  • 0

Don't want to completely uninstall Drivepool and look for a replacement but can't deal with the memleaks.  I max out the full 64gb RAM every 3 hours that drivepool is balancing.

 

Yes, I know you want a memdump but the process is convoluted and not user friendly.

 

My last ditch effort will be to install the x86 version instead of the x64 version.  Will this limit RAM usage to 4gb like x86 versions of windows limits?  Or will I still face the inevitable memleak?

Link to comment
Share on other sites

  • 0

Is this only happening when balancing?  

Or? 

 

If it's just happening when balancing, then try disabling balancing and see if that helps (or at least slows down the rate). 

 

As for the memory dump, yeah, it's not exactly straightforward. 

 

That said, "NotMyFault" may make it easier, as it's a GUI program that will initiate a crash dump. 

The caveat here is that you do need a page file specified, *and* you need to make sure it's configured to write out the memory dump.

http://wiki.covecube.com/System_Crashes

 

As for installing the 32 bit version, no.  It will only work on the matching OS.  So if it's a 64 bit version of Windows, the x86 installer won't install.  

 

 

That said, the earliest version that supports ReFS is the public beta.  (well, specifically 2.2.0.650, but close enough). 

 

 

That said, if you can identify which build this starts with, it would help.  That way, we at least know what change is triggering the issue. (potentially). 

 

 

But again, to be blunt, getting the memory dump would be best.

 

But if you are using a RAM disk, it may not be a bad idea to temporarily disable it.

The same goes with antivirus. 

Link to comment
Share on other sites

  • 0

Just keeping you updated, Drashna.

 

Uninstalled Drivepool completely last night before bed.  System has been up 12hours and only consumed 7.2gb RAM.

 

It's definitely drivepool that is memleaking for me.

 

I have to reinstall it eventually to try out Storage Spaces and move data.  I will try again to provide a memdump for you.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...