Jump to content
  • 1

SMB access slow (high latency) to get started when shared dir is in a Drivepool


Ka Lam

Question

Current environment description:

  • Host that has DrivePool installed OS is Win10 Pro 64bit.  DrivePool version 2.2.4.1162.  
  • Client OS is also Win10 Pro 64bit.
  • Both are connected via 1G wired Ethernet by a switch.

The issue:

I have shared out some directory from a pool from the host.  I've always noticed access delay issue, but have worked around the problem for years (> 3, if not 5 years.  The systems back then was running Win 7)

Recently I spend some effort to try to dig into the problem a bit, and here is what I find so far:

On the host, on a drive (t:) that's part of the pool (e:), I create an exact copy of a sub-dir that's part of the pool.
i.e.

E:/Drivepool/TestSubject1

has the same content as

T:/TestSubject2

(size of the sub-dir is around 2.?? GB)

The copy on the host from TestSubject1 to TestSubject2 is "fast" (in the expected 100MB/s range, and the copy start as soon as I drag and drop)
I also share out the TestSubject2 as SMB share.
On the client, I open separate file explore windows to each of the SMB share of the host.  And drag and drop copy from the SMB share TestSubject1 and TestSubject2 to a client local HDD.
The copy from TestSubject1 took 30+ seconds to get start, but once it start, the speed is "normal" (80-90MB/s)
The copy from TestSubject2 started right the way (no 30 seconds pause), and the speed is in the same range (80-90MB/s)


I also have a RDP window open from the client to the host, and I can observe the DrivePool Windows (where I can see the "Disk activity")
During the 30 sec pause, there is no disk activity on the pool, once the copying start (as observed in the copy progress dialog box), then DrivePool windows show activity.

I have search this forums, and noticed a few possibly related posts
https://community.covecube.com/index.php?/topic/3984-pauses-in-file-operations-on-shares-caused-by-drivepool/
and some previous suggestions:
https://wiki.covecube.com/StableBit_DrivePool_Q542215 (I have done this on the host to disable Windows Search indexing)
https://wiki.covecube.com/StableBit_DrivePool_Q7420208 (I didn't do this, as I don't see how this would apply.)

Background:

I start digging into this issue because I have a VM running dietpi (OS) running Plex Server, accessing a music folder via SAMBA to the host (same host as above).  When Plex is scanning the music folder, it took forever (as in many days) and I noticed (by netdata) there is high iowait.  Eventually, I gave up and stop using the SAMBA to access to host, and just make a complete copy of all my music file into the VM that's running, and let Plex re-scan it.  And I notice the night and day different! The same Plex VM also access via SAMBA to my video folder, and it always takes a while (30 sec ish? :P) to start stream my video, but it has no problem streaming content in 1080p or 4k (thus transfer bandwidth is not a problem.)  Which lead me to think, it's a latency issue (takes a long while to start, once it started, it's 'fast')
I always feels that something is "wrong", but until the above Plex music folder experience, I couldn't pinpoint what's wrong.  I also though it's that host, but given I did a re-install of the OS (when I upgraded from Win7 to Win10) and it didn't make a difference.  Then the idea of trying the SMB share on a non-pool drive was tested, and that was 'fast' (latency), then I suspect it has something to do with Drivepool (thus this post). 

Further background:
My drive pool currently has 1x14TB, 3x8TB, 2x4TB HDD, files are mostly 2x duplicated, some sub-folders are 3x duplicated.  The pool has evolved over time, from 4x2TB to mostly 4TB to the current state.

I also use Windows Performance Monitor to observe the host and the client, I got the idea from this webpage from Microsoft.  On the host, I observed "SMB Server" and "SMB Server Sessions" (about Read Bytes/sec, Read Request/Sec, Avg. Read Queue Length, Current Data Queue Length, Current Open File Count, Current Pending Requests...etc).  On the client, I observed "SMB Client Shares" for both TestSubject1 and TestSubject2 (independently) (about Avg Bytes/Read, Avg Data Queue Length, Avg Read Queue Length, Avg. sec/Data Request, Avg. sec/Read, Current Data Queue Length,...etc)  The observation is, during the 'pause', the client side counter has no activity, after the 'pause' (ie, file copy has started), the performance counter of TestSubject1 and TestSubject2's activities are "comparable".  On the host side, the performance counter of SMB Server / Server Sessions are also "no activity" during the pause (but has Data Queue Length of 1) and "normal" looking activity during the actual copy operation. (of both copy)


If extra details/log/debug trace, or video capture of my above debug action is needed, I can provide them.  (I work in the field of computer and is quite technical, and I am willing to work with someone to figure this out.)

Link to comment
Share on other sites

13 answers to this question

Recommended Posts

  • 1

I'm stubborn, so I had to figure this out myself.
Wireshark showed:

SMB2	131	Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
SMB2	131	Create Response, Error: STATUS_FILE_IS_A_DIRECTORY
SMB2	131	GetInfo Response, Error: STATUS_ACCESS_DENIED
SMB2	166	Lease Break Notification

I thought it might be NTFS permissions, but even after re-applying security settings per DP's KB: https://wiki.covecube.com/StableBit_DrivePool_Q5510455 I still had issues.

The timer is 30 seconds, adding 5 seconds for the SMB handshake collapse. It's the oplock break via the Lease Break Ack Timer.
This MS KB helped: Cannot access shared files or folders on a drive in Windows Server 2012 or Windows Server 2012 R2

Per MS (above) to disable SMB2/3 leasing entirely, do this:
 

REG ADD HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters /v DisableLeasing /t REG_DWORD /d 1 /f

I didn't need to restart SMB2/3, the change was instant and file lookups and even a simple right-click in Explorer came up instantly. A process that took 8days+ finished in an hour or so :)
Glad to be rid of this problem. Leases are disabled yes, but SMB oplock's are still available.

Link to comment
Share on other sites

  • 0

Have you tried changing the auto-tuning level?  

netsh int tcp set global autotuninglevel=highlyrestricted

This can have a significant impact. Though, you'd want to run it on each Windows system that connects to the system. 

Also, on the Network Adapter(s) in device management, try disabling any option that has "checksum" or "offload" in the name. Also, green" ethernet, or  interrupt modulation.   Tweaking jumbo frames may also help.    
And like above, on all of the Windows system connecting. 

 

There is also a "Network I/O Boost" option in the performance options for the pool. Try toggling this, as it tries to prioritize network access over local, at the cost of CPU cycles. 

Link to comment
Share on other sites

  • 0

I don't know exactly why you are experiencing any delay on file transfers via SMB with DrivePool. I have my computers all using the DrivePool J: (on my system) as an assigned network drive. There is no difference between file transfers off DrivePool or any other network drive. I find the only limit in DrivePool is maybe the speed of the physical HDDs that the file was sitting on in DrivePool. Since I am using 16 HDDs on USB 3.0 for my DrivePool, that is the only speed limit I have noticed. I do have 1 SSD on my DrivePool, and obviously files in cache on that drive transfer much faster.

I currently use Windows 10 Remote Desktop on my client computers to monitor my host server, and when I request a file from DrivePool, there is no delay at all.

I did have an issue with SMB on my Amazon Fire TV Stick and Kodi. For some reason, Kodi was having a problem negotiating file lookups with SMB set on automatic. Evidently, there are 3 or 4 versions of SMB. I reset my Kodi to SMB version 2.0 and that solved my SMB issue specific to Kodi on the Fire TV Stick. But that had nothing to do with my computers.

Link to comment
Share on other sites

  • 0
On 11/16/2020 at 8:42 PM, Ka Lam said:

Current environment description:

  • Host that has DrivePool installed OS is Win10 Pro 64bit.  DrivePool version 2.2.4.1162.  
  • Client OS is also Win10 Pro 64bit.
  • Both are connected via 1G wired Ethernet by a switch.

The issue:

I have shared out some directory from a pool from the host.  I've always noticed access delay issue, but have worked around the problem for years (> 3, if not 5 years.  The systems back then was running Win 7)

Recently I spend some effort to try to dig into the problem a bit, and here is what I find so far:

On the host, on a drive (t:) that's part of the pool (e:), I create an exact copy of a sub-dir that's part of the pool.
i.e.

E:/Drivepool/TestSubject1

has the same content as

T:/TestSubject2

(size of the sub-dir is around 2.?? GB)

The copy on the host from TestSubject1 to TestSubject2 is "fast" (in the expected 100MB/s range, and the copy start as soon as I drag and drop)
I also share out the TestSubject2 as SMB share.
On the client, I open separate file explore windows to each of the SMB share of the host.  And drag and drop copy from the SMB share TestSubject1 and TestSubject2 to a client local HDD.
The copy from TestSubject1 took 30+ seconds to get start, but once it start, the speed is "normal" (80-90MB/s)
The copy from TestSubject2 started right the way (no 30 seconds pause), and the speed is in the same range (80-90MB/s)


I also have a RDP window open from the client to the host, and I can observe the DrivePool Windows (where I can see the "Disk activity")
During the 30 sec pause, there is no disk activity on the pool, once the copying start (as observed in the copy progress dialog box), then DrivePool windows show activity.

I have search this forums, and noticed a few possibly related posts
https://community.covecube.com/index.php?/topic/3984-pauses-in-file-operations-on-shares-caused-by-drivepool/
and some previous suggestions:
https://wiki.covecube.com/StableBit_DrivePool_Q542215 (I have done this on the host to disable Windows Search indexing)
https://wiki.covecube.com/StableBit_DrivePool_Q7420208 (I didn't do this, as I don't see how this would apply.)

Background:

I start digging into this issue because I have a VM running dietpi (OS) running Plex Server, accessing a music folder via SAMBA to the host (same host as above).  When Plex is scanning the music folder, it took forever (as in many days) and I noticed (by netdata) there is high iowait.  Eventually, I gave up and stop using the SAMBA to access to host, and just make a complete copy of all my music file into the VM that's running, and let Plex re-scan it.  And I notice the night and day different! The same Plex VM also access via SAMBA to my video folder, and it always takes a while (30 sec ish? :P) to start stream my video, but it has no problem streaming content in 1080p or 4k (thus transfer bandwidth is not a problem.)  Which lead me to think, it's a latency issue (takes a long while to start, once it started, it's 'fast')
I always feels that something is "wrong", but until the above Plex music folder experience, I couldn't pinpoint what's wrong.  I also though it's that host, but given I did a re-install of the OS (when I upgraded from Win7 to Win10) and it didn't make a difference.  Then the idea of trying the SMB share on a non-pool drive was tested, and that was 'fast' (latency), then I suspect it has something to do with Drivepool (thus this post). 

Further background:
My drive pool currently has 1x14TB, 3x8TB, 2x4TB HDD, files are mostly 2x duplicated, some sub-folders are 3x duplicated.  The pool has evolved over time, from 4x2TB to mostly 4TB to the current state.

I also use Windows Performance Monitor to observe the host and the client, I got the idea from this webpage from Microsoft.  On the host, I observed "SMB Server" and "SMB Server Sessions" (about Read Bytes/sec, Read Request/Sec, Avg. Read Queue Length, Current Data Queue Length, Current Open File Count, Current Pending Requests...etc).  On the client, I observed "SMB Client Shares" for both TestSubject1 and TestSubject2 (independently) (about Avg Bytes/Read, Avg Data Queue Length, Avg Read Queue Length, Avg. sec/Data Request, Avg. sec/Read, Current Data Queue Length,...etc)  The observation is, during the 'pause', the client side counter has no activity, after the 'pause' (ie, file copy has started), the performance counter of TestSubject1 and TestSubject2's activities are "comparable".  On the host side, the performance counter of SMB Server / Server Sessions are also "no activity" during the pause (but has Data Queue Length of 1) and "normal" looking activity during the actual copy operation. (of both copy)


If extra details/log/debug trace, or video capture of my above debug action is needed, I can provide them.  (I work in the field of computer and is quite technical, and I am willing to work with someone to figure this out.)

I realize this is a long shot, but did you ever solve this issue? I just finally bit the bullet and upgraded my Win 7 Drivepool server to Win 10 and am encountering the exact same sluggishness when initially accessing shares that are on the pooled drive, but not when accessing shares that are on the drives directly.

Link to comment
Share on other sites

  • 0

No. I never got to the bottom of it, so I still observe the issue.  

A few new things I observed are:
1) Win11 OS as SMB client still has the same latency issue. (I built a new desktop, new CPU/MB + fresh install OS)
2) (Window's) scp to/from the pool drive has no latency issue.
3) I also has another (new to me) desktop, which has a new SSD, fresh installed Win10 OS, using a USB-wifi connection, observe the same latency issue.

I already has "Network I/O Boost" option turned on at the pool.

I will give auto-tuning level a try next time I have a chance.  It's likely that it will be 7 to 10 days before I can try it tho.

Link to comment
Share on other sites

  • 0

Hi, I think I am experiencing the same issue.

Each file takes 35 seconds to start transferring, after which it transfers normally. These events (ID 1020) appear in Event Viewer for each file (details removed):

File system operation has taken longer than expected.

Client Name: 
Client Address: 
User Name: 
Session ID: 
Share Name: 
File Name: 
Command: 8
Duration (in milliseconds): 35379
Warning Threshold (in milliseconds): 15000

Guidance:

The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB.

The duration is always just over 35 seconds.

This event tells me that something is wrong at the filesystem level.

I've also noticed that when the SMB client first connects to the share (which also takes around 35 seconds), I can't load the shared folder in File Explorer on the server. I also get the same warning in Event Viewer for the desktop.ini file.

I have occasionally experienced the same latency issue when browsing to a folder or transferring files on the server itself, but not often, and certainly not every single time as with SMB.

Edit: The above is what happens consistently when the SMB client is Windows 10 and the share is on the DrivePool volume. I don't have any issues when the SMB client is Total Commander on Android.

Edited by orondf343
Tested on Android
Link to comment
Share on other sites

  • 0

I have tried the tuning command, Network I/O Boost, stopping WSearch. I have yet to try the network adapter options, but have tried disabling LargeSendOffload in the past.

I have also managed to reproduce this issue in a Win11 virtual machine running on the server itself.

Link to comment
Share on other sites

  • 0

This is still an issue for me as well.

I have two DP setups, both with Windows Srv 2016.
One is affected (physical), and the other is not (vm).

Drive speed on the affected system is 1GB/s+, all possible tweaks (above) have been tried, including disabling Opportunistic Locking, but each file transfer takes 35 seconds to begin, so imagine 10 files, each are 1MB... it takes literal minutes. Reducing the use of Drivepool for local use only, as this issue lies with SMB and Drivepool.

There are no Server DFS issues on the affected system, R/W's on the disks are all 1000MB/s+ speed arrays.

The delays are 35-seconds on-the-nose. This also happens to be the 'Lease Break Acknowledgment Time' default for SMB v2 in Windows.
I hope a dev could have another look at this, the Oplock 35 second Lease Break timer is the only 35 second timeout I can find with SMB and Windows.
Thanks!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...