Understanding File System Minifilter and Legacy Filter Load Order

March 25, 2013, 4:39 pm

≫ Next: Debugging a Network Connectivity Issue - TrackNblOwner to the Rescue

≪ Previous: Don't Believe Everything You Read

Hello, my name is Fred Jeng from the Global Escalation Services team. For today’s post, I want to go over how Windows 7 and Windows Server 2008 R2 load file system mini-filters in a mixed environment when legacy filters are also present. I recently came across an issue where the filters were being loaded out of order based on their altitudes. This can cause all sorts of problems with a filter driver’s functionality if they are incorrectly positioned on the stack. Take for example the following filter stack, obtained using the fltmc command from the cmd prompt:

C:\Windows\system32>fltmc

Filter Name Num Instances Altitude Frame

------------------------------ ------------- ------------ -----

AVLegacy 389998.99 <Legacy>

EncryptionLegacy 149998.99 <Legacy>

AVMiniFilter 3 328000 0

luafv 1 135000 0

FileInfo 13 45000 0

At first glance it looks like there is a problem causing the legacy encryption filter to be loaded above the antivirus minifilter, which has a higher altitude. This may cause issues with AVMiniFilter as the IOs that it receives are still encrypted. Due to limitations in how the filter drivers attach to the driver stack, this is actually the intended behavior. However, there is a solution to manipulate the load order to load the legacy filters correctly based on their altitude.

First some background information regarding legacy filters and minifilters.

In the old days before minifilters, legacy drivers can only attach at the top of the driver stack so the load order also controlled the attachment order. The earlier a legacy driver loads, the lower it can attach on the file system stack. Minifilters on the other hand can load at any time, but their positions relative to other minifilters are controlled by their altitude. When a minifilter loads, it needs to register with an appropriate frame created by fltmgr. Each frame is a fltmgr device object and represents a range of altitudes. There can be more than one frame on the file system stack but the range of altitudes that each frame represents cannot overlap with the altitude range of another frame. For interoperability with legacy drivers, minifilters must still maintain a load order group. The frames are created and managed by fltmgr, which itself is a legacy driver. The ramification of this is that fltmgr must follow the old legacy filter driver rules and attach only at the top of the stack.

From the above example, let’s walk through how the legacy and minifilters above are loaded to get us into the state such that the altitude appears to be out of order.

First, here are the details for the 5 drivers.

Driver Name	Type	Load Order	Start Type	Altitude
AVLegacy	Legacy	FsFilter Anti-Virus	SERVICE_BOOT_START	329998.99
AVMiniFilter	Minifilter	FsFilter Anti-Virus	SERVICE_BOOT_START	328000
EncryptionLegacy	Legacy	FsFilter Encryption	SERVICE_BOOT_START	149998.99
Luafv	Minifilter	FsFilter Virtualization	SERVICE_AUTO_START	135000
FileInfo	Minifilter	FSFilter Bottom	SERVICE_BOOT_START	45000

MSDN has an article that describes load order groups and altitudes for minifilters: http://msdn.microsoft.com/en-us/library/windows/hardware/ff549689%28v=vs.85%29.aspx.

Referencing this article regarding load order groups and altitudes for minifilter drivers, we can determine that our filters will load in the following order.

FileInfo

EncryptionLegacy

AVLegacy

AVMiniFilter

luafv

On system bootup, when fltmgr.sys loads it will create Frame 0 with a default altitude range of 0 to 49999. When FileInfo loads with an altitude of 45000, it will fit into the default Frame 0. Next to load is EncryptionLegacy. Since this is a legacy driver, it will attach on top of the legacy driver fltmgr.sys. So this is how our file system stack looks right now.

Next up is the AVLegacy driver. This is a legacy driver, so it has to attach above EncryptionLegacy.

Now the AVMinifilter will load with an altitude of 328000. The OS will check if it will fit in the Frame 0 Fltmgr, but this frame only supports an altitude of 0-45000. Before deciding to create a new fltmgr Frame instance, it will check if there are any legacy filters attached above Frame 0 and adjust Frame 0’s altitude if there are. So in our case, we do have legacy filters on the file system stack at this point and so we go up the list of legacy drivers. First we see EncryptionLegacy with an altitude of 149998.99 so we adjust Frame 0 to cover from 0 to 149998.99. We continue up the list and see AVLegacy with an altitude of 329998.99 so we again adjust the altitude of Frame 0 to now cover 0-329998.99. The reason we do this is because Frame 0 now must handle all minifilters below 329998.99. Since we can only attach legacy filters to the top of the stack, if we add an additional FltMgr frame instance, it has to sit above AVLegacy and can only support minifilters with altitude of 329998.99 or higher. Now that Frame 0 supports 0-329998.99, we can register AVMinifilter with Frame 0.

At this point, you can already see that AVMinifilter which has a higher altitude then EncryptionLegacy will be loaded below EncryptionLegacy. The last driver to load is the luafv minifilter, and it will fit into Frame 0.

A couple of things to point out.

Why can’t we insert a Frame between AVLegacy and EncryptionLegacy when ACMinifilter loads?

This is due to how the file system stack is constructed with legacy drivers only being able to attach to the top of the stack. Since FltMgr is a legacy driver, it has to conform to these rules.

Why do we adjust the altitude in Frame 0 to cover 0-329998.99? Why not stop at 149998.99?

If Frame 0 only adjusts its altitude to the legacy filter directly attached above it and not all the way to the highest attached legacy filter, we won’t be able to handle some range of mini-filters. For example, assume we only adjust Frame 0 to cover 0-149998.99, then when the AVMinifilter with an altitude of 328000 comes along, it won’t fit in Frame 0, and we’re unable to insert a Frame between AVLegacy (329998.99) and EncryptionLegacy(149998.99) so we would either be unable to load AVMinifilter, or we would have to create Frame 1 above AVLegacy and load AVMinifilter there. In which case we would again be faced with the altitude disordering issue.

If this is the expected behavior, how do we resolve the problem of EncryptionLegacy being loaded above AVMinifilter? The solution is to inject a dummy minifilter that loads at the appropriate time to force fltmgr to create a Frame between the legacy filters. For our case above, I used the DDK to create the NullFilter minifilter driver and changed the load order to FSFilter Compression and gave it an altitude of 160030 which is within the assigned altitude for FSFilterCompression and set the start type to SERVICE_BOOT_START. Please note that I only used this driver in a test environment, production minfilter drivers must use an altitude assigned by Microsoft.

For information on minifilter load order groups and altitude, reference http://msdn.microsoft.com/en-us/windows/hardware/gg462963.aspx.

Our list of filters are as follows:

Driver Name	Type	Load Order	Start Type	Altitude
AVLegacy	Legacy	FsFilter Anti-Virus	SERVICE_BOOT_START	329998.99
AVMiniFilter	Minifilter	FsFilter Anti-Virus	SERVICE_BOOT_START	328000
EncryptionLegacy	Legacy	FsFilter Encryption	SERVICE_BOOT_START	149998.99
Luafv	Minifilter	FsFilter Virtualization	SERVICE_AUTO_START	135000
FileInfo	Minifilter	FSFilter Bottom	SERVICE_BOOT_START	45000
NullFilter	Minifilter	FSFilter Compression	SERVICE_BOOT_START	160030

So with the new NullFilter dummy driver, our filter load order should be as follows:

FileInfo

EncryptionLegacy

NullFilter

AVLegacy

AVMiniFilter

luafv

After FileInfo and EncryptionLegacy loads, the stack is the same as what we had earlier.

Now when the NullFilter minifilter loads with an altitude of 160030, we see that it doesn’t fit in Frame 0. As before, we check for any attached legacy filter drivers and see EncryptionLegacy so we adjust Frame 0 to cover 0-149998.99. Since NullFilter still does not fit in Frame 0, we will create a new Frame and attach it above the EncryptionLegacy driver.

The AVLegacy driver will load next, and since it is a legacy driver, it will attach above the Frame 1 instance of FltMgr.

The last two minifilters to load are AVMiniFilter and luafv. When AVMinifilter loads into Frame 1 with an altitude of 328000, it will see that Frame 1 at the time only supports 149998.99-160030. It follows the same algorithm to check if there are any legacy filters attached above the frame. In this case, we have AVLegacy attached above Frame 1 so we adjust Frame 1 to cover 149998.99-329998.99 before inserting AVMinifilter into Frame 1.

By strategically injecting a dummy minifilter driver, we can get the legacy and minifilter drivers to all load at the correct altitude.

↧

Debugging a Network Connectivity Issue - TrackNblOwner to the Rescue

March 29, 2013, 2:59 pm

≫ Next: Commitment Failures, Not Just a Failed Love Story

≪ Previous: Understanding File System Minifilter and Legacy Filter Load Order

Hello Debug community this is Karim Elsaid again. Today I’m going to discuss a recent interesting case where intermittently the server is losing access to the network. No communication (even pings) can be done from / to the server when the issue hits.

We went through the normal exercise and asked the customer to obtain a Kernel memory dump from the machine while it was in the problematic state, hoping that we will find some data to help us to demystify the issue.

One of the very first commands we run upon receiving a hang dump is the very famous “!locks” command. This yielded the following:

8: kd> !locks

**** DUMP OF ALL RESOURCE OBJECTS ****

KD: Scanning for held locks..

Resource @ nt!IopDeviceTreeLock (0xfffff80001a81c80) Shared 1 owning threads

Threads: fffffa800cd8a040-01<*>

KD: Scanning for held locks.

Resource @ nt!PiEngineLock (0xfffff80001a81b80) Exclusively owned

Contention Count = 6

Threads: fffffa800cd8a040-01<*>

KD: Scanning for held locks

84372 total locks, 2 locks currently held

What I’m looking for is Locks with exclusive owners and waiters. From the above output we can see that thread fffffa800cd8a040 exclusively owns a Plug and Play (Pi prefix) lock and shared owns an I/O Manager (Io prefix) device tree lock.

There are no waiters for the exclusive lock, however PnP locks always worth investigating. While debugging I always treat everything a possible suspect unless proven otherwise, so let’s dump this thread:

8: kd> !thread fffffa800cd8a040 e

THREAD fffffa800cd8a040 Cid 0004.005c Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable

fffff88002b0f118 SynchronizationEvent

IRP List:

fffffa8016527510: (0006,0310) Flags: 00000000 Mdl: 00000000

Not impersonating

DeviceMap fffff8a000006100

Owning Process fffffa800cd56040 Image: System

Attached Process N/A Image: N/A

Wait Start TickCount 14791337 Ticks: 15577 (0:00:04:03.002)

Context Switch Count 835317 IdealProcessor: 2

UserTime 00:00:00.000

KernelTime 00:00:26.863

Win32 Start Address nt!ExpWorkerThread (0xfffff8000188f530)

Stack Init fffff88002b0fc70 Current fffff88002b0ee30

Base fffff88002b10000 Limit fffff88002b0a000 Call 0

Priority 12 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

*** ERROR: Module load completed but symbols could not be loaded for myfault.sys

Child-SP RetAddr Call Site

fffff880`02b0ee70 fffff800`0187ba32 nt!KiSwapContext+0x7a

fffff880`02b0efb0 fffff800`0188cd8f nt!KiCommitThreadWait+0x1d2

fffff880`02b0f040 fffff800`018e1816nt!KeWaitForSingleObject+0x19f

fffff880`02b0f0e0 fffff880`01618fcd nt! ??::FNODOBFM::`string'+0x12ff6

fffff880`02b0f150 fffff880`0173f54e tcpip!FlPnpEvent+0x17d

fffff880`02b0f1c0 fffff880`00f87b2f tcpip!Fl48PnpEvent+0xe

fffff880`02b0f1f0 fffff880`00f884b7 NDIS!ndisPnPNotifyBinding+0xbf

fffff880`02b0f280 fffff880`00fa1911 NDIS!ndisPnPNotifyAllTransports+0x377

fffff880`02b0f3f0 fffff880`00fa2c5b NDIS!ndisCloseMiniportBindings+0x111

fffff880`02b0f500 fffff880`00f3bbc2 NDIS!ndisPnPRemoveDevice+0x25b

fffff880`02b0f6a0 fffff880`00fa5b69 NDIS!ndisPnPRemoveDeviceEx+0xa2

fffff880`02b0f6e0 fffff800`01aec8d9 NDIS!ndisPnPDispatch+0x609

fffff880`02b0f780 fffff800`01c6c1e1 nt!IopSynchronousCall+0xc5

fffff880`02b0f7f0 fffff800`0197f733 nt!IopRemoveDevice+0x101

fffff880`02b0f8b0 fffff800`01c6bd34 nt!PnpRemoveLockedDeviceNode+0x1a3

fffff880`02b0f900 fffff800`01c6be40 nt!PnpDeleteLockedDeviceNode+0x44

fffff880`02b0f930 fffff800`01cfcd04 nt!PnpDeleteLockedDeviceNodes+0xa0

fffff880`02b0f9a0 fffff800`01cfd35c nt!PnpProcessQueryRemoveAndEject+0xc34

fffff880`02b0fae0 fffff800`01be65ce nt!PnpProcessTargetDeviceEvent+0x4c

fffff880`02b0fb10 fffff800`0188f641 nt! ?? ::NNGAKEGL::`string'+0x5ab9b

fffff880`02b0fb70 fffff800`01b1ce5a nt!ExpWorkerThread+0x111

fffff880`02b0fc00 fffff800`01876d26 nt!PspSystemThreadStartup+0x5a

fffff880`02b0fc40 00000000`00000000 nt!KiStartSystemThread+0x16

Interesting, by looking at the stack above we can see that thread is doing some NDIS PnP stuff. This thread has been waiting for more than 4 minutes, but hold on, what is “ nt! ?? ::FNODOBFM::`string”? This doesn’t seem to be a useful function name, no its not! This is a side effect of Basic Block Tools optimization (BBT). Using public symbols the debugger will find it hard to get to the right symbol, there is a nice a trick you can use in order to get to the right function.

P.S for a nice x64 Deep Dive please refer to our archive.

Let’s display the function data for the return address fffff800`018e1816:

8: kd> .fnent fffff800`018e1816

Debugger function entry 000000e8`f28f14f8 for:

(fffff800`018c4790) nt! ?? ::FNODOBFM::`string'+0x12ff6 | (fffff800`018c47c8) nt!vDbgPrintExWithPrefixInternal

BeginAddress = 00000000`000da7d0

EndAddress = 00000000`000da81c

UnwindInfoAddress = 00000000`001c8a54

Unwind info at fffff800`019cfa54, 10 bytes

version 1, flags 4, prolog 0, codes 0

Chained info:

BeginAddress = 00000000`000182f0

EndAddress = 00000000`00018358

UnwindInfoAddress = 00000000`001bf910

Unwind info at fffff800`019c6910, 6 bytes

version 1, flags 0, prolog 4, codes 1

00: offs 4, unwind op 2, op info c UWOP_ALLOC_SMALL.

For optimized binaries, you will find a section “Chained Info”. Add the BeginAddress to the start address of the module and you should hit the correct function so:

8: kd> ln nt+000182f0

(fffff800`0181f2f0) nt!ExWaitForRundownProtectionReleaseCacheAware | (fffff800`0181f358) nt!KeGetRecommendedSharedDataAlignment

Exact matches:

nt!ExWaitForRundownProtectionReleaseCacheAware (<no parameter info>)

Bingo! You got the function. So tcpip!FlPnpEvent was calling ExWaitForRundownProtectionReleaseCacheAware. This function will basically wait for the rundown protection to drop down to 0.

A thread can call ExAcquireRundownProtectionEx against a shared object for safe access. Rundown Protection provides a way to protect an object from being deleted unless all outstanding access has been finished (Run Down). The “ExWaitForRundownProtectionReleaseCacheAware” will do exactly the same; it will wait for all rundown protection calls to be completed.

The question is which structure are we waiting for its rundown to drain, that will depend on what we are dealing with. Because of code optimization the debugger is not showing you the full picture. Through code review I found that in this particular dump there is an inline call to function “FlpUninitializePacketProviderInterface”.

So the stack in reality should look like this:

Child-SP RetAddr Call Site

fffff880`02b0ee70 fffff800`0187ba32 nt!KiSwapContext+0x7a

fffff880`02b0efb0 fffff800`0188cd8f nt!KiCommitThreadWait+0x1d2

fffff880`02b0f040 fffff800`018e1816 nt!KeWaitForSingleObject+0x19f

fffff880`02b0f0e0 fffff880`01618fcd nt!ExWaitForRundownProtectionReleaseCacheAware

----inline function---- tcpip!FlpUninitializePacketProviderInterface

fffff880`02b0f150 fffff880`0173f54e tcpip!FlPnpEvent+0x17d

fffff880`02b0f1c0 fffff880`00f87b2f tcpip!Fl48PnpEvent+0xe

…

So we need to un-initialize a network interface but before doing that we need to make sure that there are no outstanding references to packets and that there are no outstanding packets pending. When we say packets, starting in NDIS 6 we basically mean “NET_BUFFER” and “Net_Buffer_List” structures. So we need to check for any outstanding Net_Buffer_Lists (NBLs) that are pending, one reference will correspond to one pending NBL.

To the rescue, the “NDISKD” debugger extension has a very nice and handy command to display all pending NBLS and their owners, it is “!pendingnbls”. For the command to work it you must first enable “TrackNblOwner” through the registry. By default, this registry key is not enabled on server SKUs as it may cause a performance hit. On client SKUs this is enabled by default.

When you run !pendingnbls on a clean Windows 2008 R2 install you get:

8: kd> !ndiskd.pendingnbls

This command requires NBL tracking to be enabled on the debugee target

machine. (By default, client operating systems have level 1, and servers

have level 0). To enable, set this REG_DWORD value to a nonzero value on

the target machine and reboot the target machine:

HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters ! TrackNblOwner

Possible Values (features are cumulative)

* 0: Disable all tracking.

* 1: Track the most recent owner of each NBL (enables !ndiskd.pendingnbls)

Show me all allocated NBLs so I can manually find the one I want

You can find all allocated NBLs with the command “!ndiskd.nblpool -force -find ((@$extin.Flags)&0x108)==0x100)”, but still you don’t get any owner.

So I asked the customer to turn on “TrackNblOwner” and reboot, wait for the next occurrence of the issue and get a new memory dump.

Two days later we received the memory dump file. I verified that they are having the same issue I found in the last dump and that TrackNblOwner is configured correctly:

23: kd> dp NDIS!ndisTrackNblOwner L1

fffff880`00ef1a30 00000000`00000001

Then I immediately checked all pending NBLs to claim the prize, and it was not surprising to see why the NIC card was not un-initializing:

23: kd> !ndiskd.pendingnbls

PHASE 1/3: Found 20 NBL pool(s).

PHASE 2/3: Found 550 freed NBL(s).

Pending Nbl Currently held by

fffffa801dc559f0 fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter [Miniport]

fffffa801dc81680 fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter [Miniport]

fffffa80131d2aa0 fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter [Miniport]

……………………………….

Ret of the repeated output omitted

PHASE 3/3: Found 1854 pending NBL(s) of 3005 total NBL(s).

Search complete.

So we currently have 1854 NBLs pending on the NIC miniport driver “fffffa80142d31a0”. This is the Miniport that currently holding all NBLs:

23: kd> !ndiskd.miniport fffffa80142d31a0

MINIPORT

My Ethernet 1Gb 4-port Adapter

Ndis handle fffffa80142d31a0

Ndis API version v6.20

Adapter context fffffa80138cc000

Miniport driver fffffa800d4f7530 - MyMiniPortDriver v1.0

Network interface fffffa800d25e870

Media type 802.3

Device instance PCI\VEN_1111&DEV_1111&SUBSYS_169D103C&REV_01\4&2263a140&0&0010

Device object fffffa80142d3050 More information

MAC address xx-xx-xx-xx-xx-xx

STATE

Miniport Running

Device PnP QUERY_REMOVED

Datapath Normal

Operational status DORMANT

Operational flags DORMANT_PAUSED

Admin status ADMIN_UP

Media Connected

Power D0

References 9

Total resets 0

Pending OID None

Flags BUS_MASTER, 64BIT_DMA, SG_DMA, DEFAULT_PORT_ACTIVATED,

SUPPORTS_MEDIA_SENSE, DOES_NOT_DO_LOOPBACK,

MEDIA_CONNECTED

PnP flags PM_SUPPORTED, DEVICE_POWER_ENABLED, RECEIVED_START,

HARDWARE_DEVICE

…

What you notice from the above that the device received a “Query_Removed” PNP and is currently in a Dormant_Paused state.

From: http://msdn.microsoft.com/en-us/library/ff566737.aspx:

NET_IF_OPER_STATUS_DORMANT_PAUSED

The operational status is set to NET_IF_OPER_STATUS_DORMANT because the miniport adapter is in the paused or pausing state.

NDIS 6.0 and up allow miniport adapters to be paused and the documentation here shows what the miniport driver should do when it receives a pause request.

Because the adapter was in a pause state, basic network commads like “ping” ceased to work as described earlier in the symptoms. The next action is definitely to involve the miniport adapter vendor to trace this further and find out why all these pending NBLs were not completed.

Until a next adventure!

Best Regards,

Karim

↧

Commitment Failures, Not Just a Failed Love Story

April 16, 2013, 1:21 pm

≫ Next: Interpreting Event 153 Errors

≪ Previous: Debugging a Network Connectivity Issue - TrackNblOwner to the Rescue

I was working on a debug the other day when I ran the “!vm” command and saw that the system had some 48,000 commit requests that failed. This was strange as the system was not out of memory and the page file was not full. I was left scratching my head and thinking “I wish I knew where !vm got that information from.” So I went on a quest to find out, here is what I found.

13: kd> !vm 1

*** Virtual Memory Usage ***

Physical Memory: 12580300 ( 50321200 Kb)

Page File: \??\C:\pagefile.sys

Current: 50331648 Kb Free Space: 50306732 Kb

Minimum: 50331648 Kb Maximum: 50331648 Kb

Available Pages: 4606721 ( 18426884 Kb)

ResAvail Pages: 12189247 ( 48756988 Kb)

Locked IO Pages: 0 ( 0 Kb)

Free System PTEs: 33460257 ( 133841028 Kb)

Modified Pages: 20299 ( 81196 Kb)

Modified PF Pages: 6154 ( 24616 Kb)

NonPagedPool 0 Used: 19544 ( 78176 Kb)

NonPagedPool 1 Used: 22308 ( 89232 Kb)

NonPagedPool Usage: 53108 ( 212432 Kb)

NonPagedPool Max: 9408956 ( 37635824 Kb)

PagedPool 0 Usage: 168921 ( 675684 Kb)

PagedPool 1 Usage: 4149241 ( 16596964 Kb)

PagedPool 2 Usage: 17908 ( 71632 Kb)

PagedPool Usage: 4336070 ( 17344280 Kb)

PagedPool Maximum: 33554432 ( 134217728 Kb)

Session Commit: 3438 ( 13752 Kb)

Shared Commit: 6522 ( 26088 Kb)

Special Pool: 0 ( 0 Kb)

Shared Process: 53597 ( 214388 Kb)

PagedPool Commit: 4336140 ( 17344560 Kb)

Driver Commit: 5691 ( 22764 Kb)

Committed pages: 5565215 ( 22260860 Kb)

Commit limit: 25162749 ( 100650996 Kb)

********** 48440 commit requests have failed **********

It turns out that this calculation is from a global ULONG array named “nt!MiChargeCommitmentFailures”. The array has 4 members and they are used to trace the types of commit failures that have taken place. This is done by first calculating the new commit size NewCommitValue = CurrentCommitValue + SystemReservedMemory. Based on this calculation commit errors are tracked in a few different ways, which are listed below with the corresponding member in the array that is incremented.

MiChargeCommitmentFailures[0] - If the system failed a commit request and an expansion of the pagefile has failed.

MiChargeCommitmentFailures[1] - If the system failed a commit and we have already reached the maximum pagefile size.

MiChargeCommitmentFailures[2] - If the system failed a commit while the pagefile lock is held.

MiChargeCommitmentFailures[3] - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.

In order to calculate the count of failures, "!vm" adds up the values stored in each array member of the array. Members 0 and 1 are always counted, member 2 is counted if the OS version is Windows 2003/XP and member 3 is counted if the build version is newer than Windows 2003/XP.

Let's look at the array in the dump I was debugging:

13: kd> dc nt!MiChargeCommitmentFailures L4

fffff800`01e45ce0 00000000 0000bd38 00000000 00000000 ....8...........

Converting this to decimal we find the 48000+ commit failures I was seeing the in output of !VM.

13: kd> ?0000bd38

Evaluate expression: 48440 = 00000000`0000bd38

Since I now had my answer, “where does the number come from?”, I was left wanting to know a bit more about the overall flow of why a VirtualAlloc fails to commit.

When memory is allocated by VirtualAlloc the newly allocated memory is not committed to physical memory. Only when the memory is accessed by a read or write is it backed by physical memory.

When this newly allocated memory is accessed for the first time it will need to be backed by commit space. Under normal conditions this is a smooth process, however when the system hits what’s called the commit limit and can’t expand this limit we see commit failures.

So how is the commit limit calculated? Let’s say we have a system with 4GB of physical memory and a pagefile that is 6GB in size. To determine the commit limit we add physical memory and the pagefile size together - in this example the commit limit would be 10GB. Since memory manger will not let any user mode allocation consume every last morsel of commit space it keeps a small amount of the commit space for the system to avoid hangs. When the limit is reached the system tries to grow the page file. If there is no more room to grow the pagefile or the pagefile has reached its configured maximum size, the system will try and free some committed memory to make room for more requests. If expansion of the page file or the attempt to free memory do not allow the allocation to complete, the allocation fails and MiChargeCommitmentFailures is incremented.

To sum it all up, commit limit is RAM + pagefile, commit failures happen when we hit the commit limit and the system is unable to grow the pagefile because it is already at its max. It’s that simple, well almost.

For those that will want to know more about how memory manger works please see the post from Somak: The Memory Shell Game.

Randy Monteleone

↧

Interpreting Event 153 Errors

April 30, 2013, 3:55 pm

≫ Next: Our Bangalore Team is Hiring - Windows Server Escalation Engineer

≪ Previous: Commitment Failures, Not Just a Failed Love Story

Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log. Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.

An event 153 is similar to an event 129. An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article. The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request. The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.

Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing. This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout. Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out. The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.

Below is an example event 153:

Event 153 Example

This error means that a request failed and was retried by the class driver. In the past no message would be logged in this situation because storport did not timeout the request. The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.

The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:

Event 153 Details

In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried. In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ).

The most common SCSI commands are:

SCSIOP_READ - 0x28

SCSIOP_WRITE - 0x2A

The most common SRB statuses are below:

SRB_STATUS_TIMEOUT - 0x09

SRB_STATUS_BUS_RESET - 0x0E

SRB_STATUS_COMMAND_TIMEOUT - 0x0B

A complete list of SCSI operations and statuses can be found in scsi.h in the WDK. A list of SRB statuses can be found in srb.h.

The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period. The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.

A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem. Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.

↧

Our Bangalore Team is Hiring - Windows Server Escalation Engineer

May 9, 2013, 3:32 pm

≫ Next: Remoting Your Debug Crash Cart With KDNET

≪ Previous: Interpreting Event 153 Errors

Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?

As a trusted advisor to our top customers you will be working with to the most experienced IT professionals and developers in the industry. You will influence our product teams in sustained engineering efforts to drive improvements in our products.

This role involves deep analysis of product source code and debugging to solve problems in multi-million dollar configurations and will give you an opportunity to stretch your critical thinking skills. During the course of debugging, you will uncover opportunities to improve the customer experience while influencing the current and future design of our products.

In addition to providing support to customers while being the primary interface to our sustained engineering teams, you will also have the opportunity to work with new technologies and unreleased software. Through our continuous investment in depth training and hands-on experience with tough customer challenges you will become the world’s best in this area. Expect to partner with many various roles at Microsoft launching a very successful career!

This position is located is at the Microsoft Global Technical Support Center in Bangalore, India.

Learn more about what an Escalation Engineer does at:

Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?

Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey

Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik

Apply here:

https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=109989&jlang=en&pp=ss

↧

Remoting Your Debug Crash Cart With KDNET

May 9, 2013, 5:58 pm

≫ Next: Another Who Done It

≪ Previous: Our Bangalore Team is Hiring - Windows Server Escalation Engineer

This is Christian Sträßner from the Global Escalation Services team based in Munich, Germany.

Back in January, my colleague Ron Stock posted an interesting article about Kernel Debugging using a serial cable: How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

Today we look at a new kernel debugging transport introduced in Windows 8 and Windows Server 2012 that makes the cabling much easier, now a network cable can be used as a debug cable. The new KDNET transport utilizes a PCI Ethernet network card in the Target. Most major NIC Vendors have compatible NICs. You can find a list of supported NICs here:

http://msdn.microsoft.com/en-us/library/windows/hardware/hh830880.aspx

Note that this will not work with Wireless or USB attached NICs in the Target.

In the example below, we utilized an Acer AC 100 Server as the Target. It ships with an onboard Intel 82579LM Gigabit NIC:

Network Adapters

The great thing about KDNET is that the NIC can still be used for normal network activity. The “Microsoft Kernel Debug Network Adapter” driver is the magic behind this. When KDNET.DLL is active, the NIC’s driver will be “banged out” and KDNET takes control of the NIC.

BCD Configuration

To configure KDNET, you first need to determine the IPv4 Address of the machine with the debugger. In our example, ipconfig.exe tells us that it is 192.168.1.35:

ipconfig

Next go to your Target machine.

The kernel debug settings used to configure KDNET are stored globally in the BCD Store in the {dbgsettings} area. The kernel debug settings apply to all boot entries.

Use bcdedit.exe /dbgsettings net hostip:<addr>port:<port> to set the transport to KDNET, the IP Address of the debugger and the port. You can connect multiple targets to the same debug host by using a different port for each target.

BCD will generate a cryptographic key for you automatically the first time. You can generate a new cryptographic key by appending the ‘newkey’ keyword. Copy the ‘Key’ to a secure location - you will need it in the debugger.

You can display the debug settings using: bcdedit /dbgsettings

Next, for safety, copy the {current} entry to a new entry (bcdedit /copy {current} /d <description>).

Then enable kernel debugging on the copy (bcdedit.exe /debug {new-guid} on).

If required, also use this (new) entry to enable the checked kernel (bcdedit /set {new-guid} hal <path> and bcdedit.exe /set {new-guid} kernel <path>).

bcdedit

Debugger

On your Debugger Machine open WinDbg->File->Kernel Debugging (Ctrl-K) and choose the NET tab:

Copy and paste the ‘Key’ here and set the port to the value specified on the Target (the default is 50000):

Kernel Debugging

Next a dialog from Windows Firewall might pop up (depending on your configuration). You want to allow access at this point.

Windows Firewall

You need to make sure that your debug host machine allows inbound UDP traffic on the configured port (50000 in this example, and by default) for the network type in use.

If your company has implemented IPSec Policies, make sure you have exceptions in place that allow unsecured communication on the port used (KDNET does not talk IPSec).

The Debugger Window will now look like this:

windbg waiting to reconnect

The Debugger is now set up and ready to go.

Reboot the target system now.

When the target comes back online, it will try to connect to the IP Address and Port that was configured with the bcdedit.exe command. The Debugger Command Window will look something like the screenshot below.

windbg connected

You now can break in as usual. This is a good time to fix your symbol setup if you have not done it yet.

Operation

You still can communicate normally over the NIC and IP that you use on the target. You do not need an additional NIC in the target to use KDNET. When debugging production servers with heavy traffic, we recommend using a dedicated NIC for debugging (note, 10GigE NICs are currently not supported).

If you don’t want the NIC to be used by the OS as well, it can be disabled via: bcdedit.exe -set loadoptions NO_KDNIC

Normal Network IO

Although you can use KDNET to debug power state transitions (in particular Connected Standby), it is best avoided. The KDNET protocol polls on a regular basis and as such, many systems will not drop to a lower power state. Instead, use USB, 1394, or serial.

Disconnecting the NIC from media (unplugging the NIC in the target machine) is not supported and will most likely blue screen the target machine.

Note 1:

If you have more than one NIC in your target, please read the following (copied from the debugger help):

If there is more than one network adapter in the target computer, use Device Manager to determine the PCI bus, device, and function numbers for the adapter you want to use for debugging. Then in an elevated Command Prompt window, enter the following command, where b, d, and f are the bus number, device number, and function number of the adapter:

bcdedit /set {dbgsettings} busparams b.d.f

Note 2:

If you use the Windows NIC Teaming (LBFO) in Server 2012: KDNET is not compatible with NIC Teaming as indicated by the Whitepaper:

http://download.microsoft.com/download/F/6/5/F65196AA-2AB8-49A6-A427-373647880534/[Windows%20Server%202012%20NIC%20Teaming%20(LBFO)%20Deployment%20and%20Management].docx

How does it look on the network?

This is a packet sent from the target to the debug host machine.

Network Packet

The TTL of the packets sent from the target to the debug host is currently set to 16 (this is not configurable).

This screenshot shows that your connection can only run over 16 IP hops max. This is a theoretical limitation, but it highlights some important facts. Your host is not talking to the Windows IP stack on the target, instead it talks to a basic IPv4/UDP implementation in KDNET. The transport is UDP/IPv4 based, so there is not much tolerance for poor network conditions aside from retry operations at the Debugger Transport Protocol Level.

A few words on performance.

The performance is generally limited by the latency of the link between the host and target. Therefore, even with a LAN like latency (<=1ms), you will not be able to get even close to wire speed of a 1GigE Connection. Expect to see speeds between 1.5 – 2.5Mbytes/s.

Keep this in mind when you plan to pull large portions of memory from the target over KDNET (like the .dump command). This screenshot was taken while executing the .dump /f command (Full Kernel Dump):

Network Activity

Even with the performance restrictions mentioned, KDNET is a valuable extension of the existing debugging methods. It allows you to debug a Windows machine without the need for special hardware (1394) or legacy ports (serial) that not every machine has today (especially tablets and notebooks). It also saves you from using USB2 debugging - which requires special cables and a good amount of hope that the machine’s vendor has attached the single debug capable USB port to an external port on the chassis.

Also, there is no need for you to physically enter the Datacenter where the target is located. You can do all these steps from your convenient office chair. J

To see network kernel debugging in action, watch Episode #27 of Defrag Tools on Channel 9.

Thanks to Andrew Richards and Joe Ballantyne for their help in writing this article.

↧

Another Who Done It

May 31, 2013, 1:51 pm

≫ Next: We Are Hiring in the US and India – Windows Escalation Engineers

≪ Previous: Remoting Your Debug Crash Cart With KDNET

Hi my name is Bob Golding, I am an EE in GES. I want to share an interesting problem I recently worked on. The initial symptom was the system bugchecked with a Stop 0xA which means there was an invalid memory reference. The cause of the crash was a driver making I/O requests while Asynchronous Procedure Calls (APCs) were disabled. The bugcheck caused by an invalid memory reference was the result of the problem and not the cause.

An APC is queued to a thread during I/O completion. This is to guarantee the last phase of the I/O completion occurs in the same context as the process that issued the request.

The stack of the trap is presented below. The call stack shows that APCs are being enabled allowing queued APCs to run.

Child-SP RetAddr Call Site

fffff880`07bf3598 fffff800`030b85a9 nt!KeBugCheckEx

fffff880`07bf35a0 fffff800`030b7220 nt!KiBugCheckDispatch+0x69

fffff880`07bf36e0 fffff800`030d8b56 nt!KiPageFault+0x260

fffff880`07bf3870 fffff800`030959ff nt!IopCompleteRequest+0xc73

fffff880`07bf3940 fffff800`0306c0d9 nt!KiDeliverApc+0x1d7

fffff880`07bf39c0 fffff800`033f8a1a nt!KiCheckForKernelApcDelivery+0x25

fffff880`07bf39f0 fffff800`033cce2f nt!MiMapViewOfSection+0x2bafa

fffff880`07bf3ae0 fffff800`030b8293 nt!NtMapViewOfSection+0x2be

fffff880`07bf3bb0 00000000`772df93a nt!KiSystemServiceCopyEnd+0x13

00000000`0015dea8 00000000`00000000 0x772df93a

The reason the trap occurred is because when issuing requests to lower drivers it is common practice in drivers to implement code similar to:

…

KEVENT event;

status = IoCallDriver( DeviceObject, irp );

// Wait for the event to be signaled if STATUS_PENDING is returned.

if (status == STATUS_PENDING) {

(VOID)KeWaitForSingleObject( &event, // event is a local which is declared on the stack

Executive,

KernelMode,

FALSE,

NULL );

}

…

As you can see in the above code, if the return from IoCallDriver does not return pending the code continues and exits. Part of the last phase of I/O processing that takes place in the APC is signaling the event. If the call to IoCallDriver returns success, because the event is on the stack it is critical that the APC execute immediately before the stack unwinds. Since APCs where disabled, the execution of the APC was delayed and during this time the event became invalid. The APCs were delayed because the memory manager was in a critical area and APCs could not run.

I needed to determine which driver did this so I enabled IRP logging in Driver Verifier to trace I/O requests. With this enabled the next dump should contain a transaction log that will help identify what driver is performing I/O while APCs are disabled. The command line to enable this is:

verifier /flags 0x410 /all

The new dump with verifier enabled also crashed after delivering an APC to the thread and completing the IRP. From the debug output below I can find the IRPs that were issued and the thread that issued them, this is what I need to look for them in the log.

1: kd> !thread

THREAD fffffa80064c9b50 Cid 0200.0204 Teb: 000007fffffde000 Win32Thread: 0000000000000000 RUNNING on processor 1

IRP List:

fffff9800a33ec60: (0006,03a0) Flags: 40060070 Mdl: 00000000

fffff9800a250c60: (0006,03a0) Flags: 40060070 Mdl: 00000000

fffff9800a3f4ee0: (0006,0118) Flags: 40060070 Mdl: 00000000

Not impersonating

DeviceMap fffff8a000007890

Owning Process fffffa80064bbb30 Image: csrss.exe

Attached Process N/A Image: N/A

Wait Start TickCount 1656 Ticks: 0

Context Switch Count 25 IdealProcessor: 0

UserTime 00:00:00.000

KernelTime 00:00:00.000

Win32 Start Address 0x000000004a061540

Stack Init fffff88003b21c70 Current fffff88003b20890

Base fffff88003b22000 Limit fffff88003b1c000 Call 0

Priority 14 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

Child-SP RetAddr Call Site

fffff880`03b21428 fffff800`0307a54c nt!KeBugCheckEx

fffff880`03b21430 fffff800`030d02ee nt!MmAccessFault+0xffffffff`fff9c15c

fffff880`03b21590 fffff800`030c8db9 nt!KiPageFault+0x16e

fffff880`03b21728 fffff800`030e6ab3 nt!memcpy+0x229

fffff880`03b21730 fffff800`030c4bd7 nt!IopCompleteRequest+0x5a3

fffff880`03b21800 fffff800`0307ba85 nt!KiDeliverApc+0x1c7

fffff880`03b21880 fffff800`0331d96a nt!KiCheckForKernelApcDelivery+0x25

fffff880`03b218b0 fffff800`033e742e nt!MiMapViewOfSection+0xffffffff`fff36baa

fffff880`03b219a0 fffff800`030d1453 nt!NtMapViewOfSection+0x2bd

fffff880`03b21a70 00000000`7761159a nt!KiSystemServiceCopyEnd+0x13

00000000`0025f078 00000000`00000000 0x7761159a

The command “!verifier 100” will dump the transaction log. Below is the relevant portion of the log containing the IRPs for our thread.

IRP fffff9800a3f4ee0, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -4, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff880010f8bcc fltmgr!FltGetVolumeGuidName+0x18c

fffff88004e4fbe1 baddriver+0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

fffff88004e72cce baddriver +0x35cce

fffff88004e5f715 baddriver +0x22715

fffff88004e4c6c7 baddriver +0xf6c7

fffff88004e48342 baddriver +0xb342

fffff88004e5e44e baddriver +0x2144e

fffff88004e5e638 baddriver +0x21638

IRP fffff9800a250c60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

fffff88001010a6b mountmgr!QueryDeviceInformation+0x207

fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

fffff80003574c16 nt!IovCallDriver+0x566

fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

fffff88004e4fbe1 baddriver +0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

IRP fffff9800a33ec60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

fffff88001010afd mountmgr!QueryDeviceInformation+0x299

fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

fffff80003574c16 nt!IovCallDriver+0x566

fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

fffff88004e4fbe1 baddriver +0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

From the IRP log in verifier I can see that baddriver.sys is calling FltGetVolumeGuidName while APCs are disabled. Further investigation found that baddriver.sys had registered a function for image load notification, and the memory manager has APCs disabled when it calls the image notification routine. The image notification routine in baddriver.sys called FltGetVolumeGuidName which issued the I/O. From the log output I see KernelApcDisable and SpecialApcDisable, the issue is SpecialApcDisable being –1. The I/O completion APCs are considered special APCs, so kernel APC disable would not affect them.

The solution was for the driver to check for APCs disabled before issuing the FltGetVolumeGuidName and not make this call if APCs are disabled.

↧

We Are Hiring in the US and India – Windows Escalation Engineers

June 3, 2013, 12:27 pm

≫ Next: How To Deadlock Yourself (Don’t Do This)

≪ Previous: Another Who Done It

Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?

We have positions open at our sites in Bangalore, India; Charlotte, NC USA; and Issaquah, WA USA. The openings in Bangalore and Charlotte will primarily work with our enterprise IT customers, the opening in Issaquah will work with OEM vendors.

Learn more about what an Escalation Engineer does at:

Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?

Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey

Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik

Apply here:

Bangalore: https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=109989&jlang=en&pp=ss

Charlotte: http://www.microsoft-careers.com/job/Charlotte-Escalation-Engineer-Job-NC-28201/2630084/

Issaquah: http://www.microsoft-careers.com/job/Issaquah-Escalation-Engineer-Job-WA-98006/2630069/

↧

How To Deadlock Yourself (Don’t Do This)

July 19, 2012, 10:52 am

≫ Next: Troubleshooting Pool Leaks Part 1 – Perfmon

≪ Previous: We Are Hiring in the US and India – Windows Escalation Engineers

Some APIs should come with a warning in big red letters saying “DANGER!”, or perhaps more subtly “PROCEED WITH CAUTION”. One such API is ExSetResourceOwnerPointer. Although the documentation contains an explanation of what limited activity you can do with the resource after making this call, its warning is not very strongly worded.

You may see evidence of a call to ExSetResourceOwnerPointer in a debug. A lock in !locks will have an unusual owner field, such as the one shown below:

2: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks...

Resource @ 0xfffffa8011efede8 Exclusively owned
Contention Count = 20
NumberOfSharedWaiters = 16
Threads: fffff88007fab7f3-02<*> *** Unknown owner, possibly FileSystem
fffffa80169538a0-01 fffffa801ea69b60-01 fffffa8017dfd430-01 fffffa800cd76b60-01
fffffa801512a410-01 fffffa801279b340-01 fffffa8016d079a0-01 fffffa8015452aa0-01
fffffa801607bb60-01 fffffa8012f79b60-01 fffffa8013b4e040-01 fffffa801b03e300-01
fffffa800cd77040-01 fffffa8013a8e040-01 fffffa800cd76040-01 fffffa80172d7490-01

The error “*** Unknown owner, possibly FileSystem” is an indicator that the owner field of this eresource has likely been modified by ExSetResourceOwnerPointer. Fortunately for us debuggers, programmers often point the owner field to a location on the thread stack. You can pass an address to the !thread command and it will interpret the address as a stack value.

2: kd> !thread fffff88007fab7f3 e
fffff88007fab7f3 is not a thread object, interpreting as stack value...
THREAD fffffa80169538a0 Cid 0004.0638 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
fffffa80139ea3e0 Semaphore Limit 0x7fffffff
IRP List:
fffffa8016fd1010: (0006,0310) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap fffff8a000008aa0
Owning Process fffffa800cd6a5f0 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 27606952 Ticks: 141 (0:00:00:02.199)
Context Switch Count 90787
UserTime 00:00:00.000
KernelTime 00:00:02.496
Win32 Start Address nt!ExpWorkerThread (0xfffff80002293a50)
Stack Init fffff88007fabdb0 Current fffff88007fab3a0
Base fffff88007fac000 Limit fffff88007fa6000 Call 0
Priority 14 BasePriority 13 UnusualBoost 1 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffff880`07fab3e0 fffff800`0228da52 nt!KiSwapContext+0x7a
fffff880`07fab520 fffff800`0228fbaf nt!KiCommitThreadWait+0x1d2
fffff880`07fab5b0 fffff800`0224ec9e nt!KeWaitForSingleObject+0x19f
fffff880`07fab650 fffff800`022ad98c nt!ExpWaitForResource+0xae
fffff880`07fab6c0 fffff880`0140fc10 nt!ExAcquireSharedStarveExclusive+0x1bc
fffff880`07fab720 fffff880`0140f8e2 sis!SipDereferenceCSFile+0x40
fffff880`07fab750 fffff880`0140f608 sis!SipDereferencePerLink+0x62
fffff880`07fab780 fffff880`014102e7 sis!SipDereferenceScb+0x184
fffff880`07fab7c0 fffff800`025796e6 sis!SiFilterContextFreedCallback+0xaf
fffff880`07fab7f0 fffff880`016b9bcc nt!FsRtlTeardownPerStreamContexts+0xe2
fffff880`07fab840 fffff880`016b98d5 Ntfs!NtfsDeleteScb+0x108
fffff880`07fab880 fffff880`0162ccb4 Ntfs!NtfsRemoveScb+0x61
fffff880`07fab8c0 fffff880`016b72dc Ntfs!NtfsPrepareFcbForRemoval+0x50
fffff880`07fab8f0 fffff880`01635882 Ntfs!NtfsTeardownStructures+0xdc
fffff880`07fab970 fffff880`016ce813 Ntfs!NtfsDecrementCloseCounts+0xa2
fffff880`07fab9b0 fffff880`016a838f Ntfs!NtfsCommonClose+0x353
fffff880`07faba80 fffff880`016cd7ef Ntfs!NtfsFspClose+0x15f
fffff880`07fabb50 fffff880`01635c0d Ntfs!NtfsCommonCreate+0x193f
fffff880`07fabd30 fffff800`0227e787 Ntfs!NtfsCommonCreateCallout+0x1d
fffff880`07fabd60 fffff800`0227e741 nt!KySwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`07fabc20)
fffff880`085fffe0 fffff800`0229620a nt!KiSwitchKernelStackContinue
fffff880`08600000 fffff880`01635b2f nt!KeExpandKernelStackAndCalloutEx+0x29a
fffff880`086000e0 fffff880`016d29c0 Ntfs!NtfsCommonCreateOnNewStack+0x4f
fffff880`08600140 fffff880`013330b6 Ntfs!NtfsFsdCreate+0x1b0
fffff880`086002f0 fffff800`0258d717 fltmgr!FltpCreate+0xa6
fffff880`086003a0 fffff800`0258379f nt!IopParseDevice+0x5a7
fffff880`08600530 fffff800`02588b16 nt!ObpLookupObjectName+0x32f
fffff880`08600630 fffff800`0258f827 nt!ObOpenObjectByName+0x306
fffff880`08600700 fffff800`02599438 nt!IopCreateFile+0x2b7
fffff880`086007a0 fffff880`01405bcf nt!NtCreateFile+0x78
fffff880`08600830 fffff880`01405fbf sis!SipOpenBackpointerStream+0x10b
fffff880`086008f0 fffff880`0140657d sis!SipOpenCSFileWork+0x3bf
fffff880`08600c70 fffff800`02293b61 sis!SipOpenCSFile+0x21
fffff880`08600cb0 fffff800`0252ea26 nt!ExpWorkerThread+0x111
fffff880`08600d40 fffff800`02264866 nt!PspSystemThreadStartup+0x5a
fffff880`08600d80 00000000`00000000 nt!KxStartSystemThread+0x16

Looking at the call stack for the above thread we can see that sis.sys is trying to acquire the eresource shared. Ordinarily, if a thread already owns an eresource exclusive, it can obtain it shared without first releasing the exclusive ownership. In this scenario the kernel will compare the eresource’s owner field to the current thread and if they match the thread will be allowed to take shared ownership of the eresource. This is where the danger of ExSetResourceOwnerPointer comes into play. If you change the owner field with ExSetResourceOwnerPointer then this check fails because the owner field doesn’t match the current thread.

The result of this scenario is that the thread waits for the exclusive owner to release the lock so this thread can get shared access. Unfortunately this thread is the exclusive owner, and it is the shared waiter. The thread has deadlocked on itself.

Even if you are careful in your handling of the resource after calling ExSetResourceOwnerPointer, there is often a risk that your driver may be re-entered in the same thread and you may end up in a scenario you didn’t initially anticipate. This is why using this API is dangerous, and should be avoided when not absolutely necessary.

This issue demonstrated in this article was addressed in KB2608658 (issue 3), which is available for download from the Microsoft Download Center.

↧

Troubleshooting Pool Leaks Part 1 – Perfmon

July 31, 2012, 2:52 pm

≫ Next: Troubleshooting Pool Leaks Part 2 – Poolmon

≪ Previous: How To Deadlock Yourself (Don’t Do This)

Over the years the NTDebugging Blog has published several articles about pool memory and pool leaks. However, we haven’t taken a comprehensive approach to understanding and troubleshooting pool memory usage. This upcoming series of articles is going to tackle pool leaks from the basics to advanced troubleshooting techniques. Most of the examples will use the Windows Sysinternals tool NotMyFaultto generate a leak so our readers will be able to reproduce the described behavior and repeat the troubleshooting steps.

We need to start by understanding what pool is and how it is used. Pool is virtual memory that is used by drivers in much the same way user mode applications use heap. A driver developer calls ExAllocatePoolWithTag to get a block of memory that can be used in much the same way a user mode programmer would use memory returned by HeapAlloc or malloc. The memory manager, which is responsible for managing pool, is able to efficiently handle small allocations by taking a page of virtual memory (typically 4KB) and breaking it up into smaller blocks. The memory manager is also able to allocate pool in blocks larger than a page. There are two types of pool a developer can request from ExAllocatePoolWithTag, paged pool and nonpaged pool. As the names suggest one type of pool memory can be paged out, and the other cannot be paged. Paged pool is used for most allocations, nonpagedpool is used for memory that will be written or read at an IRQL of DISPATCH_LEVEL or above.

Pool leaks happen when a driver calls ExAllocatePoolWithTag but never calls the corresponding ExFreePool or ExFreePoolWithTag routine. A leak is different than just high memory utilization, which may happen in normal conditions as load increases. For example, the srv.sys driver creates work items for incoming requests, and when there is a large amount of SMB traffic to a server the pool usage from srv.sys may increase to handle this traffic. Typically the differentiation between a leak and high memory usage due to load is that a leak never decreases. Memory usage that is load related should decrease when the load is reduced. Monitoring is required to differentiate between these two scenarios. Performance Monitor (aka perfmon) is typically the most effective tool to begin such an investigation.

The symptom of a pool leak is often poor system performance when the system runs out of pool, or on 64-bit systems the pool may begin to consume a large amount of the available memory. This symptom makes perfmon an ideal tool to begin troubleshooting as it can be used to identify a wide variety of potential causes of poor performance. Perfmon is most useful when it is started before a system enters a state of poor performance so that trend data can be analyzed leading up to the problem.

You can use the below commands from an elevated command prompt to collect perfmon data from such a scenario.

First create the data collector. This command collects data from a variety of counters at a 5 minute interval and is designed to be run for several hours prior to and during a the time a system experiences poor performance (shorter intervals can be used for leaks that happen faster than several hours). We often recommend collecting these counters to perform general performance troubleshooting because we usually don’t know that there is a memory leak until after this data is collected and analyzed.

Logman.exe create counter PerfLog-Long -o "c:\perflogs\\%computername%_PerfLog-Long.blg" -f bincirc -v mmddhhmm -max 300 -c "\LogicalDisk(*)\*" "\Memory\*" "\Cache\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Processor(*)\*" "\Processor Information(*)\*" "\Process(*)\*" "\Redirector\*" "\Server\*" "\System\*" "\Server Work Queues(*)\*" "\Terminal Services\*" –si 00:05:00

Then start collecting data:

Logman.exe start PerfLog-Long

When the performance problem is being experienced, stop collecting data:

Logman.exe stop PerfLog-Long

After you have collected the data, open the .blg file in the Performance Monitor MMC snap-in. Browse to the Memory object, and add the counters “Pool Nonpaged Bytes” and “Pool Paged Bytes”. This should leave you with a view similar to the below screenshot.

The steadily increasing line in the above screenshot, without a substantial decrease in the line, is an indicator that nonpaged pool memory is being leaked. If we look at the maximum count we see that nonpaged pool has consumed 540MB. The significance of this value depends on the amount of RAM in the system. In this instance the system has 1GB of RAM so nonpaged pool is consuming 54% of the available memory. We can now conclude that the cause of the performance problem is a nonpaged pool memory leak, which is consuming a large amount of RAM and preventing other components from using this RAM.

Next we need to start investigating which driver has allocated the most pool. We will begin that in part 2.

↧

Troubleshooting Pool Leaks Part 2 – Poolmon

August 30, 2012, 12:52 pm

≫ Next: Troubleshooting Pool Leaks Part 3 – Debugging

≪ Previous: Troubleshooting Pool Leaks Part 1 – Perfmon

In our previous article we discussed how to identify a pool leak using perfmon. Although it may be interesting to know that you have a pool leak, most customers are interested in identifying the cause of the leak so that it can be corrected. In this article we will begin the process of identifying what kernel mode driver is leaking pool, and possibly identify why.

Often when we are collecting data for a poor performance scenario there are two pieces of data that we collect. Perfmon log data is one, as we discussed in our previous article. The other piece of data is poolmon logs. The memory manager tracks pool usage according to the tag associated with the pool allocations, using a technique called pool tagging. Poolmon gathers this data and displays it in an easy to use format. Poolmon can also be configured to dump data to a log, and in some scenarios it is beneficial to schedule poolmon to periodically collect such logs. There are several available techniques to schedule poolmon, however that is beyond the scope of this article.

Poolmon has shipped with many different packages over the years; it is currently available with the Windows Driver Kit. If you install the WDK to the default folder, poolmon will be in “C:\Program Files (x86)\Windows Kits\8.0\Tools\x64”. Poolmon does not have dependencies on other modules in this folder; you can copy it to your other computers when you need to investigate pool usage.

How does pool tagging work? When a driver allocates pool it calls the ExAllocatePoolWithTag API. This API accepts a tag - a four-letter string - that will be used to label the allocation. It is up to a driver developer to choose this tag. Ideally each developer will choose a tag that is unique to their driver and use a different tag for each code path which calls ExAllocatePoolWithTag. Because each tag should be unique to each driver, if we can identify the tag whose usage corresponds with the leak we can then begin to identify the driver which is leaking the memory. The tag may also give the driver developer clues as to why the memory is being leaked, if they use a unique tag for each code path.

To view the pool usage associated with each tag run “poolmon -b” from a command prompt. This will sort by the number of bytes associated with each tag. If you are tracking pool usage over a period of time, you can log the data to a file with “poolmon -b -n poolmonlog1.txt”, replacing 1 with increasing numbers to obtain a series of logs. Once you have a series of logs you may be able to view usage increasing for a specific tag, in a corresponding fashion to what you see in perfmon.

When analyzing poolmon the important data is at the top. Typically the tag with the largest usage in bytes is the cause of the leak.

Poolmon -b

In the above data we can see that the tag with the most pool usage is “Leak”. Now that we know what tag is leaking we need to identify what driver is using this tag. Techniques for associating a leak with a tag vary, but findstr is often effective. Most drivers are located in c:\windows\system32\drivers, so that is a good starting point when looking for the driver. If you don’t find a result in that folder, go up a folder and try again, repeating until you get to the root of the drive.

C:\>findstr /s Leak *.sys

Users\Administrator\Desktop\myfault.sys:

AìI♦A╕Leak;┴☼B┴Aì

·∟ §£♂ Θ─☺ A╗☻ E☼"├Θ╡☺ Hï♣╔♂ ╞ $Θª☺ Hï♣:Hc┴ ┴ê\♦@ë

└δ_Aï ╞♣@∟ ☺ë♣▓← Aï@♦ë♣¼← δCAâ∙♦u╓AïAìI♦A╕Leak;┴☼B┴3╔ï╨§

In the above output we can see that “Leak” is used in myfault.sys. If we hadn’t forced this leak with notmyfault, the next step in troubleshooting would be an internet search for the tag and the driver. Often such a search will allow you to identify a specific fault within the driver and a solution.

Don’t panic if findstr doesn’t find your tag, or if you find the tag but it is not unique to one driver. In future articles we will cover additional techniques for associating drivers with tags, and for associating allocations with specific code within a driver.

↧

Troubleshooting Pool Leaks Part 3 – Debugging

August 31, 2012, 1:03 pm

≫ Next: Troubleshooting Pool Leaks Part 4 – Debugging Multiple Users for a Tag

≪ Previous: Troubleshooting Pool Leaks Part 2 – Poolmon

In our previous articles we discussed identifying a pool leak with perfmon, and narrowing the source of the leak with poolmon. These tools are often preferred because they are easy to use, provide verbose information, and can be run on a system without forcing downtime. However, it is not always possible to get perfmon and poolmon data. If a system is experiencing poor performance you may have a business need to get the system up and running as quickly as possible without allowing time to troubleshoot. It is also possible to completely exhaust memory through a pool leak, leaving the system in a state where tools such as perfmon and poolmon will not work. In these scenarios it may be possible to troubleshoot the poor performance by forcing a bugcheck, gathering a memory dump, and performing a post mortem analysis.

Although a dump is not the ideal data to troubleshoot a leak, it can be done. I say less than ideal because a dump is a snapshot of the system memory, and does not provide the historical data which perfmon would provide. The lack of historical data makes it difficult to differentiate between high memory usage due to load and high memory usage due to a leak. It is up to you, as the troubleshooter, to determine if the dump is sufficient evidence of a leak. Sometimes identifying the tag and the driver will help you identify a known issue that causes a leak, or your knowledge of the driver architecture may allow you to determine if the memory usage is normal or not. In some scenarios you may decide to start monitoring with perfmon and collect additional data for a future occurrence.

The first step to debug a pool leak using a dump is to load the dump in windbg, set the symbol path, and reload symbols.

1: kd> .symfix c:\symbols

1: kd> .reload

Loading Kernel Symbols

...............................................................

................................................................

.....

The !vm command will show memory utilization, the 1 flag will limit the verbosity of this command. For the scenario of a pool leak, the significant values are “NonPagedPool Usage:” and “PagedPool Usage:”. If the debugger identifies a value that is out of the normal range it will flag it, and we can see here that the debugger has flagged excessive nonpaged pool usage. This is similar to the information we obtained in Part 1 using perfmon, but unlike perfmon we do not have trend data to indicate if this is temporary high pool usage due to load or if this is a leak.

1: kd> !vm 1

*** Virtual Memory Usage ***

Physical Memory: 403854 ( 1615416 Kb)

Page File: \??\C:\pagefile.sys

Current: 1048576 Kb Free Space: 1015644 Kb

Minimum: 1048576 Kb Maximum: 4194304 Kb

Available Pages: 106778 ( 427112 Kb)

ResAvail Pages: 225678 ( 902712 Kb)

Locked IO Pages: 0 ( 0 Kb)

Free System PTEs: 33533355 ( 134133420 Kb)

Modified Pages: 4844 ( 19376 Kb)

Modified PF Pages: 4838 ( 19352 Kb)

NonPagedPool Usage: 155371 ( 621484 Kb)

NonPagedPool Max: 191078 ( 764312 Kb)

********** Excessive NonPaged Pool Usage *****

PagedPool 0 Usage: 27618 ( 110472 Kb)

PagedPool 1 Usage: 3848 ( 15392 Kb)

PagedPool 2 Usage: 299 ( 1196 Kb)

PagedPool 3 Usage: 283 ( 1132 Kb)

PagedPool 4 Usage: 344 ( 1376 Kb)

PagedPool Usage: 32392 ( 129568 Kb)

PagedPool Maximum: 33554432 ( 134217728 Kb)

Session Commit: 7764 ( 31056 Kb)

Shared Commit: 6371 ( 25484 Kb)

Special Pool: 0 ( 0 Kb)

Shared Process: 5471 ( 21884 Kb)

PagedPool Commit: 32394 ( 129576 Kb)

Driver Commit: 2458 ( 9832 Kb)

Committed pages: 326464 ( 1305856 Kb)

Commit limit: 665998 ( 2663992 Kb)

The debugger can parse the pool tagging database and present similar information as poolmon. The !poolused command will do this, the /t5 option will limit output to the top 5 consumers, the 2 flag will sort by nonpaged pool usage (use the 4 flag if your leak is in paged pool).

1: kd> !poolused /t5 2

Sorting by NonPaged Pool Consumed

NonPaged Paged

Tag Allocs Used Allocs Used

Leak 601 615424000 0 0 UNKNOWN pooltag 'Leak', please update pooltag.txt

Pool 6 1717840 0 0 Pool tables, etc.

nVsC 664 1531552 0 0 UNKNOWN pooltag 'nVsC', please update pooltag.txt

netv 4369 1172224 1 144 UNKNOWN pooltag 'netv', please update pooltag.txt

Thre 607 774048 0 0 Thread objects , Binary: nt!ps

TOTAL 43424 634209952 63565 126487760

The above output shows that the tag “Leak” is associated with almost all of the nonpaged pool usage. This is the same information we obtained in Part 2 using poolmon.

Now we must identify what drivers use the pool tag “Leak”. Because we have a snapshot of the system memory we can search the dump for this tag. We can match each address to a module using the command lm a.

1: kd> !for_each_module s -a @#Base @#End "Leak"

fffff880`044b63aa 4c 65 61 6b 3b c1 0f 42-c1 41 8d 49 fd 8b d0 ff Leak;..B.A.I....

fffff880`044b6621 4c 65 61 6b 3b c1 0f 42-c1 33 c9 8b d0 ff 15 cc Leak;..B.3......

1: kd> lm a fffff880`044b63aa

start end module name

fffff880`044b5000 fffff880`044bc000 myfault (no symbols)

1: kd> lm a fffff880`044b6621

start end module name

fffff880`044b5000 fffff880`044bc000 myfault (no symbols)

The tag and driver name can be used to search the internet for known problems. If a known issue is found a driver update may be available, and installing this update may prevent a future memory leak.

If there are no updates available for the driver, or if this is your driver and you need to identify the cause of the leak, don’t panic. In future articles we will show techniques for getting call stacks of pool allocations, these call stacks be used to identify under what conditions the driver leaks memory.

↧

Troubleshooting Pool Leaks Part 4 – Debugging Multiple Users for a Tag

September 28, 2012, 12:02 pm

≫ Next: Troubleshooting Pool Leaks Part 5 – PoolHitTag

≪ Previous: Troubleshooting Pool Leaks Part 3 – Debugging

In our previous articles we discussed various techniques for identifying a pool memory leak and narrowing the scope of the leak to an individual pool tag. Knowing the leaking pool tag is often sufficient to identify the cause of the problem and find a solution. However, there may be a scenario where multiple drivers use the same pool tag (such as DDK) or when one driver uses the same tag in multiple places. In this scenario you will need more information to identify the source of the leak. In our next several articles we will present techniques to get this information.

This article will present a basic technique where we modify each pool tag to identify what code in which driver is allocating the memory that gets leaked.

This technique requires a live debug of the problematic system. There are many resources with steps for how to configure a system for a live debug. The debugging tools have instructions in the debugger.chm help file, under Debugging Tools for Windows\Debuggers\Installation and Setup\Kernel-Mode Setup.

DebuggerHelp

Using the same technique as in Part 3, identify where the tag in question is used.

0: kd> !for_each_module s -a @#Base @#End "Leak"

fffff880`0496e3aa 4c 65 61 6b 3b c1 0f 42-c1 41 8d 49 fd 8b d0 ff Leak;..B.A.I....

fffff880`0496e621 4c 65 61 6b 3b c1 0f 42-c1 33 c9 8b d0 ff 15 cc Leak;..B.3......

Next, edit each instance so that they are unique. The ASCII code for numeral 1 is 0x31, and the codes for each numeral increase sequentially. Using this information edit each tag to be Lea1, Lea2, etc.

0: kd> eb fffff880`0496e3aa+3 31

0: kd> eb fffff880`0496e621+3 32

Confirm your edits resulted in the expected tags using the dc command.

0: kd> dc fffff880`0496e3aa l1

fffff880`0496e3aa 3161654c Lea1

0: kd> dc fffff880`0496e621 l1

fffff880`0496e621 3261654c Lea2

Now wait for the leak to happen and repeat the steps from Part 3 to identify which of the tags is leaked. This tells you what code allocates the memory that gets leaked. Below we can see that Lea2 is the tag being leaked.

0: kd> !poolused /t5 2

Sorting by NonPaged Pool Consumed

NonPaged Paged

Tag Allocs Used Allocs Used

Lea2 257 263168000 0 0 UNKNOWN pooltag 'Lea2', please update pooltag.txt

nVsC 664 1531552 0 0 UNKNOWN pooltag 'nVsC', please update pooltag.txt

netv 4369 1172224 1 144 UNKNOWN pooltag 'netv', please update pooltag.txt

Leak 1 1024000 0 0 UNKNOWN pooltag 'Leak', please update pooltag.txt

EtwB 94 945136 4 163840 Etw Buffer, Binary: nt!etw

TOTAL 41296 281814544 44077 68102368

Knowing what code allocates the leaked pool may be very valuable to a driver developer who needs to narrow the scope of the problem. Often this information is sufficient for a developer to code review the use of this memory and identify why it would be leaked.

There are times when more information is needed to determine the cause of the leak. A developer may need the call stacks of memory being allocated and freed. We will capture this information in Part 5.

↧

Troubleshooting Pool Leaks Part 5 – PoolHitTag

September 28, 2012, 1:14 pm

≫ Next: Breaking down the "Cl" in !irp

≪ Previous: Troubleshooting Pool Leaks Part 4 – Debugging Multiple Users for a Tag

In Part 4 we narrowed the source of the leaked pool memory to the specific driver which is allocating it, and we identified where in the driver this allocation was taking place. However, we did not capture contextual information such as the call stack leading up to this code. Also, we didn’t capture information about when this allocated pool is freed. In this article we will use the PoolHitTag feature to break into the debugger when a specific tag is used.

As in Part 4, a live debug must be configured to use this feature. The debugging tools have instructions in the debugger.chm help file, under Debugging Tools for Windows\Debuggers\Installation and Setup\Kernel-Mode Setup (see screenshot in part 4).

These steps are typically only effective if you are able to perform them while the leak is happening. There may be a scenario in which a developer wants to know what “normal” looks like, but most often the steps in this article are used to investigate “broken”.

The PoolHitTag is a global in the kernel binary. When this global is set to a pool tag, the system will break into the debugger whenever pool with this tag is allocated or freed. By default the PoolHitTag is set to ffffff0f.

1: kd> dc nt!PoolHitTag l1

fffff800`016530fc ffffff0f ....

To turn on this feature, edit the PoolHitTag to the tag that is known to leak. The value 3261654c is little endian ASCII for the string ‘Lea2’. I found this value in the “Confirm your edits” step in Part 4.

1: kd> ed nt!PoolHitTag 3261654c

1: kd> dc nt!PoolHitTag l1

fffff800`016530fc 3261654c Lea2

With PoolHitTag now set to the leaking tag, issue the ‘g’ command to release debugger and it will automatically break in whenever the Lea2 tag is used.

1: kd> g

Break instruction exception - code 80000003 (first chance)

nt! ?? ::FNODOBFM::`string'+0x24a2a:

fffff800`014798f6 cc int 3

In the above example the debugger broke in because the ‘int 3’ instruction triggered a breakpoint. The symbols seem to indicate that we are in a function named “?? ::FNODOBFM::`string'”, but this is simply a lack of symbolic information for this optimized code. Unassembling the surrounding code shows that this code is a piece of ExpAllocateBigPool, one of the functions used in the kernel to allocate pool allocations larger than 4096 bytes.

1: kd> u fffff800`014798f6

nt! ?? ::FNODOBFM::`string'+0x24a2a:

fffff800`014798f6 cc int 3

fffff800`014798f7 e9ee8e0800 jmp nt!ExpAllocateBigPool+0x13a (fffff800`015027ea)

At this point we can dump the call stack and see the full context of what is happening when this memory is allocated.

1: kd> k

Child-SP RetAddr Call Site

fffff880`04ec1680 fffff800`0161090e nt! ??::FNODOBFM::`string'+0x24a2a

fffff880`04ec1770 fffff880`0496e634 nt!ExAllocatePoolWithTag+0x82e

fffff880`04ec1860 fffff880`0496e727 myfault+0x1634

fffff880`04ec19b0 fffff800`017fca97 myfault+0x1727

fffff880`04ec1a10 fffff800`017fd2f6 nt!IopXxxControlFile+0x607

fffff880`04ec1b40 fffff800`014e0ed3 nt!NtDeviceIoControlFile+0x56

fffff880`04ec1bb0 00000000`7756138a nt!KiSystemServiceCopyEnd+0x13

00000000`000df4c8 000007fe`fd5fa249 ntdll!ZwDeviceIoControlFile+0xa

00000000`000df4d0 00000000`7740683f KERNELBASE!DeviceIoControl+0x75

00000000`000df540 00000000`ff222384 kernel32!DeviceIoControlImplementation+0x7f

00000000`000df590 00000000`00000000 NotMyfault+0x2384

Repeating the ‘g’ and ‘k’ commands multiple times will begin to give you an understanding of the various ways this code may be used. This can be automated by modifying the ‘int 3’ instruction and using a breakpoint. Note that system performance may suffer because output to the debug port is serialized.

The commands shown below use addresses specific to big pool allocations (larger than 4KB). The ‘int 3’ instruction may be located elsewhere depending on the scenario you are debugging.

To modify the operation from a debug break to a breakpoint, change the ‘int 3’ to ‘nop’. In x86 and x64 the opcode for ‘nop’ is 90. Coincidentally these instructions are the same length.

1: kd> eb fffff800`014798f6 90

Confirm that the instruction was reset properly.

1: kd> u fffff800`014798f6 l1

nt! ?? ::FNODOBFM::`string'+0x24a2a:

fffff800`014798f6 90 nop

Set a breakpoint on the ‘nop’ instruction and configure the breakpoint to automatically dump the stack and go the debugger.

1: kd> bp fffff800`014798f6 "k;g"

1: kd> g

If you find that the pool is sometimes allocated and occasionally freed, you may need to edit the ‘int 3’ used when ExFreePool is called, and set a similar breakpoint on that address.

Break instruction exception - code 80000003 (first chance)

nt!ExDeferredFreePool+0xb57:

fffff800`0160f5b7 cc int 3

1: kd> eb fffff800`0160f5b7 90

1: kd> bp fffff800`0160f5b7 "k;g"

1: kd> g

Once you have sufficient data to understand the scenario where the memory is allocated and freed use Ctrl+Break to break into the debugger, clear the breakpoints and reset the PoolHitTag. Then go the debugger to allow the system to continue normal operation.

1: kd> bc *

1: kd> ed nt!PoolHitTag ffffff0f

1: kd> g

The data collected with these steps should provide an indication to a developer of what memory is being leaked and when.

PoolHitTag isn’t the only option for collecting call stack information. Our final articles will cover alternative techniques for obtaining this information.

↧

Breaking down the "Cl" in !irp

October 29, 2012, 4:18 pm

≫ Next: Troubleshooting Pool Leaks Part 6 – Driver Verifier

≪ Previous: Troubleshooting Pool Leaks Part 5 – PoolHitTag

Hey there NTDEBUGGERS my name is Randy Monteleone and today we are going to talk about IRPs. In the past we have talked about the IRP structure in passing and showed a field here and there that can be pulled out and used to find answers to stalled IO. I was recently working on a debugger extension and found something interesting in the IRP I was looking at. I had been looking at a !irp output much like the one pictured below. I found that I was asking myself what do the "Success Error Cancel" fields mean?

After doing some digging and working with a few of my co-workers we found the mystery to the meaning behind these words and why we see them in our output. Lets break this IRP down starting with the ">" marker that indicates the current stack frame. In the output below you see this marker is indicating that we are working on something in partmgr.

1: kd> !irp fffffa809a1f3af0

Irp is active with 9 stacks 5 is current (= 0xfffffa809a1f3ce0)

Mdl=fffffa814e9c4f40: No System Buffer: Thread fffffa80d05b67e0: Irp stack trace.

cmd flg cl Device File Completion-Context

[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

>[ 4,34] 1c e0 fffffa80920c0060 00000000 fffff880010301b0-00000000 Success Error Cancel<-- What are you trying to tell me?

\Driver\Disk partmgr!PmReadWriteCompletion

Args: 00001000 00000000 14625000 00000000

[ 4, 0] 1c e0 fffffa80920c0b90 00000000 fffff88001063010-fffffa80b01eec00 Success Error Cancel

\Driver\partmgr volmgr!VmpReadWriteCompletionRoutine

Args: 3993d69568d 00000000 14625000 00000000

[ 4, 0] c e0 fffffa80b01eeab0 00000000 fffff88001d59150-fffffa80d05b2180 Success Error Cancel

\Driver\volmgr volsnap!VspRefCountCompletionRoutine

Args: 00001000 00000000 3993d69568a 00000000

[ 4, 0] c e1 fffffa80d05b2030 00000000 fffff88001845344-fffff8800d52bb38 Success Error Cancel pending

\Driver\volsnap Ntfs!NtfsMasterIrpSyncCompletionRoutine

Args: 00001000 00000000 14615000 00000000

[ 4, 0] 0 0 fffffa80c2829030 fffffa80d04f18a0 00000000-00000000

\FileSystem\Ntfs

Args: 00001000 00000000 00002000 00000000

In the example below we see the “Success Error Cancel” fields and in this case we can also see that the pending flag has been set. The pending field indicates that STATUS_PENDING was returned to the caller. This is used so that I/O completion can determine whether or not to fully complete the I/O operation requested by the packet. Drivers can do this by calling IoMarkIrpPending.

Now look at the "cl" column as it holds the key to unlocking what "Success Error Cancel pending" really means.

1: kd> !irp 0xfffffa80`920c2340

Irp is active with 3 stacks 3 is current (= 0xfffffa80920c24a0) <--- _IO_STACK_LOCATION

Mdl=fffffa814e9c4f40: No System Buffer: Thread 00000000: Irp stack trace.

cmd flg cl Device File Completion-Context

[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[ f, 0] 1c 0 fffffa809209a060 00000000 fffff880010061a0-fffffa80920bfcc0

\Driver\mpio mpio!MPIOPdoCompletion

Args: fffffa80920c22b0 00000000 00000000 fffffa80920bfcc0

>[ f, 0] 1c e1 fffffa809209a060 00000000 fffff88001a01a00-fffffa80920c2190 Success Error Cancel pending

\Driver\mpio CLASSPNP!TransferPktComplete

Args: fffffa80920c22b0 00000000 00000000 fffffa80920bfcc0

Focusing on the cl column we see that our active stack frame is working in MPIO, but what is it trying to tell us about its status and intent? Well to figure that out we need to break down the value "e1" we see listed in our cl or Control column. This field is being used as a flag where the high and low parts represent two different values. We get this value in !irp from an _IO_STACK_LOCATION Control member. This can be found by dumping the stack location displayed by !irp.

1: kd> dt_IO_STACK_LOCATION 0xfffffa80920c24a0

nt!_IO_STACK_LOCATION

+0x000 MajorFunction : 0xf ''

+0x001 MinorFunction : 0 ''

+0x002 Flags : 0x1c ''

+0x003 Control : 0xe1 '' < -- Control flag

+0x008 Parameters : <unnamed-tag>

+0x028 DeviceObject : 0xfffffa80`9209a060 _DEVICE_OBJECT

+0x030 FileObject : (null)

+0x038 CompletionRoutine : 0xfffff880`01a01a00 long CLASSPNP!TransferPktComplete+0

+0x040 Context : 0xfffffa80`920c2190 Void

So now that we know where we get this value we still have to decode what it means and how this value results in what we see in our !IRP output. Let's split our number into two, our high and low parts. In our example above we had the value e1. Taking our value apart gives us an E and a 1. The first bit of our value will indicates the last returned value as our IRP was processed. In our case this is a 1, meaning that our IRP is Pending. Thus we see the word pending at the end of current stack frame in !irp. There are three possible values for this bit, 0, 1 or 2. 0 being nothing, 1 being pending and 2 being Error Returned.

Moving to our high bits we are left with the "e". Lets take this number and convert it to binary. We end up with 1110. This upper number indicates which invoke types were requested for the completion routine for the driver listed on that stack frame. If we look at MSDN we see that IoSetCompletionRoutine takes in three bool values to set this flag. These options specify whether the completion routine is called if the IRP is completed with that corresponding status.

VOID IoSetCompletionRoutine(

_In_ PIRP Irp,

_In_opt_ PIO_COMPLETION_ROUTINE CompletionRoutine,

_In_opt_ PVOID Context,

_In_ BOOLEAN InvokeOnSuccess,

_In_ BOOLEAN InvokeOnError,

_In_ BOOLEAN InvokeOnCancel

);

Doing some source review I was able to pin together how these values translate to the !irp output. If we look back at our binary value of "e" we see that we have a set of bits that get set based on the what the driver wanted to do when an IRP was completed with one of our defined status values.

Cancel = 2

Success = 4

Error = 8

Add each of these values up and the sum is 14 or e in hex. Going back to our binary "1110" we see that our first bit is set to zero or off. This would be the pending and Error Returned values that I mentioned above. The next three bits represent the corresponding Cancel, Success and Error bool values passed to the driver at the time IoSetCompletionRoutine was called for this stack location.

Example : IoSetCompletionRoutine(pirp, pCompletionRoutine,pContex, True,False,True); would yield a value of 6.

Remember the important thing here is not that the !irp output is trying to tell us that one of these things happened. It's telling us that this driver would like to be notified if one of those things does happen. This area also provides us with information if pending or or an error is returned.

Well that's all I have for now, for more documentation for the _IRP and _IO_STACK_LOCATION structures please see the following links to MSDN.

More on IRP: http://msdn.microsoft.com/en-us/library/windows/hardware/ff550694(v=vs.85).aspx

More on _IO_STACK_LOCATION: http://msdn.microsoft.com/en-us/library/windows/hardware/ff550659(v=vs.85).aspx

↧

Troubleshooting Pool Leaks Part 6 – Driver Verifier

October 31, 2012, 1:36 pm

≫ Next: Troubleshooting Pool Leaks Part 7 – Windows Performance Toolkit

≪ Previous: Breaking down the "Cl" in !irp

In part 5 we used poolhittag to get call stacks of pool being allocated and freed. This information is often essential to identifying the cause of a memory leak; however it is not always feasible to configure a live kernel debug to obtain this information. Fortunately there are alternative methods to get such call stacks.

Driver verifier has an option to enable pool tracking for a specific driver, or for multiple drivers. This functionality was first introduced in Windows Vista and Windows Server 2008. This information is also captured when driver verifier is used to enable special pool, however for the purposes of this article we will focus on using pool tracking.

The data stored by driver verifier requires a debugger to view. Any method of debugging can be used for this. You can use a live kernel debug as we described in part 4, you can get a memory dump (kernel or complete, a small dump is insufficient), or you can use livekd.

If you have used the steps from Part 1, Part 2, or Part 3, you likely have an idea which drivers are likely involved in creating the pool leak. In this example we are generating the leak using notmyfault, the same tool we have been using in prior examples. As seen in Part 2, the relevant driver is myfault.sys.

Although driver verifier has GUI, the easiest way to enable this functionality is with the below command from an elevated command prompt:

Verifier /flags 8 /driver myfault.sys

The above command will provide the following output, allowing you to confirm that the expected settings are enabled:

New verifier settings:

Special pool: Disabled

Pool tracking: Enabled

Force IRQL checking: Disabled

I/O verification: Disabled

Deadlock detection: Disabled

DMA checking: Disabled

Security checks: Disabled

Force pending I/O requests: Disabled

Low resources simulation: Disabled

IRP Logging: Disabled

Miscellaneous checks: Disabled

Verified drivers:

myfault.sys

You must restart this computer for the changes to take effect.

After rebooting the system, reproduce the memory leak and attach a debugger or generate a memory dump after the memory has been leaked.

Break in with the debugger (Ctrl+Break or Ctrl+C) or load the dump in windbg (File – Open Crash Dump).

Set the symbol path and reload symbols.

1: kd> .symfix c:\symbols

1: kd> .reload

Loading Kernel Symbols

...............................................................

The !verifier command has various options to view information about driver verifier. To view the pool allocations which have been tracked by verifier for notmyfault.sys, use the following:

0: kd> !verifier 3 myfault.sys

Verify Level 8 ... enabled options are:

All pool allocations checked on unload

Summary of All Verifier Statistics

RaiseIrqls 0x0

AcquireSpinLocks 0x0

Synch Executions 0x0

Trims 0x0

Pool Allocations Attempted 0xb

Pool Allocations Succeeded 0xb

Pool Allocations Succeeded SpecialPool 0xa

Pool Allocations With NO TAG 0x1

Pool Allocations Failed 0x0

Resource Allocations Failed Deliberately 0x0

Current paged pool allocations 0x0 for 00000000 bytes

Peak paged pool allocations 0x1 for 00000080 bytes

Current nonpaged pool allocations 0xa for 009CE000 bytes

Peak nonpaged pool allocations 0xa for 009CE000 bytes

Driver Verification List

Entry State NonPagedPool PagedPool Module

fffffa80031b5830 Loaded 009ce000 00000000 myfault.sys

Current Pool Allocations 0000000a 00000000

Current Pool Bytes 009ce000 00000000

Peak Pool Allocations 0000000a 00000001

Peak Pool Bytes 009ce000 00000080

PoolAddress SizeInBytes Tag CallersAddress

fffffa8005400000 0x000fb000 Leak fffff8800447d634

fffffa80052fb000 0x000fb000 Leak fffff8800447d634

fffffa8005200000 0x000fb000 Leak fffff8800447d634

fffffa80050fb000 0x000fb000 Leak fffff8800447d634

fffffa8005000000 0x000fb000 Leak fffff8800447d634

fffffa8004efb000 0x000fb000 Leak fffff8800447d634

fffffa8004e00000 0x000fb000 Leak fffff8800447d634

fffffa8004cfb000 0x000fb000 Leak fffff8800447d634

fffffa8004c00000 0x000fb000 Leak fffff8800447d634

fffffa8004a66000 0x000fb000 Leak fffff8800447d634

At the bottom of the above output is the list of allocations made by notmyfault.sys. For our purposes we are going to assume that these allocations have been leaked, as opposed to just being normal allocations that were not yet freed when the debugger was attached.

The !verifier command has an option to view call stacks for one of the tracked allocations. Keep in mind that the size of the database is limited and only more recent allocations will be kept in the database.

0: kd> !verifier 80 fffffa8005400000

Log of recent kernel pool Allocate and Free operations:

There are up to 0x10000 entries in the log.

Parsing 0x0000000000010000 log entries, searching for address 0xfffffa8005400000.

======================================================================

Pool block fffffa8005400000, Size 00000000000fa000, Thread fffffa80044ceb60

fffff80001927cc6 nt!VeAllocatePoolWithTagPriority+0x2b6

fffff80001927d3d nt!VerifierExAllocatePoolEx+0x1d

fffff8800447d634 myfault!MyfaultDeviceControl+0x358

fffff8800447d727 myfault!MyfaultDispatch+0xb7

fffff80001932750 nt!IovCallDriver+0xa0

fffff800017a7a97 nt!IopXxxControlFile+0x607

fffff800017a82f6 nt!NtDeviceIoControlFile+0x56

fffff8000148bed3 nt!KiSystemServiceCopyEnd+0x13

Parsed entry 0000000000010000/0000000000010000...

Finished parsing all pool tracking information.

The above output shows the call stack leading to the pool allocation. This is the same information we had seen in Part 5, however we are able to obtain this information using a dump or livekd, whereas the steps from Part 5 required an invasive debug and extended system downtime.

When you have completed troubleshooting, disable driver verifier with the following command and reboot:

verifier /reset

↧

Troubleshooting Pool Leaks Part 7 – Windows Performance Toolkit

November 30, 2012, 11:58 am

≫ Next: Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012

≪ Previous: Troubleshooting Pool Leaks Part 6 – Driver Verifier

In Part 1 of this series we identified a pool leak in non paged pool. In Part 2 and Part 3 of this series we identified what pool tag was leaking. In Part 5 and Part 6 we got call stacks showing the memory being allocated. In this article we are going to discuss a tool that combines this information into one piece of data.

Starting with Windows 7 and Windows Server 2008 R2, Windows has new functionality to track pool allocations and frees using the Windows Performance Toolkit, commonly referred to as xperf. For this example we will be using the WPT from the Windows 8 ADK. When installing the ADK select only the Windows Performance Toolkit option to minimize download time.

ADK Install

Before collecting pool usage data on a 64-bit system, you must disable the paging of data such as drivers and call stacks. The first time you run the Windows Performance Recorder UI and click the Start button you will be prompted that Disable Paging Executive is not set. If you click OK to this dialog WPR will set DisablePagingExecutive and ask that you reboot. To set this ahead of time run the following command from an elevated command prompt, and reboot afterwards:

wpr –disablepagingexecutive on

There are three methods to collect data from a pool leak using the Windows Performance Toolkit. The WPR UI, WPR command line, and xperf command line each provide different methods to collect this data.

Windows Performance Recorder UI:

The easiest way to record a trace is with the Windows Performance Recorder, which presents a GUI to make it easy for a user to record a trace. To collect data regarding a pool leak simply check the “Pool usage” checkbox and click the Start button. Reproduce the leak and after a few minutes click the Save button. Use Cancel to stop the trace after your log is saved. Note that by default the WPR will use a circular log in memory, if you record for a long period of time the log will wrap and data will be lost. A sequential file log will be captured by selecting File as the “Logging mode”, however these logs can become very large in a short period of time on a busy system so it is not recommended to leave the log running for longer than a few minutes.

WPR Pool usage

Because these traces can quickly become large, it can be helpful to trace just one pool tag. This can be done in WPR using a custom WPR recording profile. Below is a sample profile that collects pool usage information only for the tag “Leak”, which we identified as the leaking tag in previous articles. Save this text in a file ending in .wprp (ie PoolTagLeak.wprp) and load it in WPR using the Add Profiles button. Check the “Pool usage Tag ‘Leak’” option under Custom measurements. Use the Start button to begin collecting data, reproduce the leak, and use the Save button to save the log. After you have collected the log click Cancel to stop collecting data.

<?xml version="1.0" encoding="utf-8"?>

</SystemCollector>

</Keywords>

</Stacks>

</PoolTags>

</SystemProvider>

</SystemCollectorId>

</Collectors>

</Profile>

</Profiles>

</WindowsPerformanceRecorder>

WPR Custom Profile

Windows Performance Recorder command line:

WPR can also be run from a command line if you need to script its operation, or if you prefer typing text over clicking buttons. By default WPR will be installed at C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\wpr.exe.

To start a trace, run the following from a command line:

wpr -start GeneralProfile -start Pool

To save the trace run:

wpr –stop pool.etl “pool leak”

Cancel the trace with:

wpr -cancel

Optionally, you can use the custom profile defined earlier in this article to trace just one pool tag. This command assumes PoolTagLeak.wprp is in the same folder as wpr.exe, use the full path if your custom profile is located elsewhere. Save and cancel the trace using the above steps.

wpr -start PoolTagLeak.wprp

Xperf command line:

The third way to enable this tracing is with xperf. This is an older tool which has been replaced by WPR, however xperf provides a circular logging functionality that isn’t available in WPR. Circular logging can be useful if you need to run xperf over a longer period of time. To enable tracing with xperf, and use a circular buffer, use the below command:

xperf -on Base+CSwitch+POOL -stackwalk PoolAlloc+PoolAllocSession–PoolTag Leak -BufferSize 1024 -MaxBuffers 1024 -MaxFile 1024 -FileModeCircular

To save and cancel the xperf trace in one command:

xperf -d pool.etl

Analyzing data with Windows Performance Analyzer:

After you have collected a trace using the method that works best for your scenario, open the etl file in the Windows Performance Analyzer. The below output is from a trace collected with WPR.

WPA Graph Explorer

Pool analysis will require symbols. Configure the symbol path using the option in the Trace menu. Often the symbol path will be pre-populated, if it is not use srv*c:\symbols*http://msdl.microsoft.com/download/symbols. Click the Load Symbols option from the Trace menu, and be patient while WPA downloads symbols from the symbol server.

In the Graph Explorer click the + next to Memory to drop down the available memory graphs. Right click the Pool Total Allocation Size graph and choose Add graph to New Analysis View. If you are working with a small resolution screen you may want to click the X at the top of the Graph Explorer to close it, the Graph Explorer can be restored from the Window menu.

The key to effective xperf analysis is to sort the data by the appropriate columns. Columns can be added to the chart at the bottom of the view by right clicking the header and choosing the appropriate fields. To perform pool analysis the Type, Paged, Pool Tag, and Stack columns are necessary. Drag each of these columns to the left of the yellow line and sort them in the order shown below. Click the Size column to sort it as the primary.

WPA Pool Graphs

The Type column indicates when the pool memory was allocated and when it was freed. The term “AIFO” means the pool was Allocated Inside the timeframe of the trace and it was Freed Outside the timeframe of the trace (or perhaps it was never freed at all). The term “AIFI” means the pool was Allocated Inside the timeframe of the trace, and it was also Freed Inside the timeframe of the trace (this memory was not leaked). Because we are interested in memory that was not freed, start by clicking the + next to AIFO.

The Paged column indicates if the pool allocations recorded are Paged or NonPaged. From the perfmon analysis in Part 1 we know that the leak we are troubleshooting in this example is in NonPaged pool. If a perfmon log is not available, the Size column is an indicator of what type of pool was leaked. Click the + next to whichever type of pool is largest in your trace.

The Pool Tag column displays the pool tag associated with each pool allocation. Again, the Size column is an indicator of which tag is leaking. Click the + next to the largest pool tag in your trace.

The Stack column displays the call stack leading up to the allocation. This is the information we are most interested in, it will indicate what driver is allocating the pool and it may indicate why. Click the + next to the largest Stack in your trace. Depending on how many times a particular code path is repeated, your Stack may only partially display and there may be more + options, you can use the right arrow key as a shortcut to open each of these until you see the call to ExAllocatePool. When the complete stack has been displayed the right arrow key will stop expanding stacks.

WPA Tag 'Leak' Call Stack

In the above output we can see that there was a NonPaged pool leak in the tag Leak. The call stack shows that the allocations were made by myfault.sys. A driver developer would have a great use for this information. If this was data from an actual leak the developer would use this output to determine that the leak is occurring due to an IOCTL sent from NotMyfault.exe!WinMain which leads to an allocation made in myfault.sys!MyfaultDeviceControl. A developer can use this information to perform a code review and identify under what conditions MyfaultDeviceControl allocates this pool, under what conditions it should be expected to free it, and why it may not free the memory.

This article concludes our series on troubleshooting pool leaks. We have demonstrated various techniques which each have their own strengths and weaknesses. Each of these techniques has a place in your debugging toolkit and are applicable to different circumstances depending on what your scenario is and what data you have available.

↧

Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012

December 7, 2012, 1:21 pm

≫ Next: Use Caution When Implementing IPC for Performance Counters

≪ Previous: Troubleshooting Pool Leaks Part 7 – Windows Performance Toolkit

What is a bug check 0x133?

Starting in Windows Server 2012, a DPC watchdog timer is enabled which will bug check a system if too much time is spent in DPC routines. This bug check was added to help identify drivers that are deadlocked or misbehaving. The bug check is of type "DPC_WATCHDOG_VIOLATION" and has a code of 0x133. (Windows 7 also included a DPC watchdog but by default, it only took action when a kernel debugger was attached to the system.) A description of DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff544084(v=vs.85).aspx.

The DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug check set to 0. In this case, the system's time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2. Alternatively, if the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQL was raised to DPC level, the system will stop with a 0x133 with parameter 1 set to 1. Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on the system are set much higher.

How to debug a 0x133 (0, …

In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. For example, here is a debug of a 0x133 (0,…) kernel dump:

0: kd> .bugcheck

Bugcheck code 00000133

Arguments 00000000`00000000 00000000`00000283 00000000`00000282 00000000`00000000

Per MSDN, we know that this DPC has run for 0x283 ticks, when the limit was 0x282.

0: kd> k

Child-SP RetAddr Call Site

fffff803`08c18428 fffff803`098525df nt!KeBugCheckEx

fffff803`08c18430 fffff803`09723f11 nt! ??::FNODOBFM::`string'+0x13ba4

fffff803`08c184b0 fffff803`09724d98 nt!KeUpdateRunTime+0x51

fffff803`08c184e0 fffff803`09634eba nt!KeUpdateTime+0x3f9

fffff803`08c186d0 fffff803`096f24ae hal!HalpTimerClockInterrupt+0x86

fffff803`08c18700 fffff803`0963dba2 nt!KiInterruptDispatchLBControl+0x1ce

fffff803`08c18898 fffff803`096300d0 hal!HalpTscQueryCounter+0x2

fffff803`08c188a0 fffff880`04be3409 hal!HalpTimerStallExecutionProcessor+0x131

fffff803`08c18930 fffff880`011202ee ECHO!EchoEvtTimerFunc+0x7d //Here is our driver, and we can see it calls into StallExecutionProcessor

fffff803`08c18960 fffff803`097258b4 Wdf01000!FxTimer::TimerHandler+0x92

fffff803`08c189a0 fffff803`09725ed5 nt!KiProcessExpiredTimerList+0x214

fffff803`08c18ae0 fffff803`09725d88 nt!KiExpireTimerTable+0xa9

fffff803`08c18b80 fffff803`0971fe76 nt!KiTimerExpiration+0xc8

fffff803`08c18c30 fffff803`0972457a nt!KiRetireDpcList+0x1f6

fffff803`08c18da0 00000000`00000000 nt!KiIdleLoop+0x5a

Let’s view the driver’s unassembled DPC routine and see what it is doing

0: kd> ub fffff880`04be3409

ECHO!EchoEvtTimerFunc+0x54:

fffff880`04be33e0 448b4320 mov r8d,dword ptr[rbx+20h]

fffff880`04be33e4 488b0d6d2a0000 mov rcx,qword ptr [ECHO!WdfDriverGlobals (fffff880`04be5e58)]

fffff880`04be33eb 4883631800 and qword ptr [rbx+18h],0

fffff880`04be33f0 488bd7 mov rdx,rdi

fffff880`04be33f3 ff150f260000 call qword ptr [ECHO!WdfFunctions+0x838(fffff880`04be5a08)]

fffff880`04be33f9 bbc0d40100 mov ebx,1D4C0h

fffff880`04be33fe b964000000 mov ecx,64h

fffff880`04be3403 ff15f70b0000 call qword ptr[ECHO!_imp_KeStallExecutionProcessor (fffff880`04be4000)] //Its Calling KeStallExecutionProcessor with 0x64 (decimal 100) as a parameter

0: kd> u fffff880`04be3409

ECHO!EchoEvtTimerFunc+0x7d:

fffff880`04be3409 4883eb01 sub rbx,1

fffff880`04be340d 75ef jne ECHO!EchoEvtTimerFunc+0x72 (fffff880`04be33fe) //Here we can see it is jumping back to call KeStallExecutionProcessor in a loop

fffff880`04be340f 488b5c2430 mov rbx,qword ptr[rsp+30h]

fffff880`04be3414 4883c420 add rsp,20h

fffff880`04be3418 5f pop rdi

fffff880`04be3419 c3 ret

fffff880`04be341a cc int 3

fffff880`04be341b cc int 3

0: kd> !pcr

KPCR for Processor 0 at fffff80309974000:

Major 1 Minor 1

NtTib.ExceptionList: fffff80308c11000

NtTib.StackBase: fffff80308c12080

NtTib.StackLimit: 000000d70c7bf988

NtTib.SubSystemTib: fffff80309974000

NtTib.Version: 0000000009974180

NtTib.UserPointer: fffff803099747f0

NtTib.SelfTib: 000007f7ab80c000

SelfPcr: 0000000000000000

Prcb: fffff80309974180

Irql: 0000000000000000

IRR: 0000000000000000

IDR: 0000000000000000

InterruptMode: 0000000000000000

IDT: 0000000000000000

GDT: 0000000000000000

TSS: 0000000000000000

CurrentThread: fffff803099ce880

NextThread: fffffa800261cb00

IdleThread: fffff803099ce880

DpcQueue: 0xfffffa80020ce790 0xfffff880012e4e9c [Normal] NDIS!NdisReturnNetBufferLists

0xfffffa800185f118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize

0xfffff8030994fda0 0xfffff8030972bc30 [Normal] nt!KiBalanceSetManagerDeferredRoutine

0xfffffa8001dbc118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize

0xfffffa8002082300 0xfffff88001701df0 [Normal] USBPORT

The !pcr output shows us queued DPCs for this processor. If you want to see more information about DPCs and the DPC Watchdog, you could dump the PRCB listed in the !pcr output like this:

dt nt!_KPRCB fffff80309974180 Dpc*

Often the driver will be calling into a function like KeStallExecutionProcessor in a loop, as in our example debug. To resolve this problem, contact the driver vendor to request an updated driver version that spends less time in its DPC Routine.

How to troubleshoot a 0x133 (1, …

Determining the cause of a stop 0x133 with a first parameter of 1 is a bit more difficult because the problem is a result of DPCs running from multiple drivers, so the call stack is insufficient to determine the culprit. To troubleshoot this stop, first make sure that the NT Kernel Logger or Circular Kernel Context Logger ETW traces are enabled on the system. (For directions on setting this up, see http://blogs.msdn.com/b/ntdebugging/archive/2009/12/11/test.aspx.)

Once the logging is enabled and the system bug checks, dump out the list of ETW loggers using !wmitrace.strdump. Find the ID of the NT Kernel logger or the Circular logger. You can then use !wmitrace.logsave (ID) (path to ETL) to write out the ETL log to a file. Load it up with Windows Performance Analyzer and add the DPC or DPC/ISR Duration by Module, Function view (located in the Computation group) to your current analysis window:

Next, make sure the table is also shown by clicking the box in the upper right of the view:

Ensure that the Address column is added on the left of the gold bar, then expand each address entry to see individual DPC enters/exits for each function. Using this data, you can determine which DPC routines took the longest by looking at the inclusive duration column, which should be added to the right of the gold bar:

In this case, these DPCs took 1 second, which is well over the recommended maximum of 100 us. The module column (and possible the function column, if you have symbols) will show which driver is responsible for that DPC routine. Since our ECHO driver was based on WDF, that is the module named here.

For an example of doing this type of analysis in xperf, see http://blogs.msdn.com/b/ntdebugging/archive/2008/04/03/windows-performance-toolkit-xperf.aspx.

More Information

For additional information about Stop 0x133 errors, see this page on MSDN: http://msdn.microsoft.com/en-us/library/windows/hardware/jj154556(v=vs.85).aspx.

For DPC timing recommendations and for advice on capturing DPC timing information using tracelog, see http://msdn.microsoft.com/en-us/library/windows/hardware/ff545764(v=vs.85).aspx.

Guidelines for writing DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff546551(v=vs.85).aspx.

-Matt Burrough

↧

Use Caution When Implementing IPC for Performance Counters

December 31, 2012, 3:06 pm

≫ Next: Case of the Unexplained Services exe Termination

≪ Previous: Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012

Recently I was working with a developer who had created performance counters that work in Performance Monitor but are never collected in a user defined data collector set. The customer explained that their counters update named shared memory inside the application which should be read by perfmon or the data collector set.

Putting counter data in shared memory is a common technique for performance counter developers. A programmer can update performance data in a block of shared memory in their application and then use a performance extension dll (aka an “extensible counter”) to read from the shared memory.

Shared memory is created by calling CreateFileMapping and MapViewOfFile. This memory is then accessed by another application by calling OpenFileMapping. All applications which use this shared memory must pass the same lpName to CreateFileMapping or OpenFileMapping. An example of using these APIs to implement shared memory is available on MSDN.

Based on the customer’s explanation that they are populating shared memory in their application, and their counters work in Performance Monitor but do not work in a user defined data collector set, I suspected that OpenFileMapping was failing for the data collector set.

User defined data collector sets run in a rundll32.exe process. If you have multiple rundll32.exe processes you may need to identify which one is related to your data collector set. The relevant process has a command line similar to “rundll32.exe C:\Windows\system32\pla.dll,PlaHost”. There are several tools that can be used to identify the command line of the process such as tlist.exe, which is included with the Debugging Tools for Windows.

After attaching a debugger to rundll32.exe, I wanted to break on the ret instruction at the end of the OpenFileMappingW function. This would allow me to determine if the function succeeds or fails. According to MSDN “If the function fails, the return value is NULL. To get extended error information, call GetLastError.”

The uf command is an easy way to unassemble a function and find the ret instruction to break on.

0:001> uf kernelbase!OpenFileMappingW

kernelbase!OpenFileMappingW:

75b88e0d 8bff mov edi,edi

<snip>

kernelbase!OpenFileMappingW+0x8f:

75b88e79 c9 leave

75b88e7a c20c00 ret 0Ch

0:001> bp 75b88e7a

0:001> g

Breakpoint 0 hit

eax=00000000 ebx=00008022 ecx=7ffd8000 edx=00000002 esi=05abf03c edi=00000000

eip=75b88e7a esp=05abeb20 ebp=05abeb3c iopl=0 nv up ei pl zr na pe nc

cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

kernelbase!OpenFileMappingW+0x90:

75b88e7a c20c00 ret 0Ch

In the above output we can see that the eax register is NULL, indicating that the call to OpenFileMapping failed. The !gle command will show the last error and last status.

0:008> !gle

LastErrorValue: (Win32) 0x2 (2) - The system cannot find the file specified.

LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found.

The failure is that OpenFileMapping cannot find the file. The file name is the third parameter to OpenFileMapping. We can get the first three parameters from the kb command.

0:008> kb

ChildEBP RetAddr Args to Child

05abf0d0 6abae355 0002001f 00000000 05abeb7c kernelbase!OpenFileMappingW+0x90

WARNING: Stack unwind information not available. Following frames may be wrong.

05abf0f8 7784fe67 02a7ae90 05abf224 05abf254 ninjaprf+0x10edb

05abf110 7784fc97 00472158 02a7ae90 05abf224 advapi32!CallExtObj+0x17

05abf270 7784efaf 05abf2bc 60fcfa02 05abf778 advapi32!QueryExtensibleData+0x735

05abf654 75ff0468 80000004 05abf778 00000000 advapi32!PerfRegQueryValue+0x5da

05abf748 75ffd505 80000004 05abf778 05abf790 kernel32!LocalBaseRegQueryValue+0x366

05abf7b4 61247dc5 80000004 02a7ae90 00000000 kernel32!RegQueryValueExW+0xb7

05abf830 61250595 80000004 02a7ae58 02a7ae90 pdh!GetSystemPerfData+0x92

05abf89c 6124c753 02a407d0 05abf8e8 61241928 pdh!GetQueryPerfData+0xa4

05abf8b8 61254463 02a407d0 05abf8e8 60fcf32f pdh!PdhiCollectQueryData+0x32

05abf90c 611c6d04 02a58f08 00000000 75ffc3e0 pdh!PdhUpdateLogW+0xa2

05abf9bc 611be128 0045c968 00000000 00000000 pla!HPerformanceCounterDataCollector::Cycle+0x48

05abf9bc 00000000 0045c968 00000000 00000000 pla!PlaiCollectorControl+0x3b7

0:008> da 05abeb7c

05abeb7c "Local\NINJAPERF_S-1-5-18"

The user defined data collector set is failing to open the file "Local\NINJAPERF_S-1-5-18". This is the name that the performance extension dll ninjaprf.dll has given to its shared memory.

Based on the customer’s description this operation works in Performance Monitor. Next, I attached a debugger to perfmon and set the same breakpoint.

Breakpoint 0 hit

eax=000009f8 ebx=00008022 ecx=a7330000 edx=080ee678 esi=06798070 edi=00000000

eip=760be9bb esp=0a84e564 ebp=0a84e580 iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

kernelbase!OpenFileMappingW+0x90:

760be9bb c20c00 ret 0Ch

In the above output we can see that eax is a handle number, indicating that the function succeeded.

Looking at the file being opened we can see why this works in one scenario and not in another.

0:016> kb 1

ChildEBP RetAddr Args to Child

0a84e580 698e4ab9 0002001f 00000000 0a84e5c0 kernelbase!OpenFileMappingW+0x90

0:016> da 0a84e5c0

0a84e5c0 "Local\NINJAPERF_S-1-5-21-123578"

0a84e5e0 "095-571698237-1598563147-18961"

The file name used is unique for each user. The ninjaprf dll has chosen a file name which includes the SID of the current user. This works for Performance Monitor because the user who starts the application is the same as the user who runs Performance Monitor. However, a user defined data collector set runs in rundll32.exe as the Local System account.

It is possible to run the user defined data collector set as a different user, however the file name being used will not work in that scenario either. By using the “Local\” prefix, the file is created in the local session namespace. The application runs in the user’s session, while rundll32.exe is started by the Task Scheduler service and runs in session 0. This prevents the user defined data collector set from seeing the file created by the application.

If your performance counter uses shared memory to communicate, be aware that your performance extension dll may be run in the context of a different user account and a different user session. Your inter process communication techniques must account for this, or your customers will ask you why your counters do not work outside of Performance Monitor.

↧

Case of the Unexplained Services exe Termination

January 30, 2013, 3:06 pm

≫ Next: How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

≪ Previous: Use Caution When Implementing IPC for Performance Counters

Hello Debuggers! This is Ron Stock from the Global Escalation Services team and I recently worked an interesting case dispatched to our team because Services.exe was terminating. Nothing good ever happens when Services.exe exits. In this particular case, client RDP sessions were forcibly disconnected from the server and the server machine was shutting down unexpectedly. This is the message encountered at the console on the server.

The customer was able to trigger the crash by changing anything to do with a particular non-Microsoft service in the Services MMC (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...) To protect our vendor friend, I refer to this service as FriendlyService throughout this article. We could have stopped the investigation right here and implicated the FriendlyService vendor, however as you will see, this service was merely a victim.

Investigation

When investigating a process termination, I routinely gather a process dump to start my investigation. The frustrating thing about this instance was how none of our debugging tools were generating a dump file when Services.exe terminated. I tried the usual toolset including AdPlus, ProcDump and DebugDiag to no avail. Despite the lack of data from a memory dump, I was still able to piece together a stack and attack the problem through a creative approach. Debugging is an art.

First I reviewed the application log which was loaded with entries like this one.

11/29/2012 04:29:05 PM Information HE2NTSP208 1004 Application Error N/A Faulting application services.exe, version 5.2.3790.4455, faulting module msvcr80.dll, version 8.0.50727.6195, fault address 0x000000000001df67.

From the event log entry above I was able to determine the faulting module, msvcr80.dll, and the relative offset, 0x000000000001df67. This address is basically the return address of the faulting call. As you can see from the ln output below, the function name was msvcr80!wcscpy_s. As I noted above, I wasn’t able to gather a dump during process termination so I had the customer use ProcDump to snap a dump of Services.exe during normal process operation (not a crash dump).

0:000> ln 00000000`78130000 + 0x000000000001df67
(00000000`7814ded0) msvcr80!wcscpy_s+0x97 | (00000000`7814df80) msvcr80!wcsncpy_s

Next I needed to determine which DLLs in the Services.exe process were calling msvcr80!wcscpy_s by reviewing the import tables of the binaries loading in the process. I used the !peb command to dump out the Process Environment Block (PEB). This gave me the list of base addresses for each loaded DLL. I focused mainly on non-Microsoft binaries. To protect our vendor friends, I renamed the DLL to ThirdPartyServiceMonitor.Dll in this article.

This is the output from the !peb command with the base addresses in the left column.

0:000> !peb

PEB at 000007fffffde000

InheritedAddressSpace: No

ReadImageFileExecOptions: No

BeingDebugged: Yes

ImageBaseAddress: 0000000100000000

Ldr 0000000077fa9f20

Ldr.Initialized: Yes

Ldr.InInitializationOrderModuleList: 00000000000d2df0 . 00000000001a3600

Ldr.InLoadOrderModuleList: 00000000000d2d20 . 000000000019cbc0

Ldr.InMemoryOrderModuleList: 00000000000d2d30 . 000000000019cbd0

Base TimeStamp Module

100000000 49882047 Feb 03 04:45:27 2009 C:\WINDOWS\system32\services.exe

77ec0000 4ecbcd57 Nov 22 10:27:03 2011 C:\WINDOWS\system32\ntdll.dll

77d40000 49c51cdd Mar 21 11:59:09 2009 C:\WINDOWS\system32\kernel32.dll

7ff7fc00000 45d6ccae Feb 17 03:36:46 2007 C:\WINDOWS\system32\msvcrt.dll

7ff7fee0000 4a61f064 Jul 18 10:55:16 2009 C:\WINDOWS\system32\ADVAPI32.dll

7ff7fd30000 4c6ba77a Aug 18 04:27:22 2010 C:\WINDOWS\system32\RPCRT4.dll

7ff7e9c0000 4a37438e Jun 16 02:02:38 2009 C:\WINDOWS\system32\Secur32.dll

77c20000 45e7c5c2 Mar 02 00:35:46 2007 C:\WINDOWS\system32\USER32.dll

7ff7fc90000 490062ac Oct 23 06:40:28 2008 C:\WINDOWS\system32\GDI32.dll

7ff7c680000 45d6ccab Feb 17 03:36:43 2007 C:\WINDOWS\system32\USERENV.dll

7ff7c450000 45d6cc90 Feb 17 03:36:16 2007 C:\WINDOWS\system32\SCESRV.dll

7ff7e490000 45d6cc04 Feb 17 03:33:56 2007 C:\WINDOWS\system32\AUTHZ.dll

7ff77370000 4fedd464 Jun 29 11:14:28 2012 C:\WINDOWS\system32\NETAPI32.dll

7ff7c410000 45d6cca5 Feb 17 03:36:37 2007 C:\WINDOWS\system32\umpnpmgr.dll

7ff7d4d0000 45d6ccb8 Feb 17 03:36:56 2007 C:\WINDOWS\system32\WINSTA.dll

7ff65470000 45d6cc5c Feb 17 03:35:24 2007 E:\Program Files\ThirdPartyDirectory\ThirdParty2.dll

400000 424360e9 Mar 24 19:52:57 2005 C:\WINDOWS\system32\msvcp60.dll

7ff7d500000 45d6cc3b Feb 17 03:34:51 2007 C:\WINDOWS\system32\IMM32.DLL

e50000 4df9462e Jun 15 18:54:22 2011 E:\Program Files\ThirdPartyDirectory\ThirdPartyServiceMonitor.dll

Using the base address of ThirdPartyServiceMonitor, I dumped the header to find the Import Address Table Directory.

0:000> !dh 00000000`00e50000

1E000 [ 758] address [size] of Import Address Table Directory

Using the dps command I dumped all of the functions in the import table of ThirdPartyServiceMonitor.dll. I found msvcr80!wcscpy_s in the function list. This indicates ThirdPartyServiceMonitor.dll makes calls to the msvcr80!wcscpy_s.

0:000> dps 00000000`00e50000+ 1E000 l758/@$ptrsize

00000000`00e6e458 00000000`7814d890 msvcr80!_wcsnicmp

00000000`00e6e460 00000000`7814db20 msvcr80!_wcsicmp

00000000`00e6e498 00000000`7814ded0 msvcr80!wcscpy_s

Since this was the only third party DLL with msvcr80!wcscpy_s in its import table, I was able to continue piecing together my stack. ThirdPartyServiceMonitor.dll was calling msvcr80!wcscpy_s and causing Services.exe to crash. At this point in the investigation, the stack looks like this-

msvcr80!wcscpy_s

ThirdPartyServiceMonitor+<offset>

In my quest to continue building my “conceptual” stack without a crash dump file, I reviewed the status code from the “System Shutdown” dialog displayed when Services.exe terminated. Notice the status code -1073741811 in the error. What the heck does that mean?

Well I easily resolved the cryptic status code by passing 0n-1073741811 to the !error command in the debugger. The “0n” prefix indicates the value should be interpreted as decimal rather than hex by the debugger. I also included the negative symbol “-“ because this also appears in status code.

0:030> !error 0n-1073741811

Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.

Armed with the status code information, I reviewed the msvcr80!wcscpy_s assembly code to determine if this status code was returned at any point. In the assembly I found a call to a function named msvcr80!_invalid_parameter. In context of the message, “an invalid parameter was passed to a service or function”, this certainly sounds like the code path taken.

0:000> uf msvcr80!wcscpy_s

msvcr80!wcscpy_s+0x1a

18 00000000`7814deea e88184feff call msvcr80!_errno(00000000`78136370)

18 00000000`7814deef 4533c9 xor r9d,r9d

18 00000000`7814def2 4533c0 xor r8d,r8d

18 00000000`7814def5 33d2 xor edx,edx

18 00000000`7814def7 33c9 xor ecx,ecx

18 00000000`7814def9 48c744242000000000 mov qword ptr [rsp+20h],0

18 00000000`7814df02 c70016000000 mov dword ptr[rax],16h

18 00000000`7814df08 e873d1feff call msvcr80!_invalid_parameter (00000000`7813b080)

18 00000000`7814df0d b816000000 mov eax,16h

34 00000000`7814df12 4883c438 add rsp,38h

34 00000000`7814df16 c3 ret

It was reasonable to add this call to my conceptual stack because the call tree makes sense.

msvcr80!_invalid_parameter

msvcr80!wcscpy_s

ThirdPartyServiceMonitor+<offset>

Because I’m a curious kind of guy, I unassembled msvcr80!_invalid_parameter to peel back another layer of the onion. To my surprise I found a call to msvcr80!_imp_TerminateProcess. BOOM! This explains why the debugger wasn’t catching the process crash. The process was terminating ‘organically’ through a TerminateProcess call rather than crashing due to an exception, however it was unexpectedly terminating. In other words, all of the services running on the machine were not expecting Services.exe to terminate.

msvcr80!_invalid_parameter+0xd5

88 00000000`7813b155 ff156d200900 call qword ptr [msvcr80!_imp_GetCurrentProcess (00000000`781cd1c8)]

88 00000000`7813b15b ba0d0000c0 mov edx,0C000000Dh

88 00000000`7813b160 488bc8 mov rcx,rax

88 00000000`7813b163 ff1557200900 call qword ptr [msvcr80!_imp_TerminateProcess(00000000`781cd1c0)]

Now I was able to cobble together a fairly accurate stack without a dump file. At this point I could tell ThirdPartyServiceMonitor.dll was passing bad parameters to msvcr80!wcscpy_s. However, this didn’t explain how FriendlyService (mentioned in the blog introduction) was triggering the issue. I needed to go deeper with a live debug by leveraging the stack information I devised.

msvcr80!_imp_TerminateProcess

msvcr80!_invalid_parameter

msvcr80!wcscpy_s

ThirdPartyServiceMonitor+<offset>

On the customer’s server I attached Windbg to the Services.exe process and set a breakpoint on msvcr80!_invalid_parameter.

0:001> bp msvcr80!_invalid_parameter

Then I had the customer reproduce the issue by changing the startup type on FriendlyService in the Services MMC. As I mentioned above, this was one way to trigger the issue. BOOM! My breakpoint hit and looked exactly like the conceptual stack I pieced together. Now I was able to determine what ThirdPartyServiceMonitor was passing to msvcr80!wcscpy_s while broken in with the debugger.

0:035> k

Child-SP RetAddr Call Site

00000000`105deb00 00000000`7814df67 msvcr80!_invalid_parameter+0xe3

00000000`105df0c0 00000000`00e53045 msvcr80!wcscpy_s+0x97

00000000`105df100 00000000`00e5947e ThirdPartyServiceMonitor+0x3045

00000000`105df130 00000000`00e58405 ThirdPartyServiceMonitor+0x947e

00000000`105df180 000007ff`7fd69c75 ThirdPartyServiceMonitor+0x8405

00000000`105df1b0 000007ff`7fe9ccc9 rpcrt4!Invoke+0x65

00000000`105df200 000007ff`7fe9d58d rpcrt4!NdrStubCall2+0x54d

I reviewed the assembly of ThirdPartyServiceMonitor at the point in which it calls msvcr80!wcscpy_s. I discovered that the vendor hardcoded the size of the destination string buffer with 200 hex (512 decimal) while the size of the source string buffer was greater than 512 decimal.

0:035> ub 00000000`00e53045

ThirdPartyServiceMonitor+0x3020:

00000000`00e53020 898324040000 mov dword ptr [rbx+424h],eax

00000000`00e53026 488b4738 mov rax,qword ptr [rdi+38h]

00000000`00e5302a 4c8b4010 mov r8,qword ptr [rax+10h] <<<< Source Buffer

00000000`00e5302e 4d85c0 test r8,r8

00000000`00e53031 7412 je ThirdPartyServiceMonitor+0x3045 (00000000`00e53045)

00000000`00e53033 488d8b28040000 lea rcx,[rbx+428h] <<<< Destination Buffer

00000000`00e5303a ba00020000 mov edx,200h <<<<< Buffer Size hardcoded to 200 hex

00000000`00e5303f ff1553b40100 call qword ptr [ThirdPartyServiceMonitor!PlugControl+0xff38 (00000000`00e6e498)] <<<< call to msvcr80!wcscpy_s

00000000`00e53045 eb16 jmp ThirdPartyServiceMonitor +0x305d (00000000`00e5305d) <<<< return from msvcr80!wcscpy_s

0:035> .formats 200

Evaluate expression:

Hex: 00000000`00000200

Decimal: 512

Here is the definition of wcscpy_s from MSDN

errno_t wcscpy_s(

wchar_t *strDestination, // Location of destination string buffer

size_t numberOfElements, // Size of the destination string buffer.

const wchar_t *strSource // Null-terminated source string buffer.

);

Here are the parameters passed to msvcr80!wcscpy_s. As a reminder, the x64 compiler uses rcx to pass the first parameter, rdx for the second, and r8 for the third. In this case the buffer size was the second parameter in rdx and the source buffer was in r8, the third parameter.

0:035> dq 00000000`00e6e498 l1

00000000`00e6e498 00000000`7814ded0

0:035> ln 00000000`7814ded0

(00000000`7814ded0) msvcr80!wcscpy_s | (00000000`7814df80) msvcr80!wcsncpy_s

Exact matches:

msvcr80!wcscpy_s

Dumping the source string in r8 showed the string was clearly longer than 512 characters.

0:035> dc 00000000`0103b438

00000000`0103b438 003a0000 0049005c 0074006e 00670065 ..:.\.F.r.e.i.n.

00000000`0103b448 00610072 00690074 006e006f 00670041 d.l.y.S.e.r.v.i.

00000000`0103b458 006e0065 005c0074 0069006c 005c0062 c.e.\.l.i.b.\..

0:035> dc

00000000`0103b738 00650070 002e0072 00700061 002e0070 p.e.r...a.p.p...

00000000`0103b748 00610070 00610072 0065006d 00650074 p.a.r.a.m.e.t.e.

00000000`0103b758 002e0072 003d0031 00770020 00610072 r...1.=. .w.r.a.

00000000`0103b768 00700070 00720065 0061002e 00700070 p.p.e.r...a.p.p.

00000000`0103b778 0070002e 00720061 006d0061 00740065 ..p.a.r.a.m.e.t.

00000000`0103b788 00720065 0032002e 0020003d 00720077 e.r...2.=. .w.r.

00000000`0103b798 00700061 00650070 002e0072 00700061 a.p.p.e.r...a.p.

00000000`0103b7a8 002e0070 00610070 00610072 0065006d p...p.a.r.a.m.e.

0:035> dc

00000000`0103b7b8 00650074 002e0072 003d0033 00770020 t.e.r...3.=. .w.

00000000`0103b7c8 00610072 00700070 00720065 0061002e r.a.p.p.e.r...a.

00000000`0103b7d8 00700070 0070002e 00720061 006d0061 p.p...p.a.r.a.m.

00000000`0103b7e8 00740065 00720065 0034002e 0020003d e.t.e.r...4.=. .

00000000`0103b7f8 00720077 00700061 00650070 002e0072 w.r.a.p.p.e.r...

00000000`0103b808 00700061 002e0070 00610070 00610072 a.p.p...p.a.r.a.

00000000`0103b818 0065006d 00650074 002e0072 003d0035 m.e.t.e.r...5.=.

00000000`0103b828 00770020 00610072 00700070 00720065 .w.r.a.p.p.e.r.

It was not hard to figure out this string was the imagefile path located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\FriendlyService. You can see from the string that service takes parameters which increased the size of the image path beyond 512 characters.

FriendlyService\lib\friendly.3\sbin\friendlywindows-x86-32.exe-s E:\ FriendlyService\conf\wrapper.conf set.DATA_APP=apia "set.DATA_APP_LONG= FriendlyServiceVendor" set.DATA_EXE=E:\ FriendlyService\bin\..\lib\friendly.3\bin set.DATA_HOME=E:\ FriendlyService\bin\..\lib\friendly.3 set.INSTALL_DIR=E:\ FriendlyService\bin\.. wrapper.working.dir=E:\ FriendlyService\bin\.. wrapper.app.parameter.1= wrapper.app.parameter.2= wrapper.app.parameter.3= wrapper.app.parameter.4= wrapper.app.parameter.5= wrapper.app.parameter.6= wrapper.app.parameter.7= wrapper.app.parameter.8= wrapper.app.parameter.9=

The ThirdPartyServiceMonitor hooks any changes made to a service (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...). After hooking the change, ThirdPartyServiceMonitor.dll performs a string copy of the service’s image file path. In most cases this works like a champ, however in this instance, the FriendlyService image path from vendor 1 is really long and the ThirdPartyServiceMonitor from vendor 2 doesn’t account for service image paths exceeding 512 characters. This is the perfect storm! ThirdPartyServiceMonitor needs to remove the hardcoded buffer size.

This example occurred on Windows Server 2003. Starting in Windows 7 and Windows Server 2008 R2, Windows has added functionality for catching processes that exit ‘organically’ through a TerminateProcess call. You can find information on this added functionality on MSDN.

↧