How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

January 31, 2013, 3:10 pm

≫ Next: Debugging a Debugger to Debug a Dump

≪ Previous: Case of the Unexplained Services exe Termination

This is Ron Stock from the Global Escalation Services team and I recently had the task of live debugging a customer’s remote server. In debug circles we use what is known as a crash cart to live debug production servers. The phrase conjures up visions of a wheeled cabinet containing an emergency defibrillator, a heart monitor and latex gloves. Luckily for our purposes, the term merely denotes a machine setup with the Debugging Tools for Windows. This life saving machine is attached to the ailing production server for debugging and no medical degree is required.

The ailing production server is referred to as the Target Computer and the Debugging Tools for Windows are installed on the Host computer. The machines are attached with either a null-modem cable, 1394 cable, a special USB cable, or an ethernet cable (network debugging was added in Windows 8). Below I outline serial debugging because this is the most common technique. In future articles I plan to discuss configuring the other methods.

Serial Connection Setup

A null-modem cable is a serial cable used to send data between two serial ports and it can be cheaply purchased at most electronics stores. Be aware these are different from standard serial cables because the transmit and receive lines are cross linked.

Plug the null-modem cable into a serial port on each of the computers. The serial port on the target computer must be built into the system, add on components such as PCI cards will not work for serial debugging on the target computer.

Target Computer setup

1. To enable debugging enter the following command from an elevated command prompt.

bcdedit /debug on

2. In most systems the default debug settings are sufficient. The current settings can be verified with the below command.

bcdedit /dbgsettings

3. Use the below command if you need to change the debug settings, where x is the number of the COM port connected to the null modem cable on the target machine and rate is the baud rate desired for debugging. The rate is usually 115200.

bcdedit /dbgsettings serial debugport:x baudrate:115200

5. Reboot the target computer.

Host Computer setup

1. First install the Windows Debugging Tools on the host computer. Navigate to the Windows Software Development Kit (SDK) currently located at this link http://msdn.microsoft.com/en-US/windows/hardware/hh852363 and choose the download option.

a. If you are not able to install the SDK on the host computer, the debugging tools for windows can be installed to a different system and the debugger directory can be copied to the host computer.

2. Click Next until you see the Select the features you want to install screen.

3. Select only the option named Debugging Tools for Windows and click the Install button. I typically install the tools to a directory named C:\debugger

4. After the Windows Debugging Tools are installed I set my symbol path by setting the environment variable _NT_SYMBOL_PATH. I recommend setting it to the public symbol server SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols. If you prefer, you can specify any path in place of ‘c:\localsymbols’.

5. Open the debugger by running windbg.exe from the c:\debugger folder.

6. On the File menu, choose Kernel Debug.

7. In the Kernel Debugging dialog box, open the COM tab.

8. In the Baud rate box, enter the same rate you selected for the Target Machine in the steps above. This is usually 115200.

9. In the Port box, enter COMx where x is the COM port connected to the null modem cable on this computer. In my example I plugged my null modem cable to com port 1 so I typed com1 in the field.

a. It is not necessary to use the same port number on both the target and the host. For example, it is possible to use com1 on the target and com2 on the host.

10. Click OK and you’ll receive a message indicating the Host computer is waiting to connect.

11. From the Debug menu select Break will cause the debugger to break into the target machine and give you the opportunity to debug your ailing production server. Good Luck!!

↧

Debugging a Debugger to Debug a Dump

February 27, 2013, 9:30 am

≫ Next: Understanding Pool Corruption Part 1 – Buffer Overflows

≪ Previous: How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

Recently I came across an instance where my debugger did not do what I wanted. Rarely do computers disobey me, but this one was unusually stubborn. There was no other option; I had to bend the debugger to my will.

There are many ways to make a computer program do what you want. If you have the source code you can rewrite and recompile the program. If you have a hex editor you can edit the code of the binary. A shim can be used to modify a program at runtime. In this instance I was in a hurry and I was ok with a temporary solution, so I used a debugger to change the execution of the debugger while it ran.

Debug a debugger? Can you do such a thing? Of course you can.

In this example a memory dump was captured of a system and I was asked to determine if the system had run out of desktop heap. Usually the !dskheap command can be used to determine how much heap has been used by each desktop. Unfortunately, this command failed me.

23: kd> !dskheap

Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

Failed to GetHeapInfo for desktop @fffffa8019c65c00

EnumDsktps failed on Winsta: 19c4f090FillWinstaArray failed

The error indicates that the command couldn’t read from the _HEAP structure at fffffa8019c65c00 for desktop fffffa8019c65c00. Further investigation found that reason I got this error is that the heap for the desktop in question is not valid memory. Because the memory is described by a prototype PTE I assume that the heap has not been initialized (Note: See Windows Internals’ Memory Management chapter for more information about proto PTEs).

23: kd> dt win32k!tagDESKTOP fffffa8019c65c00

+0x000 dwSessionId : 0

+0x008 pDeskInfo : 0xfffff900`c05e0a70 tagDESKTOPINFO

+0x010 pDispInfo : 0xfffff900`c0581e50 tagDISPLAYINFO

+0x018 rpdeskNext : 0xfffffa80`19c6ef20 tagDESKTOP

+0x020 rpwinstaParent : 0xfffffa80`19c4f090 tagWINDOWSTATION

+0x028 dwDTFlags : 0x110

+0x030 dwDesktopId : 0x19c65c00`00000003

+0x038 spmenuSys : (null)

+0x040 spmenuDialogSys : (null)

+0x048 spmenuHScroll : (null)

+0x050 spmenuVScroll : (null)

+0x058 spwndForeground : (null)

+0x060 spwndTray : (null)

+0x068 spwndMessage : 0xfffff900`c05e0d90 tagWND

+0x070 spwndTooltip : 0xfffff900`c05e0fa0 tagWND

+0x078 hsectionDesktop : 0xfffff8a0`00ef08e0 Void

+0x080 pheapDesktop : 0xfffff900`c05e0000 tagWIN32HEAP

+0x088 ulHeapSize : 0x18000

+0x090 cciConsole : _CONSOLE_CARET_INFO

+0x0a8 PtiList : _LIST_ENTRY [ 0xfffffa80`19c65ca8 - 0xfffffa80`19c65ca8 ]

+0x0b8 spwndTrack : (null)

+0x0c0 htEx : 0n0

+0x0c4 rcMouseHover : tagRECT

+0x0d4 dwMouseHoverTime : 0

+0x0d8 pMagInputTransform : (null)

23: kd> dd 0xfffff900`c05e0000

fffff900`c05e0000 ???????? ???????? ???????? ????????

fffff900`c05e0010 ???????? ???????? ???????? ????????

fffff900`c05e0020 ???????? ???????? ???????? ????????

fffff900`c05e0030 ???????? ???????? ???????? ????????

fffff900`c05e0040 ???????? ???????? ???????? ????????

fffff900`c05e0050 ???????? ???????? ???????? ????????

fffff900`c05e0060 ???????? ???????? ???????? ????????

fffff900`c05e0070 ???????? ???????? ???????? ????????

23: kd> !pte fffff900`c05e0000

VA fffff900c05e0000

PXE at FFFFF6FB7DBEDF90 PPE at FFFFF6FB7DBF2018 PDE at FFFFF6FB7E403010 PTE at FFFFF6FC80602F00

contains 000000076245C863 contains 0000000762569863 contains 000000045FA17863 contains F8A000F4F9780400

pfn 76245c ---DA--KWEV pfn 762569 ---DA--KWEV pfn45fa17 ---DA--KWEV not valid

Proto: FFFFF8A000F4F978

There are many desktops in this session and I wanted to know about the usage of the other desktops, but the !dskheap command stopped after just one error. I needed to force it to continue after this error, so I launched a debugger to debug my debugger. There is a command to do this, just run .dbgdbg.

23: kd> .dbgdbg

Debugger spawned, connect with

"-remotenpipe:icfenable,pipe=cdb_pipe,server=NINJA007"

For clarity I will call the original debugger where I ran !dskheap debugger1, and the new debugger spawned by .dbgdbg debugger2 .

Before switching to debugger2 I need to know what I am going to debug. The error message gives a hint about where to set a breakpoint, I am looking for a failure from GetHeapInfo.

23: kd> !dskheap

Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

Failed to GetHeapInfo for desktop @fffffa8019c65c00

EnumDsktps failed on Winsta: 19c4f090FillWinstaArray failed

I need to know which module GetHeapInfo is in, the .extmatch match command indicates which module contains the !dskheap command.

23: kd> .extmatch dskheap

!kdexts.dskheap

Switching to debugger2 I set a breakpoint on kdexts!GetHeapInfo. Use Ctrl+C to trigger a debug break in cdb (this is the same as a Ctrl+Break in windbg).

0:004> bp kdexts!GetHeapInfo

0:004> g

Switch back to debugger1 and run the !dskheap command.

23: kd> !dskheap

In debugger2 I have hit the breakpoint.

Breakpoint 0 hit

kdexts!GetHeapInfo:

000007f9`4237b9b0 4055 push rbp

The error says GetHeapInfo failed, so I am interested in what this function returns. To see what GetHeapInfo returns I go up one level in the stack and set a new breakpoint on the code just after it returns. This new breakpoint will also dump the return value of GetHeapInfo (return values are always in the rax register).

0:000> gu

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> r rax

rax=0000000000000000

0:000> bc *

0:000> bp 000007f9`4237b483 "r rax"

0:000> g

The next time the breakpoint hit the return value was 1, which in this instance means GetHeapInfo failed. This is where I exerted my control over the computer: I forced the return value to 0.

rax=0000000000000001

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> r rax=0

I ran through the other breakpoints and changed rax as necessary.

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000001

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> r rax=0

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

Everything was going well, until the computer defied me again. The !dskheap output computes the percentage of heap usage by dividing the bytes used by the size of the heap. Unfortunately, the size of the heap was left at 0 for the two heaps where I changed the return value. It is well known that only Chuck Norris can divide by zero; to prevent a roundhouse kick to your computer the processor generates an exception.

(2d0.928): Integer divide-by-zero - code c0000094 (first chance)

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> r r11

r11=0000000000000000

0:000> r rax

rax=0000000000000000

0:000> g

Fortunately debugger1 handles the divide by zero exception and it is easy to run !dskheap again.

23: kd> !dskheap

Back in debugger2 I set a new breakpoint on the div instruction that outputs the divisor. When the divisor (r11) is 0 I changed it to a non-zero value to avoid the wrath of Mr. Norris.

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> bp 000007f9`4237b90e

0:000> bp 000007f9`4237b90e "r r11"

breakpoint 1 redefined

0:000> g

rax=0000000000000001

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> r rax=0

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

rax=0000000000000001

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> r rax=0

0:000> g

rax=0000000000000000

kdexts!EnumDsktps+0x197:

000007f9`4237b483 4885c0 test rax,rax

0:000> g

r11=0000000000033333

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

r11=0000000000000000

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> r r11=1

0:000> g

r11=00000000000007ae

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

r11=0000000000013333

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

r11=0000000000013333

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

r11=0000000000013333

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

r11=0000000000000000

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> r r11=1

0:000> g

r11=0000000000013333

kdexts!DisplayInfo+0x2ee:

000007f9`4237b90e 49f7f3 div rax,r11

0:000> g

Finally, back in debugger1 I now have complete output for !dskheap. After a few strategic modifications of the program’s execution I got it to output the data I wanted. As it turns out we aren’t out of desktop heap after all.

23: kd> !dskheap

Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

Error Reading TotalFreeSize from nt!_HEAP @ fffffa801a53ea30

Winstation\Desktop Heap Size(KB) Used Rate(%)

------------------------------------------------------------

WinSta0\Default 20480 0%

WinSta0\Disconnect 0 0%

WinSta0\Winlogon 192 2%

Service-0x0-3e7$\Default 7680 1%

Service-0x0-3e4$\Default 7680 0%

Service-0x0-3e5$\Default 7680 0%

Service-0x0-26f46a$\Default 0 0%

Service-0x0-2706f2$\Default 7680 0%

------------------------------------------------------

Total Desktop: ( 51392 KB - 8 desktops)

Session ID: 0

============================================================

↧

Understanding Pool Corruption Part 1 – Buffer Overflows

June 14, 2013, 1:50 pm

≫ Next: Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

≪ Previous: Debugging a Debugger to Debug a Dump

Before we can discuss pool corruption we must understand what pool is. Pool is kernel mode memory used as a storage space for drivers. Pool is organized in a similar way to how you might use a notepad when taking notes from a lecture or a book. Some notes may be 1 line, others may be many lines. Many different notes are on the same page.

Memory is also organized into pages, typically a page of memory is 4KB. The Windows memory manager breaks up this 4KB page into smaller blocks. One block may be as small as 8 bytes or possibly much larger. Each of these blocks exists side by side with other blocks.

The !pool command can be used to see the pool blocks stored in a page.

kd> !pool fffffa8003f42000

Pool page fffffa8003f42000 region is Nonpaged pool

*fffffa8003f42000 size: 410 previous size: 0 (Free) *Irp

Pooltag Irp : Io, IRP packets

fffffa8003f42410 size: 40 previous size: 410 (Allocated) MmSe

fffffa8003f42450 size: 150 previous size: 40 (Allocated) File

fffffa8003f425a0 size: 80 previous size: 150 (Allocated) Even

fffffa8003f42620 size: c0 previous size: 80 (Allocated) EtwR

fffffa8003f426e0 size: d0 previous size: c0 (Allocated) CcBc

fffffa8003f427b0 size: d0 previous size: d0 (Allocated) CcBc

fffffa8003f42880 size: 20 previous size: d0 (Free) Free

fffffa8003f428a0 size: d0 previous size: 20 (Allocated) Wait

fffffa8003f42970 size: 80 previous size: d0 (Allocated) CM44

fffffa8003f429f0 size: 80 previous size: 80 (Allocated) Even

fffffa8003f42a70 size: 80 previous size: 80 (Allocated) Even

fffffa8003f42af0 size: d0 previous size: 80 (Allocated) Wait

fffffa8003f42bc0 size: 80 previous size: d0 (Allocated) CM44

fffffa8003f42c40 size: d0 previous size: 80 (Allocated) Wait

fffffa8003f42d10 size: 230 previous size: d0 (Allocated) ALPC

fffffa8003f42f40 size: c0 previous size: 230 (Allocated) EtwR

Because many pool allocations are stored in the same page, it is critical that every driver only use the space they have allocated. If DriverA uses more space than it allocated they will write into the next driver’s space (DriverB) and corrupt DriverB’s data. This overwrite into the next driver’s space is called a buffer overflow. Later either the memory manager or DriverB will attempt to use this corrupted memory and will encounter unexpected information. This unexpected information typically results in a blue screen.

The NotMyFault application from Sysinternals has an option to force a buffer overflow. This can be used to demonstrate pool corruption. Choosing the “Buffer overflow” option and clicking “Crash” will cause a buffer overflow in pool. The system may not immediately blue screen after clicking the Crash button. The system will remain stable until something attempts to use the corrupted memory. Using the system will often eventually result in a blue screen.

NotMyFault

Often pool corruption appears as a stop 0x19 BAD_POOL_HEADER or stop 0xC2 BAD_POOL_CALLER. These stop codes make it easy to determine that pool corruption is involved in the crash. However, the results of accessing unexpected memory can vary widely, as a result pool corruption can result in many different types of bugchecks.

As with any blue screen dump analysis the best place to start is with !analyze -v. This command will display the stop code and parameters, and do some basic interpretation of the crash.

kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)

An exception happened while executing a system service routine.

Arguments:

Arg1: 00000000c0000005, Exception code that caused the bugcheck

Arg2: fffff8009267244a, Address of the instruction which caused the bugcheck

Arg3: fffff88004763560, Address of the context record for the exception that caused the bugcheck

Arg4: 0000000000000000, zero.

In my example the bugcheck was a stop 0x3B SYSTEM_SERVICE_EXCEPTION. The first parameter of this stop code is c0000005, which is a status code for an access violation. An access violation is an attempt to access invalid memory (this error is not related to permissions). Status codes can be looked up in the WDK header ntstatus.h.

The !analyze -v command also provides a helpful shortcut to get into the context of the failure.

CONTEXT: fffff88004763560 -- (.cxr 0xfffff88004763560;r)

Running this command shows us the registers at the time of the crash.

kd> .cxr 0xfffff88004763560

rax=4f4f4f4f4f4f4f4f rbx=fffff80092690460 rcx=fffff800926fbc60

rdx=0000000000000000 rsi=0000000000001000 rdi=0000000000000000

rip=fffff8009267244a rsp=fffff88004763f60 rbp=fffff8009268fb40

r8=fffffa8001a1b820 r9=0000000000000001 r10=fffff800926fbc60

r11=0000000000000011 r12=0000000000000000 r13=fffff8009268fb48

r14=0000000000000012 r15=000000006374504d

iopl=0 nv up ei pl nz na po nc

cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206

nt!ExAllocatePoolWithTag+0x442:

fffff800`9267244a 4c8b4808 mov r9,qword ptr [rax+8] ds:002b:4f4f4f4f`4f4f4f57=????????????????

From the above output we can see that the crash occurred in ExAllocatePoolWithTag, which is a good indication that the crash is due to pool corruption. Often an engineer looking at a dump will stop at this point and conclude that a crash was caused by corruption, however we can go further.

The instruction that we failed on was dereferencing rax+8. The rax register contains 4f4f4f4f4f4f4f4f, which does not fit with the canonical form required for pointers on x64 systems. This tells us that the system crashed because the data in rax is expected to be a pointer but it is not one.

To determine why rax does not contain the expected data we must examine the instructions prior to where the failure occurred.

kd> ub .

nt!KzAcquireQueuedSpinLock [inlined in nt!ExAllocatePoolWithTag+0x421]:

fffff800`92672429 488d542440 lea rdx,[rsp+40h]

fffff800`9267242e 49875500 xchg rdx,qword ptr [r13]

fffff800`92672432 4885d2 test rdx,rdx

fffff800`92672435 0f85c3030000 jne nt!ExAllocatePoolWithTag+0x7ec (fffff800`926727fe)

fffff800`9267243b 48391b cmp qword ptr [rbx],rbx

fffff800`9267243e 0f8464060000 je nt!ExAllocatePoolWithTag+0xa94 (fffff800`92672aa8)

fffff800`92672444 4c8b03 mov r8,qword ptr [rbx]

fffff800`92672447 498b00 mov rax,qword ptr [r8]

The assembly shows that rax originated from the data pointed to by r8. The .cxr command we ran earlier shows that r8 is fffffa8001a1b820. If we examine the data at fffffa8001a1b820 we see that it matches the contents of rax, which confirms this memory is the source of the unexpected data in rax.

kd> dq fffffa8001a1b820 l1

fffffa80`01a1b820 4f4f4f4f`4f4f4f4f

To determine if this unexpected data is caused by pool corruption we can use the !pool command.

kd> !pool fffffa8001a1b820

Pool page fffffa8001a1b820 region is Nonpaged pool

fffffa8001a1b000 size: 810 previous size: 0 (Allocated) None

fffffa8001a1b810 doesn't look like a valid small pool allocation, checking to see

if the entire page is actually part of a large page allocation...

fffffa8001a1b810 is not a valid large pool allocation, checking large session pool...

fffffa8001a1b810 is freed (or corrupt) pool

Bad previous allocation size @fffffa8001a1b810, last size was 81

***

*** An error (or corruption) in the pool was detected;

*** Attempting to diagnose the problem.

***

*** Use !poolval fffffa8001a1b000 for more details.

Pool page [ fffffa8001a1b000 ] is __inVALID.

Analyzing linked list...

[ fffffa8001a1b000 --> fffffa8001a1b010 (size = 0x10 bytes)]: Corrupt region

Scanning for single bit errors...

None found

The above output does not look like the !pool command we used earlier. This output shows corruption to the pool header which prevented the command from walking the chain of allocations.

The above output shows that there is an allocation at fffffa8001a1b000 of size 810. If we look at this memory we should see a pool header. Instead what we see is a pattern of 4f4f4f4f`4f4f4f4f.

kd> dq fffffa8001a1b000 + 810

fffffa80`01a1b810 4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

fffffa80`01a1b820 4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

fffffa80`01a1b830 4f4f4f4f`4f4f4f4f 00574f4c`46524556

fffffa80`01a1b840 00000000`00000000 00000000`00000000

fffffa80`01a1b850 00000000`00000000 00000000`00000000

fffffa80`01a1b860 00000000`00000000 00000000`00000000

fffffa80`01a1b870 00000000`00000000 00000000`00000000

fffffa80`01a1b880 00000000`00000000 00000000`00000000

At this point we can be confident that the system crashed because of pool corruption.

Because the corruption occurred in the past, and a dump is a snapshot of the current state of the system, there is no concrete evidence to indicate how the memory came to be corrupted. It is possible the driver that allocated the pool block immediately preceding the corruption is the one that wrote to the wrong location and caused this corruption. This pool block is marked with the tag “None”, we can search for this tag in memory to determine which drivers use it.

kd> !for_each_module s -a @#Base @#End "None"

fffff800`92411bc2 4e 6f 6e 65 e9 45 04 26-00 90 90 90 90 90 90 90 None.E.&........

kd> u fffff800`92411bc2-1

nt!ExAllocatePool+0x1:

fffff800`92411bc1 b84e6f6e65 mov eax,656E6F4Eh

fffff800`92411bc6 e945042600 jmp nt!ExAllocatePoolWithTag (fffff800`92672010)

fffff800`92411bcb 90 nop

The file Pooltag.txt lists the pool tags used for pool allocations by kernel-mode components and drivers supplied with Windows, the associated file or component (if known), and the name of the component. Pooltag.txt is installed with Debugging Tools for Windows (in the triage folder) and with the Windows WDK (in \tools\other\platform\poolmon). Pooltag.txt shows the following for this tag:

None - <unknown> - call to ExAllocatePool

Unfortunately what we find is that this tag is used when a driver calls ExAllocatePool, which does not specify a tag. This does not allow us to determine what driver allocated the block prior to the corruption. Even if we could tie the tag back to a driver it may not be sufficient to conclude that the driver using this tag is the one that corrupted the memory.

The next step should be to enable special pool and hope to catch the corruptor in the act. We will discuss special pool in our next article.

↧

Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

August 22, 2013, 12:21 pm

≫ Next: Missing System Writer Case Explained

≪ Previous: Understanding Pool Corruption Part 1 – Buffer Overflows

In our previous article we discussed pool corruption that occurs when a driver writes too much data in a buffer. In this article we will discuss how special pool can help identify the driver that writes too much data.

Pool is typically organized to allow multiple drivers to store data in the same page of memory, as shown in Figure 1. By allowing multiple drivers to share the same page, pool provides for an efficient use of the available kernel memory space. However this sharing requires that each driver be careful in how it uses pool, any bugs where the driver uses pool improperly may corrupt the pool of other drivers and cause a crash.

Figure 1 – Uncorrupted Pool

With pool organized as shown in Figure 1, if DriverA allocates 100 bytes but writes 120 bytes it will overwrite the pool header and data stored by DriverB. In Part 1 we demonstrated this type of buffer overflow using NotMyFault, but we were not able to identify which code had corrupted the pool.

Figure 2 – Corrupted Pool

To catch the driver that corrupted pool we can use special pool. Special pool changes the organization of the pool so that each driver’s allocation is in a separate page of memory. This helps prevent drivers from accidentally writing to another driver’s memory. Special pool also configures the driver’s allocation at the end of the page and sets the next virtual page as a guard page by marking it as invalid. The guard page causes an attempt to write past the end of the allocation to result in an immediate bugcheck.

Special pool also fills the unused portion of the page with a repeating pattern, referred to as “slop bytes”. These slop bytes will be checked when the page is freed, if any errors are found in the pattern a bugcheck will be generated to indicate that the memory was corrupted. This type of corruption is not a buffer overflow, it may be an underflow or some other form of corruption.

Figure 3 – Special Pool

Because special pool stores each pool allocation in its own 4KB page, it causes an increase in memory usage. When special pool is enabled the memory manager will configure a limit of how much special pool may be allocated on the system, when this limit is reached the normal pools will be used instead. This limitation may be especially pronounced on 32-bit systems which have less kernel space than 64-bit systems.

Now that we have explained how special pool works, we should use it.

There are two methods to enable special pool. Driver verifier allows special pool to be enabled on specific drivers. The PoolTag registry value described in KB188831 allows special pool to be enabled for a particular pool tag. Starting in Windows Vista and Windows Server 2008, driver verifier captures additional information for special pool allocations so this is typically the recommended method.

To enable special pool using driver verifier use the following command line, or choose the option from the verifier GUI. Use the /driver flag to specify drivers you want to verify, this is the place to list drivers you suspect as the cause of the problem. You may want to verify drivers you have written and want to test or drivers you have recently updated on the system. In the command line below I am only verifying myfault.sys. A reboot is required to enable special pool.

verifier /flags 1 /driver myfault.sys

After enabling verifier and rebooting the system, repeat the activity that causes the crash. For some problems the activity may just be to wait for a period of time. For our demonstration we are running NotMyFault (see Part 1 for details).

The crash resulting from a buffer overflow in special pool will be a stop 0xD6, DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION.

kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6)

N bytes of memory was allocated and more than N bytes are being referenced.

This cannot be protected by try-except.

When possible, the guilty driver's name (Unicode string) is printed on

the bugcheck screen and saved in KiBugCheckDriver.

Arguments:

Arg1: fffff9800b5ff000, memory referenced

Arg2: 0000000000000001, value 0 = read operation, 1 = write operation

Arg3: fffff88004f834eb, if non-zero, the address which referenced memory.

Arg4: 0000000000000000, (reserved)

We can debug this crash and determine that notmyfault.sys wrote beyond its pool buffer.

The call stack shows that myfault.sys accessed invalid memory and this generated a page fault.

kd> k

Child-SP RetAddr Call Site

fffff880`04822658 fffff803`721333f1 nt!KeBugCheckEx

fffff880`04822660 fffff803`720acacb nt! ?? ::FNODOBFM::`string'+0x33c2b

fffff880`04822700 fffff803`7206feee nt!MmAccessFault+0x55b

fffff880`04822840 fffff880`04f834eb nt!KiPageFault+0x16e

fffff880`048229d0 fffff880`04f83727 myfault+0x14eb

fffff880`04822b20 fffff803`72658a4a myfault+0x1727

fffff880`04822b80 fffff803`724476c7 nt!IovCallDriver+0xba

fffff880`04822bd0 fffff803`7245c8a6 nt!IopXxxControlFile+0x7e5

fffff880`04822d60 fffff803`72071453 nt!NtDeviceIoControlFile+0x56

fffff880`04822dd0 000007fc`4fe22c5a nt!KiSystemServiceCopyEnd+0x13

00000000`004debb8 00000000`00000000 0x000007fc`4fe22c5a

The !pool command shows that the address being referenced by myfault.sys is special pool.

kd> !pool fffff9800b5ff000

Pool page fffff9800b5ff000 region is Special pool

fffff9800b5ff000: Unable to get contents of special pool block

The page table entry shows that the address is not valid. This is the guard page used by special pool to catch overruns.

kd> !pte fffff9800b5ff000

VA fffff9800b5ff000

PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E6002D0 PTE at FFFFF6FCC005AFF8

contains 0000000001B8F863 contains 000000000138E863 contains 000000001A6A1863 contains 0000000000000000

pfn 1b8f ---DA--KWEV pfn 138e ---DA--KWEV pfn 1a6a1 ---DA--KWEV not valid

The allocation prior to this memory is an 800 byte block of non paged pool tagged as “Wrap”. “Wrap” is the tag used by verifier when pool is allocated without a tag, it is the equivalent to the “None” tag we saw in Part 1.

kd> !pool fffff9800b5ff000-1000

Pool page fffff9800b5fe000 region is Special pool

*fffff9800b5fe000 size: 800 data: fffff9800b5fe800 (NonPaged) *Wrap

Owning component : Unknown (update pooltag.txt)

Special pool is an effective mechanism to track down buffer overflow pool corruption. It can also be used to catch other types of pool corruption which we will discuss in future articles.

↧

Missing System Writer Case Explained

August 27, 2013, 12:27 pm

≫ Next: ResAvail Pages and Working Sets

≪ Previous: Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

I worked on a case the other day where all I had was a procmon log and event logs to troubleshoot a problem where the System Writer did not appear in the VSSADMIN LIST WRITERS output. This might be review for the folks that know this component pretty well but I figured I would share how I did it for those that are not so familiar with the System Writer.

WHAT WE KNOW:

System State Backups fail
Running a VSS List Writers does not list the system writer

Looking at the event logs I found the error shown below. This error indicates there was a failure while “Writer Exposing its Metadata Context”. Each writer is responsible for providing a list of files, volumes, and other resources it is designed to protect. This list is called metadata and is formatted as XML. In the example we are working with the error is “Unexpected error calling routine XML document is too long”. While helpful, this message alone does not provide us with a clear reason why the XML document is too long.

Event Type: Error

Event Source: VSS

Event ID: 8193

Description: Volume Shadow Copy Service error: Unexpected error calling routine XML document is too long. hr = 0x80070018, The program issued a command but the command length is incorrect. . Operation: Writer Exposing its Metadata Context: Execution Context: Requestor Writer Instance ID: {636923A0-89C2-4823-ADEF-023A739B2515} Writer Class Id: {E8132975-6F93-4464-A53E-1050253AE220} Writer Name: System Writer

The second event that was logged was also not very helpful as it only indicates that the writer did have a failure. It looks like we are going to need to collect more data to figure this out.

Event Type: Error

Event Source: VSS

Event ID: 8193

Description: Volume Shadow Copy Service error: Unexpected error calling routine CreateVssExamineWriterMetadata. hr = 0x80042302, A Volume Shadow Copy Service component encountered an unexpected error. Check the Application event log for more information. . Operation: Writer Exposing its Metadata Context: Execution Context: Requestor Writer Instance ID: {636923A0-89C2-4823-ADEF-023A739B2515} Writer Class Id: {E8132975-6F93-4464-A53E-1050253AE220} Writer Name: System Writer

From the error above we learned that there was an issue with the metadata file for the System Writer. These errors are among some of the most common issues seen with this writer. There are some not so well documented limitations within the writer due to some hard set limits on path depth and the number of files in a given path. These limitations are frequently exposed by the C:\Windows\Microsoft.Net\ path. Often, this path is used by development software like Visual Studio as well as server applications like IIS. Below I have listed a few known issues that should help provide some scope when troubleshooting System Writer issues.

Known limitations and common points of failure:

More than 1,000 folders in a folder causes writer to fail during OnIdentify
More than 10,000 files in a folder causes writer to fail during OnIdentify (frequently C:\Windows\Microsoft.Net)
Permissions issues (frequently in C:\Windows\WinSXS and C:\Windows\Microsoft.Net)
Permissions issues with COM+ Event System Service
- This service needs to be running and needs to have Network Service with Service User Rights

What data can I capture to help me find where the issue is?

The best place to start is with a Process Monitor (Procmon) capture. To prepare for this capture you will need to download Process Monitor, open the Services MMC snap-in, as well as open an administrative command prompt which will be used in a later step of the process.

You should first stop the Cryptographic Services service using the Services MMC.

Once stopped you will want to open Procmon, note that by default Procmon will start capturing when opened. Now that you have Procmon open and capturing data you will start the cryptographic service. This will allow you to capture any errors during service initialization. Once the service is started you will use the command prompt opened earlier to run “vssadmin.exe list writers”. This will signal the writers on the system to capture their metadata, which is a common place we see failures with the System Writer. When the vssadmin command completes, stop the Procmon capture and save this data to disk.

Now that we have data how do we find the errors?

Open your newly created Procmon file. First, add a new filter for the path field that contains “VSS\Diag”.

We do this because this is the registry path that all writers will log to when entering and leaving major functions. Now that we have our filtered view we need to look for an entry from the System Writer. You can see the highlighted line below shows the “IDENTIFY” entry for the System Writer. From here we can ascertain the PID of the svchost.exe that the system writer is running in. We now want to include only this svchost. To accomplish this you can right click on the PID for the svchost.exe and select “Include ‘(PID)’”.

Now that we have found our System Writers svchost we will want to remove the filter for “VSS\Diag”; to do that you can return to the filter tool in Procmon and uncheck the box next to the entry.

We now have a complete view of what this service was doing at the time it started and completed the OnIdentify. Our next step is to locate the IDENTIFY (Leave) entry as this is often a great marker for where your next clue will be. While in most cases we can’t directly see the error the writer hit we can make some educated connections based on the common issues we spoke about above. If we take a look at the events that took place just before the IDENTIFY (Leave) we can see that we were working in the C:\Windows\Microsoft.NET\assembly\ directory. This is one of the paths that the System Writer is responsible for protecting. As mentioned above, there are some known limitations to the depth of paths and number of files in the “C:\Windows\Microsoft.NET” folder. This is a great example of that limitation as seen in our procmon capture. The example below shows the IDENTIFY (Leave) with the line before that being where the last work was taking place. Meaning this is what the writer was touching when it failed.

What does this tell us and what should we do next?

Given the known path limitations, we need to check out the number of files and folders in the C:\Windows\Microsoft.Net\ path and see where the bloat is. Some of these files can be safely removed, however only files located in the Temp locations (Temporary ASP.NET Files) are safe to delete.

Recently we released KB2807849 which addresses the issue shown above.

There are other possible causes of the event log errors mentioned above, such as issues with file permissions. For those problems follow the same steps as above and you are likely to see the IDENTIFY (Leave) just after file access error is displayed in your procmon log. For these failures you will need to investigate the permissions on the file we failed on. Likely the file is missing permissions for the writer’s service account Network Service or Local System. All that is needed here is to add back the missing permissions for your failed file.

While these issues can halt your nightly backups, it is often fairly easy to find the root cause. It just takes time and a little bit of experience with Process Monitor.

Good luck and successful backups!

↧

ResAvail Pages and Working Sets

September 4, 2013, 12:23 pm

≫ Next: Performance Monitor Averages, the Right Way and the Wrong Way

≪ Previous: Missing System Writer Case Explained

Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).

The problem statement was: The server hangs after a while.

Not terribly informative, but that's where we start with many cases. First some good housekeeping:

0: kd> vertarget

Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64

Product: Server, suite: TerminalServer SingleUserTS

Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533

Machine Name: "ASDFASDF1234"

Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670

Debug session time: Thu Aug 8 09:39:26.992 2013 (UTC - 4:00)

System Uptime: 9 days 1:08:39.307

Of course Windows 7 Server == Server 2008 R2.

One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.

0: kd> !vm 21

*** Virtual Memory Usage ***

Physical Memory: 2097038 ( 8388152 Kb)

Page File: \??\C:\pagefile.sys

Current: 12582912 Kb Free Space: 12539700 Kb

Minimum: 12582912 Kb Maximum: 12582912 Kb

Available Pages: 286693 ( 1146772 Kb)

ResAvail Pages: 135 ( 540 Kb)

********** Running out of physical memory **********

Locked IO Pages: 0 ( 0 Kb)

Free System PTEs: 33526408 ( 134105632 Kb)

******* 12 system cache map requests have failed ******

Modified Pages: 4017 ( 16068 Kb)

Modified PF Pages: 4017 ( 16068 Kb)

NonPagedPool Usage: 113241 ( 452964 Kb)

NonPagedPool Max: 1561592 ( 6246368 Kb)

PagedPool 0 Usage: 35325 ( 141300 Kb)

PagedPool 1 Usage: 28162 ( 112648 Kb)

PagedPool 2 Usage: 24351 ( 97404 Kb)

PagedPool 3 Usage: 24350 ( 97400 Kb)

PagedPool 4 Usage: 24516 ( 98064 Kb)

PagedPool Usage: 136704 ( 546816 Kb)

PagedPool Maximum: 33554432 ( 134217728 Kb)

********** 222 pool allocations have failed **********

Session Commit: 6013 ( 24052 Kb)

Shared Commit: 6150 ( 24600 Kb)

Special Pool: 0 ( 0 Kb)

Shared Process: 1214088 ( 4856352 Kb)

Pages For MDLs: 67 ( 268 Kb)

PagedPool Commit: 136768 ( 547072 Kb)

Driver Commit: 15548 ( 62192 Kb)

Committed pages: 1648790 ( 6595160 Kb)

Commit limit: 5242301 ( 20969204 Kb)

So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool. Let's look at the breakdown:

0: kd> dd nt!MmPoolFailures l?9

fffff800`01892160 000001be 00000000 0000000000000002

fffff800`01892170 00000000 0000000000000000 00000000

fffff800`01892180 00000000

Where:

yellow = Nonpaged high/medium/low priority failures

green = Paged high/medium/low priority failures

cyan = Session paged high/medium/low priority failures

So we actually failed both nonpaged AND paged pool allocations in this case. Why? We're "Running out of physical memory", obviously. So where does this running out of physical memory message come from? In the above example this is from the ResAvail Pages counter.

ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).

Where did this memory go then? We have plenty of Available Pages (Free + Zero + Standby) for use. So something is claiming memory it isn't actually using. In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.

So let's check.

0: kd> !process 0 1

<a lot of processes in this output>.

PROCESS fffffa8008f76060

SessionId: 0 Cid: 0adc Peb: 7fffffda000 ParentCid: 0678

DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.

Image: cscript.exe

VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

DeviceMap fffff8a000008a70

Token fffff8a0046f9c50

ElapsedTime 9 Days 01:08:00.134

UserTime 00:00:00.000

KernelTime 00:00:00.015

QuotaPoolUsage[PagedPool] 0

QuotaPoolUsage[NonPagedPool] 0

Working Set Sizes (now,min,max) (5, 50, 345) (20KB, 200KB, 1380KB)

PeakWorkingSetSize 1454

VirtualSize 65 Mb

PeakVirtualSize 84 Mb

PageFaultCount 1628

MemoryPriority BACKGROUND

BasePriority 8

CommitCharge 0

I have only shown one example process above for brevity's sake, but there were thousands returned. 241,423 to be precise. None had abnormally high process working set minimums, but cumulatively their usage adds up.

The “now” process working set is lower than the minimum working set. How is that possible? Well, the minimum and maximum are not hard limits, but suggested limits. For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value. There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.

You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure. Note these numbers are from another system as this structure was already torn down for the processes we were looking at.

0: kd> dt _EPROCESS fffffa8008f76060 Vm

nt!_EPROCESS

+0x398 Vm : _MMSUPPORT

0: kd> dt _MMSUPPORT fffffa8008f76060+0x398

nt!_MMSUPPORT

+0x000 WorkingSetMutex : _EX_PUSH_LOCK

+0x008 ExitGate : 0xfffff880`00961000 _KGATE

+0x010 AccessLog : (null)

+0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]

+0x028 AgeDistribution : [7] 0

+0x044 MinimumWorkingSetSize : 0x32

+0x048 WorkingSetSize : 5

+0x04c WorkingSetPrivateSize : 5

+0x050 MaximumWorkingSetSize : 0x159

+0x054 ChargedWslePages : 0

+0x058 ActualWslePages : 0

+0x05c WorkingSetSizeOverhead : 0

+0x060 PeakWorkingSetSize : 0x5ae

+0x064 HardFaultCount : 0x41

+0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL

+0x070 NextPageColor : 0x2dac

+0x072 LastTrimStamp : 0

+0x074 PageFaultCount : 0x65c

+0x078 RepurposeCount : 0x1e1

+0x07c Spare : [2] 0

+0x084 Flags : _MMSUPPORT_FLAGS

0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84

nt!_MMSUPPORT_FLAGS

+0x000 WorkingSetType : 0y000

+0x000 ModwriterAttached : 0y0

+0x000 TrimHard : 0y0

+0x000 MaximumWorkingSetHard : 0y0

+0x000 ForceTrim : 0y0

+0x000 MinimumWorkingSetHard : 0y0

+0x001 SessionMaster : 0y0

+0x001 TrimmerState : 0y00

+0x001 Reserved : 0y0

+0x001 PageStealers : 0y0000

+0x002 MemoryPriority : 0y00000000 (0)

+0x003 WsleDeleted : 0y1

+0x003 VmExiting : 0y1

+0x003 ExpansionFailed : 0y0

+0x003 Available : 0y00000 (0)

How about some more detail?

0: kd> !process fffffa8008f76060

PROCESS fffffa8008f76060

SessionId: 0 Cid: 0adc Peb: 7fffffda000 ParentCid: 0678

DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.

Image: cscript.exe

VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

DeviceMap fffff8a000008a70

Token fffff8a0046f9c50

ElapsedTime 9 Days 01:08:00.134

UserTime 00:00:00.000

KernelTime 00:00:00.015

QuotaPoolUsage[PagedPool] 0

QuotaPoolUsage[NonPagedPool] 0

Working Set Sizes (now,min,max) (5, 50, 345) (20KB, 200KB, 1380KB)

PeakWorkingSetSize 1454

VirtualSize 65 Mb

PeakVirtualSize 84 Mb

PageFaultCount 1628

MemoryPriority BACKGROUND

BasePriority 8

CommitCharge 0

No active threads

0: kd> !object fffffa8008f76060

Object: fffffa8008f76060 Type: (fffffa8006cccc90) Process

ObjectHeader: fffffa8008f76030 (new version)

HandleCount: 0 PointerCount: 1

The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released. Sampling other entries shows the server had been leaking process objects since it was booted.

Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route. We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed. Note: The driver names have been redacted.

0: kd> lm

start end module name

<snip>

fffff880`04112000 fffff880`04121e00 driver1 (no symbols) <-- no symbols usually means 3rd party

fffff880`04158000 fffff880`041a4c00 driver2 (no symbols)

<snip>

0: kd> lmvm driver1

Browse full module list

start end module name

fffff880`04112000 fffff880`04121e00 driver1 (no symbols)

Loaded symbol image file: driver1.sys

Image path: \SystemRoot\system32\DRIVERS\driver1.sys

Image name: driver1.sys

Browse all global symbols functions data

Timestamp: Wed Dec 13 12:09:32 2006 (458033CC)

CheckSum: 0001669E

ImageSize: 0000FE00

Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

0: kd> lmvm driver2

Browse full module list

start end module name

fffff880`04158000 fffff880`041a4c00 driver2 (no symbols)

Loaded symbol image file: driver2.sys

Image path: \??\C:\Windows\system32\drivers\driver2.sys

Image name: driver2.sys

Browse all global symbols functions data

Timestamp: Thu Nov 30 12:12:07 2006 (456F10E7)

CheckSum: 0004FE8E

ImageSize: 0004CC00

Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow. Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB. This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away. That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.

↧

Performance Monitor Averages, the Right Way and the Wrong Way

September 30, 2013, 7:09 am

≫ Next: Debugging a Generation 2 Virtual Machine

≪ Previous: ResAvail Pages and Working Sets

Performance Monitor (perfmon) is the preferred tool to measure the performance of Windows systems. The perfmon tool provides an analysis view with a chart and metrics of the Last, Average, Minimum, and Maximum values.

There are scenarios where the line in the chart is the most valuable piece of information, such as a memory leak. Other times we may not be looking for a trend, the Last, Average, Minimum, and Maximum metrics may be valuable. One example where the metrics are valuable is when evaluating average disk latency over a period of time. In this article we are going to use disk latency counters to illustrate how metrics are calculated for performance counters. The concepts we will illustrate with disk latency apply to all performance counters. This article will not be a deep dive into understanding disk latency, there are already many sources of information on that topic.

Most performance counter metrics are pretty straightforward. The minimum and maximum metrics are self-explanatory. The last metric is the last entry in the data. The metric that is confusing is the average. When calculating averages it is important to consider the cardinality of the data. This is especially important when working with data that is already an average, such as the Avg. Disk sec/Read counter which displays the average time per each read from a disk.

Perfmon logs are gathered at a specific time interval, such as every 15 seconds. At every interval the counters are read and an entry is written to the log. In this interval there may have been many reads from the disk, a few reads, or there may have been none. The number of reads performed is a critical aspect of the average calculation, this is the cardinality of the data.

Consider the following 10 entries in a perfmon log:

1 reads took 150ms

0 reads took 0 ms

Often, averages are calculated by adding a column of numbers and dividing by the number of entries. However this calculation does not work for the above data. If we simply add and divide we get an average latency of 15ms (150 / 10) per read, but this is clearly incorrect. There has been 1 read performed and it took 150ms, therefore the average latency is 150ms per read. Depending on the system configuration, an average read latency of less than 20ms may be considered fast and more than 20ms may be considered slow. If we perform the calculation incorrectly we may believe the disk is performing adequately while the correct calculation shows the disk is actually very slow.

What data is used to calculate averages?

Let’s take a look at the data perfmon is working with. Perfmon stores data in two different structures. Formatted values are stored as PDH_FMT_COUNTERVALUE. Raw values are stored as PDH_RAW_COUNTER.

Formatted values are just plain numbers. They contain only the result of calculating the average of one or more raw values, but not the raw data used to obtain that calculation. Data stored in a perfmon CSV or TSV file is already formatted, which means they contain a column of floating point numbers. If our previous example was stored in a CSV or TSV we would have the following data:

0.15

0.00

The above numbers contain no information about how many reads were performed over the course of this log. Therefore it is impossible to calculate an accurate average from these numbers. That is not to say CSV and TSV files are worthless, there are many performance scenarios (such as memory leaks) where the average is not important.

Raw counters contain the raw performance information, as delivered by the performance counter to pdh.dll. In the case of Avg. Disk sec/Read the FirstValue contains the total time for all reads and the SecondValue contains the total number of reads performed. This information can be used to calculate the average while taking into consideration the cardinality of the data.

Again using the above example, the raw data would look like this:

FirstValue: 0

SecondValue: 0

FirstValue: 2147727

SecondValue: 1

FirstValue: 2147727

SecondValue: 1

…

On first look the above raw data does not resemble our formatted data at all. In order to calculate the average we need to know what the correct algorithm is. The Avg. Disk sec/Read counter is of type PERF_AVERAGE_TIMER and the average calculation is ((N_x- N₀) / F) / (D_x- D₀). N refers to FirstValue in the raw counter data, F refers to the number of ticks per second, and D refers to SecondValue. Ticks per second can be obtained from the PerformanceFrequency parameter of KeQueryPerformanceCounter, in my example it is 14318180.

Using the algorithm for PERF_AVERAGE_TIMER the calculation for the formatted values would be:

((2147727 - 0) / 14318180) / (1 - 0) = 0.15

((2147727 - 2147727) / 14318180) / (1 - 1) = 0*

…

*If the denominator is 0 there is no new data and the result is 0.

Because the raw counter contains both the number of reads performed during each interval and the time it took for these reads to complete, we can accurately calculate the average for many entries.

If you’ve taken the time to read this far you may be wondering why I have taken the time to explain such a mundane topic. It is important to explain how this works because many performance tools are not using the correct average calculation and many users are trying to calculate averages using data that is not appropriate for such calculations (such as CSV and TSV files). Programmers should use PdhComputeCounterStatistics to calculate averages and should not sum and divide by the count or duplicate the calculations described in MSDN.

Recently we have found that under some conditions perfmon will use the incorrect algorithm to calculate averages. When reading from log files perfmon has been formatting the values, summing them, and dividing by the number of entries. This issue has been corrected in perfmon for Windows 8/Server 2012 with KB2877211 and for Windows 8.1/Server 2012 R2 as part of KB2883200. We recommend using these fixes when analyzing perfmon logs to determine the average of a performance counter. Note that KB2877211/KB2883200 only change the behavior when analyzing logs, there is no change when the data is collected. This means you can collect performance logs from any version of Windows and analyze them on a system with these fixes installed.

↧

Debugging a Generation 2 Virtual Machine

October 24, 2013, 11:11 am

≫ Next: Great power. Great responsibility.

≪ Previous: Performance Monitor Averages, the Right Way and the Wrong Way

Hyper-V is based on the 440BX (PCI) chipset for emulation. The decision to use this chipset started years ago with Connectix Virtual PC. The advantage of using an emulated chipset based on a popular motherboard like the 440BX, along with associated peripherals, is the compatibility with a large number of operating systems.

Windows Server 2012 R2 introduced the Generation 2 Virtual Machine. It is a UEFI based design, removing emulated devices and replacing them with synthetic devices. Generation 2 VMs no longer support the following devices:

Legacy BIOS
COM Ports
Floppy Controller
DMA Controller
i8042 keyboard controller
PS/2 devices
Legacy NIC
IDE Controller
S3 video
PCI BUS
Programmable Interrupt Controller
Programmable Interrupt Timer
Super I/O Device

After reading this list you might ask the question – how do I debug a Generation 2 VM?

The COM port is not actually removed from a Generation 2 VM. The port is turned off by default and not present in the user interface. To enable it for debugging use the following steps.

1. Shutdown the VM. You can verify the VM is off using the below PowerShell command.

2. Turn off secure boot using the following PowerShell Command.

set-vmfirmware

3. Set a COM port path using the following PowerShell command where the path is equal the named pipe.

set-vmcomport

4. To confirm the COM port settings after making the change, use the following command.

get-vmcomport

5. Restart the Virtual Machine using the following command.

Start-VM –Name VM2

6. Inside the guest VM, you can confirm that UEFI has been disabled with the following command. The results are False if UEFI was successfully disabled in step 2 above.

Confirm-SecureBootUEFI

7. Enable Kernel Debugging using BCDEdit.

BCDEdit /debug ON

8. Configure the debugger to connect to the pipe:

KernelDebug1

9. Connect the debugger and break in with Ctrl+Break:

KernelDebug2

↧

Great power. Great responsibility.

October 29, 2013, 9:35 am

≫ Next: We Are Hiring Windows Escalation Engineers in Charlotte and Issaquah

≪ Previous: Debugging a Generation 2 Virtual Machine

When it comes to the registry, administrators are given great power to manually configure Windows to suit their needs, but even slight, seemingly innocuous changes to a particular key or value can have a drastic impact on basic operations of the system, even affecting its ability to boot properly.

I recently had the pleasure of the debugging a black-screen system hang that occurred after applying security updates and rebooting. After ruling out any “low-hanging fruit” such as deadlocks on executive resources, resource depletion, etc., I decided to survey how far along the boot had gotten. In the output below we can tell that it’s fairly early in the boot process and that session zero is currently being setup.

When a new session is created, the “Session Leader”(i.e. smss.exe instance not associated with a particular session) launches a new instance of smss.exe, who is then tasked with ensuring that the Windows subsystem gets setup properly, which includes loading and initializing win32k.sys and launching csrss.exe.

3: kd> !process 0 0

**** NT ACTIVE PROCESS DUMP ****

PROCESS 8e282840 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000

DirBase: 00122000 ObjectTable: 97801e18 HandleCount: 554.

Image: System

PROCESS 94153ad8 SessionId: none Cid: 0390 Peb: 7ffdf000 ParentCid: 0004

DirBase: 03368020 ObjectTable: a13564f0 HandleCount: 19.

Image: smss.exe

PROCESS 92a55d90 SessionId: 0 Cid: 03c8 Peb: 7ffd9000 ParentCid: 0390

DirBase: 03368040 ObjectTable: a95606e0 HandleCount: 10.

Image: smss.exe

PROCESS 92a56c48 SessionId: 0 Cid: 03d4 Peb: 7ffd9000 ParentCid: 03c8

DirBase: 03368060 ObjectTable: a959ea28 HandleCount: 30.

Image: csrss.exe

So let’s dump out the threads for these session zero processes and see what they’re doing:

1. Notice how the Session Manager thread has been waiting for more than fifteen minutes for the Windows subsystem to load and initialize.

3: kd> !process /s 0 0 0x17

Searching processes with session id 0

**** NT ACTIVE PROCESS DUMP ****

PROCESS 92a55d90 SessionId: 0 Cid: 03c8 Peb: 7ffd9000 ParentCid: 0390

DirBase: 03368040 ObjectTable: a95606e0 HandleCount: 10.

Image: smss.exe

VadRoot 941e0578 Vads 8 Clone 0 Private 21. Modified 535. Locked 0.

DeviceMap 97808b98

Token a95767b8

ElapsedTime 00:15:32.714

UserTime 00:00:00.000

KernelTime 00:00:00.000

QuotaPoolUsage[PagedPool] 6952

QuotaPoolUsage[NonPagedPool] 384

Working Set Sizes (now,min,max) (125, 50, 345) (500KB, 200KB, 1380KB)

PeakWorkingSetSize 125

VirtualSize 2 Mb

PeakVirtualSize 4 Mb

PageFaultCount 120

MemoryPriority BACKGROUND

BasePriority 8

CommitCharge 29

THREAD 92a5d030 Cid 03c8.03cc Teb: 7ffdf000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable

941df930 SynchronizationEvent

92a56c48 ProcessObject

Not impersonating

DeviceMap 97808b98

Owning Process 92a55d90 Image: smss.exe

Attached Process N/A Image: N/A

Wait Start TickCount 1896 Ticks: 59778 (0:00:15:32.542)

Context Switch Count 94 IdealProcessor: 0

UserTime 00:00:00.000

KernelTime 00:00:00.046

Win32 Start Address smss!NtProcessStartupW (0x4857d9a2)

Stack Init 9dc89000 Current 9dc888c0 Base 9dc89000 Limit 9dc86000 Call 0

Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5

Kernel stack not resident.

ChildEBP RetAddr Args to Child

9dc888d8 81eb923a 92a5d030 97cb5120 92a5d0b8 nt!KiSwapContext+0x26

9dc8891c 81eb4bca 92a5d030 00000000 00000002 nt!KiSwapThread+0x44f

9dc88970 82040e83 00000002 9dc88aa8 00000001 nt!KeWaitForMultipleObjects+0x53d

9dc88bfc 82040bf2 00000002 00000001 00000000 nt!ObpWaitForMultipleObjects+0x256

9dc88d48 81e57c96 00000002 0008fb38 00000001 nt!NtWaitForMultipleObjects+0xcc

9dc88d48 778d5d14 00000002 0008fb38 00000001 nt!KiSystemServicePostCall

0008fac4 778d54a0 4857cc7e 00000002 0008fb38 ntdll!KiFastSystemCallRet

0008fac8 4857cc7e 00000002 0008fb38 00000001 ntdll!NtWaitForMultipleObjects+0xc

0008fb40 48579296 0008fb78 0008fb68 0008fbb0 smss!SmscpLoadSubSystem+0x9b

0008fb80 4857ca8a 0008fbb0 00000000 00000000 smss!SmpExecuteCommand+0x8d

0008fbc4 4857d0bc 00000000 00000000 00000000 smss!SmscpLoadSubSystemsForMuSession+0x182

0008fbe8 4857b678 00000003 002417d8 00000000 smss!SmscMain+0xc2

0008fc7c 4857d988 00000003 002417d8 002417e8 smss!wmain+0x50

0008fcc0 77886885 00241898 779bde2d 00000000 smss!NtProcessStartupW_AfterSecurityCookieInitialized+0x221

0008fd00 778b15d6 4857d9a2 7ffd9000 ffffffff ntdll!__RtlUserThreadStart+0x35

0008fd18 00000000 4857d9a2 7ffd9000 00000000 ntdll!_RtlUserThreadStart+0x1b

2. Also, we can see that there’s a single active thread within the csrss.exe process, which is a red flag because we know that csrss.exe hosts the Desktop Thread and Raw Input Thread, among others.

The user-mode portion of the Windows subsystem is implemented in csrss.exe and associated “ServerDlls” such as csrsrv.dll, winsrv.dll, basesrv.dll and, on Windows 7 and later, sxssrv.dll. Also, csrss.exe hosts the Desktop thread and Raw Input thread, whose primary functions include handling inputs from the various input devices.

PROCESS 92a56c48 SessionId: 0 Cid: 03d4 Peb: 7ffd9000 ParentCid: 03c8

DirBase: 03368060 ObjectTable: a959ea28 HandleCount: 30.

Image: csrss.exe

VadRoot 9391c128 Vads 33 Clone 0 Private 193. Modified 60. Locked 0.

DeviceMap 97808b98

Token a9598b30

ElapsedTime 00:15:32.558

UserTime 00:00:00.000

KernelTime 00:00:03.182

QuotaPoolUsage[PagedPool] 48312

QuotaPoolUsage[NonPagedPool] 1584

Working Set Sizes (now,min,max) (582, 50, 345) (2328KB, 200KB, 1380KB)

PeakWorkingSetSize 7285

VirtualSize 23 Mb

PeakVirtualSize 48 Mb

PageFaultCount 49628

MemoryPriority BACKGROUND

BasePriority 13

CommitCharge 248

THREAD 942c5590 Cid 03d4.03e4 Teb: 00000000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable

915e4078 NotificationEvent

Not impersonating

DeviceMap 97808b98

Owning Process 92a56c48 Image: csrss.exe

Attached Process N/A Image: N/A

Wait Start TickCount 2516 Ticks: 59158 (0:00:15:22.870)

Context Switch Count 1 IdealProcessor: 0

UserTime 00:00:00.000

KernelTime 00:00:00.000

Win32 Start Address ati2mtag!IRQMGR_WorkerThreadRoutine (0xa1ccf340)

Stack Init 9dcfd000 Current 9dcfcc30 Base 9dcfd000 Limit 9dcfa000 Call 0

Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5

ChildEBP RetAddr Args to Child

9dcfcc48 81eb923a 942c5590 942c5618 00000000 nt!KiSwapContext+0x26

9dcfcc8c 81e54f38 942c5590 00000000 942c5590 nt!KiSwapThread+0x44f

9dcfcce4 a1d78724 915e4078 00000000 00000000 nt!KeWaitForSingleObject+0x492

9dcfcd00 a1c13340 908be398 915e4070 00000000 VIDEOPRT!VideoPortWaitForSingleObject+0x53

9dcfcd14 a1cce17f 908be398 915e4070 00000000 ati2mtag!IRQMgrMP_WaitForSingleObject+0x20

9dcfcd6c a1ccf355 93f45000 93f45000 9dcfcdc0 ati2mtag!PassiveRing_WorkerThreadRoutine+0x6f

9dcfcd7c 81fe301c 93f45000 ad8fc28d 00000000 ati2mtag!IRQMGR_WorkerThreadRoutine+0x15

9dcfcdc0 81e4beee a1ccf340 93f45000 00000000 nt!PspSystemThreadStartup+0x9d

00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

3. Having seen this, we now know why the system is perpetually hung: Csrss.exe is not running properly. Because there is a video driver worker thread running, but the Desktop Thread and Raw Input Thread are not running, it appears that csrss has attempted to terminate. The termination has not completed because of the video-driver worker thread performing a non-alertable wait.

4. Next, we need to check for any state in the dump that might tell us why csrss.exe attempted to terminate:

3: kd> dt nt!eprocess 92a56c48 LastThreadExitStatus

+0x184 LastThreadExitStatus : 0n-1073741619

3: kd> !error 0n-1073741619

Error code: (NTSTATUS) 0xc00000cd (3221225677) - The name limit for the local computer network adapter card was exceeded.

After a quick search for STATUS_TOO_MANY_NAMES (0xc00000cd) through the source code, I was able to theorize that csrss.exe may have attempted the termination due to invalid command-line parameters.

3: kd> vertarget

Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (16 procs) Free x86 compatible

Product: Server, suite: Enterprise TerminalServer SingleUserTS

Built by: 6002.18881.x86fre.vistasp2_gdr.130707-1535

Machine Name:

Kernel base = 0x81e0d000 PsLoadedModuleList = 0x81f24c70

Debug session time: Fri Oct 25 05:10:34.030 2013 (UTC - 5:00)

System Uptime: 0 days 0:16:02.134

3: kd> .process /p /r 92a56c48

Implicit process is now 92a56c48

Loading User Symbols

.............

3: kd> !peb

PEB at 7ffd9000

…

CommandLine: 'C:\Windows\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,1024 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16'

Sure enough, there was additional command-line parameter that was not recognized on Vista/Windows Server 2008 SP2 (supported only on Windows 7 and later). Once the invalid command-line parameter was removed, the server was able to boot normally again.

So how did the invalid value get there? It turns out that a logon script was setting the following registry value using an export from a Windows 7/Windows 2008 R2 machine where ServerDll=sxssrv,4 is a valid value.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems

Name: Windows

Type: REG_EXPAND_SZ

Well, that concludes today’s segment, but in the timeless words of Uncle Ben remember “with great power comes great responsibility.” As we just saw, this applies not only to those possessing a spider-sense, but also to Windows administrators. J

Until next time, happy debugging!

↧

We Are Hiring Windows Escalation Engineers in Charlotte and Issaquah

October 30, 2013, 2:41 pm

≫ Next: The Compiler Did What?

≪ Previous: Great power. Great responsibility.

Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?

As a trusted advisor to our top customers you will be working with to the most experienced IT professionals and developers in the industry. You will influence our product teams in sustained engineering efforts to drive improvements in our products.

This role involves deep analysis of product source code and debugging to solve problems in multi-million dollar configurations and will give you an opportunity to stretch your critical thinking skills. During the course of debugging, you will uncover opportunities to improve the customer experience while influencing the current and future design of our products.

In addition to providing support to customers while being the primary interface to our sustained engineering teams, you will also have the opportunity to work with new technologies and unreleased software. Through our continuous investment in depth training and hands-on experience with tough customer challenges you will become the world’s best in this area. Expect to partner with many various roles at Microsoft launching a very successful career!

We have positions open at our sites in Charlotte, NC USA; and Issaquah, WA USA.

Learn more about what an Escalation Engineer does at:

Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?

Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey

Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik

Apply here:

Charlotte: http://www.microsoft-careers.com/job/Charlotte-Escalation-Engineer-Job-NC-28201/23321500/

Issaquah: https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=8&jid=122974&jlang=EN&pp=SS

↧

The Compiler Did What?

November 7, 2013, 4:07 pm

≫ Next: Understanding ARM Assembly Part 1

≪ Previous: We Are Hiring Windows Escalation Engineers in Charlotte and Issaquah

I was recently investigating a crash in an application. As I researched the issue I found a very old defect in the code that was only recently being exposed by the compiler.

The crash occurred at the below instruction because the ebx register does not hold a valid pointer.

0:001> r

eax=d9050cf7 ebx=003078c0 ecx=6e2e0000 edx=00000000 esi=00000001 edi=0c334468

eip=65637fbe esp=010eb408 ebp=010eb878 iopl=0 nv up ei pl nz na po nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

riched20!CTxtSelection::CreateCaret+0x429:

65637fbe 8b4b1c mov ecx,dword ptr [ebx+1Ch] ds:002b:003078dc=????????

0:001> dd 003078c0

003078c0 ???????? ???????? ???????? ????????

003078d0 ???????? ???????? ???????? ????????

003078e0 ???????? ???????? ???????? ????????

003078f0 ???????? ???????? ???????? ????????

00307900 ???????? ???????? ???????? ????????

00307910 ???????? ???????? ???????? ????????

00307920 ???????? ???????? ???????? ????????

00307930 ???????? ???????? ???????? ????????

Examining the assembly leading up to the crash, ebx came from [ebp-40c].

0:001> ub .

riched20!CTxtSelection::CreateCaret+0x408:

65637f9d 6a08 push 8

65637f9f ff156cf06465 call dword ptr [riched20!_imp__CreateBitmap (6564f06c)]

65637fa5 898784000000 mov dword ptr [edi+84h],eax

65637fab eb06 jmp riched20!CTxtSelection::CreateCaret+0x41e (65637fb3)

65637fad 8bb5e4fbffff mov esi,dword ptr [ebp-41Ch]

65637fb3 8b9df4fbffff mov ebx,dword ptr [ebp-40Ch]

65637fb9 ff775c push dword ptr [edi+5Ch]

65637fbc 6a01 push 1

0:001> dd @ebp-40c l1

010eb46c 003078c0

Looking at the whole function, [ebp-40c] was populated at the beginning of the function as the contents of edi+1C. The contents of edi+1Ch were first moved into ecx and later the value of ecx was moved into [ebp-40Ch]. Further examination of the whole function showed the edi register is unchanged at the time of the crash, so I can use its current value to determine what [ebp-40c] should contain.

0:001> uf riched20!CTxtSelection::CreateCaret

riched20!CTxtSelection::CreateCaret:

65637b95 8bff mov edi,edi

65637b97 55 push ebp

65637b98 8bec mov ebp,esp

65637b9a 81ec5c040000 sub esp,45Ch

65637ba0 a100e06465 mov eax,dword ptr [riched20!__security_cookie (6564e000)]

65637ba5 33c5 xor eax,ebp

65637ba7 8945fc mov dword ptr [ebp-4],eax

65637baa 53 push ebx

65637bab 56 push esi

65637bac 57 push edi

65637bad 8bf9 mov edi,ecx

65637baf 8b4f1c mov ecx,dword ptr [edi+1Ch]<<< The value originates from [edi+1Ch]

65637bb2 0fbf4740 movsx eax,word ptr [edi+40h]

65637bb6 898df4fbffff mov dword ptr [ebp-40Ch],ecx<<< Store the value on the stack

<snip>

65637fb3 8b9df4fbffff mov ebx,dword ptr [ebp-40Ch]<<< Read the value from the stack

<snip>

65637fbe 8b4b1c mov ecx,dword ptr [ebx+1Ch]<<< Crash here because ebx is invalid

<snip>

The expected value of [ebp-40C], and thus the expected value of the ebx register, is 091978c0 based on the value in [edi+1Ch] at the time of the crash. This would be a valid pointer and is not what is currently in [ebp-40C] or ebx. It is noteworthy that at the time of the crash, ebx is similar to what should be there, it differs only by the high word of the dword.

0:001> r ebx

ebx=003078c0

0:001> dd @edi+1c l1

0c334484 091978c0

The expected value, 091978c0, is a valid pointer.

0:001> dd 091978c0

091978c0 091978c8 00000000 00000501 05000000

091978d0 00000015 076c1a27 2a372f35 0c2e3998

091978e0 000049aa 00000000 00000000 00000000

091978f0 00000000 00000000 00000000 00000000

09197900 00000000 00000000 00000000 00000000

09197910 00000000 00000000 00000000 00000000

09197920 1a3098a8 00000000 00000000 00000000

09197930 00000000 00000000 00000000 00000000

Somehow the value at ebp-40C was changed between instruction 65637bb6, where [ebp-40C] was set, and instruction 65637fb3 where [ebp-40C] was read. Fortunately I had a mechanism to reproduce this crash so I was able to set a breakpoint and trace through how this happened.

First I set a breakpoint on the instruction that populates [ebp-40C].

0:003> bp 65637bb6

0:003> g

Breakpoint 0 hit

eax=ffffffff ebx=0c334468 ecx=091978c0 edx=00000060 esi=091978c0 edi=0c334468

eip=65637bb6 esp=010eb410 ebp=010eb878 iopl=0 nv up ei pl nz na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206

riched20!CTxtSelection::CreateCaret+0x21:

65637bb6 898df4fbffff mov dword ptr [ebp-40Ch],ecx ss:002b:010eb46c=00000000

Next I calculated ebp-40C and set a break on write access breakpoint.

0:001> ?@ebp-40c

Evaluate expression: 17740908 = 010eb46c

0:001> ba w4 010eb46c

0:001> g

Breakpoint 1 hit

eax=00000030 ebx=00000000 ecx=00000000 edx=00000020 esi=00000001 edi=0c334468

eip=65637f67 esp=010eb40c ebp=010eb878 iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

riched20!CTxtSelection::CreateCaret+0x3d2:

65637f67 66898475f4fbffff mov word ptr [ebp+esi*2-40Ch],ax ss:002b:010eb46e=0919

The write breakpoint hit at a location I was not expecting. The instruction where the breakpoint hit is not modifying the variable that was stored at [ebp-40C].

Although I cannot share the Windows source code on this blog, the code in question roughly resembles the below example. Note that a proficient assembly language reader could figure out the code flow, this example is not sharing any magic.

Struct1* p1;

WORD array[512];

…

p1 = GetStruct1();

…

array[i-2] = 0x30;

…

p1->p = variable2; // Crash here because p1 is not a valid pointer

…

We are crashing because p1 is not a valid pointer. The high word of p1 is being overwritten as 0030 by the line “array[i-2] = 0x30;” because i is 1, leading to an underflow of the array. This underflow is corrupting the pointer in p1.

0:001> r ebx

ebx=003078c0

Clearly there is a defect in the above code. If it is legitimate for i to be 1 (and it is), then a check must be made to prevent an underflow of the array. However further research found that this code has been consistent for many years and many releases of the product. Why is this suddenly crashing now? As the bank robber in Dirty Harry said, “I gots to know."

In the above assembly we calculate that “array” starts at ebp-408 (assuming i is always 2 or greater, 2*2-40c is -408). In the earlier assembly we see that p1 is placed at ebp-40c. In this configuration an underflow of “array” will always corrupt p1.

Examining the assembly on a system that does not crash, I found that the local variables are stored differently in a different version of this binary. In the beginning of the function we see that p1 is stored in ebx. In this version of the binary ebx is never stored on the stack, so it cannot be corrupted by an underflow.

0:000> uf riched20!CTxtSelection::CreateCaret

riched20!CTxtSelection::CreateCaret:

74e75c53 8bff mov edi,edi

74e75c55 55 push ebp

74e75c56 8bec mov ebp,esp

74e75c58 81ec58040000 sub esp,458h

74e75c5e a19010e974 mov eax,dword ptr [riched20!__security_cookie (74e91090)]

74e75c63 53 push ebx

74e75c64 56 push esi

74e75c65 8bf1 mov esi,ecx

74e75c67 8b5e1c mov ebx,dword ptr [esi+1Ch]

The code that populates array[i-2] with 0x30 is later in the function. In this version, array is stored at ebp-404. If there is an underflow it will corrupt ebp-408.

riched20!CTxtSelection::CreateCaret+0x3e1:

74e76034 66c7847df8fbffff3000 mov word ptr [ebp+edi*2-408h],30h

The value stored at ebp-408 is used in several places in this function, however it is never used after instruction 74e76034 executes. This means any underflow in the array only corrupts memory that is not used after the corruption, and as a result the corruption never results in a crash. Although this defect has existed for a long time, the compiler has protected us until now.

74e75d3f 0b85f8fbffff or eax,dword ptr [ebp-408h]

…

74e75e51 ffb5f8fbffff push dword ptr [ebp-408h]

…

74e75e8a 8b8df8fbffff mov ecx,dword ptr [ebp-408h]

…

74e75f20 398df8fbffff cmp dword ptr [ebp-408h],ecx

…

74e75fec 8b85f8fbffff mov eax,dword ptr [ebp-408h]

The issue discussed in this article was addressed as part of KB2883200.

↧

Understanding ARM Assembly Part 1

November 22, 2013, 3:38 pm

≫ Next: Event ID 157 "Disk # has been surprise removed"

≪ Previous: The Compiler Did What?

My name is Marion Cole, and I am a Sr. EE in Microsoft Platforms Serviceability group. You may be wondering why Microsoft support would need to know ARM assembly. Doesn’t Windows only run on x86 and x64 machines? No. Windows has ran on a variety of processors in the past. Those include i860, Alpha, MIPS, Fairchild Clipper, PowerPC, Itanium, SPARC, 286, 386, IA-32, x86, x64, and the newest one is ARM. Most of these processors are antiquated now. The common ones now are IA-32, x86, x64. However Windows has started supporting ARM processors in order to jump into the portable devices arena. You will find them in the Microsoft Surface RT, Windows Phones, and other things in the future I am sure. So you may be saying that these devices are locked, and cannot be debugged. That is true from a live debug perspective, but you can get memory dumps and application dumps from them and those can be debugged.

Processor

There are limitations on ARM processors that Windows supports. There are 3 System on Chip (SOC) vendors that are supported. nVidia, Texas-Instruments, and Qualcomm. Windows only supports the ARMv7 (Cortex, Scorpion) architecture in ARMv7-A in (Application Profile) mode. This implements a traditional ARM architecture with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based on an MMU. It supports the ARM and Thumb-2 instruction sets which allows for a mixture of 16 (Thumb) and 32 (ARM) bit opcodes. So it will look strange in the assembly. Luckily the debuggers know this and handle it for you. This also helps to shrink the size of the assembly code in memory. The processor also has to have the Optional ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture).

In order to understand the assembly that you will see you need to understand the processor internals.

ARM is a Reduced Instruction Set Computer (RISC) much like some of the previous processors that Windows ran on. It is a 32 bit load/store style processor. It has a “Weakly-ordered” memory model: similar to Alpha and IA64, and it requires specific memory barriers to enforce ordering. In ARM devices these as ISB, DSB, and DMB instructions.

Registers

The processor has 16 available registers r0 – r15.

0: kd> r

r0=00000001 r1=00000000 r2=00000000 r3=00000000 r4=e1820044 r5=e17d0580

r6=00000001 r7=e17f89b9 r8=00000002 r9=00000000 r10=1afc38ec r11=e1263b78

r12=e127813c sp=e1263b20 lr=e16c12c3 pc=e178b6d0 psr=00000173 ----- Thumb

r0, r1, r2, r3, and r12 are volatile registers. Volatile registers are scratch registers presumed by the caller to be destroyed across a call. Nonvolatile registers are required to retain their values across a function call and must be saved by the callee if used.

On Windows four of these registers have a designated purpose. Those are:

PC (r15) – Program Counter (EIP on x86)
LR (r14) – Link Register. Used as a return address to the caller.
SP (r13) – Stack Pointer (ESP on x86).
R11 – Frame Pointer (EBP on x86).
CPSR – Current Program Status Register (Flags on x86).

In Windbg all but r11 will be labeled appropriately for you. So you may be asking why r11 is not labeled “fp” in the debugger. That is because r11 is only used as a frame pointer when you are calling a non-leaf subroutine. The way it works is this: when a call to a non-leaf subroutine is made, the called subroutine pushes the value of the previous frame pointer (in r11) to the stack (right after the lr) and then r11 is set to point to this location in the stack, so eventually we end up with a linked list of frame pointers in the stack that easily enables the construction of the call stack. The frame pointer is not pushed to the stack in leaf functions. Will discuss leaf functions later.

CPSR (Current Program Status Register)

Now we need to understand some about the CPSR register. Here is the bit breakdown:

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
N	Z	C	V	Q	IT		J	Reserved				GE				IT						E	A	I	F	T	M

Bits [31:28] – Condition Code Flags

N – bit 31 – If this bit is set, the result was negative. If bit is cleared the result was positive or zero.
Z – bit 30 – If set this bit indicates the result was zero or values compared were equal. If it is cleared, the value is non-zero or the compared values are not equal.
C – bit 29 – If this bit is set the instruction resulted in a carry condition. E.g. Adding two unsigned values resulted in a value too large to be strored.
V – bit 28 – If this bit is set then the instruction resulted in an overflow condition. E.g. An overflow of adding two signed values.

Instructions variants ending with ‘s’ set the condition codes (mov/movs)
E – bit 9 – Endianness (big = 1/Little = 0)
T – bit 5 – Set if executing Thumb instructions
M – bits [4:0] – CPU Mode (User 10000/Supervisor 10011)

So why do I need to know about the CPSR (Current Program Status Register)? You will need to know where some of these bits are due to how some of the assembly instruction affect these flags. Example of this is:

ADD will add two registers together, or add an immediate value to a register. However it will not affect the flags.

ADDS will do the same as ADD, but it does affect the flags.

MOV will allow you to move a value into a register, and a value between registers. This is not like the x86/x64. MOV will not let you read or write to memory. This does not affect the flags.

MOVS does the same thing as MOV, but it does affect the flags.

I hope you are seeing a trend here. There are instructions that will look the same. However if they end in “S” then you need to know that this will affect the flags. I am not going to list all of those assembly instructions here. Those are already listed in the ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b/index.html.

So now we have an idea of what can set the flags. Now we need to understand what the flags are used for. They are mainly used for branching instructions. Here is an example:

003a11d2 429a cmp r2,r3

003a11d4 d104 bne |MyApp!FirstFunc+0x28 (003a11e0)|

The first instruction in this code (cmp) compares the value stored in register r2 to the value stored in register r3. This comparison instruction sets or resets the Z flag in the CPSR register. The second instruction is a branch instruction (b) with the condition code ne which means that if the result of the previous comparison was that the values are not equal (the CPSR flag Z is zero) then branch to the address MyApp!FirstFunc+0x28 (003a11e0). Otherwise the execution continues.

There are a few compare instructions. “cmp” subtracts two register values, sets the flags, and discards the result. “cmn” adds two register values, sets the flags, and discards the results. “tst” does a bit wise AND of two register values, sets the flags, and discards the results. There is even an If Then (it) instruction. I am not going to discuss that one here as I have never seen it in any of the Windows code.

So is “bne” the only branch instruction? No. There is a lot of them. Here is a table of things that can be seen beside “b”, and what they check the CPSR register:

Mnemonic Extension	Meaning (Integer)	Condition Flags (in CPSR)
EQ	Equal	Z==1
NE	Not Equal	Z==0
MI	Negative (Minus)	N==1
PL	Positive or Zero (Plus)	N==0
HI	Unsigned higher	C==1 and Z==0
LS	Unsigned lower or same	C==0 or Z==1
GE	Signed greater than or equal	N==V
LT	Signed less than	N!=V
GT	Signed greater than	Z==0 and N==V
LE	Signed less than or equal	Z==1 or N!=V
VS	Overflow	V==1
VC	No overflow	V==0
CS	Carry set	C==1
CC	Carry clear	C==0
None (AL)	Execute always

Floating Point Registers

As mentioned earlier the processor also has to have the ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture). Here is what they are.

Floating Point

As you can see this is 16 – 64bit regiters (d0-d15) that is overlaid with 32 – 32bit registers (s0-s31). There are varieties of the ARM processor that has 32 – 64bit registers and 64 – 32bit registers. Windows 8 will support both 16 and 32 register variants. You have to be careful when using these, because if you access unaligned floats you may cause an exception.

SIMD (NEON)

As you can see here the SIMD (NEON) extension adds 16 – 128 bit registers (q0-q15) onto the floating point registers. So if you reference Q0 it is the same as referencing D0-D1 or S0-S1-S2-S3.

In part 2 we will discuss how Windows utilizes this processor.

↧

Event ID 157 "Disk # has been surprise removed"

December 27, 2013, 1:57 pm

≫ Next: Understanding Pool Corruption Part 3 – Special Pool for Double Frees

≪ Previous: Understanding ARM Assembly Part 1

Hello my name is Bob Golding and I would like to share information on a new error you may see in the system event log. It is Event ID 157 "Disk <n> has been surprise removed" with Source: disk. This error indicates that the CLASSPNP driver has received a “surprise removal” request from the plug and play manager (PNP) for a non-removable disk.

What does this error mean?

The PNP manager does what is called enumerations. An enumeration is a request sent to a driver that controls a bus, such as PCI, to take an inventory of devices on the bus and report back a list of the devices. The SCSI bus is enumerated in a similar manner, as are devices on the IDE bus.

These enumerations can happen for a number of reasons. For example, hardware can request an enumeration when it detects a change in configuration. Also a user can initiate an enumeration by selecting “scan for new devices” in device manager.

When an enumeration request is received, the bus driver will rescan the bus for all devices. It will issue commands to the existing devices as though it was looking for new ones. If these commands fail on an existing unit, the driver will mark the device as “missing”. When the device is marked “missing”, it will not be reported back to PNP in the inventory. When PNP determines that the device is not in the inventory it will send a surprise removal request to the bus driver so the bus driver can remove the device object.

Since the CLASSPNP driver sits in the device stack and receives requests that are destined for disks, it sees the surprise removal request and logs an event if the disk is supposed to be non-removable. An example of a non-removable disk is a hard drive on a SCSI or IDE bus. An example of a removable disk is a USB thumb drive.

Previously nothing was logged when a non-removable disk was removed, as a result disks would disappear from the system with no indication. The event id 157 error was implemented in Windows 8.1 and Windows Server 2012 R2 to log a record of a disk disappearing.

Why does this error happen?

These errors are most often caused when something disrupts the system’s communication with a disk, such as a SAN fabric error or a SCSI bus problem. The errors can also be caused by a disk that fails, or when a user unplugs a disk while the system is running. An administrator that sees these errors needs to verify the heath of the disk subsystem.

Event ID 157 Example:

↧

Understanding Pool Corruption Part 3 – Special Pool for Double Frees

December 31, 2013, 2:37 pm

≫ Next: Debugging a Windows 8.1 Store App Crash Dump

≪ Previous: Event ID 157 "Disk # has been surprise removed"

In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption. In today’s article we will use special pool to catch a double free of pool memory.

A double free of pool will cause a system to blue screen, however the resulting crash may vary. In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”. If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.

BAD_POOL_CALLER (c2)

The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.

Arguments:

Arg1: 0000000000000007, Attempt to free pool which was already freed

Arg2: 00000000000011c1, (reserved)

Arg3: 0000000004810007, Memory contents of the pool block

Arg4: fffffa8001b10800, Address of the block of pool being deallocated

A less obvious crash would be if the pool has been reallocated. As we showed in Part 2, pool is structured so that multiple drivers share a page. When DriverA calls ExFreePool to free its pool block the block is made available for other drivers. If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data. Such a problem may be difficult for the developer of DriverF to identify without special pool.

Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2). When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck. Additionally, special pool will place this page on the tail of the list of pages to be used again. This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.

To demonstrate this failure we will once again use the Sysinternals tool NotMyFault. Choose the “Double free” option and click “Crash”. Most likely you will get the stop C2 bugcheck mentioned above. Enable special pool and reboot to get a more informative error.

verifier /flags 1 /driver myfault.sys

Choosing the “Double free” option with special pool enabled resulted in the following crash. The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid. This invalid memory was the freed special pool page.

PAGE_FAULT_IN_NONPAGED_AREA (50)

Invalid system memory was referenced. This cannot be protected by try-except,

it must be protected by a Probe. Typically the address is just plain bad or it

is pointing at freed memory.

Arguments:

Arg1: fffff9800a7fe7f0, memory referenced.

Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.

Arg3: fffff80060263888, If non-zero, the instruction address which referenced the bad memory address.

Arg4: 0000000000000002, (reserved)

Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.

kd> kn

# Child-SP RetAddr Call Site

00 fffff880`0419fe28 fffff800`5fd7e28a nt!DbgBreakPointWithStatus

01 fffff880`0419fe30 fffff800`5fd7d8de nt!KiBugCheckDebugBreak+0x12

02 fffff880`0419fe90 fffff800`5fc5b544 nt!KeBugCheck2+0x79f

03 fffff880`041a05b0 fffff800`5fd1c5bc nt!KeBugCheckEx+0x104

04 fffff880`041a05f0 fffff800`5fc95acb nt! ?? ::FNODOBFM::`string'+0x33e2a

05 fffff880`041a0690 fffff800`5fc58eee nt!MmAccessFault+0x55b

06 fffff880`041a07d0 fffff800`60263888 nt!KiPageFault+0x16e

07 fffff880`041a0960 fffff800`6024258c nt!ExFreePoolSanityChecks+0xe8

08 fffff880`041a09a0 fffff880`04c9b5d9 nt!VerifierExFreePoolWithTag+0x3c

09 fffff880`041a09d0 fffff880`04c9b727 myfault!MyfaultDeviceControl+0x2fd

0a fffff880`041a0b20 fffff800`60241a4a myfault!MyfaultDispatch+0xb7

0b fffff880`041a0b80 fffff800`600306c7 nt!IovCallDriver+0xba

0c fffff880`041a0bd0 fffff800`600458a6 nt!IopXxxControlFile+0x7e5

0d fffff880`041a0d60 fffff800`5fc5a453 nt!NtDeviceIoControlFile+0x56

0e fffff880`041a0dd0 000007fd`ea212c5a nt!KiSystemServiceCopyEnd+0x13

Using the address from the bugcheck code, we can verify that the memory is in fact not valid:

kd> dd fffff9800a7fe7f0

fffff980`0a7fe7f0 ???????? ???????? ???????? ????????

fffff980`0a7fe800 ???????? ???????? ???????? ????????

fffff980`0a7fe810 ???????? ???????? ???????? ????????

fffff980`0a7fe820 ???????? ???????? ???????? ????????

fffff980`0a7fe830 ???????? ???????? ???????? ????????

fffff980`0a7fe840 ???????? ???????? ???????? ????????

fffff980`0a7fe850 ???????? ???????? ???????? ????????

fffff980`0a7fe860 ???????? ???????? ???????? ????????

kd> !pte fffff9800a7fe7f0

VA fffff9800a7fe7f0

PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E600298 PTE at FFFFF6FCC0053FF0

contains 0000000002A91863 contains 0000000002A10863 contains 0000000000000000

pfn 2a91 ---DA--KWEV pfn 2a10 ---DA--KWEV not valid

So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice? If there was a double free we need to determine if the first or second call to ExFreePool was incorrect. To this so we need to determine what code freed the memory first.

Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool. You can dump this database with the !verifier 80 command. To limit the data output you can also pass this command the address of the memory you suspect was double freed.

Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.

In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).

kd> .frame /r 9

09 fffff880`041a09d0 fffff880`04c9b727 myfault+0x15d9

rax=0000000000000000 rbx=fffff9800a7fe800 rcx=fffff9800a7fe800

rdx=fffffa8001a37fa0 rsi=fffffa80035975e0 rdi=fffffa8003597610

rip=fffff88004c9b5d9 rsp=fffff880041a09d0 rbp=fffffa80034568d0

r8=fffff9800a7fe801 r9=fffff9800a7fe7f0 r10=fffff9800a7fe800

r11=0000000000000000 r12=0000000000000000 r13=0000000000000000

r14=fffff800600306c7 r15=fffffa8004381b80

iopl=0 nv up ei ng nz na po nc

cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286

myfault+0x15d9:

fffff880`04c9b5d9 eb7a jmp myfault+0x1655 (fffff880`04c9b655)

On x64 systems the first parameter is passed in rcx. The below assembly shows that rcx originated from rbx.

kd> ub fffff880`04c9b5d9

myfault+0x15ba:

fffff880`04c9b5ba ff15a80a0000 call qword ptr [myfault+0x2068 (fffff880`04c9c068)]

fffff880`04c9b5c0 33d2 xor edx,edx

fffff880`04c9b5c2 488bc8 mov rcx,rax

fffff880`04c9b5c5 488bd8 mov rbx,rax

fffff880`04c9b5c8 ff154a0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]

fffff880`04c9b5ce 33d2 xor edx,edx

fffff880`04c9b5d0 488bcb mov rcx,rbx

fffff880`04c9b5d3 ff153f0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]

Run !verifier 80 using the address from rbx:

kd> !verifier 80 fffff9800a7fe800

Log of recent kernel pool Allocate and Free operations:

There are up to 0x10000 entries in the log.

Parsing 0x0000000000010000 log entries, searching for address 0xfffff9800a7fe800.

======================================================================

Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

fffff80060251a32 nt!VfFreePoolNotification+0x4a

fffff8005fe736c9 nt!ExFreePool+0x595

fffff80060242597 nt!VerifierExFreePoolWithTag+0x47

fffff88004c9b5ce myfault!MyfaultDeviceControl+0x2f2

fffff88004c9b727 myfault!MyfaultDispatch+0xb7

fffff80060241a4a nt!IovCallDriver+0xba

fffff800600306c7 nt!IopXxxControlFile+0x7e5

fffff800600458a6 nt!NtDeviceIoControlFile+0x56

fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

======================================================================

Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

fffff80060242a5d nt!VeAllocatePoolWithTagPriority+0x2d1

fffff8006024b20e nt!XdvExAllocatePoolInternal+0x12

fffff80060242f69 nt!VerifierExAllocatePool+0x61

fffff88004c9b5c0 myfault!MyfaultDeviceControl+0x2e4

fffff88004c9b727 myfault!MyfaultDispatch+0xb7

fffff80060241a4a nt!IovCallDriver+0xba

fffff800600306c7 nt!IopXxxControlFile+0x7e5

fffff800600458a6 nt!NtDeviceIoControlFile+0x56

fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys. If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.

Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.

↧

Debugging a Windows 8.1 Store App Crash Dump

January 13, 2014, 9:12 am

≫ Next: NTFS Misreports Free Space (Part 3)

≪ Previous: Understanding Pool Corruption Part 3 – Special Pool for Double Frees

Quality reports on the App Summary page

Microsoft provides triage dumps of your Windows Store application’s crashes and hangs through the Quality section of the App Summary page on the Dev Center - Windows Store apps portal.

Back in June 2012, the Windows Store team posted an article on this feature and the basics of debugging the dumps provided. Improving apps with Quality reports,

http://blogs.msdn.com/b/windowsstore/archive/2012/06/27/improving-apps-with-quality-reports.aspx.

This article digs further into the debugging of Windows Store application crash dump files, and explains the recent changes made to exception reporting in Windows 8.1.

The files being debugged can be obtained from the Quality page or by collecting them yourself using Windows Error Reporting (WER) or the AeDebug feature of Windows.

An example AeDebug tool is Sysinternals ProcDump. To configure crash dumping, execute the following from an elevated command prompt:

C:\>md c:\dumps

C:\>procdump.exe -ma -i c:\dumps

Windows Runtime Architecture

The Windows Runtime (WinRT API) is at the core of all Windows Store applications. Similar to how Win32 and.NET sit between the Desktop app and the kernel, the WinRT API sits between the Windows Store app and the kernel.

In between the WinRT API and app is a layer called the Language Projection layer. This layer projects the C++ centric concepts of WinRT, into language specific concepts.

The projection of errors through the Language Projection layer is the focus of this article.

In WinRT, errors are modeled as IErrorInfo and IRestrictedErrorInfo interfaces.
In CLR languages, errors are modeled as exceptions and are represented as class objects derived from System.Exception.
In JavaScript, errors are also modeled as exceptions and are represented as JavaScript Exception (JSE) objects.
In C/C++, errors are modeled as an interface or a pure HRESULT.

Because each language has a different concept on how errors are handled, the projection layer needs to use a least common denominator. For errors, that means that just an HRESULT (Error Code) and HSTRING (Error Message) are sent through the projection layer. Any addition information held by WinRT’s interface is not available in the receiving language. And conversely, any additional information held by the language’s object is not available to WinRT.

If the error becomes unhandled, the HRESULT becomes the Exception Code reported in the Exception Record (of a live debug session or dump file).

Visual Studio 2013

Opening a dump file in Visual Studio allows you to see the Exception Record via the MiniDump File Summary. The Exception Code is listed in the Dump Summary section.

MinidumpFileSummary

If you Debug the application, the Exception Code‘s Description will be listed in the Output window.

OutputWindow

The call stack of the Exception Record’s context is viewable in the Call Stack window. Depending on the dump’s state, the $exceptionstack pseudo variable can be used in a Watch (or Locals) window to see the stack.

VisualStudio-CallStack

Note, having the Private PDBs of the application will make the stack output more complete/accurate.

Debugging Tools for Windows

Using the Debugging Tools for Windows, the Exception Record can be displayed using the .exr -1 command. The Exception Code’s description can (sometimes) be looked up using the !error <code> command. The context of the exception is changed to with the .ecxr command. The stack is displayed with the k command (knL adds frame numbers and omits source line information).

0:004> .exr -1

ExceptionAddress: 722248e8 (msvcr110!Concurrency::details::_ReportUnobservedException+0x00000022)

ExceptionCode: c0000409 (Security check failure or stack buffer overrun)

ExceptionFlags: 00000001

NumberParameters: 1

Parameter[0]: 00000005

0:004> !error c0000409

Error code: (NTSTATUS) 0xc0000409 (3221226505) - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

0:004> .ecxr

eax=00000001 ebx=ffffffff ecx=00000005 edx=0a6ee048 esi=13672424 edi=0546c33c

eip=722248e8 esp=02ceeef8 ebp=02ceef14 iopl=0 nv up ei pl nz na po nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

msvcr110!Concurrency::details::_ReportUnobservedException+0x22:

722248e8 cd29 int 29h

0:004> knL

*** Stack trace for last set context - .thread/.cxr resets it

# ChildEBP RetAddr

00 02ceeef4 00f2f6cb msvcr110!Concurrency::details::_ReportUnobservedException+0x22

WARNING: Stack unwind information not available. Following frames may be wrong.

01 02ceef14 00f2fad7 MyBadApp+0xef6cb

02 02ceef40 01122720 MyBadApp+0xefad7

03 02ceef64 011228eb MyBadApp+0x2e2720

04 02ceef70 00f3960a MyBadApp+0x2e28eb

05 02ceefb0 010cc804 MyBadApp+0xf960a

06 02ceefc0 0112108e MyBadApp+0x28c804

07 02ceeff8 72422b61 MyBadApp+0x2e108e

08 02cef034 72427e27 Microsoft_Xbox!DllGetClassObject+0x61352

09 02cef040 76095c3e Microsoft_Xbox!DllGetClassObject+0x66618

0a 02cef060 7610f497 rpcrt4!Invoke+0x2a

0b 02cef6ec 75c241f8 rpcrt4!NdrStubCall2+0x33c

0c 02cef734 75c1f58a combase!CStdStubBuffer_Invoke+0xc1

0d 02cef7c0 75b24617 combase!SyncStubInvoke+0x144

0e (Inline) -------- combase!StubInvoke+0x9a

0f 02cef8e8 75b97d8d combase!CCtxComChnl::ContextInvoke+0x222

10 02cef90c 75c24cc9 combase!DefaultInvokeInApartment+0x30

11 (Inline) -------- combase!ASTAInvokeInApartment+0x35

12 02cef9b4 75c1fdc7 combase!AppInvoke+0x5ae

13 02cefb00 75c24c71 combase!ComInvokeWithLockAndIPID+0x5ed

14 02cefb20 75b93118 combase!ComInvoke+0x153

15 02cefb30 75b97b11 combase!ThreadDispatch+0x23

16 02cefb44 75be53b5 combase!CComApartment::ASTAHandleMessage+0xe6

17 02cefb68 75ba8f22 combase!ASTAWaitContext::DispatchCallsOnExitNonBlockingProcessEventsIfAppropriate+0x9e

18 02cefb8c 75b5917e combase!ASTAWaitContext::~ASTAWaitContext+0x1a9

19 02cefb98 74acb13d combase!CoEndProcessEvents+0x37

1a 02cefbf4 00f733d7 windows_ui!Windows::UI::Core::CDispatcher::ProcessEvents+0x29ac1

1b 02cefc64 00f77f46 MyBadApp+0x1333d7

1c 02cefc94 74f6f45e MyBadApp+0x137f46

1d 02cefca0 74f6f322 twinapi_appcore!Windows::ApplicationModel::Core::CoreApplicationView::Run+0x27

1e 02cefcc0 74b1008a twinapi_appcore!<lambda_f0454c86bc54370cf843d844d6c13e00>::operator()+0xb2

1f 02cefd44 75f4a534 SHCore!_WrapperThreadProc+0xe2

20 02cefd50 77dd8f8b kernel32!BaseThreadInitThunk+0xe

21 02cefd94 77dd8f61 ntdll!__RtlUserThreadStart+0x20

22 02cefda4 00000000 ntdll!_RtlUserThreadStart+0x1b

Simple so far...

Using Visual Studio or the Debugging Tools for Windows is relatively simple when the Exception Record is associated with the call stack of the issue. This is not however always the case. It depends on what side of the projection layer the issue occurred. If the error (exception) was not handled on the language side, the exception is marshaled (projected) to the WinRT side for its exception handling. When this occurs, it starts getting very, very tricky indeed, to see what stack caused the issue...

Language Exceptions - Error Code 0xC000027B

In the initial design of the WinRT API, the projection of errors was done though a call to the RoOriginateError function. This function takes a HRESULT and HSTRING. Of note, there is no call stack captured. The limits of the RoOriginateError function were recognized and a new (associated) function was created for Windows 8.1.

The RoOriginateLanguageException function takes a HRESULT, HSTRING and a marshalable interface pointer. When RoOriginateLanguageException in called, the current call stack is captured and is passed as part of the error.

The purpose of RoOriginateLanguageException is to marshal the interface pointer so that additional language information is available on the other side of the projection layer. This behavior is achieved by using a specific exception code. Instead of the using the (user defined) HRESULT, a value of 0xC000027B is used. This error code indicates to the receiver that there is data to unmarshal. The data includes the HRESULT and HSTRING, and also the Interface pointer.

The important point to understand here is that all async exceptions raised in Windows 8.1 Windows Store apps now result in a 0xC000027B error code in the Exception Record, not the error code passed by the caller.

Debugging Language Exceptions

The Exception Record of a Language Exception contains (of note) the exception code (0xC000027B) and two parameters.

0:004> .exr -1

ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 0bdaf240

Parameter[1]: 00000001

0:006> !error c000027b

Error code: (NTSTATUS) 0xc000027b (3221226107) - An application-internal exception has occurred.

The first parameter is the address of a pointer array (of unmarshalled data). The second parameter is the count of pointers in the pointer array.

So why is the need for a count? Since applications have multiple threads, it is possible for multiple threads to call RoOriginateLanguageException simultaneously. Equally, there can be a cyclic nature to the experience, where exceptions are caught and then re-thrown. Since WinRT processes the errors asynchronously, multiple errors exist regularly. The first exception in the array should be the focus of the investigation.

[Tip] Even though Microsoft publishes the private symbols for combase.dll (allowing you to view the local variables of combase!RoFailFast* functions), these locals regularly resolve to invalid addresses due to register reuse and other code flow optimizations. The pointer array in Parameter[0] is the correct place to get the address of the language exception pointer array.

Original Error Code

The first step when debugging a Language Exception is to determine the actual error code of the caller, instead of the 0xC000027B error code.

Casting an address to a pointer array of a specific type in Visual Studio is, put simply, too difficult to undertake. The easiest option is to use the Debugging Tools for Windows. Even though these tools are all command-line driven and use an obscure syntax, it is relatively easy to follow the following commands to get to the important information.

If not done already, set your symbol path to the Microsoft Public Symbol server:

0:004> .sympath SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Expanded Symbol search path is: srv*c:\Symbols*http://msdl.microsoft.com/download/symbols

************* Symbol Path validation summary **************

Response Time (ms) Location

Deferred SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Force the load of the symbols using the .reload /f command:

0:004> .reload /f

...

The next step is to display the pointer array as the original structure type. First, we need to know what structure to cast the pointer array to. Using the Parameter[0] value from .exr -1, we will generate a dt command that will display the header of the first record. We use Parameter[0] as the address in this command.

dt <Parameter[0]> combase!_STOWED_EXCEPTION_INFORMATION_HEADER*

Here’s an example:

0:004> .exr -1

ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 0180cf90

Parameter[1]: 00000003

0:004> dt 0180cf90 combase!_STOWED_EXCEPTION_INFORMATION_HEADER*

0x070884a4

+0x000 Size : 0x20

+0x004 Signature : 0x53453031

The value of the Signature member (0x53453031) is converted to a string using .formats <value>.

0:006> .formats 0x53453031

Evaluate expression:

Hex: 53453031

Decimal: 1397043249

Octal: 12321230061

Binary: 01010011 01000101 00110000 00110001

Chars: SE01

Time: Wed Apr 09 04:34:09 2014

Float: low 8.46917e+011 high 0

Double: 6.90231e-315

The chars “SE01” map to a structure name of combase!_STOWED_EXCEPTION_INFORMATION_V1. It can be assumed that v2 uses a signature of “SE02” and a structure name of combase!_STOWED_EXCEPTION_INFORMATION_V2, and so on…

Now we know the type, we can again use the values from .exr -1 to generate a dt command that will display each record. We use the Parameter[0] as the address, and Parameter[1] as the count in the command. We add an “*” to the end of the type as this is an array of pointers to the type, not structures packed next to each other.

In this example, there are 3 pointers, so 3 records are displayed:

dt -a<Parameter[1]><Parameter[0]> combase!_STOWED_EXCEPTION_INFORMATION_V1*

Note, there is no space between the -a and <Parameter[1]>.

0:004> .exr -1

ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 0180cf90

Parameter[1]: 00000003

0:004> dt -a30180cf90 combase!_STOWED_EXCEPTION_INFORMATION_V1*

[0] @ 0180cf90

---------------------------------------------

0x070884a4

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80131500

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000010100111100 (0x53c)

+0x010 ExceptionAddress : 0x7721ea23 Void

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 5

+0x01c StackTrace : 0x06f48418 Void

+0x010 ErrorText : 0x7721ea23 "?????"

[1] @ 0180cf94

---------------------------------------------

0x071ca274

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80131500

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000010100111100 (0x53c)

+0x010 ExceptionAddress : (null)

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 0x19

+0x01c StackTrace : 0x071c926c Void

+0x010 ErrorText : (null)

[2] @ 0180cf98

---------------------------------------------

0x071c922c

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80131534

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000010100111100 (0x53c)

+0x010 ExceptionAddress : (null)

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 9

+0x01c StackTrace : 0x071c8224 Void

+0x010 ErrorText : (null)

The ResultCode member is 80131500 in the first two records, and 80131534 in the third record. A quick use of the !error <code> command looks up the descriptions:

0:007> !error 80131500

Error code: (HRESULT) 0x80131500 (2148734208) - <Unable to get error code text>

0:007> !error 80131534

Error code: (HRESULT) 0x80131534 (2148734260) - <Unable to get error code text>

In this case, both aren‘t well-known error codes. This is common as API specific error codes aren’t in the OS error lookup routines.

Here are some examples of known error codes, found by looking at a random selection of dumps. Some are quite common (80004003, 80004005 and 80070057) while others are quite rare:

0:004> !error 80004003

Error code: (HRESULT) 0x80004003 (2147500035) - Invalid pointer

0:004> !error 80004005

Error code: (HRESULT) 0x80004005 (2147500037) - Unspecified error

0:005> !error 8000ffff

Error code: (HRESULT) 0x8000ffff (2147549183) - Catastrophic failure

0:004> !error 80070057

Error code: (HRESULT) 0x80070057 (2147942487) - The parameter is incorrect.

0:006> !error 80073db8

Error code: (HRESULT) 0x80073db8 (2147958200) - Loading the state store failed.

0:005> !error 800f1000

Error code: (HRESULT) 0x800f1000 (2148470784) - No installed components were detected.

0:006> !error 88985004

Error code: (HRESULT) 0x88985004 (2291683332) - A font file exists but could not be opened due to access denied, sharing violation, or similar error.

Original Call Stack

Regardless of whether the error code is known or unknown, it is useful to determine the location of the issue by viewing the call stack.

Symbol Pointers

If the ExceptionForm member has a value of 0y01, the structure’s union represents a call stack.

Unlike call stacks associated with threads, where the symbol pointers are placed throughout the stack next to local variables, these symbols pointers are packed tightly at the address specified in the StackTrace member. The dpS command is used to display the call stack.

It is important to include a limit (L) as the call stack is regularly longer than the default 10 rows displayed by dpS. The limit’s value is in the StackTraceWords member.
Note that capital S is used (dps vs dpS) because we want to omit the first column normally displayed by dps; the location of the symbol pointer is irrelevant.
If you aren‘t using the same bitness debugger as the target’s bitness, use ddS for StackTraceWordSize = 4, and dqS for StackTraceWordSize = 8.

0:004> dt -a3 0180cf90 combase!_STOWED_EXCEPTION_INFORMATION_V1*

[0] @ 0180cf90

---------------------------------------------

0x070884a4

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80131500

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000010100111100 (0x53c)

+0x010 ExceptionAddress : 0x7721ea23 Void

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 5

+0x01c StackTrace : 0x06f48418 Void

+0x010 ErrorText : 0x7721ea23 "?????"

...

0:007> dpS 0x06f48418 L5

7723f217 combase!RoOriginateLanguageException+0x3b

72e29bfd clr!SetupErrorInfo+0x1e1

72ef27e1 clr!MarshalNative::GetHRForException_WinRT+0x7d

71981170 Windows_UI_Xaml_ni+0x291170

72b02a36 clr!COMToCLRDispatchHelper+0x28

Unicode String Pointer

If the ExceptionForm member has a value of 0y10, the structure’s union represents an error message.

The call stack is (hopefully) contained within the Unicode string pointed at by the ErrorText member. As the text is defined by the caller, the existence of a call stack text isn’t guaranteed.

0:005> dt –a1 13f117e0 combase!_STOWED_EXCEPTION_INFORMATION_V1*

[0] @ 13f117e0

---------------------------------------------

0x0471f3c0

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 8000ffff

+0x00c ExceptionForm : 0y10

+0x00c ThreadId : 0y000000000000000000010101110100 (0x574)

+0x010 ExceptionAddress : 0x0de38f7c Void

+0x014 StackTraceWordSize : 0

+0x018 StackTraceWords : 0

+0x01c StackTrace : (null)

+0x010 ErrorText : 0x0de38f7c "System.Exception.. at Windows.UI.Xaml.VisualStateManager.GoToState(Control control, String stateName, Boolean useTransitions).. at MyBadApp.Common.LayoutAwarePage.InvalidateVisualState().. at MyBadApp.Common.LayoutAwarePage.WindowSizeChanged(Object sender, WindowSizeChangedEventArgs e)"

CLR - Last Exception Object

Sometimes, the call stack retrieved from the record isn’t that useful. It may just be the call stack leading up to RoOriginateLanguageException function call, or it just might not relate to any of the code that the application author has written. In these cases, the CLR provides one more chance to understand the issue.

When the CLR throws an exception on a managed thread, the address of the exception object is kept in an (internal) per-thread variable. This address is what the !sos.pe (print exception) command reads to display the CLR Last Exception of a thread.

Note, if you use the Windows 8.1 SDK version of the Debugging Tools for Windows, SOS will be automatically loaded for you, including the download of any required DLLs. As such, it is highly suggested that you use the Windows 8.1 version.

Example #1

Looking at this example, we can see that there is a single record with an "Invalid pointer" error.

0:006> .exr -1

ExceptionAddress: 00007ffb87c46960 (twinapi_appcore!Microsoft::WRL::ComPtr<Windows::ApplicationModel::Core::UnhandledErrorDetectedEventArgs>::{dtor})

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 0000003fbc80c8a0

Parameter[1]: 0000000000000001

0:006> dt -a1 0000003fbc80c8a0 combase!_STOWED_EXCEPTION_INFORMATION_V1*

[0] @ 0000003f`bc80c8a0

---------------------------------------------

0x0000003f`bfc3c5b8

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80004003

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000100111001000 (0x9c8)

+0x010 ExceptionAddress : 0x00007ffb`981e1f1c Void

+0x018 StackTraceWordSize : 8

+0x01c StackTraceWords : 0x18

+0x020 StackTrace : 0x0000003f`bd7ac9c0 Void

+0x010 ErrorText : 0x00007ffb`981e1f1c "???"

0:006> !error 80004003

Error code: (HRESULT) 0x80004003 (2147500035) - Invalid pointer

This is a common call stack. A CLR exception is being marshaling to the unhandled error reporting sub-system of WinRT.

0:006> dpS 0x0000003f`bd7ac9c0 L18

00007ffb`98238d27 combase!RoOriginateLanguageException+0x57

00007ffb`71e0f926 mscorlib_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(Int32, System.String, IntPtr)+0xe6

00007ffb`71ff7084 mscorlib_ni!System.Runtime.InteropServices.WindowsRuntime.WindowsRuntimeMarshal.RoOriginateLanguageException(Int32, System.String, IntPtr)+0x44

00007ffb`71ff6b8d mscorlib_ni!System.Runtime.InteropServices.WindowsRuntime.WindowsRuntimeMarshal.ReportUnhandledError(System.Exception)+0x12d

00007ffb`885042f4 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x73e04

00007ffb`7ee6b915 clr!ExceptionTracker::CallHandler+0xc5

00007ffb`7ee6b80b clr!ExceptionTracker::CallCatchHandler+0x7f

00007ffb`7ee6b728 clr!ProcessCLRException+0x2e6

00007ffb`9a30a7fd ntdll!RtlpExecuteHandlerForUnwind+0xd

00007ffb`9a2b36ba ntdll!RtlUnwindEx+0x366

00007ffb`7ee6d1c0 clr!ClrUnwindEx+0x40

00007ffb`7ee6d174 clr!ProcessCLRException+0x2b2

00007ffb`9a30a77d ntdll!RtlpExecuteHandlerForException+0xd

00007ffb`9a2b29fb ntdll!RtlDispatchException+0x19b

00007ffb`9a2b2668 ntdll!RtlRaiseException+0xf0

00007ffb`976c8384 KERNELBASE!RaiseException+0x68

There is a CLR Last Exception object and the exception code of it matches the record’s code:

0:006> !sos.pe

Exception object: 0000003fa2be4830

Exception type: System.NullReferenceException

Message: Object reference not set to an instance of an object.

InnerException: <none>

StackTrace (generated):

SP IP Function

0000003FBC80D190 00007FFB1F72FC18 MyBadApp!MyBadApp.Utilities.Authentication.GetAliasFromSecurityToken()+0x18

0000003FBC80D1D0 00007FFB1F72FAC2 MyBadApp! MyBadApp.MainPage.MainPage_AuthenticateUserCompleted(System.Object, System.EventArgs)+0x82

0000003FBC80D210 00007FFB1F72F5E5 MyBadApp! MyBadApp.MainPage+<AuthenticateUser_Async>d__0.MoveNext()+0x305

0000003FBC80EE60 00007FFB724F0B31 mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__4(System.Object)+0x4d61d1

0000003FBC80EE90 00007FFB88490523 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x33

StackTraceString: <none>

HResult: 80004003

In this case, you can surmise that the System.NullReferenceException exception was thrown within the MyBadApp!MyBadApp.Utilities.Authentication.GetAliasFromSecurityToken() function, and that it was unhandled.

Extraction of the CLR Last Exception object can also sometimes be done in Visual Studio. When you do a Debug with Managed Only on the dump file, the Locals window sometimes contains a pseudo variable called $exception that represents the exception.

VisualStudio-Locals

The Text Visualizer of the StackTrace member allows you to see the call stack.

TextVisualizer

Example #2

Looking at another example, we can see that again there is a single record, this time with an "Unspecified error" exception code.

0:004> .exr -1

ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 0bdaf240

Parameter[1]: 00000001

0:004> dt -a1 0bdaf240 combase!PSTOWED_EXCEPTION_INFORMATION_V1

[0] @ 0bdaf240

---------------------------------------------

0x0af64034

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80004005

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000010010011110 (0x49e)

+0x010 ExceptionAddress : (null)

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 5

+0x01c StackTrace : 0x0af6302c Void

+0x010 ErrorText : (null)

0:006> !error 80004005

Error code: (HRESULT) 0x80004005 (2147500037) - Unspecified error

The call stack of the record suggests that this is associated with GetNavigationState:

0:004> dpS 0x0af6302c L5

72ec4e7d Windows_UI_Xaml!DirectUI::NavigationHistory::WritePageStackEntryToString+0x1f7fde

72ec4ef0 Windows_UI_Xaml!DirectUI::NavigationHistory::GetNavigationState+0x1f7ddf

72ccd0fa Windows_UI_Xaml!DirectUI::Frame::GetNavigationStateImpl+0x3a

72cccced Windows_UI_Xaml!DirectUI::FrameGenerated::GetNavigationState+0x2f

737d00eb Windows_UI_Xaml_ni+0x2400eb

But the CLR Last Exception object doesn’t have the same exception code as the record:

0:004> !sos.pe

Exception object: 02cb3cb8

Exception type: <Unknown>

Message: <Invalid Object>

InnerException: System.Runtime.InteropServices.COMException, Use !PrintException e2a09bc6 to see more.

StackTrace (generated):

SP IP Function

052DF6C0 07141094 MyBadApp!UNKNOWN+0x544

052DF8A4 73E0D17A mscorlib_ni!System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)+0x5e

052DF8B4 73E0D115 mscorlib_ni!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)+0x35

052DF8C0 071409DB MyBadApp!UNKNOWN+0xb3

052DF8CC 7458458F mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__4(System.Object)+0x33

052DF8D4 6F9EF994 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x24

StackTraceString: <none>

HResult: 80131500

0:006> !error 80131500

Error code: (HRESULT) 0x80131500 (2148734208) - <Unable to get error code text>

It does however have a nested CLR Exception object that does have the same exception code as the record. It too has a call stack that indicates GetNavigationState is having an issue.

0:004> !PrintException /d 02caf968

Exception object: 02caf968

Exception type: System.Runtime.InteropServices.COMException

Message: <Invalid Object>

InnerException: <none>

StackTrace (generated):

SP IP Function

00000000 00000001 Windows_UI_Xaml_ni!Windows.UI.Xaml.Controls.Frame.GetNavigationState()+0x2

052DF778 071411B5 MyBadApp!UNKNOWN+0x1d

052DF788 07140BF3 MyBadApp!UNKNOWN+0xa3

StackTraceString: <none>

HResult: 80004005

Summary

The asynchronous and projected nature of Windows Store applications makes them significantly harder to debug than desktop applications. Knowing the error code and call stack is just the first step in understanding the root cause of a crash in a Store application. Hopefully this blog post has made those first steps easier to undertake, and that those first steps have pointed you in the right direction.

The solutions to some of the more common issues have been talked about on episodes of Channel 9 Defrag Tools. These episodes show the code changes required to avoid the hang or crash:

↧

NTFS Misreports Free Space (Part 3)

May 8, 2014, 11:41 am

≫ Next: Understanding ARM Assembly Part 2

≪ Previous: Debugging a Windows 8.1 Store App Crash Dump

It’s been a while since my last post on this topic, and I wanted to take some time to update everyone on a cool new feature in Windows Server 2012 R2 and Windows 8.1. Today we declare part 1 and part 2 of this blog as obsolete - at least for Windows Server 2012 R2 and Windows 8.1 users.

The latest fsutil.exe now allows for the creation of an allocation report which summarizes how all of your disk space is being used by NTFS. This new fsutil.exe functionality is implemented though some new file system controls that only exist on Windows Server 2012 R2 and Windows 8.1, so the binary is not portable to previous versions of Windows.

USAGE: fsutil volume allocationreport X:

X: is the drive letter of an NTFS volume on your system.

Allocation Report

The allocation report gives a summary of total reserved, free, and allocated clusters. Reserved clusters are clusters that NTFS reserves just in case it needs to allocate space for a critical operation (like expanding a compressed file or extending the $MFT). If you’re experiencing insufficient disk space errors on a volume that has plenty of free space, the issue could be caused by opening many compressed NTFS files at the same time. Please refer to Understanding Ntfs Compression for more information on how to troubleshoot this.

Allocation report:
Total clusters              : 244100351 (999835037696 bytes)
Free clusters               : 232507563 (952350978048 bytes)
Reserved clusters           : 18352 (75169792 bytes)
Total allocated             : 47484059648 bytes

System Files

If you suspect that there’s something you can’t see that’s taking up disk space, check the System Files section to see how much disk space is used by the system. In this example, I have 884,703,232 bytes in use by NTFS metadata, and the breakdown of each system file’s usage is outlined below. For details on each system file type, refer to http://blogs.technet.com/b/askcore/archive/2009/12/30/ntfs-metafiles.aspx.

System files                : Count: 29. Total allocated: 884703232 bytes.
    $Mft                    : File ID 0x0001000000000000. Total allocated: 238063616 bytes.
    $MftMirr                : File ID 0x0001000000000001. Total allocated: 4096 bytes.
    $LogFile                : File ID 0x0002000000000002. Total allocated: 67108864 bytes.
    $Volume                 : File ID 0x0003000000000003. Total allocated: 0 bytes.
    $AttrDef                : File ID 0x0004000000000004. Total allocated: 4096 bytes.
    Root folder             : File ID 0x0005000000000005. Total allocated: 8192 bytes.
    $Bitmap                 : File ID 0x0006000000000006. Total allocated: 30515200 bytes.
    $Boot                   : File ID 0x0007000000000007. Total allocated: 8192 bytes.
    $BadClus                : File ID 0x0008000000000008. Total allocated: 0 bytes.
    $Secure                 : File ID 0x0009000000000009. Total allocated: 1855488 bytes.
    $UpCase                 : File ID 0x000a00000000000a. Total allocated: 131072 bytes.
    $Extend                 : File ID 0x000b00000000000b. Total allocated: 0 bytes.
    $ObjId                  : File ID 0x0001000000000019. Total allocated: 24576 bytes.
    $Quota                  : File ID 0x0001000000000018. Total allocated: 0 bytes.
    $Reparse                : File ID 0x000100000000001a. Total allocated: 786432 bytes.
    $UsnJrnl                : File ID 0x0002000000012f66. Total allocated: 34144256 bytes.
    $RmMetadata             : File ID 0x000100000000001b. Total allocated: 0 bytes.
    $Repair                 : File ID 0x000100000000001c. Total allocated: 94371840 bytes.
    $Txf                    : File ID 0x000100000000001e. Total allocated: 4096 bytes.
    $TxfLog                 : File ID 0x000100000000001d. Total allocated: 4096 bytes.
    $Tops                   : File ID 0x000100000000001f. Total allocated: 396623872 bytes.
    $TxfLog.blf             : File ID 0x0001000000000020. Total allocated: 65536 bytes.
    Other system files      : Count: 4. Total allocated: 0 bytes.
    Other system files under $Txf folder:
        Count               : 1
        Total allocated     : 8192 bytes.
    Other system files under $TxfLog folder:
        Count               : 2
        Total allocated     : 20971520 bytes.

System Volume Information

If the usage in System Volume Information is higher than expected, the issue is likely to be caused by storage of diff areas for VSS volume shadow copies. Deleting the volume shadow copies with VSSAdmin or Diskshadow will return the free space. System Volume Information is also the home of the chunk store used by NTFS deduplication.

System Volume Information   : Total allocated: 5366915072 bytes.
    Files                   : Count: 18. Total allocated: 5366882304 bytes.
    Folders                 : Count: 7. Total allocated: 32768 bytes.

User Folders

It costs something to maintain the folder structure of a volume, and the user folders section summarizes the overall cost. Within this section is also a summary of how many NTFS compressed folders exist. As you can see below, I have 145 folders with a compressed attribute flag but the total number of compressed bytes is zero. I puzzled over the idea of zero compressed bytes until I discovered that this measurement is of how many compressed bytes exist in the context of folder indexes, and indexes are never compressed. Only user data streams are compressed natively by NTFS.

User folders                : Count: 23101. Total allocated: 77889536 bytes.
    Default streams         : 4689
        Allocated           : 4689
        Total allocated     : 77885440 bytes.
    Named streams           : 7
        Allocated           : 0
        Total allocated     : 0 bytes.
    Local metadata streams : 95566
        Allocated           : 1
        Total allocated     : 4096 bytes.
Within these folders there are:
    Compressed              : 145
        Total allocated     : 0 bytes
        Total size          : 0 bytes.
        Savings             : 0.00 %
    Sparse                  : 0
        Total allocated     : 0 bytes
        Total size          : 0 bytes.
        Savings             : 0.00 %
    Encrypted               : 0
        Total allocated     : 0 bytes

    With named streams      : 7
        Compressed          : 0
        Sparse              : 0
        Encrypted           : 0
    With no allocation      : 18412

User Files

In the user files section, we have a total of all user files and the compression statistics to show how much space is being saved by native NTFS compression. There is also a nice summary of alternate named stream usage (ANS). ANS allocations do not show up in DIR or Explorer, so this is a quick and easy way to see exactly how your named streams are affecting overall disk usage. On my volume, I had 3115 files with named streams and zero bytes were allocated. This seems to be another paradox, but there’s a logical explanation for what’s happening. If a file has a named stream and the stream size is small enough for it to be resident, then the stream lives in the file’s MFT record (which is accounted in this report as part of $Mft : File ID 0x0001000000000000. Total allocated: 238063616 bytes.).

User files                  : Count: 94128. Total allocated: 41154551808 bytes.
    Default streams         : 94128
        Allocated           : 72123
        Total allocated     : 41087229952 bytes.
    Named streams           : 4637
        Allocated           : 4562
        Total allocated     : 66740224 bytes.
    Local metadata streams : 333248
        Allocated           : 142
        Total allocated     : 581632 bytes.
Within these files there are:
    Compressed              : 2006
        Total allocated     : 374972416 bytes
        Total size          : 816416626 bytes.
        Savings             : 54.07 %
    Sparse                  : 1519
        Total allocated     : 1572864 bytes
        Total size          : 273374082 bytes.
        Savings             : 99.42 %
    Encrypted               : 0
        Total allocated     : 0 bytes

    With named streams      : 3115
        Compressed          : 0
        Sparse              : 0
        Encrypted           : 0
    With no allocation      : 20485

As you can see, this new functionality in fsutil makes it easier and faster to determine what is using space on an NTFS volume.

↧

Understanding ARM Assembly Part 2

May 15, 2014, 11:10 am

≫ Next: Debugging a Windows 8.1 Store App Crash Dump (Part 2)

≪ Previous: NTFS Misreports Free Space (Part 3)

My name is Marion Cole, and I am a Sr. Escalation Engineer in Microsoft Platforms Serviceability group. This is Part 2 of my series of articles about ARM assembly. In part 1 we talked about the processor that is supported. Here we are going to talk about how Windows utilizes that ARM processor.

As we discussed in part 1 Windows runs on the ARMV7-A with NEON. We discussed the CPSR register in part 1. There are a few bits that are important in the CPSR. The first one is the Endian State bit:

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
N	Z	C	V	Q	IT		J	Reserved				GE				IT						E	A	I	F	T	M

Bit 9 (the E bit) indicates the EndianState. This bit should always be a 0 because Windows only runs in Little-Endian state. So if you get a dump, and see the CPSR bit 9 is set then you have a problem. Here is an example from the debugger:

1: kd> r

r0=00000001 r1=00000001 r2=00000000 r3=00000000 r4=e1074044 r5=c555b580

r6=00000001 r7=e104ca39 r8=00000001 r9=00000000 r10=e9bf06c7 r11=d5f1ea08

r12=e16b213c sp=d5f1e9b0 lr=e0f0fe2f pc=e0fdebd0 psr=00000133 ----- Thumb

nt!DbgBreakPointWithStatus:

e0fdebd0 defe __debugbreak

1: kd> .formats 00000133

Evaluate expression:

Hex: 00000133

Decimal: 307

Octal: 00000000463

Binary: 00000000 00000000 00000001 00110011 ßBit 9 is 0. Note first bit is Bit 0.

Chars: ...3

Time: Wed Dec 31 18:05:07 1969

Float: low 4.30199e-043 high 0

Double: 1.51678e-321

So how could Bit 9 ever be a 1? The SETEND instruction in the ARM ISA allows even user mode code to change the current endianness, doing so will be dangerous for an application and is discouraged. If an exception is generated while in big-endian mode the behavior is unpredictable, but may lead to an application fault (user mode) or bugcheck (kernel mode).

The next bit we are going to discuss is bit 5, the Thumb bit (the T bit). This should be a 1 if executing Thumb instructions. So let’s discuss the different instruction sets the ARM processor has.

ARMv7 has four different ISA's for programming.

ARM - basic ARM instruction set including conditional execution.
Thumb - This mode uses a 16 bit instruction encoding to reduce code footprint. It has limitations with respect to register access and some system instructions aren't implemented for Thumb.
Thumb2 - This extension of the Thumb instruction set adds 32 bit opcode encodings and adds enough facilities to author an entire OS. Support for Thumb2 is guaranteed in the ARMv7 architecture revision.
Jazelle - Java code interpretation.
ThumbEE - a limited version of Thumb2 intended as a code generation target for JIT scenarios.

Windows requires Thumb2 support. The advantage of using Thumb2 is that the combination of 16 and 32 bit opcodes along with some other ISA improvements allows for saving 20-30% code footprint at a 1-2% performance loss. In addition the cache hit rate is improved due to increased density of the code.

CPSR Bit 5 should always be 1 as Windows only runs in Thumb2 mode. Also note that this bit is combined with bit 24, the Java state bit (the J bit). Bit 24 should always be 0 when running Windows.

The next bits to discuss are the CPU Mode bits 4-0 (M). Windows only runs in two modes. They are User Mode (10000) and Supervisor Mode (10011). If Bits 4-0 are anything other than the indicated values given an exception will be raised. Kernel will run in Supervisor Mode, and applications will run in User Mode.

That brings up another point. How does the processor switch between Supervisor Mode and User Mode? It is called the SVC call. In the x86 processor this was done via SYSENTER/SYSEXIT. In x64 processor this was done via SYSCALL/SYSRET. In ARM this is done via the SVC or Supervisor Call. This call is made to have the kernel provide a service. When invoked in ntdll.dll the service number is in r12. Here is an example:

1: kd> u ntdll!ZwQueryVolumeInformationFile

771e8674    f04f0c8d    mov   r12,#0x8D
771e8678    df01        svc   #1
771e867a    4770        bx    lr

When SVC is called the previous CPSR register is saved in the SPSR register (the Saved Program Status Register), and pc register is saved in lr register (the Link Register). The processor then changes to kernel mode (0x13) with interrupts disabled. The lr and SPSR values are used to generate a return from the SVC call. When an exception is taken the stack is untouched, the previous mode's SP and LR are left alone, new modes SP becomes active, exception address is stored in the new mode's LR, and the previous CPSR is copied into the new mode's SPSR. When returning from the exception the SPSR is copied back into the CPSR, and it returns to LR.

Data Types

ARMv7 processors support four data types from 8 bits to 64 bits, but the definitions are different than the ones in Windows. In Windows 16 bits are defined as a word, on ARM a word is 32 bits.

Byte	8 bits
HalfWord	16 bits
Word	32 bits
DoubleWord	64 bits

These can be signed or unsigned.

Unsigned 32 bit integer
Signed 32 bit integer
Unsigned 16 bit integer (zero extended)
Signed 16 bit register (sign extended)
Unsigned 8 bit integer (zero extended)
Signed 8 bit register (sign extended)
Two 16 bit integers
Four 8 bit integers
The upper or lower 32 bits of a 64 bit signed value whose other half is in another register
The upper or lower 32 bits of a 64 bit unsigned value whose other half is in another register

Memory Model

The ARM memory model is much like other architectures that we have supported. ARM has a "weak ordering" memory model. This means that two memory operations that occur in program order, may be observed from another processor or DMA controller in any order. When an instruction stalls because it is waiting for the result of a preceding instruction, the core can continue executing subsequent instructions that do not need to wait for the unmet dependencies. There are three instructions that allow you to configure memory barriers:

ISB - Instruction Synchronization Barrier
DMB - Data Memory Barrier
DSB - Data Synchronization Barrier

An excellent blog article on this topic with an explanation of these three instructions is available at:

http://blogs.arm.com/software-enablement/594-memory-access-ordering-part-3-memory-access-ordering-in-the-arm-architecture/

Alignment and Atomicity

Windows enables the ARM hardware to handle misaligned integer accesses transparently; however, there are still several situations where alignment faults may be generated on misaligned accesses. Follow the rules below:

Halfword and word-sized integer loads and stores do NOT need to be aligned (hardware will handle them efficiently and transparently)
Floating-point loads and stores SHOULD be aligned (the kernel will handle them transparently, but with significant overhead)
Load/store double (LDRD/STRD) and multiple (LDM/STM) operations SHOULD be aligned (the kernel will handle most of them transparently, but with significant overhead)
All uncached memory accesses MUST be aligned, even for integer accesses (you will get an alignment fault)

Note that the memcpy() implementation provided by the Windows CRT presumes the copies are to/from cached memory, and thus leverages the hardware’s support for transparently handling misaligned integer reads and writes with little penalty. This means that memcpy() CANNOT be used when the source or destination is uncached memory. Instead, use the separate function _memcpy_strict_align(), which only performs aligned accesses.

There are two types of atomicity supported. Single-copy and Multi-copy.

Single-copy atomicity

There are rules around atomicity that are intended to specify the cases where memory access behavior in relation to program order can be guaranteed. So certain access (aligned word accesses) are guaranteed by the architecture to return sensible results even if other threads are accessing the same memory. These rules are necessary in order to guarantee that the programmer (and compiler) can rely on correct behavior of memory in the majority of the cases.

Multi-copy atomicity

These rules are similar, but relate specifically to multi-processing environments in which several observers may be using a particular item in memory. To be able to guarantee correct behavior you need to be able to assume that memory behaves in a consistent way.

More on Single-Copy and Multi-Copy atomicity in the ARM Architecture Reference Manual available from http://infocenter.arm.com/help/index.jsp.

Common Assembly Instructions

We are going to cover some common Thumb2 instructions.

ldr r0, [r4] (ldrex, ldrh ldrb, ldrd, ldrexd, etc.)
This is the Load Register instruction. In the above example r0 is the destination register, and r4 is the base register. This will take the address that is in r4, go to that memory location and copy the contents of that memory location into r0.
str r2, [r4, #0x08] (strex, strh, strexh, strd, etc.)
This is the Store Register instruction. In the above example r2 is the source register, and r4 is the base register. This will take the address in r4 and add 8 to that address. It will take the value that is in r2, and store it at the address pointed to by r4 plus 8.
mov r1, r4 (movs – sets the condition codes)
This is the Move instruction. In the above example r1 is the destination register, and r4 is the source register. It will do the same thing as x86 in that it just copies what is in r4 to r1. It can optionally updated the condition flags based on the value.
adds r1, r5, #0 (add)
This is the Add instruction. In the above example r1 is the destination register. This will take the value that is in r5 and add 0 to it. It will store the result in r1. Because this has an (s) at the end of add it will update the flags.
sub sp, sp, #0x14 (subs)
This is the Subtract instruction. In the above example sp is the destination. This will take the value that is in sp, subtract 14h from it, and store the result in sp. Because this does not have an (s) at the end it will not update the flags.
push {r4-r9, r11, lr}
This is the Push instruction. It can push multiple registers to the stack in one instruction. You can separate a full series of register with the beginning register "-" and ending register like seen above. You can also list them all, and just separate them by ",". This operates the same as an x86 processor in that it subtracts 4 from the stack pointer for each push.
pop {r4-r9, r11, lr}
This is the Pop instruction. It pulls values from the stack back into the registers you list. The registers work just like the push instruction. This operates the same as an x86 processor in that it adds 4 to the stack pointer for each pop.
b?? |MyApp!main+0x60 (00b81348)|
This is the Branch instruction. This is equivalent to the jmp instruction in x86. However it has several conditional variants such as "beq, bge, and etc.".
bx r3
This is the Branch and Exchange instruction. This causes a branch to an address and instruction set specified by a register (r3 here). This can do a long branch anywhere in the 32-bit address range.
bl |MyApp!Function (00b815c4)|
This is the Branch with Link instruction. This calls a subroutine at a PC-relative address. This will update the lr register.
blx r3
This is the Branch with Link and Exchange. This calls a subroutine at an address and instruction set specified by a register (r3 here). This will do a long branch anywhere in the 32-bit address range, and update the lr register.
dmb
This is the Data Memory Barrier instruction. It is a memory barrier that ensures the ordering of observations of memory accesses.
cmp r3, #0
This is the Compare instruction. It will subtract 0 from the value in r3, and set the flags accordingly.

In ARM addressing the base register points to memory being referenced. The offset can be an immediate or an index register. The memory stored at the base register`s address plus the offset is accessed. The base register remains unchanged. Example:

Ldr r5,[r9,#0x1c]

This will take the value that is in r9 and add 0x1C to it, go to that memory location, and retrieve the value there and store it in r5. R9 will remain the same value.

ARM also has some interesting thing about indexing. They have Pre-Indexed addressing, Offset Addressing, and Post-Indexed Addressing.

Pre-Indexed addressing the value of the base register is first modified by the offset then the memory pointed to by the modified base register is accessed. Example:

Str r2,[r4,#0x4]!

The "!" at the end of the instruction is not a mistake. This is how you tell it is a Pre-Indexed address.

Offset Addressing. The value is added to the base register, and that is used as the address for memory access. If the "!" was not there then this would just be Offset addressing. Example:

Str r2,[r4,#0x4]

Post-Index addressing the memory address in the base register is accessed then afterwards the base register is modified by the offset value. Example:

Ldr pc,[sp],0x1c

Notice the "!" is missing here. Also notice the offset is outside the "[ ]". That is how you can find a Post-Index.

Part 3 of this series will cover Calling Conventions, Prolog/Epilog, and Rebuilding the stack.

↧

Debugging a Windows 8.1 Store App Crash Dump (Part 2)

May 28, 2014, 3:51 pm

≫ Next: Understanding ARM Assembly Part 3

≪ Previous: Understanding ARM Assembly Part 2

In Part 1, we covered the debugging of a Windows Store Application crash dump that contains a Stowed Exceptions Version 1 (SE01) structure.

This post continues on from Part 1, covering the changes introduced in March 2014. These Windows Updates changed the way language exceptions (RoOriginateLanguageException) are recorded in Windows Store Application crash dump files. The new Stowed Exception Version 2 (SE02) structure adds additional fields that directly associate the exception with a language exception object.

You’ll recall from the Part 1 that the CLR Exception is loosely associated with the Stowed Exception v1 structure by comparing the HRESULT of the Stowed Exception with the HRESULT of the last CLR Exception on the default thread (the exception record thread). V2 makes this relationship direct. You’ll discover that the Last CLR Exception no longer exists in the v2 dump and that it must be referenced directly by the address stored in the Stowed Exception.

The direct association was added to v2 to also aid triage dump carving (done by Windows Error Reporting). It allows WER to explicitly add the memory associated with the relevant Language (CLR) Exception. This eliminates the risk of the garbage collector freeing the memory associated with the last CLR Exception before the dump is taken. This also helps identify which exception is related to the final crash, which can be difficult when there are multiple exceptions in the dump.

Debug Steps

The steps to debug a v2 structure are similar to v1. You first determine the number of stowed exception entries (.exr -1), look at the header to determine the version, display the array of stowed exceptions cast to the correct type (dt -aN …), and then extract the native stack (dpS) or text (du) for each entry.

Instead of then comparing the HRESULT to the last CLR Exception (!sos.pe), you use the Nested Exception member to get to the innermost CLR Exception. Due to way object pointers are handled by the CLR, the address is a CCW (COM Callable Wrapper) address, not a CLR object address. To get the CLR object’s address, you use the !sos.dumpccw command. This provides the CLR object address, which can be passed to the !sos.pe command to display the exception.

OK, let’s do all of that, showing the commands and data fields of note along the way. (A lot of this is similar to the previous post.)

If not done already, set your symbol path to the Microsoft Public Symbol server:

0:003> .sympath SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Expanded Symbol search path is: srv*c:\Symbols*http://msdl.microsoft.com/download/symbols

************* Symbol Path validation summary **************

Response Time (ms) Location

Deferred SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

Force the load of the symbols using the .reload /f command:

0:003> .reload /f

...

dt <Parameter[0]> combase!STOWED_EXCEPTION_INFORMATION_HEADER*

Here’s an example:

0:003> .exr -1

ExceptionAddress: 7575b152 (combase!RoFailFastWithErrorContextInternal+0x0000010b)

ExceptionCode: c000027b

ExceptionFlags: 00000001

NumberParameters: 2

Parameter[0]: 00c6d3d0

Parameter[1]: 00000002

0:003> dt 00c6d3d0 combase!_STOWED_EXCEPTION_INFORMATION_HEADER*

0x07a690dc

+0x000 Size : 0x28

+0x004 Signature : 0x53453032

The value of the Signature member (0x53453031) is converted to a string using .formats <value>.

0:003> .formats 0x53453032

Evaluate expression:

Hex: 53453032

Decimal: 1397043250

Octal: 12321230062

Binary: 01010011 01000101 00110000 00110010

Chars: SE02

Time: Wed Apr 09 04:34:10 2014

Float: low 8.46917e+011 high 0

Double: 6.90231e-315

“SE01” maps to combase!STOWED_EXCEPTION_INFORMATION_V1
“SE02” maps to combase!STOWED_EXCEPTION_INFORMATION_V2

Now that we know the type, we can again use the values from .exr -1 to generate a dt command that will display each record. We use the Parameter[0] as the address, and Parameter[1] as the count in the command. We add a “P” to the start of the type as this is an array of pointers to the type, not structures packed next to each other.

In this example, there are 2 pointers, so 2 records are displayed:

dt -a<Parameter[1]><Parameter[0]> combase!PSTOWED_EXCEPTION_INFORMATION_V2

Note, there is no space between the -a and <Parameter[1]>.

0:003> dt -a2 00c6d3d0 combase!PSTOWED_EXCEPTION_INFORMATION_V2

[0] @ 00c6d3d0

---------------------------------------------

0x07a690dc

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80004001

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000100000001111 (0x80f)

+0x010 ExceptionAddress : 0x756b3bff Void

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 3

+0x01c StackTrace : 0x0619a368 Void

+0x010 ErrorText : 0x756b3bff "???"

+0x020 NestedExceptionType : 0x314f454c

+0x024 NestedException : 0x063a95d4 Void

[1] @ 00c6d3d4

---------------------------------------------

0x0619b6a8

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80004001

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000000000000000 (0)

+0x010 ExceptionAddress : (null)

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 0x3f

+0x01c StackTrace : 0x0639bf4c Void

+0x010 ErrorText : (null)

+0x020 NestedExceptionType : 0

+0x024 NestedException : (null)

Native Call Stack

Regardless of whether the error code (ResultCode) is known or unknown, it is useful to determine the location of the (native) issue by viewing the (native) call stack.

Symbol Pointers

If the ExceptionForm member has a value of 0y01, the structure’s union represents a call stack.

Unlike call stacks associated with threads, where the symbol pointers are placed throughout the stack next to local variables, these symbols pointers are packed tightly at the address specified in the StackTrace member. Think of it as an array of EBP addresses. The dpS command is used to display the call stack.

It is important to include a limit (L) as the call stack is regularly longer than the default 10 rows displayed by dpS. The limit’s value is in the StackTraceWords member.
Note that capital S is used (dps vs dpS) because we want to omit the first column normally displayed by dps; the location of the symbol pointer is irrelevant.
If you aren‘t using the same bitness debugger as the target’s bitness, use ddS for StackTraceWordSize = 4 (32-bit), and dqS for StackTraceWordSize = 8 (64-bit).

0:003> dt -a2 00c6d3d0 combase!PSTOWED_EXCEPTION_INFORMATION_V2

[0] @ 00c6d3d0

---------------------------------------------

0x07a690dc

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 80004001

+0x00c ExceptionForm : 0y01

+0x00c ThreadId : 0y000000000000000000100000001111 (0x80f)

+0x010 ExceptionAddress : 0x756b3bff Void

+0x014 StackTraceWordSize : 4

+0x018 StackTraceWords : 3

+0x01c StackTrace : 0x0619a368 Void

+0x010 ErrorText : 0x756b3bff "???"

+0x020 NestedExceptionType : 0x314f454c

+0x024 NestedException : 0x063a95d4 Void

...

0:003> dpS 0x619a368 L3

756ea9f1 combase!RoOriginateLanguageException+0x3b

63b2b04d clr!SetupErrorInfo+0x1e1

63bf4511 clr!MarshalNative::GetHRForException_WinRT+0x7d

Unicode String Pointer

If the ExceptionForm member has a value of 0y10, the structure’s union represents an error message.

The call stack is (hopefully) contained within the Unicode string pointed at by the ErrorText member. As the text is defined by the caller, the existence of a call stack text isn’t guaranteed.

0:003> dt –a1 13f117e0 combase!PSTOWED_EXCEPTION_INFORMATION_V1

[0] @ 13f117e0

---------------------------------------------

0x0471f3c0

+0x000 Header : _STOWED_EXCEPTION_INFORMATION_HEADER

+0x008 ResultCode : 8000ffff

+0x00c ExceptionForm : 0y10

+0x00c ThreadId : 0y000000000000000000010101110100 (0x574)

+0x010 ExceptionAddress : 0x0de38f7c Void

+0x014 StackTraceWordSize : 0

+0x018 StackTraceWords : 0

+0x01c StackTrace : (null)

Note - These records aren’t used with v2 language exceptions (or if they are, they are extremely rare based on the Windows Error Reporting telemetry).

Nested Exceptions

The new fields in the v2 structure are the NestedExceptionType and NestedException members. The NestedExceptionType member is one of the following values. Much like the Signature field, you can use .formats <value> to see the characters each code represents. The possible values and their associated meaning are:

W32E – Win32 Exception – points to an EXCEPTION_RECORD structure
STOW – Stowed Exception – points to a STOWED_EXCEPTION_INFORMATION_* structure
CLR1 – CLR Object – points (directly) to a CLR Object
LEO1 – Language Exception Object – points indirectly to a CLR Exception object

LEO1 is the only style being generated by Windows Error Reporting for CLR Exceptions raised in Windows Store Applications.

Looking at the example dump file we have been using, it can be seen that the first Stowed Exception has values for the NestedException and NestedExceptionType fields, and they are NULL in the second. Using .formats tells us that the NestedExceptionType member is of type “LEO1”. Note that this is displayed backwards in the output below, in accordance with little-endian order of Intel memory layout.

0:003> dt -a2 00c6d3d0 combase!PSTOWED_EXCEPTION_INFORMATION_V2

[0] @ 00c6d3d0

---------------------------------------------

0x07a690dc

...

+0x020 NestedExceptionType : 0x314f454c

+0x024 NestedException : 0x063a95d4 Void

...

0:003> .formats 0x314f454c

Evaluate expression:

Hex: 314f454c

Decimal: 827278668

Octal: 06123642514

Binary: 00110001 01001111 01000101 01001100

Chars: 1OEL

Time: Tue Mar 19 16:37:48 1996

Float: low 3.01619e-009 high 0

Double: 4.0873e-315

Passing the address to !sos.dumpccw provides the CLR Exception object’s address.

0:003> !sos.dumpccw 0x063a95d4

CCW: 0499f880

Managed object: 02517288

Outer IUnknown: 00000000

Ref count: 1

Flags:

RefCounted Handle: 00a31478 (STRONG)

COM interface pointers:

IP MT Type

The address can be used with !sos.pe to display the CLR Exception object. The call stack that the failure investigation should focus on is in this output.

0:003> !sos.pe 02517288

Exception object: 02517288

Exception type: System.NotImplementedException

Message: The method or operation is not implemented.

InnerException: <none>

StackTrace (generated):

SP IP Function

04F2E38C 00B81382 CrashStore!CrashStore.MainPage.Load_Click_1(System.Object, Windows.UI.Xaml.RoutedEventArgs)+0x62

StackTraceString: <none>

HResult: 80004001

There you have it. This is the CLR Exception that you need to find to start your code analysis or to point you in the right direction when beginning tracing.

But what if SOS is not available?

What do you do if SOS isn’t available? You can check if it is loaded by running the .chain command, and you can check if it is functional by running !sos.dumpccw command (without a parameter).

Firstly, make sure you are using the same bitness of the debugger as the bitness of the target.

If the dump says “x86” or “ARM (Thumb2)” in the version command or the initial debug spew, use the 32bit debugger.

Windows 8 Version 9600 MP (4 procs) Free x86 compatible

If the dump says “x64” in the version command or the initial debug spew, use the 64bit debugger.

Windows 8 Version 9200 MP (4 procs) Free x64

If you still don’t have SOS loaded (or working) after matching the bitness, or you get one of the following errors, you’ll have to debug the dump on a system with the same version of the CLR installed. Some CLR versions weren’t indexed and this causes the automatic download of sos.dll and mscordacwks.dll to fail.

0:003> !sos.dumpccw

Failed to load data access DLL, 0x80004005

Verify that 1) you have a recent build of the debugger (6.2.14 or newer)

2) the file mscordacwks.dll that matches your version of clr.dll is

in the version directory or on the symbol path

3) or, if you are debugging a dump file, verify that the file

mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.

4) you are debugging on supported cross platform architecture as

the dump file. For example, an ARM dump file must be debugged

on an X86 or an ARM machine; an AMD64 dump file must be

debugged on an AMD64 machine.

You can also run the debugger command .cordll to control the debugger's

load of mscordacwks.dll. .cordll -ve -u -l will do a verbose reload.

If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable

path is pointing to clr.dll as well.

0:003> .cordll -ve -u -l

CLRDLL: C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscordacwks.dll:4.0.30319.18444 f:8

doesn't match desired version 4.0.30319.34011 f:8

CLRDLL: Unable to find mscordacwks_x86_x86_4.0.30319.34011.dll by mscorwks search

CLRDLL: Unable to find 'mscordacwks_x86_x86_4.0.30319.34011.dll' on the path

CLRDLL: Unable to get version info for 'c:\my\sym\cl\clr.dll\52968A96698000\mscordacwks_x86_x86_4.0.30319.34011.dll', Win32 error 0n87

Cannot Automatically load SOS

CLRDLL: ERROR: Unable to load DLL mscordacwks_x86_x86_4.0.30319.34011.dll, Win32 error 0n87

CLR DLL status: ERROR: Unable to load DLL mscordacwks_x86_x86_4.0.30319.34011.dll, Win32 error 0n87

0:003> .chain

Extension DLL search Path:

...

Extension DLL chain:

C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos: image 4.0.30319.18444, API 1.0.0, built Wed Oct 30 14:40:34 2013

[path: C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos.dll]

pde.dll: image 9, 4, 0, 0, API 9.4.0, built Thu May 08 20:03:58 2014

[path: c:\debuggers_x86\winext\pde.dll]

dbghelp: image 6.3.9600.16384, API 6.3.6, built Wed Aug 21 20:59:03 2013

[path: c:\debuggers_x86\dbghelp.dll]

ext: image 6.3.9600.16384, API 1.0.0, built Wed Aug 21 21:11:11 2013

[path: c:\debuggers_x86\winext\ext.dll]

exts: image 6.3.9600.16384, API 1.0.0, built Wed Aug 21 21:04:14 2013

[path: c:\debuggers_x86\WINXP\exts.dll]

uext: image 6.3.9600.16384, API 1.0.0, built Wed Aug 21 21:04:09 2013

[path: c:\debuggers_x86\winext\uext.dll]

ntsdexts: image 6.3.9600.16384, API 1.0.0, built Wed Aug 21 21:04:34 2013

[path: c:\debuggers_x86\WINXP\ntsdexts.dll]

Summary

As discussed in the previous article, the asynchronous and projected nature of Windows Store applications makes them significantly harder to debug than desktop applications. Stowed Exceptions v2 helps definitively determine the error code and call stack of the exception that caused the crash.

Solutions to some of the more common issues have been talked about on episodes of Channel 9 Defrag Tools, and also in Avoiding Windows Store App Failures talk at //build/ 2014 and the Hardcore Debugging talk at TechEd 2014.

If you have any questions, please feel free to email us at DefragTools@microsoft.com, we’ll be happy to help you.

↧

Understanding ARM Assembly Part 3

May 29, 2014, 2:56 pm

≫ Next: Bugchecking a Computer on A Usermode Application Crash

≪ Previous: Debugging a Windows 8.1 Store App Crash Dump (Part 2)

My name is Marion Cole, and I am a Sr. Escalation Engineer in Microsoft Platforms Serviceability group. This is Part 3 of my series of articles about ARM assembly. In part 1 we talked about the processor that is supported. In part 2 we talked about how Windows utilizes that ARM processor. In this part we will cover Calling Conventions, Prolog/Epilog, and Rebuilding the stack.

Calling Conventions

In ARM there is only one calling convention. The calling convention for ARM is simple. The first four 32 bit or smaller variables are passed in R0-R3. The remaining values go onto the stack. If any of the first four variables are 8 or 16 bit in size then they will be padded with zeros to fill the 32-bit register. If any of the first four variables are 64 bit in size then they have to be 64 bit aligned. That means that the variable will be split across an even/odd register pair. Example is R0/R1 or R2/R3. Here is an example:

Registers Stack

Foo (int I0, int I1, int I2, int I3)

Registers Stack

Foo (int I0, double D, int I1)

Registers Stack

unused

Foo (int I0, int I1, double D)

Registers Stack

In the first example the function Foo takes four integer values. All of these are passed in the registers R0 - R3. This one is pretty simple.

In the second example the function Foo takes an integer, a double, and another integer. The first integer is put into R0. However note that the double has to be in an even/odd pair and therefore R1 is unused, and the double gets put into R2/R3. The last integer is pushed onto the stack. This leaves R1 unused. Programmers are suggested to not use this type. Instead organize your variables to where they will fit like in the third example. Also in this example the stack has to be word aligned, so there will be an additional unused word pushed and popped in order to keep the alignment. Also note that on ARM that a Byte is 8 bits, a Halfword is 16 bits, and a Word is 32 bits.

In the third example the function Foo takes two integers and a double. As you can see the first two variables are integers and they go in R0 and R1 respectively. The last variable the double will then be aligned to go into R2/R3.

The registers R4-R11 are used to hold the values of the local variables of a subroutine. A subroutine is required to preserve on the stack the contents of the registers R4-R8, R10, R11, and SP.

Return values are always in R0 unless they are 64 bits in size then a combination of R0 and R1 is used.

Calling convention for floating point operations are pretty much the same. A function can have up to 16 single-precision values in S0-S15, or 8 double-precision values in D0-D7, or 4 SIMD vectors in Q0-Q3. Example if you have a function that takes the following combination:

Float, double, double, float

They will go into S0, D1, D2, S1 respectively. These are aggressively back-filled.

Floating point return values are in S0/D0/Q0 as appropriate by size.

This means that S16-S31/D8-D31/Q4-Q15 are volatile.

Prolog and Epilog

The Prolog on an ARM processor does the same thing as the x86 processor, it stores registers on the stack and adjusts the frame pointer. Let`s look at a simple example from hal!KfLowerIrql.

Prolog:

push {r3,r4,r11,lr} ; save non-volatiles regs used, r11, lr
addw r11,sp,#8 ; new frame pointer value in r11...

... ; stack used in prolog is multiple of 8

As you can see the push instruction is different than x86. On x86 we would have four push instructions to do the same thing that ARM is doing in one instruction. This stores the registers in consecutive memory locations ending just below the address in SP, and updates SP to point to the start of the stored location. The lowest numbered register is stored in the lowest memory address, through to the highest numbered register to the highest memory address. We can see that here:

1: kd> r

r0=0000000f r1=e1070180 r2=00000000 r3=e0eb3675 r4=e1048cc8 r5=e10651fc

r6=00001000 r7=0000006a r8=c5561d10 r9=0000000f r10=e10acc80 r11=c5561d08

r12=ef890f1c sp=c5561cc8 lr=e1298a0f pc=e0eb3678 psr=400001b3 -Z--- Thumb

hal!KfLowerIrql+0x4:

1: kd> dds c5561cc8 c5561d08

c5561cc8 e0eb3675 <-- r3

c5561ccc e1048cc8 <-- r4

c5561cd0 c5561d08 <-- r11

c5561cd4 e1298a0f <-- lr

The addw instruction is setting up the new frame pointer. This will add 8 to the value in sp, and store that in r11 which is the frame pointer. Here is what that looks like in the debugger:

kd> r

r0=0000000f r1=00000002 r2=00000002 r3=e133b675 r4=77e31f15 r5=02cc9ad5

r6=00000000 r7=e1035580 r8=0000000f r9=00000000 r10=e22cb710 r11=e22cb5b8

r12=26ebcf96 sp=e22cb5b0 lr=e0f2560b pc=e133b67c psr=400000b3 -Z--- Thumb

hal!KfLowerIrql+0x8:

As you can see r11 is now 8 higher than sp.

Now let`s look at the Epilog for hal!KfLowerIrql. It is pretty simple as it is one command.

Epilog:

pop {r3,r4,r11,pc} ; restore non-volatile regs, r11, return

This is going to pop the first three registers from the stack back into their original registers. However the last one is poping what was the link register (lr) into the program counter (pc). This acts as a return, performing a similar function as what the RET instruction does on x86 but without using a unique instruction. Program flow is controlled by manipulating the pc register. Here is what this looks like in the debugger.

The registers before the pop instruction runs:

kd> r

r0=0000000f r1=00000006 r2=00000000 r3=e1035000 r4=0000000f r5=306f0a07

r6=00000000 r7=e1035580 r8=0000000f r9=00000000 r10=e22c9260 r11=e22c9108

r12=26ebaae6 sp=e22c9100 lr=e0f2560b pc=e133b6b4 psr=200000b3 --C-- Thumb

hal!KfLowerIrql+0x40:

e133b6b4 e8bd8818 pop {r3,r4,r11,pc}

The registers after the pop instruction runs:

kd> r

r0=0000000f r1=00000006 r2=00000000 r3=e133b675 r4=51cae4a2 r5=2aede545

r6=00000000 r7=e1035580 r8=0000000f r9=00000000 r10=e22c8d20 r11=e22c8c10

r12=26eba5a6 sp=e22c8bd0 lr=e0f2560b pc=e0f2560a psr=200000b3 --C—Thumb

Now we are going to complicate this a bit by showing a function that has local variables, NtCreateFile.

Prolog:

push {r4,r5,r11,lr} ; save non-volatiles regs used, r11, lr

addw r11,sp,#8 ; new frame pointer value in r11
sub sp,sp,#0x30 ; local variables

... ; stack used in prolog is multiple of 8

Notice that this looks the same as the previous prolog, but one line is added. The sub sp,sp,#0x30 is used to make stack space available for local variables. This adds one instruction to the Epilog as well.

Epilog :

add sp,sp,#0x30 ; cleanup local variables
pop {r4,r5,r11,pc} ; restore non-volatile regs, r11, return

The add sp,sp,#0x30 is used to clean up the stack of the local variables.

One more prolog/epilog example. This one is of IopCreateFile. It saves the arguments that come in to the stack first.

Prolog :

push        {r0-r3}           ; save r0-r3
push        {r4-r11,lr}       ; save non-volatiles r4-r10, r11, lr
addw       r11,sp,#0x1c       ; new frame pointer value in r11
sub          sp,sp,#0x3c      ; local variables

... ; stack used in prolog is multiple of 8

As you can see this prolog is mostly the same, there is just one additional line for pushing the r0-r3 argument registers to the stack.

The epilog for this one is a little different.

Epilog:

add         sp,sp,#0x4c        ; cleanup local variables from stack
pop         {r4-r11}           ; restore non-volatiles, frame pointer r11
ldr          pc,[sp],#0x14     ; return and cleanup 0x14 bytes (lr,r0-r3)

Notice that the pop is not putting lr into pc for a return. Instead the last statement is taking care of the pc register. This instruction is calculating the pc address by adding 14 to the value in sp, and putting that into pc. This cleans up the arguments and lr from the stack at the same time. This ldr instruction is similar to the ret instruction on x86.

The last thing we are going to cover is called a "Leaf function". A Leaf function executes in the context of the caller. It does not have a prolog and does not use the stack. It only uses volatile registers r0-r3, and r12. It returns via the "bx lr" command. Example of this is KeGetCurrentIrql. Here is what it looks like in the debugger.

kd> uf hal!KeGetCurrentIrql

hal!KeGetCurrentIrql 211 e132b650 f3ef8300 mrs r3,cpsr

216 e132b654 f0130f80 tst r3,#0x80

216 e132b658 d103 bne hal!KeGetCurrentIrql+0x12 (e132b662)

hal!KeGetCurrentIrql+0xa

216 e132b65a b672 cpsid i

216 e132b65c 0000 movs r0,r0

216 e132b65e 2201 movs r2,#1

216 e132b660 e000 b hal!KeGetCurrentIrql+0x14 (e132b664)

hal!KeGetCurrentIrql+0x12

216 e132b662 2200 movs r2,#0

hal!KeGetCurrentIrql+0x14

217 e132b664 ee1d3f90 mrc p15,#0,r3,c13,c0,#4

217 e132b668 7f18 ldrb r0,[r3,#0x1C]

218 e132b66a b10a cbz r2,hal!KeGetCurrentIrql+0x20 (e132b670)

hal!KeGetCurrentIrql+0x1c

218 e132b66c b662 cpsie i

218 e132b66e 0000 movs r0,r0

hal!KeGetCurrentIrql+0x20

220 e132b670 4770 bx lr

The stack must remain 4 byte aligned at all times, and must be 8 byte aligned in any function boundary. This is due to the frequent use of interlocked operations on 64-bit stack variables.

Functions which need to use a frame pointer (for example, if alloca is used) or which dynamically change the stack pointer within their body, must set up the frame pointer in the function prologue and leave it unchanged until the epilog. Functions which do not need a frame pointer must perform all stack updating in the prolog and leave the SP unchanged until the epilog.

Rebuilding the Stack

Here we are going to discuss how to rebuild the stack from the frame pointer.

The frame pointer points to the top of the stack area for the current function, or it is zero if not being used. By using the frame pointer and storing it at the same offset for every function call, it creates a singly linked list of activation records.

The frame pointer register points to the stack backtrace structure for the currently executing function.

The saved frame pointer value is (zero or) a pointer to the stack backtrace structure created by the function which called the current function.

The saved frame pointer in this structure is a pointer to the stack backtrace structure for the function that called the function that called the current function; and so on back until the first function.

In the below diagram Main calls Foo which calls Bar

For more information about ARM Debugging check out this article from T.Roy at Code Machine:

http://codemachine.com/article_armasm.html

↧

Bugchecking a Computer on A Usermode Application Crash

June 19, 2014, 2:13 pm

≫ Next: Windows Troubleshooting – Special Pool

≪ Previous: Understanding ARM Assembly Part 3

Hello my name is Gurpreet Singh Jutla and I would like to share information on how we can bugcheck a box on any usermode application crash. Set the application as a critical process when the application crash is reproducible. We may sometimes need a complete memory dump to investigate the information from kernel mode on a usermode application crash or closure.

We will use the operating system’s ability to mark a process as critical and cause the system to bugcheck when the critical process closes unexpectedly. This will generate either a CRITICAL_PROCESS_DIED or a CRITICAL_OBJECT_TERMINATION bugcheck.

For this demonstration I will use the following code sample which waits for the user input and then causes an Access Violation. You can use the following steps to collect a complete memory dump for any application crash that launches fine but crashes under known repro conditions.

Code Sample

#include<conio.h>
void main()
{
_getch(); //Wait for a key press
*(char*)0xdeaddead ='B'; //Causes the Access Violation
}

Please follow the steps below

Set the system for a complete memory dump by opening the “Advanced System settings” under System properties in control panel and then setting the value of “Write debugging information” under “Startup and recovery” options on the advanced tab.
Also enable the debug mode by running the following command from a command prompt
bcdedit -debug on
To enable the “Complete memory dump” and debug mode you need to restart the box to ensure the changes are implemented.
Run the application you want to setup as critical process but do not run the repro steps. I have compiled my test application as test.exe
Download and install the Debugging Tools for Windows, part of SDK which you can download from http://msdn.microsoft.com/en-us/windows/desktop/bg162891.aspx. Note, when the installer launches you can uncheck every feature except Debugging Tools for Windows.
We need to setup the debugger to use the public symbols. Create a folder c:\symbols. Run Windbg with admin privileges, choose “File” menu and then “Symbol file path”. Type SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
For more details check http://support.microsoft.com/kb/311503/en-us
Assuming you have the debugger installed and setup with the public symbols, launch the debugger with admin privileges.
From the file menu select kernel debug and then choose the “Local” tab and hit Ok button. This will connect the windbg to the local kernel. You should see an “lkd>” prompt.
Run the following command to get the process information in windbg. The below example uses both x64 and x86 architectures

x64
0: kd> !process 0 0 test.exe

PROCESS fffffa82fa924b30

SessionId: 0 Cid: 036c Peb: 7fffffda000 ParentCid: 02e4

DirBase: 1085d76000 ObjectTable: fffff8a0042d7970 HandleCount: 11.

Image: test.exe

x86
0: kd> !process 0 0 test.exe

PROCESS 89038a08 SessionId: 0 Cid: 10f0 Peb: 7ffde000 ParentCid: 0f10

DirBase: bfa19900 ObjectTable: e669b630 HandleCount: 11.

Image: test.exe

Take the process id from the output and run the following command. The following command shows the process flags. The output shows the flags as 144d0841 in the example for x64 and 0x44082d for x86.

x64
0: kd> dt nt!_eprocess fffffa82fa924b30 flags

+0x440 Flags : 0x144d0801

x86
0: kd> dt 89038a08 nt!_eprocess flags

+0x240 Flags : 0x450801

Run the ed command to edit the memory and set the process flags to mark the process critical. Adding the value 0x2000 marks the process critical.

x64
0: kd> ed fffffa82fa924b30+0x440 0x144d0801+0x2000

x86
0: kd> ed 89038a08+0x228 0x450801+0x2000

Now close the debugger and proceed with the repro steps to crash or close the application.
In our case the test application with the code mentioned above should cause the machine to bugcheck as soon as any key is pressed.

The complete memory dump will contain the process information as well as kernel data for investigation.

↧