Regardless of the reason for a system crash, the function that actually performs the crash is KeBugCheckEx, documented in the Windows Driver Kit (WDK). This function takes a stop code (sometimes called a bugcheck code) and four parameters that are interpreted on a per–stop code basis. After KeBugCheckEx masks out all interrupts on all processors of the system, it switches the display into a low-resolution VGA graphics mode (one implemented by all Windows-supported video cards), paints a blue background, and then displays the stop code, followed by some text suggesting what the user can do. Finally, KeBugCheckEx calls any registered device driver bugcheck callbacks (registered by calling the KeRegisterBugCheckCallback function), allowing drivers an opportunity to stop their devices. It then calls registered reason callbacks (registered with KeRegisterBugCheckReasonCallback), which allow drivers to append data to the crash dump or write crash dump information to alternate devices.
The first line in the Technical information section in the sample Windows blue screen shown in Figure 14-1 lists the stop code and the four additional parameters passed to KeBugCheckEx. A text line near the top of the screen provides the text equivalent of the stop code’s numeric identifier. According to the example in Figure 14-1, the stop code 0x000000D1 is a DRIVER_IRQL_NOT_LESS_OR_EQUAL crash. When a parameter contains an address of a piece of operating system or device driver code (as in Figure 14-1), Windows displays the base address of the module the address falls in, the date stamp, and the file name of the device driver. This information alone might help you pinpoint the faulty component.
Although there are more than 300 unique stop codes, most are rarely, if ever, seen on production systems. Instead, just a few common stop codes represent the majority of Windows system crashes. Also, the meaning of the four additional parameters depends on the stop code (and not all stop codes have extended parameter information). Nevertheless, looking up the stop code and the meaning of the parameters (if applicable) might at least assist you in diagnosing the component that is failing (or the hardware device that is causing the crash).
You can find stop code information in the section “Bug Checks (Blue Screens)” in the Debugging Tools for Windows help file. (For information on the Debugging Tools for Windows, see Chapter 1, “Concepts and Tools,” in Part 1.) You can also search Microsoft’s Knowledge Base (http://support.microsoft.com) for the stop code and the name of the suspect hardware or driver. You might find information about a workaround, an update, or a service pack that fixes the problem you’re having. The Bugcodes.h file in the WDK contains a complete list of the 300 or so stop codes, with some additional details on the reasons for some of them. Last but not least, these stop codes are listed and documented at http://msdn.microsoft.com/en-us/library/windows/hardware/hh406232(v=vs.85).aspx.
Based on data collected from the release of Windows 7 through the release of Windows 7 SP1, the top 20 stop codes account for 91 percent of crashes and can be grouped into the following categories:
Page fault A page fault on memory backed by data in a paging file or a memory-mapped file occurs at an IRQL of DPC/dispatch level or above, which would require the memory manager to have to wait for an I/O operation to occur. The kernel cannot wait or reschedule threads at an IRQL of DPC/dispatch level or higher. (See Chapter 3, “System Mechanisms,” in Part 1 for details on IRQLs.) The common stop codes are:
Power management A device driver or an operating system function running in kernel mode is in an inconsistent or invalid power state. Most frequently, some component has failed to complete a power management I/O request operation within the default period of 10 minutes. The common stop code is:
Exceptions and traps A device driver or an operating system function running in kernel mode incurs an unexpected exception or trap. The common stop codes are:
Access violations A device driver or an operating system function running in kernel mode incurs a memory access violation, which is caused either by attempting to write to a read-only page or by attempting to read an address that isn’t currently mapped and therefore is not a valid memory location. The common stop codes are:
Display The display device driver detects that it can no longer control the graphics processing unit. This indicates that an attempt to reset the display driver failed. The common stop code is:
Pool The kernel pool manager detects a corrupt pool header or an improper pool reference. The common stop codes are:
Memory management The kernel memory manager detects a corruption of memory management data structures or an improper memory management request. The common stop codes are:
Hardware A hardware error, such as a machine check or a nonmaskable interrupt (NMI), occurs. This category also includes disk failures when the memory manager is attempting to read data to satisfy page faults. The common stop codes are:
USB An unrecoverable error occurs in a universal serial bus operation. The common stop code is:
Critical object A fatal error occurs in a critical object without which Windows cannot continue to run. The common stop code is:
NTFS file system A fatal error is detected by the NTFS file system. The common stop code is:
Figure 14-2 shows the distribution of these categories for Windows 7 and Windows 7 SP1 in May 2012: