diff options
Diffstat (limited to 'zh_TW.UTF-8/books/developers-handbook/kerneldebug/chapter.xml')
-rw-r--r-- | zh_TW.UTF-8/books/developers-handbook/kerneldebug/chapter.xml | 848 |
1 files changed, 848 insertions, 0 deletions
diff --git a/zh_TW.UTF-8/books/developers-handbook/kerneldebug/chapter.xml b/zh_TW.UTF-8/books/developers-handbook/kerneldebug/chapter.xml new file mode 100644 index 0000000000..a2cd16fa99 --- /dev/null +++ b/zh_TW.UTF-8/books/developers-handbook/kerneldebug/chapter.xml @@ -0,0 +1,848 @@ +<?xml version="1.0" encoding="utf-8"?> +<!-- + The FreeBSD Documentation Project + + $FreeBSD$ +--> +<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:id="kerneldebug"> + <info><title>Kernel Debugging</title> + <authorgroup> + <author><personname><firstname>Paul</firstname><surname>Richards</surname></personname><contrib>Contributed by </contrib></author> + <author><personname><firstname>Jörg</firstname><surname>Wunsch</surname></personname></author> + </authorgroup> + </info> + + + + <sect1 xml:id="kerneldebug-obtain"> + <title>Obtaining a Kernel Crash Dump</title> + + <para>When running a development kernel (eg: &os.current;), such as a + kernel under extreme conditions (eg: very high load averages, + tens of thousands of connections, exceedingly high number of + concurrent users, hundreds of &man.jail.8;s, etc.), or using a + new feature or device driver on &os.stable; (eg: + <acronym>PAE</acronym>), sometimes a kernel will panic. In the + event that it does, this chapter will demonstrate how to extract + useful information out of a crash.</para> + + <para>A system reboot is inevitable once a kernel panics. Once a + system is rebooted, the contents of a system's physical memory + (<acronym>RAM</acronym>) is lost, as well as any bits that are + on the swap device before the panic. To preserve the bits in + physical memory, the kernel makes use of the swap device as a + temporary place to store the bits that are in RAM across a + reboot after a crash. In doing this, when &os; boots after a + crash, a kernel image can now be extracted and debugging can + take place.</para> + + <note><para>A swap device that has been configured as a dump + device still acts as a swap device. Dumps to non-swap devices + (such as tapes or CDRWs, for example) are not supported at this time. A + <quote>swap device</quote> is synonymous with a <quote>swap + partition.</quote></para></note> + + <para>To be able to extract a usable core, it is required that at + least one swap partition be large enough to hold all of the bits + in physical memory. When a kernel panics, before the system + reboots, the kernel is smart enough to check to see if a swap + device has been configured as a dump device. If there is a + valid dump device, the kernel dumps the contents of what is in + physical memory to the swap device.</para> + + <sect2 xml:id="config-dumpdev"> + <title>Configuring the Dump Device</title> + + <para>Before the kernel will dump the contents of its physical + memory to a dump device, a dump device must be configured. A + dump device is specified by using the &man.dumpon.8; command + to tell the kernel where to save kernel crash dumps. The + &man.dumpon.8; program must be called after the swap partition + has been configured with &man.swapon.8;. This is normally + handled by setting the <varname>dumpdev</varname> variable in + &man.rc.conf.5; to the path of the swap device (the + recommended way to extract a kernel dump).</para> + + <para>Alternatively, the dump device can be hard-coded via the + <literal>dump</literal> clause in the &man.config.5; line of + a kernel configuration file. This approach is deprecated and should + be used only if a kernel is crashing before &man.dumpon.8; can be executed.</para> + + <tip><para>Check <filename>/etc/fstab</filename> or + &man.swapinfo.8; for a list of swap devices.</para></tip> + + <important><para>Make sure the <varname>dumpdir</varname> + specified in &man.rc.conf.5; exists before a kernel + crash!</para> + + <screen>&prompt.root; <userinput>mkdir /var/crash</userinput> +&prompt.root; <userinput>chmod 700 /var/crash</userinput></screen> + + <para>Also, remember that the contents of + <filename>/var/crash</filename> is sensitive and very likely + contains confidential information such as passwords.</para> + </important> + </sect2> + + <sect2 xml:id="extract-dump"> + <title>Extracting a Kernel Dump</title> + + <para>Once a dump has been written to a dump device, the dump + must be extracted before the swap device is mounted. + To extract a dump + from a dump device, use the &man.savecore.8; program. If + <varname>dumpdev</varname> has been set in &man.rc.conf.5;, + &man.savecore.8; will be called automatically on the first + multi-user boot after the crash and before the swap device + is mounted. The location of the extracted core is placed in + the &man.rc.conf.5; value <varname>dumpdir</varname>, by + default <filename>/var/crash</filename> and will be named + <filename>vmcore.0</filename>.</para> + + <para>In the event that there is already a file called + <filename>vmcore.0</filename> in + <filename>/var/crash</filename> (or whatever + <varname>dumpdir</varname> is set to), the kernel will + increment the trailing number for every crash to avoid + overwriting an existing <filename>vmcore</filename> (eg: + <filename>vmcore.1</filename>). While debugging, it is + highly likely that you will want to use the highest version + <filename>vmcore</filename> in + <filename>/var/crash</filename> when searching for the right + <filename>vmcore</filename>.</para> + + <tip> + <para>If you are testing a new kernel but need to boot a different one in + order to get your system up and running again, boot it only into single + user mode using the <option>-s</option> flag at the boot prompt, and + then perform the following steps:</para> + + <screen>&prompt.root; <userinput>fsck -p</userinput> +&prompt.root; <userinput>mount -a -t ufs</userinput> # make sure /var/crash is writable +&prompt.root; <userinput>savecore /var/crash /dev/ad0s1b</userinput> +&prompt.root; <userinput>exit</userinput> # exit to multi-user</screen> + + <para>This instructs &man.savecore.8; to extract a kernel dump + from <filename>/dev/ad0s1b</filename> and place the contents in + <filename>/var/crash</filename>. Do not forget to make sure the + destination directory <filename>/var/crash</filename> has enough + space for the dump. Also, do not forget to specify the correct path to your swap + device as it is likely different than + <filename>/dev/ad0s1b</filename>!</para></tip> + + <para>The recommended, and certainly the easiest way to automate + obtaining crash dumps is to use the <varname>dumpdev</varname> + variable in &man.rc.conf.5;.</para> + </sect2> + </sect1> + + <sect1 xml:id="kerneldebug-gdb"> + <title>Debugging a Kernel Crash Dump with <command>kgdb</command></title> + + <note> + <para>This section covers &man.kgdb.1; as found in &os; 5.3 + and later. In previous versions, one must use + <command>gdb -k</command> to read a core dump file.</para> + </note> + + <para>Once a dump has been obtained, getting useful information + out of the dump is relatively easy for simple problems. Before + launching into the internals of &man.kgdb.1; to debug + the crash dump, locate the debug version of your kernel + (normally called <filename>kernel.debug</filename>) and the path + to the source files used to build your kernel (normally + <filename>/usr/obj/usr/src/sys/KERNCONF</filename>, + where <filename>KERNCONF</filename> + is the <varname>ident</varname> specified in a kernel + &man.config.5;). With those two pieces of info, let the + debugging commence!</para> + + <para>To enter into the debugger and begin getting information + from the dump, the following steps are required at a minimum:</para> + + <screen>&prompt.root; <userinput>cd /usr/obj/usr/src/sys/KERNCONF</userinput> +&prompt.root; <userinput>kgdb kernel.debug /var/crash/vmcore.0</userinput></screen> + + <para>You can debug the crash dump using the kernel sources just like + you can for any other program.</para> + + <para>This first dump is from a 5.2-BETA kernel and the crash + comes from deep within the kernel. The output below has been + modified to include line numbers on the left. This first trace + inspects the instruction pointer and obtains a back trace. The + address that is used on line 41 for the <command>list</command> + command is the instruction pointer and can be found on line + 17. Most developers will request having at least this + information sent to them if you are unable to debug the problem + yourself. If, however, you do solve the problem, make sure that + your patch winds its way into the source tree via a problem + report, mailing lists, or by being able to commit it!</para> + + <screen> 1:&prompt.root; <userinput>cd /usr/obj/usr/src/sys/KERNCONF</userinput> + 2:&prompt.root; <userinput>kgdb kernel.debug /var/crash/vmcore.0</userinput> + 3:GNU gdb 5.2.1 (FreeBSD) + 4:Copyright 2002 Free Software Foundation, Inc. + 5:GDB is free software, covered by the GNU General Public License, and you are + 6:welcome to change it and/or distribute copies of it under certain conditions. + 7:Type "show copying" to see the conditions. + 8:There is absolutely no warranty for GDB. Type "show warranty" for details. + 9:This GDB was configured as "i386-undermydesk-freebsd"... +10:panic: page fault +11:panic messages: +12:--- +13:Fatal trap 12: page fault while in kernel mode +14:cpuid = 0; apic id = 00 +15:fault virtual address = 0x300 +16:fault code: = supervisor read, page not present +17:instruction pointer = 0x8:0xc0713860 +18:stack pointer = 0x10:0xdc1d0b70 +19:frame pointer = 0x10:0xdc1d0b7c +20:code segment = base 0x0, limit 0xfffff, type 0x1b +21: = DPL 0, pres 1, def32 1, gran 1 +22:processor eflags = resume, IOPL = 0 +23:current process = 14394 (uname) +24:trap number = 12 +25:panic: page fault +26 cpuid = 0; +27:Stack backtrace: +28 +29:syncing disks, buffers remaining... 2199 2199 panic: mi_switch: switch in a critical section +30:cpuid = 0; +31:Uptime: 2h43m19s +32:Dumping 255 MB +33: 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 +34:--- +35:Reading symbols from /boot/kernel/snd_maestro3.ko...done. +36:Loaded symbols for /boot/kernel/snd_maestro3.ko +37:Reading symbols from /boot/kernel/snd_pcm.ko...done. +38:Loaded symbols for /boot/kernel/snd_pcm.ko +39:#0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 +40:240 dumping++; +41:<prompt>(kgdb)</prompt> <userinput>list *0xc0713860</userinput> +42:0xc0713860 is in lapic_ipi_wait (/usr/src/sys/i386/i386/local_apic.c:663). +43:658 incr = 0; +44:659 delay = 1; +45:660 } else +46:661 incr = 1; +47:662 for (x = 0; x < delay; x += incr) { +48:663 if ((lapic->icr_lo & APIC_DELSTAT_MASK) == APIC_DELSTAT_IDLE) +49:664 return (1); +50:665 ia32_pause(); +51:666 } +52:667 return (0); +53:<prompt>(kgdb)</prompt> <userinput>backtrace</userinput> +54:#0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 +55:#1 0xc055fd9b in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372 +56:#2 0xc056019d in panic () at /usr/src/sys/kern/kern_shutdown.c:550 +57:#3 0xc0567ef5 in mi_switch () at /usr/src/sys/kern/kern_synch.c:470 +58:#4 0xc055fa87 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:312 +59:#5 0xc056019d in panic () at /usr/src/sys/kern/kern_shutdown.c:550 +60:#6 0xc0720c66 in trap_fatal (frame=0xdc1d0b30, eva=0) +61: at /usr/src/sys/i386/i386/trap.c:821 +62:#7 0xc07202b3 in trap (frame= +63: {tf_fs = -1065484264, tf_es = -1065484272, tf_ds = -1065484272, tf_edi = 1, tf_esi = 0, tf_ebp = -602076292, tf_isp = -602076324, tf_ebx = 0, tf_edx = 0, tf_ecx = 1000000, tf_eax = 243, tf_trapno = 12, tf_err = 0, tf_eip = -1066321824, tf_cs = 8, tf_eflags = 65671, tf_esp = 243, tf_ss = 0}) +64: at /usr/src/sys/i386/i386/trap.c:250 +65:#8 0xc070c9f8 in calltrap () at {standard input}:94 +66:#9 0xc07139f3 in lapic_ipi_vectored (vector=0, dest=0) +67: at /usr/src/sys/i386/i386/local_apic.c:733 +68:#10 0xc0718b23 in ipi_selected (cpus=1, ipi=1) +69: at /usr/src/sys/i386/i386/mp_machdep.c:1115 +70:#11 0xc057473e in kseq_notify (ke=0xcc05e360, cpu=0) +71: at /usr/src/sys/kern/sched_ule.c:520 +72:#12 0xc0575cad in sched_add (td=0xcbcf5c80) +73: at /usr/src/sys/kern/sched_ule.c:1366 +74:#13 0xc05666c6 in setrunqueue (td=0xcc05e360) +75: at /usr/src/sys/kern/kern_switch.c:422 +76:#14 0xc05752f4 in sched_wakeup (td=0xcbcf5c80) +77: at /usr/src/sys/kern/sched_ule.c:999 +78:#15 0xc056816c in setrunnable (td=0xcbcf5c80) +79: at /usr/src/sys/kern/kern_synch.c:570 +80:#16 0xc0567d53 in wakeup (ident=0xcbcf5c80) +81: at /usr/src/sys/kern/kern_synch.c:411 +82:#17 0xc05490a8 in exit1 (td=0xcbcf5b40, rv=0) +83: at /usr/src/sys/kern/kern_exit.c:509 +84:#18 0xc0548011 in sys_exit () at /usr/src/sys/kern/kern_exit.c:102 +85:#19 0xc0720fd0 in syscall (frame= +86: {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1, tf_ebp = -1077940712, tf_isp = -602075788, tf_ebx = 672411944, tf_edx = 10, tf_ecx = 672411600, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 671899563, tf_cs = 31, tf_eflags = 642, tf_esp = -1077940740, tf_ss = 47}) +87: at /usr/src/sys/i386/i386/trap.c:1010 +88:#20 0xc070ca4d in Xint0x80_syscall () at {standard input}:136 +89:---Can't read userspace from dump, or kernel process--- +90:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen> + + + <para>This next trace is an older dump from the FreeBSD 2 time + frame, but is more involved and demonstrates more of the + features of <command>gdb</command>. Long lines have been folded + to improve readability, and the lines are numbered for + reference. Despite this, it is a real-world error trace taken + during the development of the pcvt console driver.</para> + +<screen> 1:Script started on Fri Dec 30 23:15:22 1994 + 2:&prompt.root; <userinput>cd /sys/compile/URIAH</userinput> + 3:&prompt.root; <userinput>gdb -k kernel /var/crash/vmcore.1</userinput> + 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel +...done. + 5:IdlePTD 1f3000 + 6:panic: because you said to! + 7:current pcb at 1e3f70 + 8:Reading in symbols for ../../i386/i386/machdep.c...done. + 9:<prompt>(kgdb)</prompt> <userinput>backtrace</userinput> +10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767) +11:#1 0xf0115159 in panic () +12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698) +13:#3 0xf010185e in db_fncall () +14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073) +15:#5 0xf0101711 in db_command_loop () +16:#6 0xf01040a0 in db_trap () +17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723) +18:#8 0xf019d2eb in trap_fatal (...) +19:#9 0xf019ce60 in trap_pfault (...) +20:#10 0xf019cb2f in trap (...) +21:#11 0xf01932a1 in exception:calltrap () +22:#12 0xf0191503 in cnopen (...) +23:#13 0xf0132c34 in spec_open () +24:#14 0xf012d014 in vn_open () +25:#15 0xf012a183 in open () +26:#16 0xf019d4eb in syscall (...) +27:<prompt>(kgdb)</prompt> <userinput>up 10</userinput> +28:Reading in symbols for ../../i386/i386/trap.c...done. +29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\ +30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\ +31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\ +32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\ +33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\ +34:ss = -266427884}) (../../i386/i386/trap.c line 283) +35:283 (void) trap_pfault(&frame, FALSE); +36:<prompt>(kgdb)</prompt> <userinput>frame frame->tf_ebp frame->tf_eip</userinput> +37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done. +38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\ +39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403) +40:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); +41:<prompt>(kgdb)</prompt> <userinput>list</userinput> +42:398 +43:399 tp->t_state |= TS_CARR_ON; +44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */ +45:401 +46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200) +47:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); +48:404 #else +49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag)); +50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */ +51:407 } +52:<prompt>(kgdb)</prompt> <userinput>print tp</userinput> +53:Reading in symbols for ../../i386/i386/cons.c...done. +54:$1 = (struct tty *) 0x1bae +55:<prompt>(kgdb)</prompt> <userinput>print tp->t_line</userinput> +56:$2 = 1767990816 +57:<prompt>(kgdb)</prompt> <userinput>up</userinput> +58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\ +59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126) +60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p)); +61:<prompt>(kgdb)</prompt> <userinput>up</userinput> +62:#2 0xf0132c34 in spec_open () +63:<prompt>(kgdb)</prompt> <userinput>up</userinput> +64:#3 0xf012d014 in vn_open () +65:<prompt>(kgdb)</prompt> <userinput>up</userinput> +66:#4 0xf012a183 in open () +67:<prompt>(kgdb)</prompt> <userinput>up</userinput> +68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\ +69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\ +70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \ +71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \ +72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673) +73:673 error = (*callp->sy_call)(p, args, rval); +74:<prompt>(kgdb)</prompt> <userinput>up</userinput> +75:Initial frame selected; you cannot go up. +76:<prompt>(kgdb)</prompt> <userinput>quit</userinput></screen> + <para>Comments to the above script:</para> + + <variablelist> + <varlistentry> + <term>line 6:</term> + + <listitem> + <para>This is a dump taken from within DDB (see below), hence the + panic comment <quote>because you said to!</quote>, and a rather + long stack trace; the initial reason for going into DDB has been a + page fault trap though.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>line 20:</term> + + <listitem> + <para>This is the location of function <function>trap()</function> + in the stack trace.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>line 36:</term> + + <listitem> + <para>Force usage of a new stack frame; this is no longer necessary. + The stack frames are supposed to point to the right + locations now, even in case of a trap. + From looking at the code in source line 403, there is a + high probability that either the pointer access for + <quote>tp</quote> was messed up, or the array access was out of + bounds.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>line 52:</term> + + <listitem> + <para>The pointer looks suspicious, but happens to be a valid + address.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>line 56:</term> + + <listitem> + <para>However, it obviously points to garbage, so we have found our + error! (For those unfamiliar with that particular piece of code: + <literal>tp->t_line</literal> refers to the line discipline of + the console device here, which must be a rather small integer + number.)</para> + </listitem> + </varlistentry> + </variablelist> + + <tip><para>If your system is crashing regularly and you are running + out of disk space, deleting old <filename>vmcore</filename> + files in <filename>/var/crash</filename> could save a + considerable amount of disk space!</para></tip> + </sect1> + + <sect1 xml:id="kerneldebug-ddd"> + <title>Debugging a Crash Dump with DDD</title> + + <para>Examining a kernel crash dump with a graphical debugger like + <command>ddd</command> is also possible (you will need to install + the <package>devel/ddd</package> port in order to use the + <command>ddd</command> debugger). Add the <option>-k</option> + option to the <command>ddd</command> command line you would use + normally. For example;</para> + + <screen>&prompt.root; <userinput>ddd -k /var/crash/kernel.0 /var/crash/vmcore.0</userinput></screen> + + <para>You should then be able to go about looking at the crash dump using + <command>ddd</command>'s graphical interface.</para> + </sect1> + + <sect1 xml:id="kerneldebug-post-mortem"> + <title>Post-Mortem Analysis of a Dump</title> + + <para>What do you do if a kernel dumped core but you did not expect it, + and it is therefore not compiled using <command>config -g</command>? Not + everything is lost here. Do not panic!</para> + + <para>Of course, you still need to enable crash dumps. See above for the + options you have to specify in order to do this.</para> + + <para>Go to your kernel config directory + (<filename>/usr/src/sys/arch/conf</filename>) + and edit your configuration file. Uncomment (or add, if it does not + exist) the following line:</para> + + <programlisting>makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols</programlisting> + + <para>Rebuild the kernel. Due to the time stamp change on the Makefile, + some other object files will be rebuilt, for example + <filename>trap.o</filename>. With a bit of luck, the added + <option>-g</option> option will not change anything for the generated + code, so you will finally get a new kernel with similar code to the + faulting one but with some debugging symbols. You should at least verify the + old and new sizes with the &man.size.1; command. If there is a + mismatch, you probably need to give up here.</para> + + <para>Go and examine the dump as described above. The debugging symbols + might be incomplete for some places, as can be seen in the stack trace + in the example above where some functions are displayed without line + numbers and argument lists. If you need more debugging symbols, remove + the appropriate object files, recompile the kernel again and repeat the + <command>gdb -k</command> + session until you know enough.</para> + + <para>All this is not guaranteed to work, but it will do it fine in most + cases.</para> + </sect1> + + <sect1 xml:id="kerneldebug-online-ddb"> + <title>On-Line Kernel Debugging Using DDB</title> + + <para>While <command>gdb -k</command> as an off-line debugger provides a very + high level of user interface, there are some things it cannot do. The + most important ones being breakpointing and single-stepping kernel + code.</para> + + <para>If you need to do low-level debugging on your kernel, there is an + on-line debugger available called DDB. It allows setting of + breakpoints, single-stepping kernel functions, examining and changing + kernel variables, etc. However, it cannot access kernel source files, + and only has access to the global and static symbols, not to the full + debug information like <command>gdb</command> does.</para> + + <para>To configure your kernel to include DDB, add the option line + + <programlisting>options DDB</programlisting> + + to your config file, and rebuild. (See <link xlink:href="&url.books.handbook;/index.html">The FreeBSD Handbook</link> for details on + configuring the FreeBSD kernel).</para> + + <note> + <para>If you have an older version of the boot blocks, your + debugger symbols might not be loaded at all. Update the boot blocks; + the recent ones load the DDB symbols automatically.</para> + </note> + + <para>Once your DDB kernel is running, there are several ways to enter + DDB. The first, and earliest way is to type the boot flag + <option>-d</option> right at the boot prompt. The kernel will start up + in debug mode and enter DDB prior to any device probing. Hence you can + even debug the device probe/attach functions.</para> + + <para>The second scenario is to drop to the debugger once the + system has booted. There are two simple ways to accomplish + this. If you would like to break to the debugger from the + command prompt, simply type the command:</para> + + <screen>&prompt.root; <userinput>sysctl debug.enter_debugger=ddb</userinput></screen> + + <para>Alternatively, if you are at the system console, you may use + a hot-key on the keyboard. The default break-to-debugger + sequence is <keycombo action="simul"><keycap>Ctrl</keycap> + <keycap>Alt</keycap><keycap>ESC</keycap></keycombo>. For + syscons, this sequence can be remapped and some of the + distributed maps out there do this, so check to make sure you + know the right sequence to use. There is an option available + for serial consoles that allows the use of a serial line BREAK on the + console line to enter DDB (<literal>options BREAK_TO_DEBUGGER</literal> + in the kernel config file). It is not the default since there are a lot + of serial adapters around that gratuitously generate a BREAK + condition, for example when pulling the cable.</para> + + <para>The third way is that any panic condition will branch to DDB if the + kernel is configured to use it. For this reason, it is not wise to + configure a kernel with DDB for a machine running unattended.</para> + + <para>The DDB commands roughly resemble some <command>gdb</command> + commands. The first thing you probably need to do is to set a + breakpoint:</para> + + <screen><userinput>b function-name</userinput> +<userinput>b address</userinput></screen> + + <para>Numbers are taken hexadecimal by default, but to make them distinct + from symbol names; hexadecimal numbers starting with the letters + <literal>a-f</literal> need to be preceded with <literal>0x</literal> + (this is optional for other numbers). Simple expressions are allowed, + for example: <literal>function-name + 0x103</literal>.</para> + + <para>To continue the operation of an interrupted kernel, simply + type:</para> + + <screen><userinput>c</userinput></screen> + + <para>To get a stack trace, use:</para> + + <screen><userinput>trace</userinput></screen> + + <note> + <para>Note that when entering DDB via a hot-key, the kernel is currently + servicing an interrupt, so the stack trace might be not of much use + to you.</para> + </note> + + <para>If you want to remove a breakpoint, use</para> + + + <screen><userinput>del</userinput> +<userinput>del address-expression</userinput></screen> + + <para>The first form will be accepted immediately after a breakpoint hit, + and deletes the current breakpoint. The second form can remove any + breakpoint, but you need to specify the exact address; this can be + obtained from:</para> + + <screen><userinput>show b</userinput></screen> + + <para>To single-step the kernel, try:</para> + + <screen><userinput>s</userinput></screen> + + <para>This will step into functions, but you can make DDB trace them until + the matching return statement is reached by:</para> + + <screen><userinput>n</userinput></screen> + + <note> + <para>This is different from <command>gdb</command>'s + <command>next</command> statement; it is like <command>gdb</command>'s + <command>finish</command>.</para> + </note> + + <para>To examine data from memory, use (for example): + + <screen><userinput>x/wx 0xf0133fe0,40</userinput> +<userinput>x/hd db_symtab_space</userinput> +<userinput>x/bc termbuf,10</userinput> +<userinput>x/s stringbuf</userinput></screen> + + for word/halfword/byte access, and hexadecimal/decimal/character/ string + display. The number after the comma is the object count. To display + the next 0x10 items, simply use:</para> + + <screen><userinput>x ,10</userinput></screen> + + <para>Similarly, use + + <screen><userinput>x/ia foofunc,10</userinput></screen> + + to disassemble the first 0x10 instructions of + <function>foofunc</function>, and display them along with their offset + from the beginning of <function>foofunc</function>.</para> + + <para>To modify memory, use the write command:</para> + + <screen><userinput>w/b termbuf 0xa 0xb 0</userinput> +<userinput>w/w 0xf0010030 0 0</userinput></screen> + + <para>The command modifier + (<literal>b</literal>/<literal>h</literal>/<literal>w</literal>) + specifies the size of the data to be written, the first following + expression is the address to write to and the remainder is interpreted + as data to write to successive memory locations.</para> + + <para>If you need to know the current registers, use:</para> + + <screen><userinput>show reg</userinput></screen> + + <para>Alternatively, you can display a single register value by e.g. + + <screen><userinput>p $eax</userinput></screen> + + and modify it by:</para> + + <screen><userinput>set $eax new-value</userinput></screen> + + <para>Should you need to call some kernel functions from DDB, simply + say:</para> + + <screen><userinput>call func(arg1, arg2, ...)</userinput></screen> + + <para>The return value will be printed.</para> + + <para>For a &man.ps.1; style summary of all running processes, use:</para> + + <screen><userinput>ps</userinput></screen> + + <para>Now you have examined why your kernel failed, and you wish to + reboot. Remember that, depending on the severity of previous + malfunctioning, not all parts of the kernel might still be working as + expected. Perform one of the following actions to shut down and reboot + your system:</para> + + <screen><userinput>panic</userinput></screen> + + <para>This will cause your kernel to dump core and reboot, so you can + later analyze the core on a higher level with <command>gdb</command>. This command + usually must be followed by another <command>continue</command> + statement.</para> + + <screen><userinput>call boot(0)</userinput></screen> + + <para>Which might be a good way to cleanly shut down the running system, + <function>sync()</function> all disks, and finally reboot. As long as + the disk and filesystem interfaces of the kernel are not damaged, this + might be a good way for an almost clean shutdown.</para> + + <screen><userinput>call cpu_reset()</userinput></screen> + + <para>This is the final way out of disaster and almost the same as hitting the + Big Red Button.</para> + + <para>If you need a short command summary, simply type:</para> + + <screen><userinput>help</userinput></screen> + + <para>However, it is highly recommended to have a printed copy of the + &man.ddb.4; manual page ready for a debugging + session. Remember that it is hard to read the on-line manual while + single-stepping the kernel.</para> + </sect1> + + <sect1 xml:id="kerneldebug-online-gdb"> + <title>On-Line Kernel Debugging Using Remote GDB</title> + + <para>This feature has been supported since FreeBSD 2.2, and it is + actually a very neat one.</para> + + <para>GDB has already supported <emphasis>remote debugging</emphasis> for + a long time. This is done using a very simple protocol along a serial + line. Unlike the other methods described above, you will need two + machines for doing this. One is the host providing the debugging + environment, including all the sources, and a copy of the kernel binary + with all the symbols in it, and the other one is the target machine that + simply runs a similar copy of the very same kernel (but stripped of the + debugging information).</para> + + <para>You should configure the kernel in question with <command>config + -g</command>, include <option>DDB</option> into the configuration, and + compile it as usual. This gives a large binary, due to the + debugging information. Copy this kernel to the target machine, strip + the debugging symbols off with <command>strip -x</command>, and boot it + using the <option>-d</option> boot option. Connect the serial line + of the target machine that has "flags 080" set on its sio device + to any serial line of the debugging host. + Now, on the debugging machine, go to the compile directory of the target + kernel, and start <command>gdb</command>:</para> + + <screen>&prompt.user; <userinput>gdb -k kernel</userinput> +GDB is free software and you are welcome to distribute copies of it + under certain conditions; type "show copying" to see the conditions. +There is absolutely no warranty for GDB; type "show warranty" for details. +GDB 4.16 (i386-unknown-freebsd), +Copyright 1996 Free Software Foundation, Inc... +<prompt>(kgdb)</prompt> </screen> + + <para>Initialize the remote debugging session (assuming the first serial + port is being used) by:</para> + + <screen><prompt>(kgdb)</prompt> <userinput>target remote /dev/cuaa0</userinput></screen> + + <para>Now, on the target host (the one that entered DDB right before even + starting the device probe), type:</para> + + <screen>Debugger("Boot flags requested debugger") +Stopped at Debugger+0x35: movb $0, edata+0x51bc +<prompt>db></prompt> <userinput>gdb</userinput></screen> + + <para>DDB will respond with:</para> + + <screen>Next trap will enter GDB remote protocol mode</screen> + + <para>Every time you type <command>gdb</command>, the mode will be toggled + between remote GDB and local DDB. In order to force a next trap + immediately, simply type <command>s</command> (step). Your hosting GDB + will now gain control over the target kernel:</para> + + <screen>Remote debugging using /dev/cuaa0 +Debugger (msg=0xf01b0383 "Boot flags requested debugger") + at ../../i386/i386/db_interface.c:257 +<prompt>(kgdb)</prompt></screen> + + <para>You can use this session almost as any other GDB session, including + full access to the source, running it in gud-mode inside an Emacs window + (which gives you an automatic source code display in another Emacs + window), etc.</para> + </sect1> + + <sect1 xml:id="kerneldebug-kld"> + <title>Debugging Loadable Modules Using GDB</title> + + <para>When debugging a panic that occurred within a module, or + using remote GDB against a machine that uses dynamic modules, + you need to tell GDB how to obtain symbol information for those + modules.</para> + + <para>First, you need to build the module(s) with debugging + information:</para> + + <screen>&prompt.root; <userinput>cd /sys/modules/linux</userinput> +&prompt.root; <userinput>make clean; make COPTS=-g</userinput></screen> + + <para>If you are using remote GDB, you can run + <command>kldstat</command> on the target machine to find out + where the module was loaded:</para> + + <screen>&prompt.root; <userinput>kldstat</userinput> +Id Refs Address Size Name + 1 4 0xc0100000 1c1678 kernel + 2 1 0xc0a9e000 6000 linprocfs.ko + 3 1 0xc0ad7000 2000 warp_saver.ko + 4 1 0xc0adc000 11000 linux.ko</screen> + + <para>If you are debugging a crash dump, you will need to walk the + <literal>linker_files</literal> list, starting at + <literal>linker_files->tqh_first</literal> and following the + <literal>link.tqe_next</literal> pointers until you find the + entry with the <literal>filename</literal> you are looking for. + The <literal>address</literal> member of that entry is the load + address of the module.</para> + + <para>Next, you need to find out the offset of the text section + within the module:</para> + + <screen>&prompt.root; <userinput>objdump --section-headers /sys/modules/linux/linux.ko | grep text</userinput> + 3 .rel.text 000016e0 000038e0 000038e0 000038e0 2**2 + 10 .text 00007f34 000062d0 000062d0 000062d0 2**2</screen> + + <para>The one you want is the <literal>.text</literal> section, + section 10 in the above example. The fourth hexadecimal field + (sixth field overall) is the offset of the text section within + the file. Add this offset to the load address of the module to + obtain the relocation address for the module's code. In our + example, we get 0xc0adc000 + 0x62d0 = 0xc0ae22d0. Use the + <command>add-symbol-file</command> command in GDB to tell the + debugger about the module:</para> + + <screen><prompt>(kgdb)</prompt> <userinput>add-symbol-file /sys/modules/linux/linux.ko 0xc0ae22d0</userinput> +add symbol table from file "/sys/modules/linux/linux.ko" at text_addr = 0xc0ae22d0? +(y or n) <userinput>y</userinput> +Reading symbols from /sys/modules/linux/linux.ko...done. +<prompt>(kgdb)</prompt></screen> + + <para>You should now have access to all the symbols in the + module.</para> + </sect1> + + <sect1 xml:id="kerneldebug-console"> + <title>Debugging a Console Driver</title> + + <para>Since you need a console driver to run DDB on, things are more + complicated if the console driver itself is failing. You might remember + the use of a serial console (either with modified boot blocks, or by + specifying <option>-h</option> at the <prompt>Boot:</prompt> prompt), + and hook up a standard terminal onto your first serial port. DDB works + on any configured console driver, including a serial + console.</para> + </sect1> + + <sect1 xml:id="kerneldebug-deadlocks"> + <title>Debugging the Deadlocks</title> + + <para>You may experience so called deadlocks, the situation where + system stops doing useful work. To provide the helpful bug report + in this situation, you shall use ddb as described above. Please, + include the output of <command>ps</command> and + <command>trace</command> for suspected processes in the + report.</para> + + <para>If possible, consider doing further investigation. Receipt + below is especially useful if you suspect deadlock occurs in the + VFS layer. Add the options + <programlisting>makeoptions DEBUG=-g + options INVARIANTS + options INVARIANT_SUPPORT + options WITNESS + options DEBUG_LOCKS + options DEBUG_VFS_LOCKS + options DIAGNOSTIC</programlisting> + + to the kernel config. When deadlock occurs, in addition to the + output of the <command>ps</command> command, provide information + from the <command>show allpcpu</command>, <command>show + alllocks</command>, <command>show lockedvnods</command> and + <command>show alltrace</command>.</para> + + <para>For threaded processes, to obtain meaningful backtraces, use + <command>thread thread-id</command> to switch to the thread + stack, and do backtrace with <command>where</command>.</para> + </sect1> +</chapter> |