diff options
Diffstat (limited to 'cddl/contrib/dtracetoolkit/Notes')
19 files changed, 1227 insertions, 0 deletions
diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLcolors_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLcolors_notes.txt new file mode 100644 index 000000000000..bed6f9509b67 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLcolors_notes.txt @@ -0,0 +1,127 @@ +************************************************************************** +* The following are additional notes on all programs that print a colorized +* ("colourised") output, *color*.d. +* +* $Id: ALLcolors_notes.txt 58 2007-10-01 13:36:29Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + +* The colors aren't working, I see rubbish characters + +Try using a terminal that supports colors, such as gnome-terminal or dtterm. + +The following text should test the spectrum of colors for your terminal. +Read this using "more" or "cat" (not "less" or "vim") to check if your +terminal will print colors, and what they will look like: + + Color Test String Dark Background + --------------------------------------------------------- + black [30mcolor test[0m [30;40mcolor test[0m + red [31mcolor test[0m [31;40mcolor test[0m + green [32mcolor test[0m [32;40mcolor test[0m + yellow [33mcolor test[0m [33;40mcolor test[0m + blue [34mcolor test[0m [34;40mcolor test[0m + magenta [35mcolor test[0m [35;40mcolor test[0m + cyan [36mcolor test[0m [36;40mcolor test[0m + white [37mcolor test[0m [37;40mcolor test[0m + +and now for a test of attributes: + + Color Bold Faint + --------------------------------------------------------- + black [1;30mcolor test[0m [2;30mcolor test[0m + red [1;31mcolor test[0m [2;31mcolor test[0m + green [1;32mcolor test[0m [2;32mcolor test[0m + yellow [1;33mcolor test[0m [2;33mcolor test[0m + blue [1;34mcolor test[0m [2;34mcolor test[0m + magenta [1;35mcolor test[0m [2;35mcolor test[0m + cyan [1;36mcolor test[0m [2;36mcolor test[0m + white [1;37mcolor test[0m [2;37mcolor test[0m + + +* Why so much green and violet in the toolkit scripts? + +As DTrace can examine the entire software stack, it is conceivable that +your script could print events from many different layers each with their +own color. Color scripts in the DTraceToolkit generally start by tracing +two layers, with extra layers added by the end user as needed (you). The +general plan is: + + Software Layer Example Provider Color + ------------------------------------------------------- + Dynamic Language perl violet + User Library pid:libperl blue + OS Library pid:libc cyan + System Calls syscall green + Kernel and Drivers fbt red + +How these colors will look will depend on your terminal software. Useful +variations can be made, for example using red/bold for kernel abstraction +providers (io, vminfo, ...); and red/faint for raw kernel tracing (fbt). + +The color examples in this toolkit usually trace the syscall and dynamic +language layers, hense the green and violet. + + +* I don't like the choosen terminal colors / your colors suck + +It should be easy to customize them by tweaking the script. I've tried +to use the following convention for declaring colors in D scripts: + + dtrace:::BEGIN + { + color_shell = "\033[2;35m"; /* violet, faint */ + color_line = "\033[1;35m"; /* violet, bold */ + color_syscall = "\033[2;32m"; /* green, faint */ + color_off = "\033[0m"; /* default */ + } + +That way, printf() statements can print these string variables to turn +on and off colors, as needed. These strings contain an escape sequence to +inform your terminal software to change the output color. Customizations +can be made by tweaking the variables; refer to documentation for your +terminal software to see what numbers will print what colors. + +For my terminal (dtterm), the numbers are (from dtterm(5)): + + Attributes + + 1 bold + 2 faint + + Forground colors + + 30 black + 31 red + 32 green + 33 yellow + 34 blue + 35 magenta + 36 cyan + 37 white + + Background colors + + 40 black + 41 red + ... etc, as above + + +* I'd like to use this colored output on a website. + +The easiest way would be to change the script to output HTML rather than +escape sequences. eg: + + dtrace:::BEGIN + { + color_shell = "<font color=\"#FFAAFF\">"; /* violet, faint */ + color_line = "<font color=\"#FF44FF\">"; /* violet, bold */ + color_syscall = "<font color=\"#44CC44\">"; /* green, faint */ + color_off = "</font>"; /* default */ + } + +Other tweaks can be made to either print the output in a <pre> tagged block; +or as seperate lines ending in <br> along with changing the font to be +fixed width. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLelapsed_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLelapsed_notes.txt new file mode 100644 index 000000000000..9e8f314feae2 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLelapsed_notes.txt @@ -0,0 +1,46 @@ +************************************************************************** +* The following are notes for all scripts that measure elapsed time. +* +* $Id: ALLelapsed_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What is "elapsed" time? + +Elapsed time is the absolute time from one point to another. This time +includes everything that happened between these points, including +off-CPU time due to other system events such as I/O, scheduling, +interrupts, etc. It also includes the small overheads of DTrace itself. + +Elapsed times are useful for identifying where latencies are, since +regardless of their nature (CPU, I/O, ...), they will be visible in +elapsed time. + +Since elapsed times don't filter out anything, they are suseptible to +"noise" - random system events that are unrelated to the analysis target. +For that reason, it may be best to take several measurements of elapsed +time and take the average (or run your workload several times and let +DTrace take the average). + +See Notes/ALLoncpu_notes.txt for a description of a different time +measurement, "on-CPU" time. + + +* How is "elapsed" time measured? + +In DTrace, the following template provides elapsed time as "this->elapsed", + + <start-probe> + { + self->start = timestamp; + } + + <end-probe> + { + this->elapsed = timestamp - self->start; + self->start = 0; + ... + } + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLexclusive_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLexclusive_notes.txt new file mode 100644 index 000000000000..7aeb9ffde84c --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLexclusive_notes.txt @@ -0,0 +1,78 @@ +************************************************************************** +* Notes for all scripts that print exclusive function times (or method, +* or subroutine). +* +* $Id: ALLexclusive_notes.txt 45 2007-09-17 08:54:56Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What is "exclusive" function time? + +This is the time of function execution, from when the function begins to +when it completes, excluding the time spent executing any child function. + +Exclusive function time can be calculated like this, + + exclusive function time = time(function end) - time(function start) - + time(total child exclusive time) + +To do this, the DTrace script needs to keep trace of child function execution +time, so that it can be subtracted from the parent execution time. + +Consider this Bourne shell program, + 1 #!./sh + 2 + 3 func_c() + 4 { + 5 echo "Function C" + 6 sleep 1 + 7 } + 8 + 9 func_b() + 10 { + 11 echo "Function B" + 12 sleep 1 + 13 func_c + 14 } + 15 + 16 func_a() + 17 { + 18 echo "Function A" + 19 sleep 1 + 20 func_b + 21 } + 22 + 23 func_a + +func_a() calls func_b() which calls func_c(). Tracing the flow using +sh_flowtime.d shows, + +# ./sh_flowtime.d | cat -n + 1 C TIME(us) FILE DELTA(us) -- NAME + 2 0 3052991099265 func_abc.sh 2 -> func_a + 3 0 3052991099324 func_abc.sh 59 > echo + 4 0 3052992111638 func_abc.sh 1012314 | sleep + 5 0 3052992111678 func_abc.sh 39 -> func_b + 6 0 3052992111729 func_abc.sh 51 > echo + 7 0 3052993121633 func_abc.sh 1009903 | sleep + 8 0 3052993121693 func_abc.sh 60 -> func_c + 9 0 3052993121745 func_abc.sh 52 > echo + 10 0 3052994131634 func_abc.sh 1009888 | sleep + 11 0 3052994131685 func_abc.sh 50 <- func_c + 12 0 3052994131699 func_abc.sh 14 <- func_b + 13 0 3052994131707 func_abc.sh 7 <- func_a + +the output of DTrace was piped through "cat -n" to enumerate the lines. + +Exclusive function time for func_a() in the above output would be the +time from line 2 to line 13 minus the time from line 5 to 12 to subtract +the time spent in both func_b() and func_c(). Or, you could say that +exclusive time for func_a() is the time from lines 2 to 4. + +Looking back at the code, exclusive time for func_a() is the time spent +in code lines 18 and 19 (and not line 20). + +See Notes/ALLinclusive_notes.txt for details on "inclusive" function time. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLfbt_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLfbt_notes.txt new file mode 100644 index 000000000000..ef5f2b8eff3e --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLfbt_notes.txt @@ -0,0 +1,77 @@ +************************************************************************** +* The following are notes for any script that uses the "fbt" provider. +* To identify these scripts, check the "STABILITY" section of the script's +* man page, or try grepping for "fbt" on the script. +* +* $Id: ALLfbt_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +What is the "fbt" provider?... + +* A DTrace library of probes that instruments raw kernel function calls. +* An "unstable" provider; meaning, scripts written using "fbt" are not + guarenteed to work on future versions of the OS - including after + patching the kernel. + +In a perfect world... + +* None of the DTraceToolkit scripts would use the "fbt" provider; instead + they would all use stable providers such as "proc", "sched", "io", etc. +* All the DTraceToolkit scripts would run on any system that supports DTrace. + +In the real world... + +* Not all stable providers exist yet. Many are in development, such as + stable networking providers. +* In the meantime, useful tools such as "tcpsnoop" and "tcptop" can + only be written using the unstable "fbt" provider (and these scripts have + broken several times due to kernel changes since they were first written). +* "fbt" provider based scripts, + - only run on a particular OS (eg, Solaris) + - may only run on a particular version of an OS (eg, Solaris 10 3/05) + - are likely to break for future OS releases (eg, Solaris 10 6/06) +* "fbt" provider based scripts also make the impossible possible, albiet + in a very unstable way, as a temporary solution while stable providers + are still in development. +* Once stable providers exist, "fbt" scripts can be rewritten to use them; + however these new scripts will only run on newer OS builds that support + the stable providers. (in other words, this won't help you if you remain + on Solaris 10 6/06; you'll need to upgrade, or survive with "fbt"). +* Only some of the DTraceToolkit scripts use "fbt", and only a portion of + those have encountered stability issues - so this issue is limited. + +The "fbt" provider exports raw kernel implementation, which isn't guarenteed +to be stable nor should it ever be (to do so would freeze kernel development +and bug fixes). The only practical solution is the development and +integration of stable providers (although that doesn't help people who keep +running older versions of the OS). + +More harm than good?... + +Is the inclusion of these "fbt" scripts more harm than good? Consider, + +* the good, + - shows what is possible with DTrace + - should help a number of people solve specific performance issues, + on systems where they run + - a customer who really wants these scripts but on an OS version + where they don't work, have at least the source as a starting + point (and in some cases, the fix was trivial) + +* the bad, + - teases and frustrates people who find these scripts don't work + on their OS + +To minimise this issue, only a small number of "fbt" scripts have been +included, and they have been documented (see their man page) as unstable. + +Can I help?... + +If you really like an "fbt" based script and would like to keep using it +in a stable way, it may help to raise that with your vendor (Sun for Solaris, +Apple for MacOS). Sun has OpenSolaris forums, such as dtrace-discuss, which +are read by their engineers and the public. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLflow_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLflow_notes.txt new file mode 100644 index 000000000000..4571491d467e --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLflow_notes.txt @@ -0,0 +1,64 @@ +************************************************************************** +* Notes for all scripts that print a function or method flow. +* +* $Id: ALLflow_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What is a flow? + +Output that has some meaningful indent, such as function flow indented by +stack depth. eg, + + # ./pl_flow.d + C TIME(us) FILE -- SUB + 0 2963130861619 func_abc.pl -> func_a + 0 2963131870998 func_abc.pl -> func_b + 0 2963132871121 func_abc.pl -> func_c + 0 2963133881150 func_abc.pl <- func_c + 0 2963133881166 func_abc.pl <- func_b + 0 2963133881174 func_abc.pl <- func_a + ^C + + +* The output looks shuffled? + +Eg, + + # ./pl_flow.d + C TIME(us) FILE -- SUB + 0 2963130861619 func_abc.pl -> func_a + 0 2963131870998 func_abc.pl -> func_b + 0 2963132871121 func_abc.pl -> func_c + 0 2963133881166 func_abc.pl <- func_b + 0 2963133881174 func_abc.pl <- func_a + 1 2963133881150 func_abc.pl <- func_c + ^C + +Yes, this is shuffled. DTrace has been designed with a number of important +goals in mind - including minimising the enabled performance overhead. To do +this, per-CPU kernel buffers have been used to collect output, which are +(currently) dumped in sequence by /usr/sbin/dtrace whenever it wakes +up ("switchrate" tunable). So, on multi-CPU servers, there is always the +possibility that any DTrace script can print out-of-order data. + +To deal with this behaviour, the flow scripts may, + +- print a "C" CPU column. If this changes from one line to the next then + the output is probably shuffled around that point. This is why the "C" + column appears in these flow scripts. +- print a "TIME(us)" column. You can eyeball this for shuffles, or just + post sort the dtrace output. + +Now have a closer look at the pl_flow.d output above. The change in C +indicates that a shuffle may have happened, and the out-of-order TIME(us) +shows that it did happen. + +It is possible that DTrace will be enhanced to always sort output before +printing, and this behaviour is no longer an issue. + +See "The output seems shuffled?" in Notes/ALLsnoop_notes.txt for more +notes on this behaviour. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLinclusive_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLinclusive_notes.txt new file mode 100644 index 000000000000..eea4b5d88d9e --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLinclusive_notes.txt @@ -0,0 +1,74 @@ +************************************************************************** +* Notes for all scripts that print inclusive function times (or method, +* or subroutine). +* +* $Id: ALLinclusive_notes.txt 45 2007-09-17 08:54:56Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What is "inclusive" function time? + +This is the time of function execution, from when the function begins to +when it completes. This includes the times from all child functions called. + +Inclusive function time is calculated in a very simple way, + + inclusive function time = time(function end) - time(function start) + +Consider this Bourne shell program, + + 1 #!./sh + 2 + 3 func_c() + 4 { + 5 echo "Function C" + 6 sleep 1 + 7 } + 8 + 9 func_b() + 10 { + 11 echo "Function B" + 12 sleep 1 + 13 func_c + 14 } + 15 + 16 func_a() + 17 { + 18 echo "Function A" + 19 sleep 1 + 20 func_b + 21 } + 22 + 23 func_a + +func_a() calls func_b() which calls func_c(). Tracing the flow using +sh_flowtime.d shows, + +# ./sh_flowtime.d | cat -n + 1 C TIME(us) FILE DELTA(us) -- NAME + 2 0 3052991099265 func_abc.sh 2 -> func_a + 3 0 3052991099324 func_abc.sh 59 > echo + 4 0 3052992111638 func_abc.sh 1012314 | sleep + 5 0 3052992111678 func_abc.sh 39 -> func_b + 6 0 3052992111729 func_abc.sh 51 > echo + 7 0 3052993121633 func_abc.sh 1009903 | sleep + 8 0 3052993121693 func_abc.sh 60 -> func_c + 9 0 3052993121745 func_abc.sh 52 > echo + 10 0 3052994131634 func_abc.sh 1009888 | sleep + 11 0 3052994131685 func_abc.sh 50 <- func_c + 12 0 3052994131699 func_abc.sh 14 <- func_b + 13 0 3052994131707 func_abc.sh 7 <- func_a + +the output of DTrace was piped through "cat -n" to enumerate the lines. + +Inclusive function time for func_a() in the above output would be the +time from line 2 to line 13. This inclusive time includes the time +for both func_b() and func_c(). + +Looking back at the code, inclusive time for func_a() is the time spent +in code lines 18, 19 and 20. + +See Notes/ALLexclusive_notes.txt for details on "exclusive" function time. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLjava_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLjava_notes.txt new file mode 100644 index 000000000000..2d033c18a822 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLjava_notes.txt @@ -0,0 +1,35 @@ +************************************************************************** +* Notes for all scripts that trace Java using the hotspot provider. +* +* $Id: ALLjava_notes.txt 52 2007-09-24 04:28:01Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + +* I see "drops" + +If you see the following output, + + dtrace: 2547 drops on CPU 0 + +This means that JVM events (usually methods) were executed too quickly for +DTrace to keep up, and as a safety measure DTrace has let events slip by. +This means, at least, that the output is missing lines. At worst, the +output may contain corrupted values (time deltas between events that were +dropped). + +If you see drops, you should first ask yourself whether you need to be +tracing such frequent events at all - is there another way to get the same +data? For example, see the j_profile.d script, which uses a different +technique (sampling) than most of the other Java scripts (tracing). + +You can try tweaking DTrace tunables to prevent DTrace from dropping events. +A key tunable is "bufsize", and can be set in scripts like so, + + #pragma D option bufsize=32m + +That line means that 32 Mbytes will be allocated to the DTrace primary +buffer per-CPU (how depends on bufpolicy). If you have many CPUs, say 8, +then the above line means that 256 Mbytes (32 * 8) will be allocated as a +buffer while your D script is running. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLoncpu_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLoncpu_notes.txt new file mode 100644 index 000000000000..41aead027bc9 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLoncpu_notes.txt @@ -0,0 +1,42 @@ +************************************************************************** +* The following are notes for all scripts that measure on-CPU times. +* +* $Id: ALLoncpu_notes.txt 58 2007-10-01 13:36:29Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What is "on-CPU" time? + +This is the time that a thread spent running on a CPU. It does not include +time spent off-CPU time such as sleeping for I/O or waiting for scheduling. + +On-CPU times are useful for showing who is causing the CPUs to be busy, +since they measure how much CPU time has been consumed by that thread. + +On-CPU times are also less susceptible to system "noise" than elapsed times, +since much of the noise will be filtered out. DTrace itself also tries +to subtract the small overheads of DTrace from the on-CPU time, to improve +the accuracy of this time. + +See Notes/ALLelapsed_notes.txt for a description of a different time +measurement, "elapsed" time. + + +* How is "on-CPU" time measured? + +In DTrace, the following template provides on-CPU time as "this->oncpu", + + <start-probe> + { + self->vstart = vtimestamp; + } + + <end-probe> + { + this->oncpu = vtimestamp - self->vstart; + self->vstart = 0; + ... + } + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLoverhead.txt b/cddl/contrib/dtracetoolkit/Notes/ALLoverhead.txt new file mode 100644 index 000000000000..844b3c08c1d9 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLoverhead.txt @@ -0,0 +1,96 @@ +************************************************************************** +* The following are notes regarding the overheads of running DTrace. +* +* $Id: ALLoverhead.txt 58 2007-10-01 13:36:29Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +The following are notes regarding the overheads of running DTrace. + +* What are the overheads of running DTrace? + +Often negligible. + +It depends what the DTrace script does, in particular, the frequency of +events that it is tracing. + +The following tips should explain what the overheads probably are, + +- if your script traces less than 1000 events per second, then the overhead + is probably negligible. ie, less than 0.1% CPU. +- if your script traces more than 100,000 events per second, then the + overhead will start to be significant. If you are tracing kernel events, + then perhaps this could be 10% per CPU. If you are tracing user land + application events, then the overhead can be greater than 30% per CPU. +- if your script produes pages of output, then the CPU cost of drawing + this output to the screen and rendering the fonts is usually far greater + than DTrace itself. Redirect the output of DTrace to a file in /tmp + ("-o" or ">"). +- a ballpark figure for the overhead of a DTrace probe would be 500 ns. + This can be much less (kernel only), or much more (many user to kerel + copyin()s); I've provided it to give you a very rough idea. Of course, + as CPUs become faster, this overhead will become smaller. + +If overheads are a concern - then perform tests to measure their magnitude +for both your workload and the scripts applied, such as benchmarks with +and without DTrace running. Also read the scripts you are using, and +consider how frequent the probes will fire, and if you can customise the +script to reduce the frequency of probes. + +For example, scripts that trace, + + pid$target:::entry, + pid$target:::return + +would usually cause significant performance overhead, since they fire two +probes for every function called (and can easily reach 100,000 per second). +You could reduce this by modifying the script to only trace the libraries +you are interested in. For example, if you were only interested in +libsocket and libnsl, then change the above lines wherever they appeared to, + + pid$target:libsocket::entry, + pid$target:libsocket::return, + pid$target:libnsl::entry, + pid$target:libnsl::return + +and you may notice the overheads are significantly reduced (especially anytime +you drop libc and libdl). To go further, only list functions of interest, + + pid$target:libsocket:connect:entry, + pid$target:libsocket:connect:return, + pid$target:libsocket:listen:entry, + pid$target:libsocket:listen:return, + [...] + +There are additional notes in Docs/Faq about the DTraceToolkit's scripts +and performance overhead. + + +* When are the overheads a problem? + +When they are significant (due to frequent events), and you are tracing +in a production environment that is sensitive to additional CPU load. + +Overheads should be considered if you are measuring times (delta, elapsed, +on-CPU, etc) for performance analysis. In practise, overheads aren't +that much of a problem -- the script will either identify your issues +correctly (great), or not (keep looking). Any it is usually easy to quickly +confirm what DTrace does find by using other tools, or by hacking quick +code changes. You might be using DTrace output that you know has a +significant margin of error - but that becomes moot after you prove that +the performance fix works through benchmarking a quick fix. + +At the end of the day, if DTrace helps find real measurable performance wins +(and it should), then it has been successful. + + +* When are overheads not a problem? + +When the script is not tracing extreamly frequent events. + +Also, when you are in development and tracing events for troubleshooting +purposes (args to functions, for example), DTrace overheads are usually +not an issue at all. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLperl_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLperl_notes.txt new file mode 100644 index 000000000000..24039504f698 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLperl_notes.txt @@ -0,0 +1,44 @@ +************************************************************************** +* The following are notes for all the Perl tracing scripts, +* +* $Id: ALLperl_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* Where did those "BEGIN" subroutine calls come from? + +The following counts subroutines from the example program, Code/Perl/hello.pl, + + # pl_subcalls.d + Tracing... Hit Ctrl-C to end. + ^C + FILE SUB CALLS + +no subroutines were called, so there is no data to output. + +Now a similar program is traced, Code/Perl/hello_strict.pl, which uses +the "strict" pragma, + + # pl_subcalls.d + Tracing... Hit Ctrl-C to end. + ^C + FILE SUB CALLS + hello_strict.pl BEGIN 1 + strict.pm bits 1 + strict.pm import 1 + +not only were functions from "strict.pm" traced, but a "BEGIN" function +ran from the "hello_strict.pl" program - which doesn't appear to use "BEGIN", + + # cat -n ../Code/Perl/hello_strict.pl + 1 #!./perl -w + 2 + 3 use strict; + 4 + 5 print "Hello World!\n"; + +Perl appears to add a BEGIN block to process the "use" keyword. This makes +some degree of sense. + diff --git a/cddl/contrib/dtracetoolkit/Notes/ALLsnoop_notes.txt b/cddl/contrib/dtracetoolkit/Notes/ALLsnoop_notes.txt new file mode 100644 index 000000000000..b43c70a1aefe --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/ALLsnoop_notes.txt @@ -0,0 +1,94 @@ +************************************************************************** +* The following are additional notes on ALL of the *snoop programs (such as +* execsnoop, iosnoop, ..., and dapptrace, dtruss). +* +* $Id: ALLsnoop_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* The output seems shuffled? + +Beware - due to the (current) way DTrace works, on multi-CPU systems there +is no guarentee that if you print traced events the output is in the same +order that the events occured. + +This is because events details are placed in kernel per-CPU buffers, and then +dumped in sequence by the DTrace consumer (/usr/sbin/dtrace) whenever it +wakes up ("switchrate" tunable). The DTrace consumer reads and prints the +buffers one by one, it doesn't combine them and sort them. + +To demonstrate this, + + # dtrace -n 'profile:::profile-3hz { trace(timestamp); }' + dtrace: description 'profile-3hz ' matched 1 probe + CPU ID FUNCTION:NAME + 0 41241 :profile-3hz 1898015274778547 + 0 41241 :profile-3hz 1898015608118262 + 0 41241 :profile-3hz 1898015941430060 + 1 41241 :profile-3hz 1898015275499014 + 1 41241 :profile-3hz 1898015609173485 + 1 41241 :profile-3hz 1898015942505828 + 2 41241 :profile-3hz 1898015275351257 + 2 41241 :profile-3hz 1898015609180861 + 2 41241 :profile-3hz 1898015942512708 + 3 41241 :profile-3hz 1898015274803528 + 3 41241 :profile-3hz 1898015608120522 + 3 41241 :profile-3hz 1898015941449884 + ^C + +If you read the timestamps carefully, you'll see that they aren't quite +in chronological order. If you look at the CPU column while reading the +timestamps, the way DTrace works should become clear. + +Most of the snoop tools have a switchrate of 10hz, so events may be shuffled +within a tenth of a second - not hugely noticable. + +This isn't really a problem anyway. If you must have the output in the correct +order, find the switch that prints timestamps and then sort the output. +As an example, + + # iosnoop -t > out.iosnoop + ^C + # sort -n out.iosnoop + + TIME UID PID D BLOCK SIZE COMM PATHNAME + 183710958520 0 3058 W 10507848 4096 sync /var/log/pool/poold + 183710990358 0 3058 W 6584858 1024 sync /etc/motd + 183711013469 0 3058 W 60655 9216 sync <none> + 183711020149 0 3058 W 60673 1024 sync <none> + +All shell-wrapped scripts should have some way to print timestamps, and +many DTrace-only scripts print timestamps by default. If you find a script +that doesn't print timestamps, it should be trivial for you to add an +extra column. + +To add a microsecond-since-boot time column to a script, try adding this +before every printf() you find, + + printf("%-16d ", timestamp / 1000); + +except for the header line, where you can add this, + + printf("%-16s ", "TIME(us)"); + +Now you will be able to post sort the script output on the TIME(us) column. + +In practise, I find post sorting the output a little annoying at times, +and use a couple of other ways to prevent shuffling from happening in the +first place: + +- offline all CPUs but one when running flow scripts. Naturally, you + probably don't want to do this on production servers, this is a trick + that may be handy for when developing on workstations or laptops. Bear + in mind that if you are trying to DTrace certain issues, such as + multi-thread locking contention, then offlining most CPUs may eliminate + the issue you are trying to observe. +- pbind the target process of interest to a single CPU. Most OSes provide + a way to glue a process to a single CPU; Solaris has both pbind and psrset. + +Another way to solve this problem would be to enhance DTrace to always print +in-order output. Maybe this will be done one day; maybe by the time you +are reading this it has already been done? + diff --git a/cddl/contrib/dtracetoolkit/Notes/Readme b/cddl/contrib/dtracetoolkit/Notes/Readme new file mode 100644 index 000000000000..99a1807e087e --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/Readme @@ -0,0 +1,21 @@ +Notes - Discussion about tools and their output + + This directory contains files that provide deeper discussions about + tools and their output. + + Files are either named, + + ALL*_notes.txt - notes that cover a collection of tools + *_notes.txt - notes that cover a specific tool + + These files are exist as an informal place to dump "stuff". This might + range from caveats to bear in mind when interpreting tool output, to + general or bizzare knowledge. Tool documentation is placed in, + + /Man - formal man pages + /Examples - demos and how to read the output + *.d - implementation notes within the code itself + /Notes - everything else + + Many of the scripts and man pages refer to files in this directory. + diff --git a/cddl/contrib/dtracetoolkit/Notes/cputimes_notes.txt b/cddl/contrib/dtracetoolkit/Notes/cputimes_notes.txt new file mode 100644 index 000000000000..cdf7ecdac5a6 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/cputimes_notes.txt @@ -0,0 +1,138 @@ +************************************************************************** +* The following are additional notes on the cputimes command. +* +* $Id: cputimes_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* How cputimes works + +cputimes measures time consumed by the kernel, idle therads and processes, +by tracking the activity of the schedular. In particular we track on-cpu +and off-cpu events for kernel therads, measuring the timestamps at each event. + + +* Why cputimes? + +If you are interested in how much time processes are consuming, the data +given by "prstat" or "prstat -m" is fine. However there is no easy way to +see kernel consumed time, which is the idea behind cputimes. + + +* What does it mean? + +The output shows categories of threads by the sum of time, in nanoseconds. + +A nanosecond is 10^-9, or 0.000000001 of a second. This program uses +nanoseconds as units, but does not have nanosecond accuracy. It would be +reasonable to assume that this has microsecond accuracy (10^-6), so in +practise ignore the last three digits of the times. + +The sections reported are, + +PROCESSES - the sum of all the process time on the CPU. +KERNEL - the sum of the time spent in the kernel. +IDLE - the time the kernel spent in the idle thread, waiting for some work. + +If your system isn't doing much, then the idle time will be quite large. If +your system is running many applications, then there may be no idle time +at all - instead most of the time appearing under processes. + + +* When is there a problem? + +Expect to see most of the time in processes or idle, depending on how busy +your server is. Seeing a considerable amout of time in kernel would +definately be interesting. + +The kernel generally doesn't use much CPU time, usually less than 5%. +If it were using more, that may indicate heavy activity from an interrupt +thread, or activity caused by DTrace. + +For example, + + # cputimes 1 + 2005 Apr 27 23:49:32, + THREADS TIME (ns) + IDLE 28351679 + KERNEL 436022725 + PROCESS 451304688 + +In this sample the kernel is using a massive amount of the CPUs, around 47%. +This sample was taken during heavy network utilisation, the time consumed +by the TCP/IP and network driver threads (and DTrace). The "intrstat" command +could be used for further analysis of the interrupt threads responsible +for servicing the network interface. + + +* Problems with cputimes + +The way cputimes measures schedular activity turns out to be a lot of work. +There are many scheduling events per second where one thread steps onto a +CPU and another leaves. It turns out that cputimes itself causes some degree +of kernel load. + +Here we run 1 cputimes, + + # cputimes 1 + 2005 May 15 12:00:41, + THREADS TIME (ns) + KERNEL 12621985 + PROCESS 982751579 + 2005 May 15 12:00:42, + THREADS TIME (ns) + KERNEL 12267577 + PROCESS 983513765 + [...] + +Now a second cputimes is run at the same time, + + # cputimes 1 + 2005 May 15 12:02:06, + THREADS TIME (ns) + KERNEL 17366426 + PROCESS 978804165 + 2005 May 15 12:02:07, + THREADS TIME (ns) + KERNEL 17614829 + PROCESS 978671601 + [...] + +And now a third, + + # cputimes 1 + 2005 May 15 12:03:09, + THREADS TIME (ns) + KERNEL 21303089 + PROCESS 974925124 + 2005 May 15 12:03:10, + THREADS TIME (ns) + KERNEL 21222992 + PROCESS 975152727 + [...] + +Each extra cputimes is consuming an extra 4 to 5 ms of the CPU as kernel time. +Around 0.5%. This can be used as an estimate of the kernel load caused by +running cputimes, and a similar strategy could be used to measure the kernel +load of other DTrace scripts. + +However the following CPU characteristics must be taken into consideration, + + # psrinfo -v + Status of virtual processor 0 as of: 05/15/2005 12:06:05 + on-line since 04/30/2005 13:32:32. + The i386 processor operates at 867 MHz, + and has an i387 compatible floating point processor. + +as well as the type of activity that was also running on the system, which +cputimes was monitoring (frequency of scheduling events). + +A system with a slower CPU will use a larger proportion of kernel time to +perform the same tasks. Also, a system that is context switching more +(switching between different processes) is likely to consume more kernel time +as well. + + + diff --git a/cddl/contrib/dtracetoolkit/Notes/dappprof_notes.txt b/cddl/contrib/dtracetoolkit/Notes/dappprof_notes.txt new file mode 100644 index 000000000000..d617f2aeabaa --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/dappprof_notes.txt @@ -0,0 +1,14 @@ +************************************************************************** +* The following are extra notes on the dappprof command. +* +* $Id: dappprof_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* Can I trust the elapsed and on-cpu times? + +See the documentation for this point in the dtruss_notes.txt file. + + diff --git a/cddl/contrib/dtracetoolkit/Notes/dapptrace_notes.txt b/cddl/contrib/dtracetoolkit/Notes/dapptrace_notes.txt new file mode 100644 index 000000000000..579c2dfd8c01 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/dapptrace_notes.txt @@ -0,0 +1,19 @@ +************************************************************************** +* The following are extra notes on the dapptrace command. +* +* $Id: dapptrace_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* Can I trust the elapsed and on-cpu times? + +See the documentation for this point in the dtruss_notes.txt file. + + + +* The output appears shuffled? + +Read the answer to this in ALLsnoop_notes.txt. + diff --git a/cddl/contrib/dtracetoolkit/Notes/dtruss_notes.txt b/cddl/contrib/dtracetoolkit/Notes/dtruss_notes.txt new file mode 100644 index 000000000000..8ecbecf864bc --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/dtruss_notes.txt @@ -0,0 +1,97 @@ +************************************************************************** +* The following are additional notes on the dtruss program. +* +* $Id: dtruss_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* Can I trust the elapsed and on-cpu times? + +Firstly, lets see dtruss do something cool, + + # dtruss -eo -n bash + PID/LWP ELAPSD CPU SYSCALL(args) = return + 6215/1: 57 37 write(0x2, "h\0", 0x1) = 1 0 + 6215/1: 357210 45 read(0x0, "e\0", 0x1) = 1 0 + 6215/1: 53 37 write(0x2, "e\0", 0x1) = 1 0 + 6215/1: 359510 46 read(0x0, "l\0", 0x1) = 1 0 + 6215/1: 57 42 write(0x2, "l\0", 0x1) = 1 0 + 6215/1: 166495 47 read(0x0, "l\0", 0x1) = 1 0 + 6215/1: 56 40 write(0x2, "l\0", 0x1) = 1 0 + 6215/1: 346076 44 read(0x0, "o\0", 0x1) = 1 0 + 6215/1: 54 38 write(0x2, "o\0", 0x1) = 1 0 + 6215/1: 349852 45 read(0x0, " \0", 0x1) = 1 0 + 6215/1: 54 39 write(0x2, " \0", 0x1) = 1 0 + +In the above, the slow elapsed times for reads are due to the process context +switching off the CPU while we wait for the next keystroke. For example, +the second line shows an on-CPU time of 45 us and an elapsed time of 357210 us. +In fact, the elapsed times are equal to the inter-keystroke delays. + + +What about the writes? Their elapsed times are longer than the on-CPU times +also. Did we context switch off for them too? ... Lets run a different demo, + + # dtruss -eo date + ELAPSD CPU SYSCALL(args) = return + Mon Jul 25 21:41:40 EST 2005 + 44 23 resolvepath("/usr/bin/date\0", 0x80476CC, 0x3FF) = 13 0 + 10 1 sysconfig(0x6, 0xB25A1, 0xFEC1D444) = 4096 0 + 36 28 resolvepath("/usr/lib/ld.so.1\0", 0x80476CC, 0x3FF) = 12 0 + 18 9 xstat(0x2, 0x8047FEB, 0x8047AF8) = 0 0 + 25 16 open("/var/ld/ld.config\0", 0x0, 0x0) = -1 Err#2 + 27 18 xstat(0x2, 0xD27FBF38, 0x80473B0) = 0 0 + 17 9 resolvepath("/lib/libc.so.1\0", 0x8047438, 0x3FF) = 14 0 + 21 13 open("/lib/libc.so.1\0", 0x0, 0x0) = 3 0 + 30 22 mmap(0x10000, 0x1000, 0x5) = -763559936 0 + 15 6 mmap(0x10000, 0xCE000, 0x0) = -764411904 0 + 24 16 mmap(0xD2700000, 0xB5A45, 0x5) = -764411904 0 + 21 12 mmap(0xD27C6000, 0x5EB3, 0x3) = -763600896 0 + 18 9 mmap(0xD27CC000, 0x15C0, 0x3) = -763576320 0 + 14 5 munmap(0xD27B6000, 0x10000) = 0 0 + 186 176 memcntl(0xD2700000, 0x1B8D8, 0x4) = 0 0 + 17 7 close(0x3) = 0 0 + [...] + +For every syscall, the elapsed time is around 10 us (microseconds) slower +than the on-cpu time. These aren't micro context switches, this is due to +DTrace slowing down the program! The more closely we measure something the +more we effect it. (See Heisenberg's uncertainty principle). + +Ok, so for the above output we can tell that each elapsed time is around 10 us +longer than it should be. That's fine, since it's fairly consistant and not +a huge difference. This is an x86 server with a 867 MHz CPU. + + +Now lets try the same on an Ultra 5 with a 360 MHz CPU, + + # dtruss -eo date + ELAPSD CPU SYSCALL(args) = return + 216 142 resolvepath("/usr/bin/date\0", 0xFFBFF338, 0x3FF) = 13 0 + 234 187 resolvepath("/usr/lib/ld.so.1\0", 0xFFBFF338, 0x3FF) = 12 0 + 113 67 stat("/usr/bin/date\0", 0xFFBFF818, 0xFFBFFFEB) = 0 0 + 136 90 open("/var/ld/ld.config\0", 0x0, 0x0) = -1 Err#2 + 107 61 stat("/opt/onbld/lib/libc.so.1\0", 0xFFBFF330, 0xFFBFF55C) = -1 Err#2 + 98 54 stat("/opt/SUNWspro/lib/libc.so.1\0", 0xFFBFF330, 0xFFBFF55C) = -1 Err#2 + 96 53 stat("/opt/SUNWmlib/lib/libc.so.1\0", 0xFFBFF330, 0xFFBFF55C) = -1 Err#2 + 97 54 stat("/usr/sfw/lib/libc.so.1\0", 0xFFBFF330, 0xFFBFF55C) = -1 Err#2 + 96 53 stat("/lib/libc.so.1\0", 0xFFBFF330, 0xFFBFF55C) = 0 0 + 134 92 resolvepath("/lib/libc.so.1\0", 0xFFBFEF30, 0x3FF) = 14 0 + 109 69 open("/lib/libc.so.1\0", 0x0, 0x0) = 3 0 + 177 132 mmap(0x10000, 0x2000, 0x5) = -12976128 0 + [...] + +Now the time difference is around 40 us, and fairly consistant. + + +This difference is find so long as we bear it in mind. Or, run DTrace +on faster servers where the difference is much less. + + + +* The output appears shuffled? + +Read the answer to this in ALLsnoop_notes.txt. + diff --git a/cddl/contrib/dtracetoolkit/Notes/iosnoop_notes.txt b/cddl/contrib/dtracetoolkit/Notes/iosnoop_notes.txt new file mode 100644 index 000000000000..af3ab9bbbe20 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/iosnoop_notes.txt @@ -0,0 +1,99 @@ +************************************************************************** +* The following are additional notes on the iosnoop program. +* +* $Id: iosnoop_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* What does the output represent? + +The output is disk events - I/O operations that cause the disk to physically +read or write data. The output is not application I/O events which may be +absorbed by memory caches - many of which will be. The output really is +physical disk events. + +iosnoop uses probes from the "io" provider - which traces the block device +driver before disk events happen. disk events. The stack goes like this, + + application + | + V + syscall + | + V + vfs + | + V + ufs/zfs/... + | + V + block device driver + | + V + physical device driver + | + V + disk + +Due to caching (after vfs) few events will make it to the disk for iosnoop +to see. If you want to trace all I/O activity, try using syscall provider +based scripts first. + + +* What do the elapsed and delta times mean? + +Glad you asked! + +The times may *not* be as useful as they appear. I should also add that +this quickly becomes a very complex topic, + +There are two different delta times reported. -D prints the +elapsed time from the disk request (strategy) to the disk completion +iodone); -o prints the time for the disk to complete that event +since it's last event (time between iodones, or since idle->strategy). + +The elapsed time is equivalent to the response time from the application +request to the application completion. The delta time resembles the +service time for this request (resembles means it will be generally +correct, but not 100% accurate). The service time is the the time for the +disk to complete the request, after it has travelled through any bus or +queue. + +buuuttt.... you need to think carefully about what these times mean before +jumping to conclusions. For example, + + You troubleshoot an application by running iosnoop and filtering + on your application's PID. You notice large times for the disk events + (responce, service, for this example it doesn't matter). + Does this mean there is a problem with that application? + What could be happening is that a different application is also using + the disks at the same time, and is causing the disk heads to seek to + elsewhere on the disk surface - increasing both service and response time. + +hmmm! so you can't just look at one application, one set of numbers, and +understand fully what is going on. + +But it gets worse. Disks implement "tagged queueing", where events in the +queue are reshuffeled to promote "elevator seeking" of the disk heads (this +reduces head seeking). So the time for a disk event can be effected not +just by the previous event (and previous location the heads had seeked to), +but the surrounding events that enter the queue. + +So the good and the bad. The good news is that iosnoop makes it easy to +fetch disk event data on a live system, the bad news is that understanding +all the data is not really easy. + +For further information on disk measurements see, + + "How do disks really work?" - Adrian Cockcroft, SunWorld Online, June 1996 + "Sun Performance and Tuning" - Adrian Cockcroft, Richard Pettit + "Solaris Internals" - Richard McDougall, Jim Mauro + + + +* The output appears shuffled? + +Read the answer to this in ALLsnoop_notes.txt. + diff --git a/cddl/contrib/dtracetoolkit/Notes/iotop_notes.txt b/cddl/contrib/dtracetoolkit/Notes/iotop_notes.txt new file mode 100644 index 000000000000..9663ec7b5498 --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/iotop_notes.txt @@ -0,0 +1,48 @@ +************************************************************************** +* The following are additional notes on the iotop program. +* +* $Id: iotop_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* When using -P, how can a process exceed 100% I/O? + +These percentages are based on disk time. They are in terms of a single disk. + +200% could mean 2 disks @ 100%, or 4 @ 50%, or some such combination. + +I could have capped it at 100% by dividing by disk count. I didn't. Disk +utilisation is an asymmetric resource (unlike CPUs, which are (mostly) +symmetric), so it's unfair to divide by all the disks capacity as an +application cannot use every disks capacity (eg, writing to a /opt disk only). + +Would it be wise to report utilisation as 10% of overall capacity, if it +could mean that 1 disk was SATURATED out of ten? A value of 10% could +understate the problem. + +Instead I add the utilisations and don't divide. 1 disk saturated out of 10 +would be reported as 100% utilisation. This has the danger of overstating +the problem (consider all ten disks at 10% utilisation, this would also be +reported as 100%). + +Nothing is perfect when you are summarising to a single value! + + + +* Beware of overcounting metadevices, such as SVM and Veritas. + +The current version of iotop reports on anything the kernel believes to be +a block disk device. A problem happens when a metadevice contains physical +disk devices, and iotop reports on activity to both the metadevice and +the physical devices, which overcounts activity. + +Consider a metadevice that contains two physical disks which are both +running at 100% utilised. iotop -P may report 300% utilisation, which is +200% for the disks + 100% for the metadevice. We'd probably want to see +a value of 200%, not 300%. Eliminating the counting of metadevices in DTrace +isn't easy (without inelegant "hardwiring" of device types), however I do +intend to find a way to fix this in future versions. + + diff --git a/cddl/contrib/dtracetoolkit/Notes/procsystime_notes.txt b/cddl/contrib/dtracetoolkit/Notes/procsystime_notes.txt new file mode 100644 index 000000000000..5e1c52f4e52c --- /dev/null +++ b/cddl/contrib/dtracetoolkit/Notes/procsystime_notes.txt @@ -0,0 +1,14 @@ +************************************************************************** +* The following are extra notes on the procsystime command. +* +* $Id: procsystime_notes.txt 44 2007-09-17 07:47:20Z brendan $ +* +* COPYRIGHT: Copyright (c) 2007 Brendan Gregg. +************************************************************************** + + +* Can I trust the elapsed and on-cpu times? + +See the documentation for this point in the dtruss_notes.txt file. + + |