diff options
-rw-r--r-- | en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-2.sbv | 305 |
1 files changed, 150 insertions, 155 deletions
diff --git a/en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-2.sbv b/en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-2.sbv index 16f08a914a..bf785730a9 100644 --- a/en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-2.sbv +++ b/en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-2.sbv @@ -1,8 +1,8 @@ 0:00:00.530,0:00:01.590 -So, basically, +So basically, 0:00:04.590,0:00:10.029 -we are going to look mainly in this second part +we are going to look, mainly in this second part, at how to 0:00:10.029,0:00:11.519 @@ -15,32 +15,32 @@ locking problems that categorize in the kernel. 0:00:17.910,0:00:24.410 -Here there are described two kinds of problems -you can get with locks, that are pretty much common +Here, there are described two kinds of problems +you can get with locks, that are pretty much common. 0:00:24.410,0:00:27.859 -The first one is called Lock Order Reversal +The first one is called Lock Order Reversal (LOR). 0:00:27.859,0:00:30.140 -when you have for example a thread A +When you have for example a thread A, 0:00:30.140,0:00:32.340 which owns 0:00:32.340,0:00:35.870 -a lock code, for example, L1 +a lock code, for example L1 0:00:35.870,0:00:37.920 and another thread B 0:00:37.920,0:00:40.070 -which owns another lock, L2 +which owns the lock, L2 0:00:40.070,0:00:43.150 -Then thread A tries to +Then thread A tries to.. 0:00:43.150,0:00:44.730 -Right, it's wrong. +Right.. it's wrong. 0:00:44.730,0:00:46.220 The slide is wrong. @@ -96,7 +96,7 @@ knows that locks should maintain 0:01:32.910,0:01:34.319 -an ordering regard of each other. +an ordering in regard of each other. 0:01:34.319,0:01:38.859 That's not very simple when @@ -110,25 +110,26 @@ that there are 3 kinds of classes of locks 0:01:44.850,0:01:49.180 is going to count because you can -never mix two different kinds of locks +never mix two different kinds of locks. 0:01:49.180,0:01:50.680 -for example +For example 0:01:50.680,0:01:51.610 -a spin lock +a spinlock 0:01:51.610,0:01:53.770 -and a mutex. +and a mutex 0:01:53.770,0:01:59.120 -You can mix in this way. +can be mixed in this way. 0:01:59.120,0:02:01.720 - +You can have the mutex first and the spinlock later, +while the opposite is not actually true. 0:02:01.720,0:02:07.060 -So, you will see that this kind +So, you will see that these kind of deadlocks are possible 0:02:07.060,0:02:09.290 @@ -141,26 +142,26 @@ like for example 2 mutex or 2 spin mutex, or such. 0:02:14.569,0:02:16.090 -Um... + 0:02:16.090,0:02:17.409 -Also +Also, 0:02:17.409,0:02:19.949 -Even if it's not very well documented, +even if it's not very well documented, 0:02:19.949,0:02:22.880 -for example, spin locks, +for example spinlocks 0:02:22.880,0:02:26.599 -in previous deep, as a way to +in FreeBSD, have a way to identify such kind of deadlocks. 0:02:26.599,0:02:27.619 -And it's pretty much implemented... +And it's pretty much implemented. 0:02:27.619,0:02:29.709 -a very much in it would + 0:02:29.709,0:02:32.449 It's a feature enabled in the code. @@ -178,7 +179,7 @@ if it exceeds an exaggerated result, 0:02:41.379,0:02:47.870 -it means that they are probable +it means that they are probably under a deadlock and the system panics. 0:02:47.870,0:02:52.489 @@ -224,7 +225,7 @@ you can end up having some threads sleeping on this wait channel 0:03:30.569,0:03:34.589 -and nobody is going to wake up them again. +and nobody is going to wake them up again. 0:03:34.589,0:03:37.629 This is usually called missed wakeup @@ -240,10 +241,10 @@ The problem is that 0:03:46.719,0:03:52.109 it's very difficult to differentiate -between missed wakeup and, +between missed wakeup and 0:03:52.109,0:03:53.480 -for example, +for example 0:03:53.480,0:03:56.189 forever sleep @@ -256,7 +257,7 @@ that is not likely to be awaken. 0:04:01.859,0:04:07.109 So these kind of deadlocks are -very difficult to be discovered +very very difficult to be discovered 0:04:07.109,0:04:11.669 and will require some bit of @@ -278,7 +279,7 @@ kernel systems and some things integrated into the debugger. 0:04:22.240,0:04:22.979 -Um, + 0:04:22.979,0:04:25.520 In FreeBSD, @@ -292,7 +293,7 @@ with kernel problems. 0:04:32.080,0:04:36.539 The first one (and the most important) -is called witness. +is called WITNESS. 0:04:36.539,0:04:39.169 It was introduced @@ -301,7 +302,7 @@ It was introduced in the context of SMPng 0:04:42.080,0:04:44.979 -and has been written in the recent past, +and has been rewritten in the recent past, 0:04:44.979,0:04:47.919 mainly by a contribution of @@ -313,7 +314,7 @@ Isilon systems. They contributed back then 0:04:52.270,0:04:54.989 -to the writing of witness. +to the writing of WITNESS. 0:04:54.989,0:04:57.389 This subsystem is very important @@ -322,7 +323,7 @@ This subsystem is very important because it tracks down exactly every order 0:05:02.730,0:05:03.949 -of the locks +of the locks. 0:05:03.949,0:05:07.810 So that, if there is an ordering violation like a LOR, @@ -356,7 +357,7 @@ Doing that, we can identify 0:05:36.539,0:05:38.419 -deadlocks possible +deadlocks, possibly 0:05:38.419,0:05:39.500 even @@ -368,7 +369,7 @@ on the reader's path. 0:05:45.529,0:05:49.609 -We could say that witness is pretty big, +We could say that WITNESS is pretty big, 0:05:49.609,0:05:52.289 so activating it @@ -381,10 +382,10 @@ It's mainly used when you are going to develop a new feature in the kernel 0:05:59.929,0:06:02.110 -and you are going to test it heavily, +and you are going to test it heavily. 0:06:02.110,0:06:05.479 -in particular if it has +In particular if it has 0:06:05.479,0:06:06.819 some @@ -393,7 +394,7 @@ some relation to locking. 0:06:10.509,0:06:13.089 -Uh, + 0:06:13.089,0:06:17.840 We could also tell that with the new code @@ -403,10 +404,10 @@ provided by Isilon and Nokia, basically 0:06:19.150,0:06:21.689 -the orad + 0:06:21.689,0:06:25.479 -offered by witness is greatly reduced to about +offered by WITNESS is greatly reduced to about 0:06:25.479,0:06:27.699 the 10th part of @@ -415,7 +416,7 @@ the 10th part of what we had before. 0:06:30.240,0:06:36.150 -Witness is very good at tracking LOR, +WITNESS is very good at tracking LOR, 0:06:36.150,0:06:37.849 but @@ -448,7 +449,7 @@ and basically 0:06:58.879,0:07:04.159 -and basically, it's in the 8th release, +it's in the 8th release, we have new features 0:07:04.159,0:07:05.759 @@ -467,18 +468,18 @@ and their orderings and 0:07:17.549,0:07:23.550 -and shows some graphs of the relations. -Even from the user space, +it shows some graphs of the relations +even from the user space. 0:07:23.550,0:07:28.550 -you don't have to go into the kernel -degubber to look at it's output. +You don't have to go into the kernel +debugger to look at it's output. 0:07:28.550,0:07:35.550 -Well + 0:07:35.620,0:07:37.380 -... + 0:07:37.380,0:07:42.250 Well, I see that sometimes when @@ -507,36 +508,36 @@ when a deadlock is in the kernel. 0:07:59.590,0:08:02.490 -... + 0:08:02.490,0:08:03.389 Usually, 0:08:03.389,0:08:07.939 if you want to find a deadlock -that's happening in the kernel +that's happening in the kernel, 0:08:07.939,0:08:10.909 -your first line of analysis start from the DDB +your first line of analysis starts from the DDB 0:08:10.909,0:08:13.919 instead of a post-mortem analysis, 0:08:13.919,0:08:16.839 -which is even more important, +which is even more important. 0:08:16.839,0:08:22.330 -but using DDB you will get more +But, using DDB you will get more processes and better information. 0:08:22.330,0:08:24.970 -Uh, + 0:08:24.970,0:08:28.499 The most important unit in order to find the deadlock 0:08:28.499,0:08:34.389 -are the LORs reported by witness in order +are the LORs reported by WITNESS in order to see if there is something strange 0:08:34.389,0:08:36.690 @@ -547,10 +548,10 @@ You want to know the state of all the threads that are running on the system that is deadlocking. 0:08:41.700,0:08:42.900 -... + 0:08:42.900,0:08:47.050 -You can see that you're deadlocking if you see that +You can see that you're deadlocking, if you see that 0:08:47.050,0:08:48.070 on the runqueue @@ -570,21 +571,21 @@ and you have all the threads sleeping in their own containers. 0:09:02.450,0:09:07.850 -You need to know which are exactly locks +You need to know which are the exact locks that are acquired 0:09:07.850,0:09:11.270 in the system 0:09:11.270,0:09:15.570 -and that's something that witness provides, +and that's something that WITNESS provides 0:09:15.570,0:09:20.720 -and the very important things is +and the very important thing is to know why the threads are stopping. 0:09:20.720,0:09:24.250 -So one on the most important things is +So one of the most important things is retrieving what the threads were doing 0:09:24.250,0:09:26.320 @@ -594,13 +595,13 @@ when they were put asleep. 0:09:28.960,0:09:30.070 -... + 0:09:30.070,0:09:33.009 The backtraces of all the threads involved 0:09:33.009,0:09:37.130 -are so printed out in order to identify deadlocks. +are printed out in order to identify deadlocks. 0:09:37.130,0:09:38.589 In the case that @@ -609,13 +610,13 @@ In the case that buffered cache and VFS are 0:09:42.830,0:09:45.910 -probably parts of the deadlocking +probably parts of the deadlocking, 0:09:45.910,0:09:50.790 you should also print out 0:09:50.790,0:09:53.420 -the info about vnodes +the information about vnodes 0:09:53.420,0:09:58.250 and what we're interested in is which vnodes are called, @@ -630,10 +631,10 @@ are actually referenced and 0:10:03.530,0:10:10.530 -which way they were called. +in which way they were called. 0:10:11.030,0:10:13.380 -So +So, 0:10:13.380,0:10:15.770 this is an example @@ -648,7 +649,7 @@ thread states in the case of a deadlock. 0:10:20.760,0:10:27.480 -This is an example of a real deadlock +This is an real example of a deadlock 0:10:27.480,0:10:28.900 but you can see @@ -663,10 +664,10 @@ this is not totally complete. But you can see that all the threads are sleeping. 0:10:38.450,0:10:39.870 -Uh... + 0:10:39.870,0:10:43.580 -And this one is the message +This one is the message 0:10:43.580,0:10:44.790 used by the wait channel @@ -678,7 +679,7 @@ on which they're sleeping on or used by 0:10:48.710,0:10:54.480 -the container like the turnsile or the sleep queue. +the container like the turnstile or the sleepqueue. 0:10:54.480,0:10:59.410 If I recall correctly, it's a forced amount @@ -688,7 +689,7 @@ that does deadlocking at some point. I'm not really sure 0:11:01.290,0:11:04.190 -because I have to take a look at it. +because I should have looked at it. 0:11:04.190,0:11:08.810 You can see that the revelant command here @@ -698,7 +699,7 @@ is -ps that DDB supports. 0:11:11.220,0:11:14.220 -Um, + 0:11:14.220,0:11:17.520 Another important thing @@ -716,43 +717,39 @@ As you can see there, usually 0:11:25.210,0:11:31.600 -is because you can add some data structures corrupted +its because you can add some data structures corrupted 0:11:31.600,0:11:34.320 in the per-CPU datas. 0:11:34.320,0:11:38.830 -That's a very common situation when you get deadlocks, +That's a very common situation where you can get deadlocks, 0:11:38.830,0:11:40.280 -because, for example, +because, for example, 0:11:40.280,0:11:43.149 leaving a corrupted LPD will lead -0:11:43.149,0:11:48.750 -I loved you too much review shellacking double -falls and things like that about that - 0:11:48.750,0:11:55.290 to you having a bigger massive breakage like -double-faults. In general. it's a good idea to -look at all the CPUs involved in the system. +double-faults and things like that. Usually it's always a +good idea to look at all the CPUs involved in the system. 0:11:55.290,0:11:57.310 The command 0:11:57.310,0:12:00.120 -is ""show allpcpu"" +is """"-show allpcpu"". 0:12:00.120,0:12:04.960 -Um, + 0:12:04.960,0:12:06.959 This one 0:12:06.959,0:12:12.009 -this one is a witness specific command -show alllocks +is a WITNESS specific command ""-show alllocks"" and it's going to show all the locks, 0:12:12.009,0:12:13.130 @@ -771,7 +768,7 @@ a mount, and the thread is this one, 0:12:21.270,0:12:23.660 -what lock is holding, +what the lock is holding, 0:12:23.660,0:12:24.970 that's the address @@ -783,7 +780,7 @@ and where it was acquired. It gives you lines and file. 0:12:31.140,0:12:32.770 -... + 0:12:32.770,0:12:34.730 Actually, @@ -792,7 +789,7 @@ Actually, that's just possible 0:12:37.620,0:12:40.859 -with witness, because otherwise, +with WITNESS, because otherwise, 0:12:40.859,0:12:44.410 trying to keep the oldest information @@ -806,13 +803,13 @@ Then, the most important thing is the backtrace for any thread. 0:12:59.730,0:13:01.150 -... + 0:13:01.150,0:13:03.390 It's going to show the backtrace 0:13:03.390,0:13:05.700 -for old threads. +for all the threads. 0:13:05.700,0:13:08.380 the seas @@ -827,7 +824,7 @@ the thread with these addresses TID and PID basically got sleeping 0:13:15.350,0:13:17.140 -on a pnode. +on a vnode. 0:13:17.140,0:13:22.020 You will see that the backend in this case is FFF @@ -839,7 +836,7 @@ and that's the context switching function, 0:13:25.729,0:13:26.900 -next + 0:13:26.900,0:13:32.220 those are the sleepqueues of the containter @@ -867,10 +864,11 @@ you will have a lot of these kinds of traces, but they are very important 0:13:53.079,0:13:59.270 -so as developers to understand what is going on. +for the developers in order to understand +what is going on. 0:13:59.270,0:14:02.590 -This ones are the locked vnodes +These ones are the locked vnodes 0:14:02.590,0:14:05.830 that are also very important when @@ -891,10 +889,10 @@ they are specific to some handling of the vnodes such as recycling, 0:14:23.850,0:14:26.020 -and completely freeing +and completely freeing. 0:14:26.020,0:14:27.290 -that's the mount point +That's the mount point 0:14:27.290,0:14:28.770 where the vnodes @@ -927,7 +925,7 @@ For example, it tells you that the lock 0:14:52.830,0:14:55.040 -the lock is in exclusive mode +is in exclusive mode 0:14:55.040,0:14:56.280 and @@ -942,13 +940,13 @@ on its queues. That's also 0:15:04.090,0:15:06.370 -the node number +the node number. 0:15:06.370,0:15:09.140 -... + 0:15:09.140,0:15:13.880 -there are also under information you could receive +There is also other information you could receive from the DDB linked to, for example, 0:15:13.880,0:15:16.980 @@ -961,18 +959,18 @@ like sleep chains, for any 0:15:19.310,0:15:24.250 -wait channel if you have the address +wait channel, if you have the address 0:15:24.250,0:15:27.150 -and, for example, +and for example, 0:15:27.150,0:15:32.650 you can also print the wall table of -the lock relations from witness +the lock relations from WITNESS 0:15:32.650,0:15:38.010 but it's mostly never useful -because you already know that. +because you should already know that. 0:15:38.010,0:15:41.100 So you will just need to know which is the one @@ -981,14 +979,12 @@ So you will just need to know which is the one that 0:15:41.980,0:15:43.019 -um, + 0:15:43.019,0:15:47.750 -can give trouble. +can give the trouble. 0:15:47.750,0:15:51.640 -if you're going to say I mean some problem -is a problem 0:15:51.640,0:15:53.980 So if you are going to submit some problems @@ -1020,8 +1016,8 @@ I think that it is a very good thing to talk about it. 0:16:25.670,0:16:31.420 -Along with the witness, we have another -mechanism that could help us, +Along with the WITNESS, we have another +important mechanism that could help us with deadlocks 0:16:31.420,0:16:34.620 and it's called KTR. @@ -1030,17 +1026,17 @@ and it's called KTR. KTR is 0:16:36.100,0:16:40.630 -basically a logger of events, +basically a logger, a kernel logger, of events. 0:16:40.630,0:16:42.550 -it's +It's 0:16:42.550,0:16:45.090 highly configurable, 0:16:45.090,0:16:48.280 as you can, for example, -handle different classes of events, +handle different classes of events. 0:16:48.280,0:16:53.940 In FreeBSD we have @@ -1071,7 +1067,7 @@ enable several classes, like 0:17:13.610,0:17:16.470 -the time class of the KTR +the ten classes of the KTR 0:17:16.470,0:17:21.000 and then you are just interested in three of them @@ -1114,7 +1110,7 @@ and not the information, for example, 0:17:48.390,0:17:50.160 -a string, +the strings, 0:17:50.160,0:17:55.000 it doesn't make copies, you need to just pass @@ -1139,10 +1135,10 @@ you can also look at it from the user space through the ktrdump interface. 0:18:13.430,0:18:15.820 -... + 0:18:15.820,0:18:17.030 -And now... + 0:18:17.030,0:18:19.669 Why is that important for locking? @@ -1163,8 +1159,8 @@ on which CPU branches, and the order it happened in. 0:18:30.020,0:18:34.580 -and this is very important when you're -going to track down for example races, +This is very important when you're +going to track down for example traces, 0:18:34.580,0:18:37.720 when you are not sure about the order of operations and @@ -1185,14 +1181,14 @@ a typical trace of KTR, where you have 0:18:52.410,0:18:56.890 -the CPU where the event happened, the index, +the CPU where the event happened, thats the index, 0:18:56.890,0:18:58.620 that's a timestamp, 0:18:58.620,0:19:03.400 I think it's retrieved directly from the TSC, -but i'm not actually sure. +but i'm actually not sure. 0:19:03.400,0:19:04.889 In this case, @@ -1202,7 +1198,7 @@ i was tracking down the scheduler class, 0:19:10.210,0:19:16.100 so I was interested mainly in scheduler -workloads and i could see +workloads and I could see 0:19:16.100,0:19:19.210 for example @@ -1238,10 +1234,10 @@ this one and other things. 0:19:46.420,0:19:48.770 -Well + 0:19:48.770,0:19:50.820 -you can enable it +You can enable 0:19:50.820,0:19:55.280 the option KTR, but you must handle it carefully. @@ -1266,7 +1262,7 @@ enough entries to have a reliable tracking. 0:20:11.500,0:20:13.580 -... + 0:20:13.580,0:20:16.780 For example, if you are going to track a lot of events, @@ -1293,23 +1289,23 @@ let you compile some classes, or mask them, 0:20:38.370,0:20:43.770 -or mask a CPU, +or even mask the CPU. 0:20:43.770,0:20:46.289 -if you have a big SMP environment, +If you have a big SMP environment, 0:20:46.289,0:20:50.160 so that you can selectively enable some of them. 0:20:50.160,0:20:54.700 For example, this is very good for -tracking down races in the sleeping queue. +tracking down traces in the sleeping queue. 0:20:54.700,0:21:01.700 You can find referrals here. 0:21:02.770,0:21:04.820 -... + 0:21:04.820,0:21:06.220 So, @@ -1332,7 +1328,7 @@ I think that actually our locking system is pretty complete, 0:21:21.750,0:21:26.919 -but it's also confusing for newcomers, +but it's also pretty confusing for newcomers, it's not widely documented. 0:21:26.919,0:21:32.280 @@ -1361,14 +1357,14 @@ who just need to do simple tasks. For example, I saw a lot of guys coming from Linux World 0:21:56.660,0:22:00.620 -who wanted to actually use spinlocks for time +who wanted to actually use spinlocks for time. 0:22:00.620,0:22:05.720 -it's obvious they are missing something from our -architecture +It's obvious they are missing something from our +architecture. 0:22:05.720,0:22:07.250 -From, uh, ... +From 0:22:07.250,0:22:11.010 just a technical point of view, @@ -1384,7 +1380,7 @@ we have lockmgr and sxlog 0:22:22.900,0:22:27.990 which are both read/write locks and -are both serverd by sleep queues. +are both servered by sleep queues. 0:22:27.990,0:22:31.800 They have some differences, obviously, @@ -1394,21 +1390,20 @@ but, mainly, 0:22:32.660,0:22:38.920 we could manage the missing bits and -just one of the 2 interfaces. +just use one of the two interfaces. 0:22:38.920,0:22:42.059 -on the scene where he hasn't told you before -visiting all + 0:22:42.059,0:22:43.920 In the same way, as i told you before, 0:22:43.920,0:22:47.340 -the sleeping point, true end sleep, +the sleeping point, true-end sleep, read/write sleep and sxsleep 0:22:47.340,0:22:50.350 -should probably be merged with cond vars +should probably be managed with cond vars 0:22:50.350,0:22:52.930 and superdoff our kernel @@ -1418,10 +1413,10 @@ and we should probably drop sema, 0:22:55.070,0:23:00.290 because it is obsolete, and can be -replaced by condova and mutex. +replaced by condvars and mutex. 0:23:00.290,0:23:02.620 -... + 0:23:02.620,0:23:03.830 From @@ -1500,7 +1495,7 @@ Instead, the other one 0:24:10.290,0:24:16.180 uses spinning on a local variable -which is not shared by the threads +which is not shared by the threads. 0:24:16.180,0:24:18.030 and the time spent @@ -1512,10 +1507,10 @@ on that local variable increases 0:24:22.780,0:24:25.440 -um... + 0:24:25.440,0:24:28.320 -... + 0:24:28.320,0:24:31.780 with the passing of time. @@ -1527,10 +1522,10 @@ Another interesting thing would be benchmarking different wake-up algorithms for blocking primitives. 0:24:37.930,0:24:42.390 -We have an algorithms that has proven to be +We have an algorithm that has proven to be 0:24:42.390,0:24:42.910 -... + 0:24:42.910,0:24:45.200 quite good @@ -1549,10 +1544,10 @@ a higher overhead but could give time improvements on a big SMP environment. 0:24:59.760,0:25:02.500 -Um, ... + 0:25:02.500,0:25:07.000 -another thing that would be very interesting +Another thing that would be very interesting to fix is the priority inversion problem 0:25:07.000,0:25:08.670 @@ -1584,7 +1579,7 @@ is often just a single atomic operation, and 0:25:30.010,0:25:33.770 -if it fails +if it fails, 0:25:33.770,0:25:36.900 it falls down and the art pattern tries to do @@ -1595,7 +1590,7 @@ in this case the owner of record technic was going to make the fastpack too simple 0:25:40.210,0:25:42.640 -Basically +Basically, 0:25:42.640,0:25:46.310 it just considers @@ -1611,7 +1606,7 @@ And it practically lands the priority to this owner of record which does it's right log. 0:26:02.210,0:26:06.900 -... + 0:26:06.900,0:26:11.900 Another important thing obviously is improving locking @@ -1647,4 +1642,4 @@ like the one we saw before with the malloc command, that needs to sleep. 0:26:44.070,0:26:44.320 -Questions? +Any questions? |