0:00:00.000,0:00:02.740
My name is Attilio Rao and
0:00:02.740,0:00:05.960
I think that we are in time for the presentation
0:00:05.960,0:00:10.870
I want to ask sorry for my English because it's not really British English but I will
0:00:10.870,0:00:12.480
try to make this
0:00:12.480,0:00:16.359
a little bit uncomfortable
0:00:16.359,0:00:21.300
Better?
0:00:21.300,0:00:24.609
Ok.Thank you.So we are going to speak about the
0:00:24.609,0:00:28.639
the locking infrastructure in the FreeBSD kernel
which
0:00:28.639,0:00:33.440
is a bit interesting topic because
0:00:33.440,0:00:38.890
Its going to be with time very widely discussed on our mailing list not only
0:00:38.890,0:00:43.100
from developer's perspective but even from user's perspective.
0:00:43.100,0:00:49.470
and we will see why later
0:00:49.470,0:00:52.990
In this presentation we will specifically see what
0:00:52.990,0:00:55.100
was the situation
0:00:55.100,0:00:57.010
of the first
0:00:57.010,0:00:59.150
FreeBSD implementations
0:00:59.150,0:01:01.120
and what changed from that
0:01:01.120,0:01:06.690
what specifically what's called the SMPng era
0:01:06.690,0:01:07.639
and what
0:01:07.639,0:01:10.500
we had prior that
0:01:10.500,0:01:12.780
we are going to discuss
0:01:12.780,0:01:13.579
specifically
0:01:13.579,0:01:19.160
locking primitives that has been introduced with time until now
0:01:19.160,0:01:20.910
and
0:01:20.910,0:01:24.730
problems linked to
0:01:24.730,0:01:27.620
parellelism in general and how we solve that in
0:01:27.620,0:01:30.950
the FreeBSD kernel
0:01:30.950,0:01:36.200
You can see a table of content a little bit more detailed as
0:01:36.200,0:01:39.850
listing precisely what we
0:01:39.850,0:01:43.210
some problems like
0:01:43.210,0:01:46.159
Priority Inheritance
0:01:46.159,0:01:53.159
and Adaptive Spinning that we are going to discuss fruitfullly.
0:01:53.370,0:01:58.890
Mostly until FreeBSD 4.x
0:01:58.890,0:02:00.830
We had already moved to multitasking.
0:02:00.830,0:02:05.210
so the slide is a little bit confusing but
multitasking and preemptive system
0:02:05.210,0:02:06.360
since
0:02:06.360,0:02:10.379
that transition was not very
0:02:10.379,0:02:14.180
was not very difficult to implement in such systems
because
0:02:14.180,0:02:17.479
if you can see then our uniprocessor machine
0:02:17.479,0:02:18.929
you can get that
0:02:18.929,0:02:20.019
well
0:02:20.019,0:02:24.029
the sequential execution was
just
0:02:24.029,0:02:25.699
stopped by
0:02:25.699,0:02:26.309
preemption
0:02:26.309,0:02:29.400
and by arrival of interrupts
0:02:29.400,0:02:33.969
so you should adjustment in consistency of data structures
0:02:33.969,0:02:36.289
about these two issues
0:02:36.289,0:02:37.079
more precisely
0:02:37.079,0:02:39.079
we were handling
0:02:39.079,0:02:41.779
the interrupts and transitions through
0:02:41.779,0:02:43.370
a mechanism
0:02:43.370,0:02:45.779
called SPL
0:02:45.779,0:02:50.769
and for kernel threads, threads running in the kernel we were disabling
0:02:50.769,0:02:51.379
preemption
0:02:51.379,0:02:53.019
in order to avoid
0:02:53.019,0:02:55.809
the corruption of the data structure
0:02:55.809,0:02:57.519
This approach while was
0:02:57.519,0:03:00.629
pretty good on uniprocessor machines
0:03:00.629,0:03:02.269
was actually
0:03:02.269,0:03:04.270
impredictable for
0:03:04.270,0:03:06.219
the SMP environments
0:03:06.219,0:03:10.199
more precisely because we had more coures that
0:03:10.199,0:03:12.959
was running thread per time
0:03:12.959,0:03:13.909
and so
0:03:13.909,0:03:14.980
parallel
0:03:14.980,0:03:19.309
accesses to the data structures were possible
0:03:19.309,0:03:21.290
in order to
0:03:21.290,0:03:22.469
to avoid
0:03:22.469,0:03:24.149
big problems in the kernel
0:03:24.149,0:03:25.799
we have to just
0:03:25.799,0:03:26.739
allow
0:03:26.739,0:03:28.989
the entering of
0:03:28.989,0:03:32.309
one thread per time into kernel
0:03:32.309,0:03:35.379
while that was a pretty good approach
0:03:35.379,0:03:39.049
for workloads that were nearly user space
0:03:39.049,0:03:40.969
for work loads
0:03:40.969,0:03:45.619
requiring a lot of IO for example they were wateful because they wasn't
0:03:45.619,0:03:47.839
getting any advantage from the new
0:03:47.839,0:03:49.819
SMP architecture
0:03:49.819,0:03:52.749
like the parallelism was basically zero
0:03:52.749,0:03:55.189
at least in the kernel
0:03:55.189,0:03:55.949
in order
0:03:55.949,0:04:00.650
to fix that a new project was created
called SMP
0:04:00.650,0:04:01.470
New generation
0:04:01.470,0:04:05.169
or NG
0:04:05.169,0:04:07.309
as you can see it from the slide
0:04:07.309,0:04:10.329
the entering in the kernel was preempted
0:04:10.329,0:04:12.569
by using Big Lock
0:04:12.569,0:04:19.569
called BKL basically
0:04:23.199,0:04:28.109
With FreeBSD 5.x we had the SMP new generation project
0:04:28.109,0:04:30.110
basically it was
0:04:30.110,0:04:31.509
a sanitization of the
0:04:31.509,0:04:34.539
of all
0:04:34.539,0:04:40.039
our kernel and the engineering over lot of mechanism inside our kernel. We could see it
0:04:40.039,0:04:44.709
FreeBSD 4.x and FreeBSD 5.x as mainly two different kernels
0:04:44.709,0:04:45.550
because of
0:04:45.550,0:04:50.150
substantial subsystem were rewritten and
0:04:50.150,0:04:51.830
were written with the
0:04:51.830,0:04:56.949
idea to use and implement a real parallelism in mind.
0:04:56.949,0:05:02.610
we can say that basically it was a major task a very big task
0:05:02.610,0:05:04.029
and that it required
0:05:04.029,0:05:06.669
a lot of years to be brought
0:05:06.669,0:05:08.900
in a good shape at least
0:05:08.900,0:05:11.379
In Italy, the people gave
0:05:11.379,0:05:13.069
a lot of
0:05:13.069,0:05:16.350
complaining about the
0:05:16.350,0:05:20.430
un-robustness of FreeBSD 5.x but
0:05:20.430,0:05:22.249
probably that's because they couldn't even
0:05:22.249,0:05:28.929
see that the changes were really really important and really huge
0:05:28.929,0:05:34.490
however for FreeBSD 5.x based this initial SMP system
0:05:34.490,0:05:37.070
inheriting from BSD/OS
0:05:37.070,0:05:39.309
that kindly
0:05:39.309,0:05:42.699
released this code above that
0:05:42.699,0:05:44.009
and the
0:05:44.009,0:05:46.579
the process was break up in
0:05:46.579,0:05:51.069
some precise tasks at least in Italy
0:05:51.069,0:05:55.429
Mainly the first things was introducing in the kernel
0:05:55.429,0:06:00.180
new set of atomic instruction and locking primitives
0:06:00.180,0:06:01.520
Then introducing
0:06:01.520,0:06:05.380
an abstraction called interrupt threads that we are going to discuss
0:06:05.380,0:06:06.929
rather later but
0:06:06.929,0:06:12.319
it was basically restored completely the interrupt mechanism that was in the FreeBSD 4.x
0:06:14.439,0:06:16.490
the the BKL
0:06:16.490,0:06:19.210
lock was moved to a real
0:06:19.210,0:06:20.679
mutex called Giant
0:06:20.679,0:06:23.180
that still exists in our kernel
0:06:23.180,0:06:26.660
and they were introduced some threading primitives
0:06:26.660,0:06:28.019
the
0:06:28.019,0:06:30.499
like and and on and
0:06:30.499,0:06:32.280
threading primitives
0:06:32.280,0:06:34.319
called also KSE
0:06:34.319,0:06:37.009
which are actually never used in our kernel
0:06:37.009,0:06:41.620
and that being their being exit out in the past year
0:06:41.620,0:06:43.409
and the
0:06:43.409,0:06:45.259
slowly of the porting of
0:06:45.259,0:06:50.459
all the older subsystems to a finer locking was started
0:06:50.459,0:06:55.919
I have to say this task is not still completed, its still going on but
0:06:55.919,0:06:58.889
we are really good shape about that
0:06:58.889,0:07:02.429
just few subsystems remain which are still Giant protected
0:07:02.429,0:07:05.939
and with new release that we're going to ship this year, I think that we made
0:07:05.939,0:07:10.220
a very huge step forward in this direction
0:07:12.319,0:07:18.599
really the the SMPng has been considered closed around the end of
0:07:18.599,0:07:20.600
2007
0:07:20.600,0:07:22.579
but the
0:07:22.579,0:07:23.819
the
0:07:23.819,0:07:27.539
the important parts where this initial moving
0:07:27.539,0:07:32.669
I rather thing that's not listed here but I can tell you is that
0:07:32.669,0:07:38.279
even that if Giant was preventing any parallelism initial parallelism
0:07:38.279,0:07:43.219
that were imported new kernel memory allocator that was
0:07:43.219,0:07:45.009
that I discovered
0:07:45.009,0:07:48.439
and the scheduler was move with a seperate lock
0:07:48.439,0:07:50.449
in order to
0:07:50.449,0:07:52.080
start getting some
0:07:52.080,0:07:54.699
a little bit of concurrency
0:07:54.699,0:07:59.099
a real concurrency
0:07:59.099,0:08:01.520
the
0:08:01.520,0:08:06.280
before to speak about FreeBSD specifics we can start digging in about
0:08:06.280,0:08:08.219
what kind of
0:08:08.219,0:08:12.729
of locking primitives you can find in our kernel.
0:08:12.729,0:08:15.780
from a more historical point of view
0:08:15.780,0:08:19.710
we have some versions of mutex which
0:08:19.710,0:08:20.919
I assume
0:08:20.919,0:08:24.809
people here knows about that but I'm going to give a little explanation
0:08:24.809,0:08:26.449
for people that doesn't know
0:08:26.449,0:08:28.939
a mutex is basically
0:08:28.939,0:08:30.739
a lock allowing to access
0:08:30.739,0:08:36.700
to some protected data's thread to just one thread per time
0:08:36.700,0:08:38.150
so if a thread
0:08:38.150,0:08:39.690
owns the lock,
0:08:39.690,0:08:40.760
owns the mutex
0:08:40.760,0:08:42.539
other threads
0:08:42.539,0:08:44.039
won't be able to
0:08:44.039,0:08:46.090
to access to this until
0:08:46.090,0:08:48.730
this lock is released
0:08:48.730,0:08:50.430
we offer even
0:08:50.430,0:08:54.890
some kind of locks called R/W lock Read/Write lock
0:08:54.890,0:08:57.920
which are basically a
0:08:57.920,0:09:03.050
locks that can be acquired in two different versions
0:09:03.050,0:09:04.060
one version
0:09:04.060,0:09:07.980
is the write lock which is the same as the mutex just one
0:09:07.980,0:09:10.010
in the protected part per time
0:09:10.010,0:09:13.860
and other one is the read mode which basically
0:09:13.860,0:09:15.100
allows
0:09:15.100,0:09:18.410
all the thread willing to acquire to read mode to
0:09:18.410,0:09:23.699
concurrently adjust to the structure but prevents the threads from
0:09:23.699,0:09:25.390
writing nto the protected path.
0:09:25.390,0:09:28.890
while the reader..while they are readers
0:09:28.890,0:09:30.280
then we have even
0:09:30.280,0:09:33.030
the locks called the Read Mostly
0:09:33.030,0:09:37.570
which are basically the same of Read/Write Locks but are
0:09:37.570,0:09:42.500
they have some optimization in order to make the Read
0:09:42.500,0:09:44.180
part be really fast
0:09:44.180,0:09:46.930
and to have like
0:09:46.930,0:09:48.180
zero overhead
0:09:48.180,0:09:51.410
zero overhead kind of lock
0:09:51.410,0:09:53.350
from the read path while
0:09:53.350,0:09:55.590
probably the write path is even
0:09:55.590,0:09:59.210
heavier than the other one but if you think about cases that
0:09:59.210,0:10:01.710
just
0:10:01.710,0:10:02.750
where
0:10:02.750,0:10:06.980
there are a lot of reader chases and very few writer chases you can find that a
0:10:06.980,0:10:08.220
very useful
0:10:08.220,0:10:11.070
very useful primitive
0:10:11.070,0:10:11.850
then we have
0:10:11.850,0:10:14.360
some form of Wait channels
0:10:14.360,0:10:16.030
Wait channels
0:10:16.030,0:10:17.140
basically are what
0:10:17.140,0:10:21.700
generalizations of what other people con call like
0:10:22.470,0:10:24.240
condition variable and
0:10:24.240,0:10:28.240
they basically let that thread sleep
0:10:28.240,0:10:30.870
under some conditions that are
0:10:30.870,0:10:35.200
that are previously started with some
0:10:35.200,0:10:36.610
some variables
0:10:36.610,0:10:37.150
usually
0:10:37.150,0:10:39.500
having a Wait channel means that its
0:10:39.500,0:10:45.080
chases are controlled through another locking primitive like a mutex
0:10:45.080,0:10:46.640
or R/Wlock
0:10:46.640,0:10:52.010
and so often the Wait channel is associated to its
0:10:52.010,0:10:53.620
to its locking primitive
0:10:53.620,0:11:00.140
usually if you have no necessity to use a Wait channel without a primitive
0:11:00.140,0:11:04.150
a locking primitive you probably have bad code
0:11:04.150,0:11:06.830
but there are some edge cases
0:11:06.830,0:11:09.660
with that seem possible
0:11:09.660,0:11:13.550
As last thing FreeBSD sub primitive counting semaphore
0:11:13.550,0:11:15.290
even if thats considered not featured
0:11:15.290,0:11:17.710
as we are going to see I think they're going to see it and
0:11:17.710,0:11:23.570
its usage is pretty much discouraged
0:11:23.570,0:11:28.320
basically FreeBSD you can consider locking primative divided into three classes
0:11:28.320,0:11:31.250
three classes of
0:11:31.250,0:11:32.450
of locking
0:11:32.450,0:11:34.090
based mainly in
0:11:34.090,0:11:35.600
particular
0:11:35.600,0:11:37.340
from an outside perspective
0:11:37.340,0:11:38.690
based on the behavior
0:11:38.690,0:11:42.680
the contending threads as you regard of the lock
0:11:42.680,0:11:48.100
for example in case of a mutex you can can get that
0:11:48.100,0:11:53.360
spinning and blocking mutex do very different things about the contenders
0:11:53.360,0:11:59.680
as we are going to see more of this later
0:11:59.680,0:12:03.410
usually in the traditional literature,
0:12:03.410,0:12:05.430
there are just two
0:12:05.430,0:12:07.280
cases of the lock classes mainly
0:12:07.280,0:12:08.620
you will find the
0:12:08.620,0:12:11.200
spinning lock and the blocking lock
0:12:11.200,0:12:14.370
or what they called the sleeping lock
0:12:14.370,0:12:16.670
the I think that
0:12:16.670,0:12:21.020
as we're going to see why we have three types I think that things will be clear but
0:12:21.020,0:12:27.100
if you have any questions please ask us. Thats not a problem
0:12:27.100,0:12:29.930
Spinning primitives as I told you
0:12:29.930,0:12:32.810
allows the contesting thread to
0:12:32.810,0:12:36.120
to check the status of the lock periodically
0:12:36.120,0:12:37.590
and the
0:12:37.590,0:12:40.420
and they just do busy waiting around
0:12:40.420,0:12:41.890
the locking variable
0:12:41.890,0:12:46.400
as the spinning primitive FreeBSD just offers mutex
0:12:46.400,0:12:50.689
What are the problems linked with this kind of, with this class
0:12:50.689,0:12:53.869
of locks? Mainly its that CPU
0:12:53.869,0:12:58.130
remains busy without doing really nothing useful
0:12:58.130,0:12:59.740
it happens
0:12:59.740,0:13:03.620
that if several threads contest on the
0:13:03.620,0:13:04.870
on the locks
0:13:04.870,0:13:08.210
basically they share the same cache line where the lock is
0:13:08.210,0:13:10.220
where the lock is
0:13:10.220,0:13:12.400
that means that
0:13:12.400,0:13:17.470
contesting or sharing a cache line is a lot underlying activity
0:13:17.470,0:13:20.150
on a lot of architectures like for example
0:13:20.150,0:13:23.660
having a lot of snoop messages between CPUs
0:13:23.660,0:13:26.450
and some buses
0:13:26.450,0:13:28.120
some buses traffic
0:13:28.120,0:13:31.980
which means in a variety operations
0:13:31.980,0:13:35.740
and the last things even the most important you can note is that interrupts
0:13:35.740,0:13:37.120
are disabled
0:13:37.120,0:13:39.330
while spin locks are held
0:13:39.330,0:13:40.810
that was
0:13:40.810,0:13:42.979
that happens mainly because there are
0:13:42.979,0:13:45.140
there were identified in the past by some
0:13:45.140,0:13:47.970
kind of deadlocks possible
0:13:47.970,0:13:50.180
if you were going to lead
0:13:50.180,0:13:51.710
the spin locks
0:13:51.710,0:13:55.900
the interrupts enabled while holding a spin lock. In particular
0:13:55.900,0:13:58.180
you could find that there are
0:13:58.180,0:14:02.530
some problems with the interrupts angling good in the botom half that was
0:14:02.530,0:14:05.040
going to deadlock
0:14:05.040,0:14:10.250
Its not very simple to understand the thing so I've left out
0:14:10.250,0:14:12.360
but if you want to know
0:14:12.360,0:14:15.990
we could speak later probably
0:14:17.820,0:14:21.320
with spinning primitives we are even blocking primitives
0:14:21.320,0:14:22.890
blocking primitives
0:14:25.260,0:14:26.860
allows the
0:14:26.860,0:14:28.440
basically the contenders to be
0:14:28.440,0:14:30.980
descheduled from the runqueue
0:14:30.980,0:14:35.790
to be put on another kind of container
0:14:35.790,0:14:38.000
put on another kind of container
0:14:38.000,0:14:40.489
and basically
0:14:40.489,0:14:41.399
context switch immediately
0:14:41.399,0:14:44.360
immediately.
0:14:44.360,0:14:49.440
then we put again on runqueue of the scheduler just once the just when the owner
0:14:49.440,0:14:51.570
is going to release the lock
0:14:51.570,0:14:53.260
and it will be the owner
0:14:53.260,0:14:56.930
the owner that was going to
0:14:56.930,0:15:00.310
do all the operations about that
0:15:00.310,0:15:05.550
we have several primitives implemented as blocking primitives like mutexes
0:15:05.550,0:15:10.470
R/W locks and R-M locks
0:15:11.430,0:15:13.140
with
0:15:13.140,0:15:16.890
basically with
0:15:16.890,0:15:21.780
blocking primitives we have a lot of advantages over the spinning mutex
0:15:21.780,0:15:24.650
like having the contenders
0:15:24.650,0:15:26.560
that
0:15:26.560,0:15:27.590
that sleeps
0:15:27.590,0:15:31.840
or that blocks avoids CPU busyness
0:15:31.840,0:15:34.660
and mainly we can leave the
0:15:34.660,0:15:37.150
we can leave the
0:15:37.150,0:15:42.040
we can leave that basically the interrupts out
0:15:42.040,0:15:45.760
that happens mainly because the interrupts code is just allowed
0:15:45.760,0:15:50.710
at least the bottom of one is just allowed
0:15:50.710,0:15:52.070
to use spin locks
0:15:52.070,0:15:56.049
probably if it was going to use blocking primitives
0:15:56.049,0:16:01.060
we wouldnt have been able to disable interrupts here
0:16:01.060,0:16:02.239
There are however some
0:16:02.239,0:16:04.790
big drawbacks that as you will see
0:16:04.790,0:16:07.210
we handle in FreeBSD
0:16:07.210,0:16:11.280
in order to make the blobking primitives our
0:16:11.280,0:16:13.540
how could I tell
0:16:13.540,0:16:16.440
the first choice in terms of blocking
0:16:16.440,0:16:19.690
where the problem called Priority Inversion
0:16:19.690,0:16:21.899
and we have
0:16:21.899,0:16:27.589
the problem that context switches are very heavy in particular
0:16:27.589,0:16:30.209
on machines that FreeBSD uses as referral
0:16:30.209,0:16:33.500
like E38 and the MD64
0:16:33.500,0:16:37.940
but as you're going to see we've used two techniques in order to
0:16:37.940,0:16:40.020
to cope with that
0:16:42.020,0:16:45.830
another thing is that while you cant
0:16:45.830,0:16:47.920
allow
0:16:47.920,0:16:50.089
context switches while having
0:16:50.089,0:16:52.570
while holding spin lock
0:16:52.570,0:16:55.249
it's obvious you cant
0:16:55.249,0:16:59.580
acquire a locking primitive while holding a spin lock
0:16:59.580,0:17:02.110
that's an important rule in FreeBSD
0:17:02.110,0:17:06.089
that sometimes its confused and often its not
0:17:06.089,0:17:07.470
observed
0:17:07.470,0:17:09.929
that leads to block refusal
0:17:12.170,0:17:16.610
usually you will always prefer a blocking primitive for a spin lock
0:17:16.610,0:17:22.159
if not in some very particular condition like what
0:17:22.159,0:17:25.010
Alrick said about the interrupt and even
0:17:25.010,0:17:26.090
about the
0:17:28.160,0:17:30.570
some parts that are very very short
0:17:30.570,0:17:33.629
we should have some example in the kernel even if I can
0:17:33.629,0:17:35.390
I can tell you one right now
0:17:35.390,0:17:38.770
I have no idea actually
0:17:38.770,0:17:39.500
so that
0:17:39.500,0:17:43.740
we're going to see the problemslinked with the blocking primitives the first one is
0:17:43.740,0:17:45.679
called Priority Inversion
0:17:45.679,0:17:46.389
basically
0:17:46.389,0:17:49.130
it could happen that like a thread A
0:17:49.130,0:17:51.410
which has a priority
0:17:51.410,0:17:55.380
owns a lock. call it L for example
0:17:55.380,0:17:58.710
then another thread with another priority than this one
0:17:58.710,0:18:00.690
locks on this lock
0:18:00.690,0:18:03.299
what happens is that the second thread
0:18:03.299,0:18:04.120
the thread B
0:18:04.120,0:18:05.870
for example
0:18:05.870,0:18:08.920
will need to wait for a lower priority thread
0:18:08.920,0:18:13.070
to finish its work load
0:18:13.070,0:18:15.120
we
0:18:15.120,0:18:17.780
solve this problem actually in the
0:18:17.780,0:18:21.170
kernel using a technique called priority propogation
0:18:21.170,0:18:22.020
basically
0:18:22.020,0:18:24.620
what happens is that priority of thread B
0:18:25.760,0:18:27.880
is lent to thread A
0:18:27.880,0:18:31.460
until it doesn't release the lock
0:18:31.460,0:18:34.760
of its directly implemented in the container
0:18:34.760,0:18:36.180
the turnstiles
0:18:37.870,0:18:39.530
while that could be done
0:18:39.530,0:18:44.290
even on the primitive it has been much convenient to use the container for
0:18:44.290,0:18:45.190
that
0:18:45.190,0:18:45.990
because
0:18:45.990,0:18:52.990
it was going to offer some advantage we are going to see right now
0:18:53.030,0:18:54.240
just note that
0:18:54.240,0:18:56.090
Read locks
0:18:56.090,0:18:57.310
cannot support
0:18:57.310,0:19:03.430
priority propogation fixes for read lock that happens because you'd like to
0:19:03.430,0:19:07.290
the turnstile should keep track of all the readers
0:19:07.290,0:19:11.100
and these would be very very expensive from
0:19:11.100,0:19:12.880
from a
0:19:12.880,0:19:15.540
from a point of view of the overhead
0:19:15.540,0:19:19.800
and even I think I've tried to do something in this regard and I
0:19:19.800,0:19:24.050
saw that there was some races that were trying to
0:19:24.050,0:19:29.390
acquire a spin lock as base even in fast path so it was a
0:19:29.390,0:19:31.320
an impredicable way
0:19:31.320,0:19:32.380
I will tell
0:19:32.380,0:19:37.200
at least for what we found so far
0:19:37.200,0:19:37.630
basically
0:19:37.630,0:19:39.070
what happens
0:19:39.070,0:19:42.150
about the priority propogation is that the
0:19:42.150,0:19:44.830
the threads and the turnstiles
0:19:44.830,0:19:47.000
are chained together
0:19:47.000,0:19:48.350
the thread
0:19:48.350,0:19:50.970
owns the a pointer
0:19:50.970,0:19:53.710
to wrench the turnstile is sleeping on
0:19:53.710,0:19:58.540
and the turnstile owns a pointer above
0:19:58.540,0:20:00.549
the owner of the lock
0:20:00.549,0:20:04.620
what happens is that for example in this case we have
0:20:05.080,0:20:08.070
a sleeper which is going to sleep on a turnstile
0:20:08.070,0:20:08.990
the first lock
0:20:08.990,0:20:13.470
which has a priority of one hundred and twenty eight
0:20:14.120,0:20:15.520
the turnstile
0:20:15.520,0:20:18.370
to the pointer
0:20:18.370,0:20:20.570
ts_owner knows which is its owner
0:20:20.570,0:20:26.150
and this owner has a priority of two hundred and fifty six
0:20:26.150,0:20:31.120
well as you know higher level, higher value means lower priority. so if this is
0:20:31.120,0:20:34.960
a suitable pace for priority propogation
0:20:34.960,0:20:40.820
but what happens is that this owner is actually sleeping on another turnstile
0:20:40.820,0:20:43.419
and the other owner
0:20:43.419,0:20:48.820
of the second turnstile has always the same priority of its sleepers
0:20:48.820,0:20:50.750
so
0:20:50.750,0:20:55.530
just propogating priority to the first owner was just unuseful because the first
0:20:55.530,0:20:56.340
one
0:20:56.340,0:20:57.320
could
0:20:57.320,0:20:58.760
still
0:20:58.760,0:21:00.580
keep the chain to a
0:21:00.580,0:21:04.820
lower priority so it's was going to be propogated to the first one
0:21:04.820,0:21:07.679
actually running
0:21:07.679,0:21:09.870
owner of the chain
0:21:09.870,0:21:14.670
this is the situation after the propogation as you can see all of threads in the chain
0:21:14.670,0:21:16.559
has the same priority
0:21:16.559,0:21:17.950
either possible
0:21:17.950,0:21:24.480
in this case the one the last one arriving
0:21:25.750,0:21:31.720
there are question about that
0:21:31.720,0:21:34.780
no?
0:21:34.780,0:21:36.760
yeah when the
0:21:36.760,0:21:39.720
when the for example the third owner
0:21:39.720,0:21:41.679
the second owner there
0:21:41.679,0:21:43.659
when it goes to release the lock
0:21:43.659,0:21:47.010
it basically brings back the priority to the
0:21:47.010,0:21:49.340
to the
0:21:49.340,0:21:52.490
twenty hundred and sixty five to all the chains
0:21:52.490,0:21:54.650
he is responsible for
0:21:54.650,0:22:01.179
so it just happens at locking operation
0:22:01.179,0:22:04.159
and that is what we do about the Priority Inversion
0:22:04.159,0:22:09.970
inorder to fix instead the overhead given by the
0:22:09.970,0:22:14.030
big amount of context switch we use another technique called adaptive spinning
0:22:14.030,0:22:16.030
basically
0:22:16.030,0:22:20.260
as the context switch brings a lot of overhead
0:22:22.310,0:22:26.090
we prefer to not do
0:22:26.090,0:22:27.770
completely a context switch
0:22:27.770,0:22:30.760
in the case the lock owner is still running
0:22:30.760,0:22:32.190
on a runqueue
0:22:32.190,0:22:38.340
because there are very good chance that the owner is going to release the lock very early
0:22:40.440,0:22:43.990
that means that for example
0:22:43.990,0:22:46.070
we choose just to spin
0:22:46.070,0:22:49.149
in order to wait that the state of the
0:22:49.149,0:22:52.240
lock changed or the state of the owner
0:22:52.240,0:22:57.660
was going to change like the owner going to sleep on another turstile
0:22:57.660,0:22:59.140
and the
0:22:59.140,0:23:03.270
basically we, there have been very big measurement even in the
0:23:03.270,0:23:07.510
another operating system like solice that
0:23:07.510,0:23:12.300
where I think we brought in this approach the first time
0:23:12.300,0:23:16.430
that we're we're showing
0:23:16.430,0:23:23.430
a very big improvement in performance from this technique
0:23:25.790,0:23:30.640
apart from the two types of primitives, these are sleeping primitives
0:23:30.640,0:23:36.120
now there is a consideration we have to make about that
0:23:36.120,0:23:38.110
basically sleeping primitives
0:23:38.110,0:23:42.320
should be in theory just the
0:23:42.320,0:23:44.340
the wait channels
0:23:44.340,0:23:49.170
wait channels should have been the only one implemented using the
0:23:49.170,0:23:50.630
container called
0:23:50.630,0:23:52.760
sleepqueue
0:23:52.760,0:23:53.910
but
0:23:53.910,0:23:56.170
due to some legacy
0:23:56.170,0:24:01.000
the actually the sleepqueues were used to implement other kind of other
0:24:01.000,0:24:03.290
kinds of lock like the
0:24:03.290,0:24:04.219
lockmgr
0:24:04.219,0:24:08.080
and the sx locks and the
0:24:08.080,0:24:11.100
basically the
0:24:11.100,0:24:13.679
semaphore's condvars too
0:24:13.679,0:24:16.010
that has been this is
0:24:16.010,0:24:18.809
going to give some problems actually
0:24:18.809,0:24:19.350
because
0:24:20.450,0:24:24.820
as we're going to see
0:24:24.820,0:24:26.889
and as you can see on the line too
0:24:26.889,0:24:27.929
in the FreeBSD
0:24:27.929,0:24:31.600
while sleeping threads should not hold any kind of lock
0:24:31.600,0:24:33.809
neither blocking nor spinning
0:24:33.809,0:24:36.770
thats a simple thing to explain
0:24:36.770,0:24:40.200
we just want to enforce very
0:24:40.200,0:24:43.490
we just want to enforce
0:24:43.490,0:24:46.060
correct semantics of locking
0:24:46.060,0:24:47.880
so imagine to keep a lock
0:24:47.880,0:24:50.190
a blocking primitive while
0:24:50.190,0:24:50.729
sleeping
0:24:50.729,0:24:53.010
it's going to waste a lot of time
0:24:53.010,0:24:56.530
because all the contenders are going to
0:24:56.530,0:24:58.760
are going to start on the
0:24:58.760,0:25:01.400
lock owner which is sleeping
0:25:01.400,0:25:03.120
basically in fact what
0:25:03.120,0:25:07.169
as you should know condition variables do usually is to drop the lock
0:25:07.169,0:25:11.070
once it was passed to the primitives
0:25:11.070,0:25:12.380
in this case
0:25:14.170,0:25:18.249
basically we just dont allow that this means that's the
0:25:18.249,0:25:23.160
the same conditions happens even for other kinds of lock
0:25:23.160,0:25:25.540
lockmgr and the sx lock
0:25:25.540,0:25:26.860
so you cant hold
0:25:26.860,0:25:29.410
a mutex for example
0:25:29.410,0:25:33.640
of blocking mutex an R/W lock while trying to acquire
0:25:33.640,0:25:38.559
a lockmgr and sx
0:25:38.559,0:25:41.850
this is going to create some problems because
0:25:41.850,0:25:46.830
in some parts that is unavoidable so you have to drop the lock for example and try
0:25:46.830,0:25:48.190
to acquire
0:25:48.190,0:25:49.770
the other primitive
0:25:49.770,0:25:51.320
which is going to
0:25:53.400,0:25:59.110
and so can create some raisee problems
0:26:00.130,0:26:04.779
as the sleepqueues are born just to serve wait channels
0:26:04.779,0:26:09.190
they don't track owner too so they dont care about priority propogation and priority inversion problem
0:26:09.190,0:26:14.430
just because sleepqueues entirely should not have work
0:26:14.430,0:26:20.150
so for example lockmgr and sx have not priority propogation
0:26:20.150,0:26:22.360
systems and the
0:26:22.360,0:26:29.360
so they are discouraged to be used even for this thing mainly
0:26:31.590,0:26:34.930
sure
0:26:36.780,0:26:39.000
it's you mean why it's not
0:26:39.000,0:26:41.790
why doesnt blocking primitives exist yeah?
0:26:41.790,0:26:44.250
so imagine that for example the
0:26:44.250,0:26:45.570
you have a wait channel
0:26:45.570,0:26:47.679
condvar a condition variable
0:26:47.679,0:26:50.950
or M sleep
0:26:50.950,0:26:52.090
M sleep
0:26:52.090,0:26:54.910
the primitive that allows you to sleep on
0:26:54.910,0:26:57.850
a condition variable for example
0:26:57.850,0:26:58.870
however
0:27:00.510,0:27:02.270
the you are
0:27:02.270,0:27:03.350
using the blocking
0:27:03.350,0:27:06.930
the using the turnstile you will go to a
0:27:06.930,0:27:12.110
always the mechanism of priority propogation and priority inversion handling.Its
0:27:12.110,0:27:13.760
not very
0:27:13.760,0:27:14.970
it's pretty
0:27:14.970,0:27:17.320
it's not a simple operation
0:27:17.320,0:27:20.219
it acquires even some kind of spin locks
0:27:20.219,0:27:22.650
in order to avoid some raises
0:27:22.650,0:27:23.340
and so
0:27:23.340,0:27:24.289
it
0:27:24.289,0:27:26.590
so it has an overhead
0:27:26.590,0:27:31.770
if you do in this case it will be not to be useful it will be completely unuseful to have
0:27:31.770,0:27:34.159
a mechanism like that so
0:27:34.159,0:27:37.410
in theory if you just would have used
0:27:37.410,0:27:41.320
a sleeping the sleepqueue for wait channels
0:27:41.320,0:27:42.990
you are to add
0:27:42.990,0:27:46.640
bigperformance boost than just using the turnstile
0:27:46.640,0:27:49.330
for the same problem
0:27:49.330,0:27:51.310
in theory
0:27:51.310,0:27:54.780
but what happened is that other locks are implementedo
0:27:54.780,0:27:55.839
using this sleepqueue
0:27:55.839,0:27:58.070
that should have not be happened
0:27:58.070,0:27:59.260
on the principle
0:27:59.260,0:28:02.960
really I'm not sure who introduced the sx lock
0:28:02.960,0:28:04.440
I'm actually not sure
0:28:04.440,0:28:06.280
and even the lockmgr
0:28:06.280,0:28:09.870
but
0:28:09.870,0:28:12.340
however
0:28:12.340,0:28:17.669
as you could have seen before the three containers create a heirarchy that
0:28:17.669,0:28:20.090
should not be broken like
0:28:20.090,0:28:21.639
you have spinqueues
0:28:21.639,0:28:26.900
you have spin locks you have blocking primitives and sleeping primitives and
0:28:26.900,0:28:31.470
you cannot acquire you cannot mix them there are precise rules like
0:28:31.470,0:28:33.710
on the top the sleeping primitive
0:28:33.710,0:28:37.690
in the mid the blocking primitive and in the end the spinning primitive
0:28:38.900,0:28:44.440
the main choice will be to use blocking primitives always
0:28:44.440,0:28:48.240
because as you can see we handled a lot of problem that they have
0:28:48.240,0:28:49.659
and the practice
0:28:49.659,0:28:52.229
they have proven to be very
0:28:52.229,0:28:53.799
very helpful
0:28:53.799,0:28:54.999
but sometimes
0:28:56.789,0:28:58.790
some nasty conditions can happen
0:28:58.790,0:29:02.900
for example one of the most widespread is the
0:29:02.900,0:29:06.350
using a mallok with a flag M_WAITOK
0:29:06.350,0:29:11.240
in FreeBSD that means that if the allocator is pretty busy or going to
0:29:11.240,0:29:12.680
to sleep
0:29:12.680,0:29:15.760
in order to retreive your memory
0:29:15.760,0:29:17.890
and if you do with a lock hold
0:29:17.890,0:29:22.080
you're going to violate one of our rules and its not
0:29:22.080,0:29:23.440
possible
0:29:23.440,0:29:25.320
another one is just we just
0:29:25.320,0:29:28.299
said before like call a sleeping lock while
0:29:28.299,0:29:32.090
holding a blocking primitive
0:29:33.390,0:29:37.530
in the next example in the next I'm going to show you a way to
0:29:37.530,0:29:41.140
to handle for example the Mallock case
0:29:41.140,0:29:42.520
and similar
0:29:42.520,0:29:45.000
but the that usually
0:29:46.830,0:29:47.620
usually that
0:29:47.620,0:29:49.980
are not very common cases
0:29:49.980,0:29:52.920
at least for simple parts
0:29:52.920,0:29:56.280
you should even try to avoid the
0:29:56.280,0:30:03.280
the
0:30:04.620,0:30:06.180
yes
0:30:06.180,0:30:07.050
even in the
0:30:07.050,0:30:09.120
in the
0:30:09.120,0:30:10.220
wait channel
0:30:10.220,0:30:14.530
as in the FreeBSD you can differentiate between the condition variables and
0:30:14.530,0:30:15.720
Msleep
0:30:15.720,0:30:17.510
usually Msleep was
0:30:17.510,0:30:22.210
really Msleep was introduced as the first primitive
0:30:22.210,0:30:26.190
but it has an interface very very difficult to
0:30:26.190,0:30:28.460
to make saner and to understand
0:30:28.460,0:30:30.470
at least for
0:30:30.470,0:30:31.220
for people
0:30:31.220,0:30:32.120
which are
0:30:32.120,0:30:34.960
comfortable with
0:30:34.960,0:30:39.260
with interface of condition variable that we all saw but they are
0:30:39.260,0:30:40.649
newer primitive
0:30:40.649,0:30:42.660
mainly there is
0:30:42.660,0:30:44.400
so far the newer code
0:30:44.400,0:30:46.960
what you should do is just to
0:30:46.960,0:30:49.000
use condition variables
0:30:49.000,0:30:50.659
and not Msleep
0:30:50.659,0:30:51.630
basically
0:30:51.630,0:30:56.220
Msleep should be dropped off but they have avery nice feature which
0:30:56.220,0:31:02.669
is the the possibility to specify a wake up priority on the sleeping threads
0:31:02.669,0:31:04.740
once they are asleep
0:31:04.740,0:31:07.470
that condvar still doesnt
0:31:07.470,0:31:12.430
maybe if we could port these features to the condition variables we we will be able
0:31:12.430,0:31:13.659
to completely drop off Msleep
0:31:13.659,0:31:18.529
from the work arena
0:31:18.529,0:31:20.450
this is a
0:31:20.450,0:31:25.580
simple case that it's going to show a way to
0:31:26.620,0:31:30.670
a simple way to deal with the for example
0:31:30.670,0:31:34.100
condition I told before the Mallock willing to
0:31:34.100,0:31:35.390
to sleep
0:31:35.390,0:31:38.260
and the doing that while holding a lock
0:31:38.260,0:31:45.070
as you see we have some fake C as some members like flags
0:31:45.070,0:31:47.659
and an object called instructful
0:31:47.659,0:31:49.940
which needs to be allocated
0:31:49.940,0:31:54.400
and that they are protected by an internal lock
0:31:54.400,0:31:58.810
you imagine that for example the fake C create
0:31:58.810,0:32:02.269
holds lock of the object and does some things
0:32:02.269,0:32:04.460
which are not important
0:32:04.460,0:32:07.650
then in the end for example it's going to
0:32:07.650,0:32:09.170
to allocate
0:32:09.170,0:32:14.110
the FC object and that should be protected in
0:32:14.110,0:32:16.470
in anatomic part
0:32:16.470,0:32:20.030
something you can do is just to set the flag
0:32:20.030,0:32:22.160
for that
0:32:22.160,0:32:22.730
saying
0:32:22.730,0:32:28.460
the allocation is going to happen if you're adjust to this structure concurrently
0:32:28.460,0:32:29.899
just keep the allocation
0:32:29.899,0:32:31.500
and that's what we do
0:32:31.500,0:32:32.919
we check for this flag
0:32:32.919,0:32:37.969
and if its present it means that another thread is still
0:32:37.969,0:32:40.149
is already allocating and we just keep
0:32:40.149,0:32:46.360
so otherwise we set it and then we have locked the mutex
0:32:46.360,0:32:49.100
then we allocate the memory for the
0:32:49.100,0:32:50.610
for the object
0:32:50.610,0:32:52.450
acquire again the lock
0:32:52.450,0:32:54.860
and we simply have seen
0:32:54.860,0:33:00.200
please note that Ive used the temporary storage for that in order to make
0:33:00.200,0:33:01.830
some search on
0:33:01.830,0:33:03.280
like the MS
0:33:03.280,0:33:04.180
about the
0:33:04.180,0:33:05.500
the pointer
0:33:05.500,0:33:10.700
it was just a tricky note that you verify that really the structure was not
0:33:10.700,0:33:14.330
really allocated
0:33:14.330,0:33:16.600
and so that we can get some
0:33:16.600,0:33:21.870
kind of session about that
0:33:22.640,0:33:26.340
one of the biggest innovation that was brought to FreeBSD
0:33:26.340,0:33:30.120
about the locking primitive about the locking primitives
0:33:30.120,0:33:33.770
are the interrupts that
0:33:34.640,0:33:36.850
mainly
0:33:36.850,0:33:40.820
this is pretty simple to explain maybe
0:33:40.820,0:33:44.070
As the top half remains basically the same
0:33:44.070,0:33:49.790
and was going to handle the ISR for the interrupt line for example
0:33:49.790,0:33:54.330
the bottom half changed set and running the interrupts
0:33:54.330,0:33:58.700
handler is solid on that line as it was traditionally happened
0:33:58.700,0:34:02.140
it was going just to schedule a thread
0:34:02.140,0:34:04.980
that was going to run the
0:34:04.980,0:34:06.940
the interrupt handler in a
0:34:06.940,0:34:12.389
--- context and not the kind of --it was going to happen
0:34:12.389,0:34:15.509
traditionally in a lot of unique system
0:34:16.699,0:34:23.179
this has the big advantage that in using your own context you can
0:34:23.179,0:34:24.429
basically
0:34:24.990,0:34:29.889
you're not forced to use spin locks and you can do a lot of other fancy things
0:34:29.889,0:34:32.209
this necesity came over because
0:34:32.209,0:34:33.149
often
0:34:33.149,0:34:38.529
interrupts handlers needs to adjust to some
0:34:38.529,0:34:42.589
needs to adjust to some subsystem locks and the
0:34:42.589,0:34:45.799
as we were going to use blocking ---around
0:34:45.799,0:34:50.379
we had the necessity to support the
0:34:50.379,0:34:52.589
the locking of the
0:34:52.589,0:34:57.119
the possibilities of wide mutex actually
0:34:57.559,0:35:01.759
A similar thing was implemented using taskqueues
0:35:01.759,0:35:02.879
previously
0:35:02.879,0:35:04.010
and the sometimes it
0:35:04.010,0:35:05.740
I think I saw a lenux too
0:35:05.740,0:35:08.439
using taskqueues maybe
0:35:08.439,0:35:10.029
but the
0:35:10.029,0:35:14.709
it was basically something similar but not exactly in this way
0:35:14.709,0:35:16.809
a actually FreeBSD
0:35:16.809,0:35:20.559
from the release seven
0:35:20.559,0:35:22.579
the interrupt threads
0:35:22.579,0:35:24.659
are this model is a little bit changed
0:35:24.659,0:35:26.499
in order to include the
0:35:26.499,0:35:29.739
a new mechanism called the filtering
0:35:29.739,0:35:36.249
we have interrupt filters that basically if set then directly
0:35:36.249,0:35:39.809
directly
0:35:39.809,0:35:40.879
schedule the thread
0:35:40.879,0:35:43.209
linked to the parked line
0:35:43.209,0:35:46.619
they just check for
0:35:46.619,0:35:50.939
they just let run some new thing in the kernel or context
0:35:50.939,0:35:52.449
that will decide if
0:35:52.449,0:35:56.709
handle directly to requests or just schedule the kernel
0:35:56.709,0:35:59.739
it's like if you have the old bottom handler
0:35:59.739,0:36:04.529
that add the possibility to register a handler
0:36:04.529,0:36:08.869
still running in interrupt context and at the same time
0:36:08.869,0:36:12.009
decide if scheduled or not
0:36:12.009,0:36:14.499
so that it's no
0:36:14.499,0:36:18.579
no more madatory
0:36:18.579,0:36:22.919
So I think that the first part is going to finish so if you have some questions we can
0:36:22.919,0:36:23.430
handle
0:36:23.430,0:36:28.699
right now
0:36:28.699,0:36:35.699
this should be material for the second part actually
0:36:45.279,0:36:48.529
a new bus for example
0:36:48.529,0:36:51.259
some
0:36:51.259,0:36:55.769
some drivers that kind of a frequently used I'm not sure but which ones but all
0:36:55.769,0:37:00.049
the big ones are compared to finer locking
0:37:00.049,0:37:03.109
%um
0:37:03.109,0:37:07.479
actually the problem is not which parts are under Giant
0:37:07.479,0:37:08.530
well how we could
0:37:08.530,0:37:12.380
optimize the locking of some subsystems because
0:37:12.380,0:37:15.079
for example we have to virtual memory
0:37:15.079,0:37:17.910
which is not on the Giant but its
0:37:17.910,0:37:19.719
not locate
0:37:19.719,0:37:24.400
optimally and it's going to bring a lot of contention
0:37:24.400,0:37:26.230
so
0:37:26.230,0:37:30.329
it's not under Giant but it should be optimized
0:37:30.329,0:37:37.329
because the parts under Giant are very tiny.New bus for example
0:37:37.599,0:37:44.599
some parts relating to the VFS on the mounting
but yet a very short parts
0:37:44.979,0:37:51.979
I'm not sure about others
0:37:57.479,0:37:59.170
sorry
0:38:02.069,0:38:08.549
well usually it should be moved completely but
0:38:08.549,0:38:11.019
yes
0:38:11.019,0:38:12.539
it could
0:38:32.909,0:38:34.809
okay although
0:38:34.809,0:38:38.289
in the kernel we have a basically
0:38:38.289,0:38:39.450
%um
0:38:39.450,0:38:43.019
as you should know we already imported the trays for example
0:38:43.019,0:38:47.839
and I have wondered, I have submitted by developed
0:38:47.839,0:38:48.669
my country
0:38:48.669,0:38:51.479
called ---some patches that brings the
0:38:51.479,0:38:54.689
the ----- directly in our locking
0:38:54.689,0:38:55.699
in order to
0:38:55.699,0:38:58.890
allow it to be tracked with the trace.
0:38:58.890,0:39:02.009
which is very nice but it's still not completed
0:39:02.009,0:39:03.310
we are reviewing
0:39:03.310,0:39:08.309
above that we have a very the other useful tool called the lock profiling
0:39:08.309,0:39:12.039
that has been very helpful in the past in order to
0:39:12.039,0:39:14.110
find the most contended lock
0:39:14.110,0:39:17.469
and the to try to propose them to finer locking
0:39:17.469,0:39:20.589
so at least for the kernel we have such mechanism
0:39:20.589,0:39:22.719
I'm not sure what should
0:39:22.719,0:39:26.640
have been the user space.I'm sure we've not something similar
0:39:26.640,0:39:28.310
but maybe other systems
0:39:28.310,0:39:29.469
have
0:39:29.469,0:39:30.749
similar tools
0:39:30.749,0:39:36.039
I don't know I just know FreeBSD so
0:39:58.479,0:39:59.220
not sure
0:39:59.220,0:39:59.919
would you repeat
0:39:59.919,0:40:03.879
some voice please. No I cant hear
0:40:03.879,0:40:05.509
It seems to me that
0:40:05.509,0:40:08.269
you don't you have to do all the work that you do with locking
0:40:08.269,0:40:11.469
well if you're not on SMP right?
0:40:11.469,0:40:13.029
well no
0:40:13.029,0:40:15.259
it's not right because the
0:40:15.259,0:40:20.210
you have to protect even against some mechanism like preemption
0:40:20.210,0:40:25.989
which is going to be tricky.It is dfferent implemented than FreeBSD 4.x so
0:40:25.989,0:40:28.909
it's going to be with preemption its like
0:40:28.909,0:40:30.099
from
0:40:30.099,0:40:34.479
it's like if you have a real SMP system from our technical point of view
0:40:34.479,0:40:35.809
so you have to handle
0:40:35.809,0:40:38.339
problems typical of that
0:40:38.339,0:40:43.249
really in the kernel we have other kind of synchronization like atomics
0:40:43.249,0:40:45.500
I don't, I should have had
0:40:45.500,0:40:50.609
a slide about that but it disappeared so I can tell you by voice
0:40:50.609,0:40:55.170
its well like we have the possibility to use atomic instruction in the
0:40:55.170,0:40:57.369
in FreeBSD kernel directly
0:40:57.369,0:40:59.249
but the
0:40:59.249,0:41:03.119
to use even memory bytes linked with them
0:41:03.119,0:41:08.869
the only pitfall is that you cannot really trust about the
0:41:08.869,0:41:10.469
cash coherency
0:41:10.469,0:41:14.339
because as long as it's Im be specific you can just
0:41:14.339,0:41:16.989
you can just be trust about
0:41:16.989,0:41:21.879
what happens in your CPU where use the atomic and where to use the memory byte
0:41:21.879,0:41:26.349
you cannot make assumptions about the what happens about if other CPUs
0:41:26.349,0:41:29.289
can see your modifiers or not
0:41:29.289,0:41:31.640
and if the cache can handle that
0:41:31.640,0:41:37.119
we have a specific primitives in order to for example disable preemption
0:41:37.119,0:41:39.379
which are the critical sections
0:41:39.379,0:41:42.179
critical entry and critical exit
0:41:42.179,0:41:45.309
that what you call them you are not to
0:41:45.309,0:41:48.219
the preemption is simply allowed
0:41:48.219,0:41:54.749
it's that's a very fast primitive so there is not much overhead
0:41:54.749,0:41:56.049
so there's not much overhead
0:41:56.049,0:42:00.679
we also have a way to disable interrupt which is unofficial.I will tell
0:42:00.679,0:42:03.079
that
0:42:03.079,0:42:07.720
because you can do that in machine dependant way
0:42:07.720,0:42:10.619
with a spin lock entry and spin lock exit
0:42:10.619,0:42:14.989
and then
0:42:14.989,0:42:16.049
yeah that you can
0:42:16.049,0:42:17.389
even disable
0:42:17.389,0:42:19.479
some thread migration
0:42:19.479,0:42:22.940
using skid primitives
0:42:22.940,0:42:25.319
that are very useful
0:42:25.319,0:42:29.779
when you are going to adjust for example to per-CPU datas
0:42:29.779,0:42:33.270
and you have several chases and you don't want the CPU migrate
0:42:33.270,0:42:34.200
from that
0:42:34.200,0:42:36.619
thread migrate from that CPU
0:42:36.619,0:42:38.729
because you could read different
0:42:38.729,0:42:45.369
values from different CPU then
0:42:45.369,0:42:46.479
I'm not sure
0:42:46.479,0:42:52.079
if there is something else okay
0:42:52.079,0:42:57.229
questions? no?
0:42:57.229,0:42:58.189
so i'll see you later"