User:NcsuJH/sandbox
In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock which uses "tickets" to control which thread of execution is allowed to enter a critical section.
Overview
[edit]The basic concept of a ticket lock is similar to the ticket queue management system. This is the method that many bakeries and delis use to serve customers in the order that they arrive, without making them stand in a line. Generally, there is some type of dispenser that customers pull sequentially numbered tickets from upon arrival. The dispenser usually has a sign above or near it stating something like "Please take a number". There is also typically a dynamic sign (usually digital) that displays the ticket number that is now being served. Each time the next ticket number (customer) is ready to be served, the "Now Serving" sign is incremented and the number called out. This allows all of the waiting customers to know how many people are still ahead of them in the queue (line).
Like this system, a ticket lock is a first in first out (FIFO) queue based mechanism. It adds the benefit of fairness of lock acquisition and works as follows; there are two integer values which begin at 0. The first value is the queue ticket, the second is the dequeue ticket. The queue ticket is the thread's position in the queue, and the dequeue ticket is the ticket (queue position) that now has the lock (Now Serving).
When a thread arrives, it atomically obtains and then increments the queue ticket. The atomicity of this operation is required to prevent two threads from simultaneously being able to obtain the same ticket number. It then compares its ticket value (before the increment) with the dequeue ticket's value. If they are the same, the thread is permitted to enter the critical section. If they are not the same, then another thread must already be in the critical section and this thread must busy wait or yield. When a thread leaves the critical section controlled by the lock, it atomically increments the dequeue ticket. This permits the next waiting thread, the one with the next sequential ticket number, to enter the critical section.[1]
History
[edit]The ticket lock was introduced by Mellor-Crummey and Scott in 1991.[2] This algorithm was introduced into the Linux kernel in 2008 due to its advantages,[3] but was omitted in paravirtualized environments where it had disadvantages.[4] As of July 2010[update], work is in progress to enable the use of ticket locks in paravirtualization.[5][6] This type of locking scheme is employed in several systems. Currently, as of March 2015, Red Hat Enterprise Linux employs ticket locking in their system.[7]
Fairness of Lock Acquisition
[edit]The notion of fairness in lock acquisition applies to the order in which threads acquire a lock successfully.[8] If some type of fairness is implemented, it prevents a thread from being starved out of execution for a long time due to inability to acquire a lock in favor of other threads. With no fairness guarantees, the situation where a thread (or multiple threads) can take a disproportionately long time to execute as compared to others can arise. A simple example will now be presented to show how a thread could be excessively delayed due to a lack of fairness in lock acquisition.
Assume a case where three threads, each executing on one of three processors, are executing the following pseudo code that uses a lock with no consideration for fairness.
while(1) {
lock {
…
critical section
…
}
unlock
}
Now further assume the physical arrangement of the three processors, P1, P2, and P3, results in a non-uniform memory access time to the location of the shared lock variable. The order of increasing access time to the lock variable for the three processors is P1 < P2 < P3. So P1 is always the most advantaged at acquiring the lock, followed by P2, with P3 being most disadvantaged. How this situation leads to thread starvation in the absence of a fairness guarantee is shown in the following illustration of the execution of the above pseudo code by these three processors.
Time | P1 | P2 | P3 |
---|---|---|---|
1 | lock attempt (success) | lock attempt (failed) | lock attempt (failed) |
2 | critical section | spin | spin |
3 | release lock | lock attempt (success) | lock attempt (failed) |
4 | ... | critical section | spin |
5 | lock attempt (failed) | ... | ... |
6 | lock attempt (success) | release lock | lock attempt (failed) |
7 | critical section | spin | spin |
... | ... | ... | ... |
Initially, the lock is free and all three processors attempt to acquire the lock simultaneously (Time 1). Due to P1 having the fastest access time to the lock, it acquires it first and enters the critical section. P2 and P3 now spin while P1 is in the critical section (Time 2). Upon exiting the critical section (Time 3), P1 executes an unlock, releasing the lock. Since P2 has faster access to the lock than P3, it acquires the lock next and enters the critical section (Time 4). While P2 is in the critical section, P1 once again attempts to acquire the lock but can’t (Time 5), forcing it to spin wait along with P3. Once P2 finishes the critical section and issues an unlock, both P1 and P3 simultaneously attempt to acquire it once again (Time 6). But P1, with its faster access time wins again, thus entering the critical section (Time 7). This pattern of P3 being unable to obtain the lock will continue indefinitely until either P1 or P2 stops attempting to acquire it.
This illustrates the need to ensure some level of fairness in lock acquisition in certain circumstances.
Implementation
[edit]In a Non-Uniform Memory Architecture (NUMA) system it is important to have a lock implementation that guarantees some level of fairness of lock acquisition. The ticket lock is an implementation of spinlock that adds this desired attribute. The following pseudocode[1][2] shows the operations for initializing the lock, acquiring the lock, and releasing the lock. A call to ticketLock_acquire would precede the critical section of the code and ticketLock_release would follow it. Each processor will keep track of its turn via the value of each processor's my_ticket.
ticketLock_init(int *next_ticket, int *now_serving)
{
*now_serving = *next_ticket = 0;
}
ticketLock_acquire(int *next_ticket, int *now_serving)
{
my_ticket = fetch_and_inc(next_ticket);
while(*now_serving != my_ticket) {}
}
ticketLock_release(int *now_serving)
{
now_serving++;
}
Following along with the pseudocode above we can see that each time a processor tries to acquire a lock with ticketLock_acquire(), fetch_and_inc is called which returns the current value of next_ticket into my_ticket and increments next_ticket for the next processor. It is important to note that the fetch and increment is done atomically, thereby not allowing any other concurrent attempts at access. Once my_ticket has been received each processor spins until now_serving is equal to that processor's my_ticket. When now_serving is equal to a given processor's my_ticket they are allowed to return from ticketLock_acquire() and enter the critical section of code. After a critical section of code the processor performs ticketLock_release() which will increment now_serving and allow the processor that arrived at the ticket lock after the current one to exit from ticketLock_acquire() and enter the critical section.[2] Thus, fairness of lock acquisition is ensured, by enforcing a FIFO ordering.
The following table shows an example of ticket lock in action in a system with four processors (P1, P2, P3, P4) competing for access to the critical section. We can follow along the "action" column and see the outcome based on the pseudocode above. For each row, the variable values shown are those after the indicated action(s) have completed. The key points to note from the example is that the initial attempts by all four processors to acquire the lock results in only the first to arrive actually achieving the lock. All subsequent attempts, while the first still holds the lock, serves to form the queue (or line) of processors waiting their turn in the critical section. This is followed by each getting the lock in turn, allowing the next in line to acquire it as it leaves.
Action | next_ticket | now_serving | P1 my_ticket | P2 my_ticket | P3 my_ticket | P4 my_ticket |
---|---|---|---|---|---|---|
Initialized to 0 | 0 | 0 | - | - | - | - |
P1 tries to acquire lock | 1 | 0 | 0 | - | - | - |
P2 tries to acquire lock | 2 | 0 | 0 | 1 | - | - |
P3 tries to acquire lock | 3 | 0 | 0 | 1 | 2 | - |
P4 tries to acquire lock | 4 | 0 | 0 | 1 | 2 | 3 |
P1 releases lock, P2 acquires lock | 4 | 1 | 0 | 1 | 2 | 3 |
P2 releases lock, P3 acquires lock | 4 | 2 | 0 | 1 | 2 | 3 |
P3 releases lock, P4 acquires lock | 4 | 3 | 0 | 1 | 2 | 3 |
P4 releases lock | 4 | 4 | 0 | 1 | 2 | 3 |
... | 4 | 4 | 0 | 1 | 2 | 3 |
Let us look at another example that takes place in a different order, shown in the following table. This example demonstrates that processors do not have to arrive at the ticket lock at any specified time and may arrive in any order. This also demonstrates that next_ticket and now_serving may be incremented at any time. This example proves that after 3 processors have acquired and released the lock for a particular critical section a fourth processor can come, get the next ticket and proceed to the critical section when it's turn is available.
Action | next_ticket | now_serving | P1 my_ticket | P2 my_ticket | P3 my_ticket | P4 my_ticket |
---|---|---|---|---|---|---|
Initialized to 0 | 0 | 0 | - | - | - | - |
P1 tries to acquire lock | 1 | 0 | 0 | - | - | - |
P3 tries to acquire lock | 2 | 0 | 0 | - | 1 | - |
P2 tries to acquire lock | 3 | 0 | 0 | 2 | 1 | - |
P1 releases lock, P3 acquires lock | 3 | 1 | 0 | 2 | 1 | - |
P3 releases lock, P2 acquires lock | 3 | 2 | 0 | 2 | 1 | - |
P4 tries to acquire lock | 4 | 2 | 0 | 2 | 1 | 3 |
... | 4 | 2 | 0 | 2 | 1 | 3 |
P2 releases lock, P4 acquires lock | 4 | 3 | 0 | 2 | 1 | 3 |
P4 releases lock | 4 | 4 | 0 | 2 | 1 | 3 |
... | 4 | 4 | 0 | 2 | 1 | 3 |
Comparison of Locks
[edit]The Linux kernel implementation can have lower latency than the simpler test-and-set or exchange based spinlock algorithms on modern machines. Consider the table below when comparing various types of spin based locks. The more basic locking mechanisms have lower uncontended latency than the advanced locking mechanisms.[1]
Criteria | test-and-set | Test and Test-and-set | Load-link/store-conditional | Ticket | ABQL |
---|---|---|---|---|---|
Uncontended latency | Lowest | Lower | Lower | Higher | Higher |
1 Release max traffic | Ө(p) | Ө(p) | Ө(p) | Ө(p) | Ө(1) |
Wait traffic | High | - | - | - | - |
Storage | Ө(1) | Ө(1) | Ө(1) | Ө(1) | Ө(p) |
Fairness guarantee | No | No | No | Yes | Yes |
Advantages
[edit]One advantage of a ticket lock over other spinlock algorithms is that it is fair. The waiting threads are processed in a first-in first-out basis as the dequeue ticket integer increases, thus preventing starvation.
A ticket lock algorithm also prevents the thundering herd problem occurring since only one thread at a time tries to enter the critical section.
Storage is not necessarily a problem as all threads spin on one variable, unlike Array-based queueing locks (ABQL) who have threads spin on individual elements of an array.[1]
Disadvantages
[edit]One disadvantage is that there is a higher uncontended latency because of the extra instructions required to read and test the value that all threads are spinning on before a lock is released. [1]
Another major disadvantage of a ticket lock is that it is non-scalable. Research indicated that as processors are added to a Ticket Lock system, performance appears to decay exponentially. [9]
Another problem comes from releasing a lock. All threads are spinning on one variable, so when the lock is released there are Ө(p) invalidations (as well as Ө(p) acquisitions). This is because all threads must reload their block into the cache and perform a test to determine their admittance to the critical section. [1]
Related work
[edit]- Lamport's bakery algorithm also uses atomic operations (loads and stores) to ensure that the ticket numbers are unique, but requires O(n) space and time, whereas the ticket lock is O(1) and uses an atomic increment operation. It was designed for fault tolerance rather than performance. Rather than all processors continuously examining the release counter, the bakery lock spins on examining the tickets of its peers.[2]
- Queue-based spin locks guarantee fairness by maintaining a queue of waiters and by granting the lock to the first waiter during an unlock operation.[10]
See also
[edit]- fetch-and-add, another atomic instruction that can be used in place of fetch-and-incrment to implement a ticket lock
References
[edit]- ^ a b c d e f Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 262–269. ISBN 978-0-9841630-0-7.
- ^ a b c d John M. Mellor-Crummey and Michael L. Scott; et al. (February 1991). "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". ACM TOCS.
{{cite web}}
: Explicit use of et al. in:|last=
(help) - ^ Jonathan Corbet, Ticket spinlocks, Linux Weekly News, February 6, 2008. Retrieved 23. July, 2010.
- ^ Jeremy Fitzhardinge, paravirt: introduce a "lock-byte" spinlock implementation, Linux kernel, July 7, 2008
- ^ x86/spinlocks: replace pv spinlocks with pv ticketlocks
- ^ Revert "Merge branch 'xen/pvticketlock' into xen/next"
- ^ "Ticket Spinlocks".
- ^ Sottile, Matthew J.; Mattson, Timothy G.; Rasmussen, Craig E (Sep 28, 2009). Introduction to Concurrency in Programming Languages. Boca Raton, FL, USA: CRC Press. p. 56. ISBN 978-1-4200-7214-3.
{{cite book}}
:|access-date=
requires|url=
(help) - ^ Boyd-Wickizer, Silas, et al. "Non-scalable locks are dangerous." Proceedings of the Linux Symposium. 2012. http://pdos.csail.mit.edu/papers/linux:lock.pdf
- ^ Michael L. Scott and William N. Scherer III. Scalable Queue-Based Spin Locks with Timeout. Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 44-52, 2001.