User:Swpb/sandbox
Ideal division of a disambiguation page
[edit]The purpose of disambiguation pages is for readers to find their target article with as little reading as possible. How many sections, then, should a dab page have, and how long should those sections be?
Suppose we have a dab page with a total of t entries, which we can divide into n sections. Section headers average a words in length, and entries average b words in length. We want to find n that results in the fewest words having to be read, on average.
Questionable assumptions
[edit]- The disambiguation page will be divided into equal-sized sections, with no sub-sections.
- Readers will first read section headers until they find the one they want, then read entries in that section until they find the one they want.
- Each entry is equally likely to be the one the reader is looking for. The position of the desired section, and of the desired entry within that section, are random.
- Section names and entries are clear and unambiguous. Once a reader reads a section name or entry, they know with 100% certainty whether it is what they want or not.
How questionable are these assumptions?
[edit]- This is not a very realistic assumption, but serves as a workable average, and the effect of different sized sections on n is not large.
- This is a good assumption
- This is a good assumption
- The strength of this assumption depends on how well subject areas are selected, and how well headers and entries are written, but it should be near 100%.
Solve
[edit]Given n sections, the average reader will have to read (n+1)/2 headers to find the one they want. They will then have to read ((t/n)+1)/2 entries to find the one they want. Thus, the average number of words that must be read is w = a*((n+1)/2) + b*(((t/n)+1)/2). To find the value of n that minimizes w, we take the derivative of w with respect to n and see where it equals 0.
The derivative of w is a/2 + bt/(2n^2). Setting this expression equal to zero and rearranging, we find n = sqrt(b/a*t).
Let's plug in some realistic numbers:
- Section headers average a = 3 words in length
- Entries average b = 10 words in length
Now n = sqrt(10/3)*sqrt(t) ~ 1.8*sqrt(t)
Suppose our disambiguation page has 30 entries. In that case, n ~ . If we divide the dab page into n sections, the reader will have to read an average of w ~ words.