aboutsummaryrefslogtreecommitdiff
path: root/textproc/p5-Sort-Naturally/pkg-descr
blob: 9926f89b419d98d814f74f4e965cdf44ad80b7e6 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
This module exports two functions, nsort and ncmp; they are used in implementing
my idea of a "natural sorting" algorithm. Under natural sorting, numeric
substrings are compared numerically, and other word-characters are compared
lexically.

This is the way I define natural sorting:

    * Non-numeric word-character substrings are sorted lexically,
case-insensitively: "Foo" comes between "fish" and "fowl".
    * Numeric substrings are sorted numerically: "100" comes after "20",
not before.
    * \W substrings (neither words-characters nor digits) are ignored.  Our use
    * of \w, \d, \D, and \W is locale-sensitive: Sort::Naturally
uses a use locale statement.
    * When comparing two strings, where a numeric substring in one place
is not up against a numeric substring in another, the non-numeric always comes
first. This is fudged by reading pretending that the lack of a number substring
has the value -1, like so:
    * The start of a string is exceptional: leading non-\W (non-word,
non-digit) components are ignored, and numbers come before letters.
    * I define "numeric substring" just as sequences matching m/\d+/ --
scientific notation, commas, decimals, etc., are not seen. If your data has
thousands separators in numbers ("20,000 Leagues Under The Sea" or "20.000
lieues sous les mers"), consider stripping them before feeding them to nsort or
ncmp.

WWW: http://search.cpan.org/dist/Sort-Naturally/