[Pdx-pm] Library Title sort ordering
Andy Lester
andy at petdance.com
Wed Jan 19 21:10:20 PST 2005
> Public libraries sort titles with various stop words ignored.
All libraries do.
> Das Boot
> The Castle
> Return of the King, The
>
> I would like to find a string comparison module that incorporates
> all the funny rules that librarians use to do this ordering.
We don't do it with rules. We use non-filing character counts, entered
by humans.
In a MARC record, fields that can have non-filing characters will have
an indicator that tells how many there are. For example, from
http://search.cpan.org/dist/MARC-Record/lib/MARC/Doc/Tutorial.pod,
LDR 00903pam 2200265 a 4500
100 1 _aLogan, Robert K.
_d1939-
245 14 _aThe alphabet effect /
_cRobert K. Logan.
The "14" in the 245 tag are the indicators, and the "4" means "skip the
first 4 characters when filing this." In this case, it means to ignore
"The " and just file as "alphabet effect".
The lack of rules is because there's just no reliable way for the
computer to figure it out. Take the Spanish article "La". If you have
a book called "La Luna", you have an NFC of 3, but what if it's the
movie "LA Confidential"? Or say you have the book "A Bell For Adano",
and clearly it should be NFC of 2 because of the obvious "A ", but what
about the book "A B C Play With Me"?
Yes, there are times we'll take shortcuts when mangling data and do a
simple
$titlesort = $title;
$title =~ s/^(A|An|The) //;
but that's not really accurate, but can be close enough for prototyping.
Library software guy for 14+ years now,
xoxo,
Andy
--
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance
More information about the Pdx-pm-list
mailing list