[Pdx-pm] Library Title sort ordering

Andy Lester andy at petdance.com
Wed Jan 19 21:10:20 PST 2005


> Public libraries sort titles with various stop words ignored.

All libraries do.

> Das Boot
> The Castle
> Return of the King, The
>
> I would like to find a string comparison module that incorporates
> all the funny rules that librarians use to do this ordering.

We don't do it with rules.  We use non-filing character counts, entered 
by humans.

In a MARC record, fields that can have non-filing characters will have 
an indicator that tells how many there are.  For example, from 
http://search.cpan.org/dist/MARC-Record/lib/MARC/Doc/Tutorial.pod,

   LDR 00903pam  2200265 a 4500
   100 1  _aLogan, Robert K.
          _d1939-
   245 14 _aThe alphabet effect /
          _cRobert K. Logan.

The "14" in the 245 tag are the indicators, and the "4" means "skip the 
first 4 characters when filing this."  In this case, it means to ignore 
"The " and just file as "alphabet effect".

The lack of rules is because there's just no reliable way for the 
computer to figure it out.  Take the Spanish article "La".  If you have 
a book called "La Luna", you have an NFC of 3, but what if it's the 
movie "LA Confidential"?  Or say you have the book "A Bell For Adano", 
and clearly it should be NFC of 2 because of the obvious "A ", but what 
about the book "A B C Play With Me"?

Yes, there are times we'll take shortcuts when mangling data and do a 
simple

   $titlesort = $title;
   $title =~ s/^(A|An|The) //;

but that's not really accurate, but can be close enough for prototyping.

Library software guy for 14+ years now,
xoxo,
Andy

-- 
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance



More information about the Pdx-pm-list mailing list