SPUG: Software to expand contractions?

Jeremy G Kahn kahn at cpan.org
Sun Oct 26 19:53:10 CST 2003


WARNING: non-language geeks, avert your eyes.


Well, you could cope with "wouldn't" and "can't" by having an exception 
table with just "can't" and handling the rest by rule (where that rule 
is s/n't$/ not/g).

You can handle "can't" and "won't" with special cases, handle otherwise 
"-n't", "-'d", "-'ll" and "-'ve" cases with general rules, but you're 
not going to be able to solve the general case for 's without some 
heuristics or some grammatical knowledge.

For example:

Larry's been going at it. ("Larry has ..."

Larry's hat  ("Larry's")

The traditional grammarian explanation is that
"'s" represents "has" when the following word is a past participle:

  Larry's beaten the system => "has"
  Larry's gone home  => "has"
  Larry's had a bad day  => "has"

"beaten", "gone", "had => all past participles.

This looks doable (with a dictionary):
Here's the English rule, pseudo-emperlified and untested (also modulo 
concerns about capitalization)

  s/'s (\w+)/ is_past_part($1) ? " has $1":"'s $1"/eg;

BUT.

But what are the conditions in which "'s" represents "is"?

  Larry's skiing trip was needed. => "'s"
  Larry's skiing is good => "'s"
  Larry's hat is phat. => "'s"
  Larry's jealousy => "'s"

  Larry's coding up a storm.     => "is"
  Larry's phat.  => "is"
  Larry's jealous. => "is"

There are two more cases where "is" is the correct expansion:

  * when the following unit is a present progressive verb phrase 
("coding up a storm")
  * when the following unit is a predicative adjective phrase 
("scheduled for another release", "phat", "jealous").

Unfortunately, one kind of adjectival phrase can be formed from the past 
participle form of the verb in English, which pretty much shoots down 
any chance for regular-expression-based solutions, at least with any 
kind of elegance

  Larry's scheduled for another conference. => "is" (not "has"!)
  Larry's scheduled another conference. => "has" (not "is"!)

It gets worse: "'s" really needs a following noun phrase (without 
article), but you can make one of these using a past participle!

  Larry's scheduled conference conflicted with mine. => "'s" (not "has" 
or "is")

This is the sort of thing that drove philosophers to give up on natural 
language and build lambda calculus. I make this sort of nightmare my 
problem; natural language via computers is what I study.

The "right" solution, Tim, is probably a stochastic parser that would 
decide what the most likely handling is for the constituent following 
the 's.  That's a pretty neat project -- I can point you to some links 
if you like -- but I doubt that's really what you want. It's probably 
less work to design a mini-script that shows you each context and 
presents to you the choices (you type "'", "'h", or "i") and it rewrites 
them.

Incidentally, contractions are information-losing -- it's much easier to 
go the other way with a script if you should ever need to.

good luck,

jeremy
  who really should be working on a parser this very minute



Tim Maher wrote:

>Dudes,
>
>I'm getting complaints from some early reviewers of my book
>about all the "contractions" I'm using in my writing!
>So last night I set out to write a Perl script to make changes
>like the following:
>
>	That's => That is
>	He's => He is
>	They're => They are
>	We're => We are
>
>Then I realized that the 's ending usually needs replacement by
>"is", so I tried
>
>	"'s" => ' is'
>which doesn't quite work, because of the undesirability of changing
>
>	Larry's hat  => Larry is hat
>
>After making an exception for that case, I found other complications,
>such as
>
>	wouldn't => would not
>	can't => can not
>
>These follow different rules, because the 'wouldn' loses its 'n', but the
>'can' doesn't!
>
>So at that point I realized that this is a bigger problem than I first
>thought, and decided to look for a solution on CPAN.  Searching for
>"contractions" didn't turn up anything relevant, so now I'm wondering if
>anybody knows of a module that will do this job -- convert a standard set of English
>contractions into their expanded forms.  
>
>Can't y'all gimme the help I'm searchin' for?
> 
>-Tim
>*------------------------------------------------------------*
>| Tim Maher (206) 781-UNIX  (866) DOC-PERL  (866) DOC-UNIX   |
>| tim(AT)Consultix-Inc.Com  TeachMeUnix.Com  TeachMePerl.Com |
>*+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-*
>|  UNIX Fundamentals Class: 11/10-13   Perl Class: 12/01-05  |
>|  Watch for my Book: "Minimal Perl for Shell Programmers"   |
>*------------------------------------------------------------*
>_____________________________________________________________
>Seattle Perl Users Group Mailing List  
>POST TO: spug-list at mail.pm.org  http://spugwiki.perlocity.org
>ACCOUNT CONFIG: http://mail.pm.org/mailman/listinfo/spug-list
>MEETINGS: 3rd Tuesdays, U-District, Seattle WA
>WEB PAGE: http://www.seattleperl.org
>
>  
>





More information about the spug-list mailing list