SPUG: extracting text between <a> and </a>

Tim Maher/CONSULTIX tim at consultix-inc.com
Thu Oct 5 17:57:21 CDT 2000


On Thu, Oct 05, 2000 at 10:58:34AM -0700, Chuck Keagle wrote:
> Now I'm new with Perl (just had Dr. Tim's beginner class a couple weeks

Glad to see you're putting your education to use! 8-}

> ago), but would a pattern match do the trick?
> 
>     m|<a[\w:"/= \.]*> ([\w ]*)</a>| and $text = $1;
> 
> If I'm way off base, please don't chastise me too harshly.

Nice try, but you've fallen into the trap of underestimating the
difficult of getting the regular expression right, and even if it were
perfect in itself, you'd still have to worry about eliminating matches
within comments, an entire problem unto itself. 

Best to use a debugged module written by a person whose Hubris will
promote greater accuracy than a hand-rolled solution.  Damian's
Text::Balanced is what I'd suggest; I showed a sample run in a
separate posting.

-"Dr. Tim"
*========================================================================*
| Dr. Tim Maher, CEO, Consultix       (206) 781-UNIX/8649;  ask for FAX# | 
| Email: tim at consultix-inc.com        Web: http://www.consultix-inc.com  |
|Training- TIM MAHER: Unix, Perl  DAMIAN CONWAY: Adv. Perl, OOP, Parsing |
|CLASSES: 10/9: Adv OO-Perl/Parsing   10/16: Int. Perl  10/23 Perl Prog. |
*========================================================================*


> 
> -- 
> 
> (fixed width font)                                  //\\
>  __________________________                            \\
>  Chuck Keagle                   \\                  .__=.\\
>  chuck.keagle at boeing.com          \____          ,' H-D \-\<)
>  Shared Services Group             \   \______.,(_______/_:\
>  (425) 865-5394                 |==.\______//  # /# #\ || : \____
>  Fax: (425) 865-2221              '\\\ =''=//|_|##(O)##|| `./\---.
>  M/S 7J-04     _____________  /\   / ,`--'./# ======='//,  //.\ . \
>                      _______  \ \_(_:_ at O__)_///<_>O////    ( (@O ) )
>                         _____  \_____________/======'O'     \ `-' /
>                                   __`-----'__________________`---'___
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
>  For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
> 
> 

-- 
*========================================================================*
| Dr. Tim Maher, CEO, Consultix       (206) 781-UNIX/8649;  ask for FAX# | 
| Email: tim at consultix-inc.com        Web: http://www.consultix-inc.com  |
|Training- TIM MAHER: Unix, Perl  DAMIAN CONWAY: Adv. Perl, OOP, Parsing |
|CLASSES: 10/9: Adv OO-Perl/Parsing   10/16: Int. Perl  10/23 Perl Prog. |
*========================================================================*

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list