[pm-h] relocate metatags in hyperlatex-generated HTML

G. Wade Johnson gwadej at anomaly.org
Sun Feb 11 14:05:45 PST 2007


On Sat, 10 Feb 2007 08:57:51 -0600
"Russell L. Harris" <rlharris at oplink.net> wrote:

> * G. Wade Johnson <gwadej at anomaly.org> [070209 23:28]:
> > Now, let's get back to the quick and dirty solution.
> > 
> > Let's start with some assumptions, let me know if I fail one of
> > them.
> > 
> > 1. The </title> end tag is not broken across a line boundary (pretty
> > safe).
> >
> > 2. The 'keywords' metatag is all on one line.
> >
> > 3. There is nothing else on the line with the 'keywords' metatag.
> > 
> > 4. The 'description' metatag is all on one line.
> > 
> > 5. There is nothing else on the line with the 'description' metatag.
> > 
> > If any of the above is not true, we will need to get a little more
> > complicated.
> 
> The only assumptions which fail are numbers 2 and 4.  However, there
> is a work-around.

Since we can't get each of the tags of interest on a single line, let's
take a completely different approach. Again, this would not be
recommended, for more comprehensive processing. We are beginning to
reach the point where one of the HTML processing modules might be a
good idea.

That said, let's try a new approach. We'll change the main loop to:

-------------------------------
#!/usr/bin/perl -i.bak

use strict;
use warnings;

# slurp the whole file as a single string.
undef $/;

while(my $file = <>)
{
    # processing steps here.
    my $metatags = q{};

    # If you don't print these out, the new file will be empty.
    print $file;
}
-------------------------------

In this case, we will need to process the entire file as a single
string. This also means that we need to handle regular expressions
slightly differently to deal with linebreaks in the text.

The main tool you'll use here is the substitute operator s///. We will
use it two ways.

1. Extract meta tags from the file
2. Insert a string after the title.

The first task will consist of the following code for each tag you wish
to move.

   if($file =~ s{(<meta\s+name=["']keywords["'][^>]*>)}{}sm)
   {
       $metatags .= "$1\n";
   }

This replaces the keyword metatag with nothing, removing it from the
string. We also capture the metatag with the parens, making it
available in the variable $1.

To insert the $metatag string after the title, use something like

   $file =~ s{(</title>)}{$1\n$metatags}sm;

This will probably end up with some extra blank lines where the
metatags were removed and inserted, but cleaning it up shouldn't be too
hard.

G. Wade
-- 
There are 2 possible outcomes: If the result confirms the hypothesis,
then you've made a measurement. If the result is contrary to the
hypothesis, then you've made a discovery.               -- Enrico Fermi


More information about the Houston mailing list