[pm-h] how to split one file into multiple files?

G. Wade Johnson gwadej at anomaly.org
Mon Jun 23 05:15:27 PDT 2008


Hi Russell,

On Mon, 23 Jun 2008 02:59:56 -0500
"Russell L. Harris" <rlharris at oplink.net> wrote:

> My problem:
> 
> I have several text source files, each of which contains several
> chapters of a book.  I need to split each of these source files into
> the component chapter files, with a coherent naming scheme for the
> chapter files.
> 
> Details:
> 
>     => No single file contains every chapter of the book.
> 
>     => The chapter numbers must be determined from a hash of the
>        chapter names; they cannot be determined from the sequence of
> the chapter in the source file.
> 
>     => Each chapter is separated by line consisting of the word
>        "chapterbreak".
> 
>     => Each chapter begins with a unique string which provides the
>        chapter title ("LION", "BEAR", "DUCK", "MOOSE", etc.).
> 
> Examples:
> 
>     source file No. 1 ::
> 
>         LIONtext of chapter three\n
>         chapterbreak\n
>         MOOSEtext of chapter one\n
>         chapterbreak\n
>         KANGAROOtext of chapter four\n
>  
>     source file No. 2 ::
> 
>         BEARtext of chapter five\n
>         chapterbreak\n
>         PENGUINtext of chapter six\n
>         chapterbreak\n
>         DUCKtext of chapter two\n
> 
> I wish to split each source file on the pattern "chapterbreak" and
> place each chapter into a separate file, with the chapter filename
> being of the form "chapternumber.txt".

Okay.

> %%%%%%%%%%%%%%%%%%%%
> 
> I know how to create a hash of the first several characters of the
> chapter names, using the chapter numbers as the keys:
> 
>             key: MOOS  DUCK  LION  KANG  BEAR  PENG
>    hash element: 1     2     3     4     5     6  

Got that.

> ----------
> 
> I think that I know how to read a file using the <> operator and split
> the chapters into an array (but I am a little fuzzy on this).

One useful trick for this is to set the input record separator to the
string you want to split on.

$/ = "\nchapterbreak\n";

Now, the <> returns all text up to and including "\nchapterbreak\n" on
each call. The chomp operator also removes the same string from the end
of the string. (So it will remove "\nchapterbreak\n" instead of "\n".)

> ----------
> 
> And I think that I know how to use the match operator to obtain the
> first several characters of the chapter name from each string scalar
> in the array, which is the needed hash key:
> 
>    /(.{4})/
> 
> ----------
> 
> But, after several hours of reading in the O'Reilly Perl books, I
> still do not understand how to open a new file using the hash element
> (the string "1", "2", "3", etc.) as the filename.

open( my $fh, $hash{'MOOS'} ) or die "Unable to open file: $!\n";

> %%%%%%%%%%%%%%%%%
> 
> If I am trying to do this the hard way, kindly advise.

G. Wade
-- 
Why do your people ask if someone's ready right before you are going to
do something massively unwise?        -- Delenn - "The War without End"


More information about the Houston mailing list