Phoenix.pm: update of HUGE text file

Eric Thelin eric at thelin.org
Mon Oct 2 16:36:13 CDT 2000


On Mon, 2 Oct 2000, Mark A. Sharkey wrote:

> Is there a way in perl to do an update on a huge (100MB+ text file) without
> bringing the whole thing into memory?
> 
> I have client that is using a text file as a make-shift database table.  The
> client needs to be able to add/modify/delete lines in the text file.  He does a
> search against the file, edits the results, and then needs to make the edits
> "stick".
> 
> Can I make modifications to certain lines in the file, without reading the whole
> thing into memory?
> 
> If this is explained by the Camel someplace, even just pointing me in the right
> direction would be very helpful.

If you are going to add, delete, or modify the file changing its length
you will need to rewrite the file.  Note this doesn't require bringing
it all into ram.  You can open two files one to the old file for read
and one to a new file with write access.  Then you read a certain number
of characters from the infile make any modification and write that block
to the outfile.  Then repeat with the next block.  Do this until you
have read the whole file.  You will have to make sure that your changes
either don't span two blocks of read or handle cases where they would
specially.

A very simple example:


$blocksize=100;  # read 100 lines into memory at a time
$file="myinfo.dat";

open(IN,$file);
open(OUT,">".$file.".new");

while(!eof(IN))
{
	$x=0;
	while($x<$blocksize && $line=<IN>)
	{
		$block.=$line;
		$x++;
	}
	
	# transform $block here
	# ie. $block=~s/bad/good/g;
	#
	
	print OUT $block;
	
}
move($file.".new",$file);

(I typed this just for this email so there may be syntax errors or minor
issues I forgot.  Consider yourself warned.)


This doesn't handle locking or address the possibility of needing to
change a string that is more that one line long and therefore could be
the last part of one block of text and the first part of the next block.
But it should show you the idea.  This is the style of solution that you
will find in many programs that are designed to work with files larger
than the machine memory.  In the example I am reading 100 lines at a
time you could also read a given number of characters with read() or
sysread().

Hope this helps.

Eric


Eric Thelin                                          erict at aztechbiz.com
           AZtechBiz.com: Where Arizona Does Tech Business
               Voice: 480-377-6743   Fax: 480-377-6755




More information about the Phoenix-pm mailing list