[tpm] Fast(er) way of deleting X lines from start of a file

Madison Kelly linux at alteeve.com
Thu Oct 1 08:58:24 PDT 2009


Liam R E Quin wrote:
> On Wed, 2009-09-30 at 11:53 -0400, Uri Guttman wrote:
>>>>>>> "MK" == Madison Kelly <linux at alteeve.com> writes:
>>   MK> Hi all,
>>   MK>   Thus far, I've opened a file, read it in, shifted/ignored the first
>>   MK> X number of line and then wrote out the resulting file. This works,
>>   MK> but strikes me as horribly inefficient. Doubly so when the file could
>>   MK> be larger than RAM.
> 
> 
> In this case I'd consider a short C program that used mmap() and bcopy()
> and ftrunc().  The point is to avoid per-line processing, and to avoid
> having the whole file in memory (mmap() maps the file on disk into
> memory locations without actually reading it).
> 
> In perl, you can use sysread and syswrite to avoid per-line processing.

I will look into these, thanks.

> Watch out, if something is writing to the file while you do this, you
> can end up with corruption, though. You can use file locking, 

Does perl support file locking? If so, how does an active file handle 
know when another perl process has locked a file? Will it pause and wait 
for the file to be unlocked?

This is a good point I'd not thought of. My programs do handle forking 
and so could have multiple file handles writing to the same file. If 
perl doesn't, then I could use my own lock file that each thread checks 
before trying to write to the file and goes into a sleep look until the 
lock file is removed.

Another good data point!

> However, Liam's Rule of Optimization :-) is that the fastest way to do
> something is not to do it at all.  For example, what if you had a
> separate file for every X lines? Then you'd just delete the oldest file.
> 
> Or, buy more memory so it all fits :-)
> 
> Liam

I am hoping that my modules will be one day used by other people, so I 
want to keep memory requirements as modest as possible. Besides, I've 
never been fond of the "buy your way out of your problem" approach... If 
I did, I'd be writing this email on a Mac. ;)

Madi



More information about the toronto-pm mailing list