[San-Diego-pm] odd chars in file "Killing" my console

Shlomi Fish shlomif at iglu.org.il
Thu Nov 11 01:30:36 PST 2010


On Thursday 11 November 2010 04:56:12 Christopher Hahn wrote:
> Hey team,
> 
> I am trying to parse a huge (7 Gb) file that is line oriented but has
> large sections
> that are any kind of binary character.
> 
> (this is a p42svn dump file of a large perforce repository)
> 
> I tried several smarter things, but found the after running for a while
> my console would just close....dead, gone:
> ============================
> administrator at cmSVNDumper-09:/p42svn/testing$ ./p4dump-parse-new.pl
> Killed
> ============================
> 
> I am sure that there are odd chars in the file that are doing this....
> 
> I tried setting binmode on the input file handle, and just loading the
> entire file into a buffer, just as a test, as we have enough memory to do
> this.
> 
> The result:
> ===========================================
> open(OUTF, ">SM_amanda_238037_fixed.dump")
>   or die "Opening output file failed: $!";
> 
> open(INF, "SM_amanda_238037_bad.dump")
>   or die "Opening input file failed: $!";
> binmode INF;
> 
> my @buffer = <INF>;
> 

Are you sure you want to load the many lines of a 7GB file into an array? Perl 
arrays have a lot of overhead, and doing this would be very memory wasteful. 
How much RAM do you have? You'll need much more than 7 GB for that.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Stop Using MSIE - http://www.shlomifish.org/no-ie/

<rindolf> She's a hot chick. But she smokes.
<go|dfish> She can smoke as long as she's smokin'.

Please reply to list if it's a mailing list post - http://shlom.in/reply .


More information about the San-Diego-pm mailing list