[San-Diego-pm] odd chars in file "Killing" my console

Christopher Hahn xrz1138 at gmail.com
Fri Nov 12 12:46:11 PST 2010


Hello all,

A colleague handled this nasty file....using sed!

I have used sed many times in the past, but as a standard
filter....I never thought that it could "look back" like perl can.

His answer was sufficiently nasty/beautiful that I wanted to share it:
============================================
#!/bin/sed -nf

\|^Node-path: /names/mangled/serversetup/pdf/install_en.pdf| {
        p
        n
        \|^Node-kind: file| {
                p
                n
                \|^Node-action: add| {
                        N
                        \|Text-content-length: 40|{
                                N
                                \|Text-content-md5:
e87b1296fc0de5556340adcc7c904901| {
                                        s/Node-action: add/Node-action: change/
                                        }
                                }
                        }
                }
        }
P
============================================

(it had to find file PROP blocks that had the right md5 and then
reset the action to "change")

Take care all,

Chris

On Thu, Nov 11, 2010 at 1:35 AM, Christopher Hahn <xrz1138 at gmail.com> wrote:
> Hey Shlomi,
>
> Thanks for taking the time.
>
> Yes, there is more than enough RAM to load the thing....and
> I was only trying that in desperation.
>
> Given that the day is over, I will try a smaller dump file to write
> my script to.
>
> Onward and upward,
>
> Chris
>
> On Thu, Nov 11, 2010 at 1:30 AM, Shlomi Fish <shlomif at iglu.org.il> wrote:
>>
>> On Thursday 11 November 2010 04:56:12 Christopher Hahn wrote:
>> > Hey team,
>> >
>> > I am trying to parse a huge (7 Gb) file that is line oriented but has
>> > large sections
>> > that are any kind of binary character.
>> >
>> > (this is a p42svn dump file of a large perforce repository)
>> >
>> > I tried several smarter things, but found the after running for a while
>> > my console would just close....dead, gone:
>> > ============================
>> > administrator at cmSVNDumper-09:/p42svn/testing$ ./p4dump-parse-new.pl
>> > Killed
>> > ============================
>> >
>> > I am sure that there are odd chars in the file that are doing this....
>> >
>> > I tried setting binmode on the input file handle, and just loading the
>> > entire file into a buffer, just as a test, as we have enough memory to
>> > do
>> > this.
>> >
>> > The result:
>> > ===========================================
>> > open(OUTF, ">SM_amanda_238037_fixed.dump")
>> >   or die "Opening output file failed: $!";
>> >
>> > open(INF, "SM_amanda_238037_bad.dump")
>> >   or die "Opening input file failed: $!";
>> > binmode INF;
>> >
>> > my @buffer = <INF>;
>> >
>>
>> Are you sure you want to load the many lines of a 7GB file into an array?
>> Perl
>> arrays have a lot of overhead, and doing this would be very memory
>> wasteful.
>> How much RAM do you have? You'll need much more than 7 GB for that.
>>
>> Regards,
>>
>>        Shlomi Fish
>>
>> --
>> -----------------------------------------------------------------
>> Shlomi Fish       http://www.shlomifish.org/
>> Stop Using MSIE - http://www.shlomifish.org/no-ie/
>>
>> <rindolf> She's a hot chick. But she smokes.
>> <go|dfish> She can smoke as long as she's smokin'.
>>
>> Please reply to list if it's a mailing list post - http://shlom.in/reply .
>
>
>
> --
> Realisant mon espoir, je me lance vers la gloire.
> Christopher Hahn == xrz1138 at gmail.com
>



-- 
Realisant mon espoir, je me lance vers la gloire.
Christopher Hahn == xrz1138 at gmail.com


More information about the San-Diego-pm mailing list