[oak perl] regexp for discussion
George Woolley
george at metaart.org
Thu Mar 11 18:58:22 CST 2004
On Thursday 11 March 2004 10:20 am, B.E.G wrote:
> ...
> Now one from a log parser I use (Hi George, did iPro parse like this?).
Hi, Elijah.
Yes, some of the filters used a regex something like
the one you show below.
Many of the filters, however, began with a split.
> The particular parser understands (== has regexps for) four different
> formats. As George likes to attest, this is not expected to work on
> all log lines of the target format, just better than 99% of them.
Right, I don't recall anyone being concerned with losing 1% of the data.
For some accounts even 2% might be OK.
> combined => [ # Standard apache 'combined' log format
> # Column names, for (captures)
> [ 'ip', 'identd', 'username', 'date', 'time', 'tz', 'method', 'file',
> 'protocol', 'status', 'bytes', 'referer', 'client', 'other', ],
> # Regexp
> qr%^ # anchor
> ([\w.]+) # IP
> \s+ # whitespace
> (\S+) # ident check
> \s+ # whitespace
> (\S+) # auth user
> \s+ # whitespace
> \[(\d\d/\w\w\w/\d\d\d\d) # date
>
> :(\d\d:\d\d:\d\d) # time
>
> \s+([\d-]+)\] # timezone
> \s+ # whitespace
> "(\w+) # GET/POST/HEAD, etc
> \s+ # whitespace
> (\S+) # URI/URL
> (?: # grouping for optional version
> \s+ # whitespace
> (HTTP/[\d.]+) # protocol version
> )? # end grouping
> " # end of request line
> \s+ # whitespace
> (\d\d\d) # response code, 200 success, etc
> \s+ # whitespace
> (\d+|-) # bytes written
> \s+ # whitespace
> "(\S+)" # referrer
> \s+ # whitespace
> "([^"]+)" # user agent
> \s* # whitespace
> (.*) # other
> $ # anchor
> %xi,
> ],
> ...
> Elijah
More information about the Oakland
mailing list