APM: Musings: Current state of log capture and analysis...

Montgomery Conner montgomery.conner at gmail.com
Thu Jul 1 09:03:52 PDT 2010


I'm looking into using the Spread Toolkit (http://www.spread.org/) which may
be more complex than your needs dictate but has some real advantages for my
use-cases. It has an excellent description of it's use as a logging
mechanism in Theo Schlossnagel's 'Scalable Internet Architectures' (
http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X/ref=sr_1_1?ie=UTF8&s=books&qid=1278000081&sr=8-1),
which I highly recommend; he also wrote the first of many Perl modules that
speak Spread (all available via the CPAN).

If you're at all concerned about deriving value (via analysis) of the
collected data at the scale you're dealing with you might want to consider
Hadoop (and the Hadoop file system: HDFS) as an end point for storage as
well as an analysis platform. There are some tools in various states of
development designed to import massive amounts of data into Hadoop: Scribe,
Chukwa, and Flume, which was open-sourced just this Monday by Cloudera, are
among the growing list of alternates in this space.

Hope that helps,
Montgomery

On Thu, Jul 1, 2010 at 10:47 AM, <jameschoate at austin.rr.com> wrote:

> I'm looking into a solution to collecting logs on at least a hundred or so
> servers, and possibly somewhere in the neighborhood of 5 million endpoints
> (and that could grow 2-3x).
>
> I've been googling around and found:
>
> Snare - mix of proprietary and open source solution, is based around a
> central collection service/server which is very appealing
> AWStats - this one is more for single server analysis and just doesn't feel
> right
> MindTreeInsight - Jave and open source, will likely look a little deeper
> into this one
> LASSO - Open Source and seems to be Windows only
> syslog-ng - this has been around forever and is scripted based, doesn't
> scale the way I'd like
> Analog - this one I'm not familiar with, currently researching
> Webalizer - is more focused on single server analysis and may have scaling
> issues, currently researching
> Yaala - not familiar with this one at all, still researching
>
> Any that you know if that I missed? If you have a favorite can you share in
> 3-5 sentences why? Scaling is important.
>
> I was also looking at a JASON based log analysis tool but didn't find any.
> This tech looks like a good way to approach this problem. Scaling might be
> an issue.
>
> --
>  -- -- -- --
> Venimus, Vidimus, Dolavimus
>
> jameschoate at austin.rr.com
> james.choate at g.austincc.edu
> james.choate at twcable.com
> h: 512-657-1279
> w: 512-845-8989
> http://hackerspaces.org/wiki/Confusion_Research_Center
>
> Adapt, Adopt, Improvise
>  -- -- -- --
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20100701/e4b8485a/attachment.html>


More information about the Austin mailing list