From jameschoate at austin.rr.com Thu Jul 1 08:47:03 2010 From: jameschoate at austin.rr.com (jameschoate at austin.rr.com) Date: Thu, 1 Jul 2010 15:47:03 +0000 Subject: APM: Musings: Current state of log capture and analysis... Message-ID: <20100701154703.37JJL.74098.root@hrndva-web02-z01> I'm looking into a solution to collecting logs on at least a hundred or so servers, and possibly somewhere in the neighborhood of 5 million endpoints (and that could grow 2-3x). I've been googling around and found: Snare - mix of proprietary and open source solution, is based around a central collection service/server which is very appealing AWStats - this one is more for single server analysis and just doesn't feel right MindTreeInsight - Jave and open source, will likely look a little deeper into this one LASSO - Open Source and seems to be Windows only syslog-ng - this has been around forever and is scripted based, doesn't scale the way I'd like Analog - this one I'm not familiar with, currently researching Webalizer - is more focused on single server analysis and may have scaling issues, currently researching Yaala - not familiar with this one at all, still researching Any that you know if that I missed? If you have a favorite can you share in 3-5 sentences why? Scaling is important. I was also looking at a JASON based log analysis tool but didn't find any. This tech looks like a good way to approach this problem. Scaling might be an issue. -- -- -- -- -- Venimus, Vidimus, Dolavimus jameschoate at austin.rr.com james.choate at g.austincc.edu james.choate at twcable.com h: 512-657-1279 w: 512-845-8989 http://hackerspaces.org/wiki/Confusion_Research_Center Adapt, Adopt, Improvise -- -- -- -- From montgomery.conner at gmail.com Thu Jul 1 09:03:52 2010 From: montgomery.conner at gmail.com (Montgomery Conner) Date: Thu, 1 Jul 2010 11:03:52 -0500 Subject: APM: Musings: Current state of log capture and analysis... In-Reply-To: <20100701154703.37JJL.74098.root@hrndva-web02-z01> References: <20100701154703.37JJL.74098.root@hrndva-web02-z01> Message-ID: I'm looking into using the Spread Toolkit (http://www.spread.org/) which may be more complex than your needs dictate but has some real advantages for my use-cases. It has an excellent description of it's use as a logging mechanism in Theo Schlossnagel's 'Scalable Internet Architectures' ( http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X/ref=sr_1_1?ie=UTF8&s=books&qid=1278000081&sr=8-1), which I highly recommend; he also wrote the first of many Perl modules that speak Spread (all available via the CPAN). If you're at all concerned about deriving value (via analysis) of the collected data at the scale you're dealing with you might want to consider Hadoop (and the Hadoop file system: HDFS) as an end point for storage as well as an analysis platform. There are some tools in various states of development designed to import massive amounts of data into Hadoop: Scribe, Chukwa, and Flume, which was open-sourced just this Monday by Cloudera, are among the growing list of alternates in this space. Hope that helps, Montgomery On Thu, Jul 1, 2010 at 10:47 AM, wrote: > I'm looking into a solution to collecting logs on at least a hundred or so > servers, and possibly somewhere in the neighborhood of 5 million endpoints > (and that could grow 2-3x). > > I've been googling around and found: > > Snare - mix of proprietary and open source solution, is based around a > central collection service/server which is very appealing > AWStats - this one is more for single server analysis and just doesn't feel > right > MindTreeInsight - Jave and open source, will likely look a little deeper > into this one > LASSO - Open Source and seems to be Windows only > syslog-ng - this has been around forever and is scripted based, doesn't > scale the way I'd like > Analog - this one I'm not familiar with, currently researching > Webalizer - is more focused on single server analysis and may have scaling > issues, currently researching > Yaala - not familiar with this one at all, still researching > > Any that you know if that I missed? If you have a favorite can you share in > 3-5 sentences why? Scaling is important. > > I was also looking at a JASON based log analysis tool but didn't find any. > This tech looks like a good way to approach this problem. Scaling might be > an issue. > > -- > -- -- -- -- > Venimus, Vidimus, Dolavimus > > jameschoate at austin.rr.com > james.choate at g.austincc.edu > james.choate at twcable.com > h: 512-657-1279 > w: 512-845-8989 > http://hackerspaces.org/wiki/Confusion_Research_Center > > Adapt, Adopt, Improvise > -- -- -- -- > _______________________________________________ > Austin mailing list > Austin at pm.org > http://mail.pm.org/mailman/listinfo/austin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameschoate at austin.rr.com Thu Jul 1 11:19:32 2010 From: jameschoate at austin.rr.com (jameschoate at austin.rr.com) Date: Thu, 1 Jul 2010 18:19:32 +0000 Subject: APM: [Lopsa-us-tx-austin] Musings: Current state of log capture and analysis... In-Reply-To: <93D9FF29-7C93-4297-9FE1-FD6CE3CADAC3@zenoss.com> Message-ID: <20100701181932.SEFU2.75087.root@hrndva-web02-z01> Hi Matt, You won't remember me but I talked to you after your talk here in Austin a few months ago at the Linux expo (don't remember it's actual name now). We discussed your tools ability to capture large SNMP trap populations. I'll give this a look. Thanks. ---- Matt Ray wrote: > I actually discussed this with one of the admins from Twitter at Velocity. They were using Splunk, but ran into scaling issues eventually and replaced it with a home-grown solution of Scribe + Hadoop File System (http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html). If you're going to large-scale installations, that might be a path to explore. -- -- -- -- -- Venimus, Vidimus, Dolavimus jameschoate at austin.rr.com james.choate at g.austincc.edu james.choate at twcable.com h: 512-657-1279 w: 512-845-8989 http://hackerspaces.org/wiki/Confusion_Research_Center Adapt, Adopt, Improvise -- -- -- -- From jameschoate at austin.rr.com Thu Jul 1 12:29:50 2010 From: jameschoate at austin.rr.com (jameschoate at austin.rr.com) Date: Thu, 1 Jul 2010 14:29:50 -0500 Subject: APM: [Lopsa-us-tx-austin] Musings: Current state of log capture and analysis... In-Reply-To: Message-ID: <20100701192950.3I8E6.75500.root@hrndva-web02-z01> ---- Mark Farver wrote: > My experience boiled down to two choices. > > 1. Splunk. Which is just awesome, scales ok up to a couple hundred > gigs a day and is easy to use. It is priced by the GB, and the price > is heart attack inducing. Yep, that kills it right there. > 2. Roll your own... basically a bunch of syslog collectors writing to > Hadoop/HFS (if you expect to actually analyze all of that data) I guess it's time to talk IP then... > Either way, you'll need to build a rack or two of high disk capacity > machines to house the data on. The nice thing is Hadoop works pretty > well on generic server hardware and consumer grade disks. Stuff a > machine with 2TB disks and you can pack as much as 12TB into 1U. I > recommend starting with about 10-20 machines and scaling up.. much > less that that and you'll have the diskspace by probably not enough > CPU to do analysis. The current plan is a single machine to store the files from the various head-ends on, keeping five days worth of each. The analysis will get done on other boxes and at this point isn't my problem. I'm opting for a pull mechanism from the server, I see a push from the clients as taking too much maint. > Expect that this system is going to require at least a full time > employee seat or two. Probably a Hadoop admin, and a > programmer/report writer. Hadoop is pretty easy to setup, but actual > data analysis takes some skill. I can give you some pointers, or you > might be able to find a Rackspace Hadoop person (there are quite a few > in SA) that would moonlight. In one of these comments I mentioned the fiscal responsibility of cable companies... Thanks for the feedback Mark. -- -- -- -- -- Venimus, Vidimus, Dolavimus jameschoate at austin.rr.com james.choate at g.austincc.edu james.choate at twcable.com h: 512-657-1279 w: 512-845-8989 http://hackerspaces.org/wiki/Confusion_Research_Center Adapt, Adopt, Improvise -- -- -- -- From brian.litke at sedl.org Thu Jul 1 13:52:46 2010 From: brian.litke at sedl.org (Brian Litke) Date: Thu, 1 Jul 2010 15:52:46 -0500 Subject: APM: Another log analyzer In-Reply-To: References: Message-ID: <39EAB836-CCD8-4C29-B573-60ACDE64BB3B@sedl.org> Hi, I've been using the "wusage" access_log analyzer by Boutelle for years. I believe it is a Perl-based software. It works on UNIX and Windows http://www.boutell.com/wusage/ I recently had to upgrade to version 8, because my 10-year-old wusage binary did not run on the new virtualized server my site was moved to. Wusage has a - single domain option ($25) for commercial sites and - 5-domain ($75) and - unlimited number of domains options ($295), too. Non-profits price is one-third of the prices shown above. Brian Litke Web Administrator http://www.sedl.org On Jul 1, 2010, at 2:00 PM, austin-request at pm.org wrote: > Send Austin mailing list submissions to > austin at pm.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.pm.org/mailman/listinfo/austin > or, via email, send a message with subject or body 'help' to > austin-request at pm.org > > You can reach the person managing the list at > austin-owner at pm.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Austin digest..." > > > Today's Topics: > > 1. Musings: Current state of log capture and analysis... > (jameschoate at austin.rr.com) > 2. Re: Musings: Current state of log capture and analysis... > (Montgomery Conner) > 3. Re: [Lopsa-us-tx-austin] Musings: Current state of log > capture and analysis... (jameschoate at austin.rr.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 1 Jul 2010 15:47:03 +0000 > From: > Subject: APM: Musings: Current state of log capture and analysis... > To: Austin Area Leauge of Pro Sysadmins > Cc: "Austin-Hacking-Society at googlegroups.com" > , "Austin: pm.org" > > Message-ID: <20100701154703.37JJL.74098.root at hrndva-web02-z01> > Content-Type: text/plain; charset=utf-8 > > I'm looking into a solution to collecting logs on at least a hundred or so servers, and possibly somewhere in the neighborhood of 5 million endpoints (and that could grow 2-3x). > > I've been googling around and found: > > Snare - mix of proprietary and open source solution, is based around a central collection service/server which is very appealing > AWStats - this one is more for single server analysis and just doesn't feel right > MindTreeInsight - Jave and open source, will likely look a little deeper into this one > LASSO - Open Source and seems to be Windows only > syslog-ng - this has been around forever and is scripted based, doesn't scale the way I'd like > Analog - this one I'm not familiar with, currently researching > Webalizer - is more focused on single server analysis and may have scaling issues, currently researching > Yaala - not familiar with this one at all, still researching > > Any that you know if that I missed? If you have a favorite can you share in 3-5 sentences why? Scaling is important. > > I was also looking at a JASON based log analysis tool but didn't find any. This tech looks like a good way to approach this problem. Scaling might be an issue. > > -- > -- -- -- -- > Venimus, Vidimus, Dolavimus > > jameschoate at austin.rr.com > james.choate at g.austincc.edu > james.choate at twcable.com > h: 512-657-1279 > w: 512-845-8989 > http://hackerspaces.org/wiki/Confusion_Research_Center > > Adapt, Adopt, Improvise > -- -- -- -- > > > ------------------------------ > > Message: 2 > Date: Thu, 1 Jul 2010 11:03:52 -0500 > From: Montgomery Conner > Subject: Re: APM: Musings: Current state of log capture and > analysis... > To: jameschoate at austin.rr.com > Cc: Austin Area Leauge of Pro Sysadmins > , > "Austin-Hacking-Society at googlegroups.com" > , "Austin: pm.org" > > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > I'm looking into using the Spread Toolkit (http://www.spread.org/) which may > be more complex than your needs dictate but has some real advantages for my > use-cases. It has an excellent description of it's use as a logging > mechanism in Theo Schlossnagel's 'Scalable Internet Architectures' ( > http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X/ref=sr_1_1?ie=UTF8&s=books&qid=1278000081&sr=8-1), > which I highly recommend; he also wrote the first of many Perl modules that > speak Spread (all available via the CPAN). > > If you're at all concerned about deriving value (via analysis) of the > collected data at the scale you're dealing with you might want to consider > Hadoop (and the Hadoop file system: HDFS) as an end point for storage as > well as an analysis platform. There are some tools in various states of > development designed to import massive amounts of data into Hadoop: Scribe, > Chukwa, and Flume, which was open-sourced just this Monday by Cloudera, are > among the growing list of alternates in this space. > > Hope that helps, > Montgomery > > On Thu, Jul 1, 2010 at 10:47 AM, wrote: > >> I'm looking into a solution to collecting logs on at least a hundred or so >> servers, and possibly somewhere in the neighborhood of 5 million endpoints >> (and that could grow 2-3x). >> >> I've been googling around and found: >> >> Snare - mix of proprietary and open source solution, is based around a >> central collection service/server which is very appealing >> AWStats - this one is more for single server analysis and just doesn't feel >> right >> MindTreeInsight - Jave and open source, will likely look a little deeper >> into this one >> LASSO - Open Source and seems to be Windows only >> syslog-ng - this has been around forever and is scripted based, doesn't >> scale the way I'd like >> Analog - this one I'm not familiar with, currently researching >> Webalizer - is more focused on single server analysis and may have scaling >> issues, currently researching >> Yaala - not familiar with this one at all, still researching >> >> Any that you know if that I missed? If you have a favorite can you share in >> 3-5 sentences why? Scaling is important. >> >> I was also looking at a JASON based log analysis tool but didn't find any. >> This tech looks like a good way to approach this problem. Scaling might be >> an issue. >> >> -- >> -- -- -- -- >> Venimus, Vidimus, Dolavimus >> >> jameschoate at austin.rr.com >> james.choate at g.austincc.edu >> james.choate at twcable.com >> h: 512-657-1279 >> w: 512-845-8989 >> http://hackerspaces.org/wiki/Confusion_Research_Center >> >> Adapt, Adopt, Improvise >> -- -- -- -- >> _______________________________________________ >> Austin mailing list >> Austin at pm.org >> http://mail.pm.org/mailman/listinfo/austin >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 1 Jul 2010 18:19:32 +0000 > From: > Subject: Re: APM: [Lopsa-us-tx-austin] Musings: Current state of log > capture and analysis... > To: Matt Ray > Cc: Austin Area Leauge of Pro Sysadmins > , > "Austin-Hacking-Society at googlegroups.com" > , "Austin: pm.org" > > Message-ID: <20100701181932.SEFU2.75087.root at hrndva-web02-z01> > Content-Type: text/plain; charset=utf-8 > > Hi Matt, > > You won't remember me but I talked to you after your talk here in Austin a few months ago at the Linux expo (don't remember it's actual name now). We discussed your tools ability to capture large SNMP trap populations. > > I'll give this a look. Thanks. > > ---- Matt Ray wrote: >> I actually discussed this with one of the admins from Twitter at Velocity. They were using Splunk, but ran into scaling issues eventually and replaced it with a home-grown solution of Scribe + Hadoop File System (http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html). If you're going to large-scale installations, that might be a path to explore. > > -- > -- -- -- -- > Venimus, Vidimus, Dolavimus > > jameschoate at austin.rr.com > james.choate at g.austincc.edu > james.choate at twcable.com > h: 512-657-1279 > w: 512-845-8989 > http://hackerspaces.org/wiki/Confusion_Research_Center > > Adapt, Adopt, Improvise > -- -- -- -- > > > ------------------------------ > > _______________________________________________ > Austin mailing list > Austin at pm.org > http://mail.pm.org/mailman/listinfo/austin > > End of Austin Digest, Vol 80, Issue 1 > ************************************* Brian Litke Web Administrator SEDL 4700 Mueller Blvd. Austin, TX 78723 512-391-6529 (voice) 512-476-2286 (fax) http://www.sedl.org "Advancing Research, Improving Education" -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameschoate at austin.rr.com Fri Jul 2 08:10:34 2010 From: jameschoate at austin.rr.com (jameschoate at austin.rr.com) Date: Fri, 2 Jul 2010 15:10:34 +0000 Subject: APM: CRC: Early web server planning... Message-ID: <20100702151034.MJC74.77849.root@hrndva-web21-z02> I've started looking at what the framework should look like once the initial server is up. At this point here's where I"m at, - Linux OS (using basic LAMP architecture w/ normal Unix user services like IRC, email, etc.) - Services and resources distributed across multiple servers in different cities (Austin and Cheyenne to start) - Drupal for the web services - Hadoop (incl. Chukwa) for managing the distributed computing aspects - BOINC for dealing with application layer distributed computing - Need to have a mechanism using DynDNS to provide emergency fail-over (check w/ the provider to see if they have any kind of DynDNS support or problem with our using it) - The web architecture should be based on Web 3.0/Semantic Web toolkits, this should include NLP and AI agents - Needs to have a revision control mechanism available (and used!) I'll be in Cheyenne the week of Aug. 16 for three days. I expect to have the server available at the beginning of Sept. I'm going to continue to pursue additional members for the Board of Directors so we can incorporate in Texas as a non-profit. If that is not resolved by Jan 1, 2011 then I'm going to look at other alternatives to get a workspace open. -- -- -- -- -- Venimus, Vidimus, Dolavimus jameschoate at austin.rr.com james.choate at g.austincc.edu james.choate at twcable.com h: 512-657-1279 w: 512-845-8989 http://hackerspaces.org/wiki/Confusion_Research_Center Adapt, Adopt, Improvise -- -- -- --