From william.l.lewis at usa.net Tue Jul 1 11:15:58 2003 From: william.l.lewis at usa.net (bill lewis) Date: Thu Aug 5 00:18:24 2004 Subject: Log file parsing Message-ID: <536HgaqP79728S13.1057076158@uwdvg013.cms.usa.net> Hi all, Long time listener, first time poster. I am scoured around looking for a fast way to pull lines out of very large log files. These files are on the order of 300+ mb. I am finding a list of Proc/pid combinations that we log, and creating an array of those then foreach'ing over that list looking through the log for matching lines. Each new proc/pid means a new search on the entire file which is obviously time consuming. Right now, I am using the OS grep to pipe into perl to get the lines for each procpid and then processing those lines. Unfortunately, there is not enouhg memory to slurp the whole thing into ram and process that way. Though, if I could how would I go about that? Anyone done anything like this before? thanks, Bill Lewis ----- William L. Lewis email: william.l.lewis@usa.net "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." --Ben Franklin From timc+perl at divide.net Tue Jul 1 11:29:43 2003 From: timc+perl at divide.net (Tim Chambers) Date: Thu Aug 5 00:18:24 2004 Subject: Log file parsing References: <536HgaqP79728S13.1057076158@uwdvg013.cms.usa.net> Message-ID: <013c01c33fed$fb5fbb60$a91b1944@cephas> Bill, > I am scoured around looking for a > fast way to pull lines out of very large log files. These files are on the > order of 300+ mb. > Unfortunately, there is not enouhg memory to slurp the whole thing into ram > and process that way. Though, if I could how would I go about that? Have you thought about slurping it into a database? If the logs don't change often, that's the way to go. Then the database queries would be very efficient. I'd use mySQL. As for slurping the whole file into memory -- just give it a try. 300 megs isn't much swap -- even for Windoze. Create a data structure (probably an associative array) and do your queries on that. You might find that the data structure is significantly smaller than the raw file. <>< Tim From hierophant at pcisys.net Tue Jul 1 12:45:53 2003 From: hierophant at pcisys.net (Keary Suska) Date: Thu Aug 5 00:18:24 2004 Subject: Log file parsing In-Reply-To: <536HgaqP79728S13.1057076158@uwdvg013.cms.usa.net> Message-ID: on 7/1/03 10:15 AM, william.l.lewis@usa.net purportedly said: > Long time listener, first time poster. I am scoured around looking for a > fast way to pull lines out of very large log files. These files are on the > order of 300+ mb. > > I am finding a list of Proc/pid combinations that we log, and creating an > array of those then foreach'ing over that list looking through the log for > matching lines. > > Each new proc/pid means a new search on the entire file which is obviously > time consuming. Right now, I am using the OS grep to pipe into perl to get > the lines for each procpid and then processing those lines. > > Unfortunately, there is not enouhg memory to slurp the whole thing into ram > and process that way. Though, if I could how would I go about that? > > Anyone done anything like this before? I have done various manipulations on Apache log files in the order of 1+gb, and I find Perl can chew through a large text file in no time flat (less than a minute for 1gb files). Of course, depends on what you need to do to each line. If you use a hash instead of an array for PIDs, you will likely eke out more performance: 1) extract pid from line 2) check if exists in hash 3) do whatever with line With new PIDs, if you can maintain state, you can simply search the file for the new PIDs (i.e. as opposed to the whole list all over again). If this isn't acceptable performance-wise, you could also try maintaining a bitmap of the file (one bit per line, on if PID is known, off if otherwise) and then only deal with candidate lines. Of course, this is only if using a database isn't an acceptable option, as I would agree with Tim that for performance and flexibility this is the best option. Keary Suska Esoteritech, Inc. "Leveraging Open Source for a better Internet" From jtevans at kilnar.com Wed Jul 2 12:23:42 2003 From: jtevans at kilnar.com (John Evans) Date: Thu Aug 5 00:18:24 2004 Subject: Perl Mongers Lunch Tomorrow! In-Reply-To: Message-ID: On Mon, 30 Jun 2003, John Evans wrote: Who: Pikes Peak Perl Mongers What: Lunch. Food. Grub. Sustenance. Flavored Grease. When: Thursday, July 3rd 2003 Time: 11:30 AM Where: Village Inn, 8050 N Academy Blvd. No takers this month on picking where we're going to eat, so I've made (as my boss likes to say) an "executive decision" and chosen Village Inn. It's on North Academy near the Chapel Hills Mall. http://www.mapquest.com/maps/map.adp?address=8050+N.+Academy&zipcode=80920 See you guys there at 11:30! -- John Evans http://jtevans.kilnar.com/ -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS d- s++:- a- C+++>++++ ULSB++++$ P+++$ L++++$ E--- W++ N+ o? K? w O- M V PS+ !PE Y+ PGP t(--) 5-- X++(+++) R+++ tv+ b+++(++++) DI+++ D++>+++ G+ e h--- r+++ y+++ ------END GEEK CODE BLOCK------ From timc+perl at divide.net Sat Jul 5 13:41:38 2003 From: timc+perl at divide.net (Tim Chambers) Date: Thu Aug 5 00:18:24 2004 Subject: Fw: Newsletter from O'Reilly UG Program, July 3 Message-ID: <00d201c34325$158d7970$e306ee0f@cephas> O'Reilly User Group Program Newsletter July 3, 2003 Highlights This Week: ---------------------------------------------------------------- Book News ---------------------------------------------------------------- -Digital Video Pocket Guide -Learning Web Design, 2nd Edition -Building Wireless Community Networks, 2nd Edition ---------------------------------------------------------------- Conferences ---------------------------------------------------------------- -Registration Is Open for the Second Annual O'Reilly Mac OS X Conference -Put Up an O'Reilly Mac OS X Conference Banner, Get A Free Book -Can't Make OSCON the whole time? Day Passes are available ---------------------------------------------------------------- Safari ---------------------------------------------------------------- -"Go On Safari" Tip of the Week Winner--Christian Gagnon, MELUG-North ---------------------------------------------------------------- News ---------------------------------------------------------------- -The Future of Mozilla Application Development -Put Up an O'Reilly ThinkGeek Banner, Get A Free Book -Writing for O'Reilly -Defending Your Site Against Spam -MySQL FULLTEXT Searching -Using the Jakarta Commons, Part 1 -Understanding Interfaces in .NET -Super-Efficient Image Rollovers -Self-Enhancing Stylesheets -Making Movies with the Apple iSight ================================================ Book News ================================================ Review books are available--email me for a copy. ***Please include the book order number on your requests. Let me know if you need your books by a certain date. Allow at least four weeks for shipping. Send or email me copies of your newsletters and book reviews. Don't forget, your members get 20% off any O'Reilly book they purchase directly from O'Reilly. Just use code DSUG when ordering. http://www.oreilly.com/ ***Group purchases with better discounts are available*** Please let me know if you are interested. Press releases are available on our press page: http://press.oreilly.com/ ***Digital Video Pocket Guide Order Number: 5237 The "Digital Video Pocket Guide" is organized into three chapters: "What Is It?", "How Does It Work?", and "How Do I...Tips, Tricks, and Techniques." Its compact size, organization, and detailed illustrations make it easy to find the information you need. This is the ultimate shooting companion that will help you create the videos you want to show to friends, family, or even the world at large. http://www.oreilly.com/catalog/dvideopg/ Sample Excerpts, "Tip 4: How to overcome backlighting," and "Tip 5: How to cope with wind," are available online: http://www.oreilly.com/catalog/dvideopg/chapter/index.html ***Learning Web Design, 2nd Edition Order Number: 4842 In "Learning Web Design," author Jennifer Niederst shares the knowledge she's gained from years of web design experience, both as a designer and a teacher. This book starts from the beginning--defining the Internet, the Web, browsers, and URLs--so you don't need to have any previous knowledge about how the Web works. After reading this book, you'll have a solid foundation in HTML, graphics, and design principles that you can immediately put to use in creating effective web pages. http://www.oreilly.com/catalog/learnweb2/ Chapter 6, "Creating a Simple Page," is available online: http://www.oreilly.com/catalog/learnweb2/chapter/index.html ***Building Wireless Community Networks, 2nd Edition Order Number: 5024 "Building Wireless Community Networks" is about getting people online using wireless network technology. The 802.11b standard (also known as WiFi) makes it possible to network towns, schools, neighborhoods, small business, and almost any kind of organization. All that's required is a willingness to cooperate and share resources. The first edition of this book helped thousands of people engage in community networking activities. This revised and expanded edition adds coverage on new network monitoring tools and techniques, regulations affecting wireless deployment, and IP network administration, including DNS and IP Tunneling. http://www.oreilly.com/catalog/wirelesscommnet2/ Chapter 3, "Network Layout, " is available online: http://www.oreilly.com/catalog/wirelesscommnet2/chapter/index.html ================================================ Conference News ================================================ ***Registration Is Open for the Second Annual O'Reilly Mac OS X Conference Do you want to tame Panther quickly, or live Apple's iLife to the fullest? If so, the second annual O'Reilly Mac OS X Conference can take you where you want to go with Apple's newest software and hardware. One day of in-depth tutorials and three days of conference sessions cover topics like network security, Cocoa, Java, Rendezvous, Quartz, AirPort Extreme, workflow management, Unix administration, and much more. Some of the many experts presenting at the conference include David Pogue, Adam Engst, mmalcolm Crawford, Dan Wood, Andy Ihnatko, Robb Beal, and Dan Frakes. O'Reilly Mac OS X Conference October 27-30, 2003 Westin Santa Clara, Santa Clara, CA http://conferences.oreilly.com/macosxcon/ User Group members who register before September 12, 2003 get a double discount. Use code DSUG when you register, and receive 20% off the "Early Bird" price. To register, go to: http://conferences.oreillynet.com/cs/macosx2003/create/ord_mac03 ***Put Up an O'Reilly Mac OS X Conference Banner, Get A Free Book Yet another new banner offer-- We are looking for user groups to display our conference banners on their web sites. If you send me the link to your user group site with our O'Reilly Mac OS X Conference banner, I will send you the O'Reilly book of your choice. O'Reilly Mac OS X Conference Banners: http://ug.oreilly.com/banners/macosx2003/ ***Can't Make OSCON the whole time? Day Passes are available (Sorry UG discount not available.) To register, go to: http://conferences.oreillynet.com/cs/os2003/create/ord_os03 O'Reilly Open Source Convention Portland Marriott Downtown, Portland, OR July 7-11, 2003 http://conferences.oreilly.com/oscon/ ================================================ Safari News ================================================ ***"Go On Safari" Tip of the Week Winner-- Christian Gagnon, MELUG-North "Presently, I have a number of reference books in multiple locations, a number of books I keep in my car, and a few that I carry around with me. (Perhaps your situation is similar?) My next experiment with Safari will be to add these references to my bookshelf and lighten my load." Your group can also participate in this introductory program just for user group members. To "Go on Safari," any of your members who sign up for our Safari 14-day free trial can send comments on their experiences, or tips and tricks for how they used Safari (it only needs to be 2 sentences long, but it may be longer) to safari_talk@oreilly.com. (Please include your UG name in the email.) Every week someone will be chosen from the tips or comments submitted to receive fun stuff from O'Reilly (T-shirts, book bags, or other surprises). If a member of your user group is selected, your group receives free gifts, too. Whatever the individual member receives, your UG will get one, too, to give away at your next meeting, or use however you see fit. Recipients--and their comments--will be announced in the User Group Newsletter. **Please use this special UG URL to sign up for the 14-day trial** http://www.oreilly.com/safari/ug For more information on Safari: http://safari.oreilly.com/ ================================================ News From O'Reilly & Beyond ================================================ --------------------- General News --------------------- ***The Future of Mozilla Application Development Recently, mozilla.org announced a major update to its development roadmap. David Boswell and Brian King provide an analysis of the new roadmap, and demonstrate how to convert an existing XPFE-based application into an application that uses the new XUL toolkit. http://www.oreillynet.com/pub/a/mozilla/2003/06/27/mozilla.html Brian and David are coauthors of "Creating Applications with Mozilla." Order Number: 0529 http://www.oreilly.com/catalog/mozilla/index.html ***Put Up an O'Reilly ThinkGeek Banner, Get A Free Book We are looking for user groups to display our ThinkGeek banners on their web sites. If you send me the link to your user group site with one of our O'Reilly ThinkGeek banners, I will send you the O'Reilly book of your choice. O'Reilly ThinkGeek Banners: http://ug.oreilly.com/banners/thinkgeek/ ***Writing for O'Reilly We're always looking for new authors and new book ideas. Our ideal author has real technical competence and a passion for explaining things clearly. We're happy to work with first time authors, and encourage inquiries about virtually any topic. However, it helps if you know that we tend to publish "high end" books rather than books for dummies, and generally don't want yet another book on a topic that's already well covered. For more information, please check out: http://oreilly.com/oreilly/author/intro.html --------------------- Open Source --------------------- ***Defending Your Site Against Spam To users, unsolicited commercial e-mail is an annoyance. To mail server administrators, it's a threat. Dru Nelson recently had his network attacked by spammers. He explains the various defenses he considered for protecting against future attacks. http://linux.oreillynet.com/pub/a/linux/2003/06/26/blocklist.html ***MySQL FULLTEXT Searching Storing text in your database is handy, but searching can be a pain. MySQL's FULLTEXT search can save your sanity. Joe Stump demonstrates how it works and gives several ideas on how to use it in your own applications. http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html --------------------- Java --------------------- ***Using the Jakarta Commons, Part 1 Ever find yourself thinking "Someone's surely solved this problem before?" That's the beauty of open source. In this first of three articles, Vikram Goyal explores the Jakarta Commons, mature and well-defined reusable Java components. http://www.onjava.com/pub/a/onjava/2003/06/25/commons.html --------------------- .NET --------------------- ***Understanding Interfaces in .NET .NET introduces the potentially confusing concept of an interface. An interface is a contract that defines the signature of some piece of functionality. Throughout the .NET framework, interfaces are used to define that certain types have well-known behaviors. Nick Harrison explains what interfaces are and how to use them in your own classes. http://www.ondotnet.com/pub/a/dotnet/2003/06/30/interfaces.html --------------------- Web --------------------- ***Super-Efficient Image Rollovers Danny Goodman shows you how to reduce the number of individual image files downloaded to a browser to accomplish three-state image rollovers. http://www.oreillynet.com/pub/a/javascript/2003/07/01/bonusrecipe.html Danny is the author of "JavaScript & DHTML Cookbook." Order Number: 4672 http://www.oreilly.com/catalog/jvdhtmlckbk/index.html --------------------- XML --------------------- ***Self-Enhancing Stylesheets Developing new stylesheets can be a chore. So why not let XSLT take the load? This article shows how to easily check the coverage of your XSLT and create skeleton stylesheets. http://www.xml.com/pub/a/2003/07/02/xslt2.html --------------------- Mac --------------------- ***Making Movies with the Apple iSight Online conferencing is great, but what else can you do with your new iSight? Actually, quite a bit. In this first installment of an ongoing series, Derrick Story shows you how to make professional-looking QuickTime movies with just an iSight and some very inexpensive software. http://www.macdevcenter.com/pub/a/mac/2003/07/01/isight.html Until next time-- Marsee