From matthew_heusser at mcgraw-hill.com Mon Oct 1 06:47:43 2001 From: matthew_heusser at mcgraw-hill.com (matthew_heusser@mcgraw-hill.com) Date: Wed Aug 4 00:01:17 2004 Subject: Lies, Damned Lies, and ... Statistics Message-ID: <85256AD8.00418B5F.00@corpny55wls01.mcgraw-hill.com> (Sorry about the Subject, but I wanted people to read this :-) Folks: As some of you know, I'm currently taking a grad class in Software Development Management where we have to create and support a project plan for a database-driven inventory control system. At Friday's meeting, I requested some statistics on transactions per second, uptime, scalability, security, etc. for open-source (and even closed-source) databases. A few folks said they would get details to me. (Personal experience is good, URLS are better.) If you could email me details ASAP (within a couplea days), I would personally appreciate it. Even "Our shop does roughly XXY on a ZZZ-box, I'll get more solid details later" is a good start so I can pencil you into the document, to be penned later. For those not at the meeting, please feel free to chime in. My goal is to find a few options of SQL databases that (preferably) support stored procedures, transactions, and fast backup. Good details = Free Soda [12-oz] or something ... thanks in advance, Matt Heusser From matthew_heusser at mcgraw-hill.com Fri Oct 5 09:41:38 2001 From: matthew_heusser at mcgraw-hill.com (matthew_heusser@mcgraw-hill.com) Date: Wed Aug 4 00:01:17 2004 Subject: More theoretical Database Stuff Message-ID: <85256ADC.00517BD6.00@corpny55wls01.mcgraw-hill.com> Folks: I'd like to get your opinions/experiences/references on something. Let's say we're developing a 24/7 multi-user (avg: 5 at a time, all the time) database for a warehousing project for a multi-site food company. (GFS, Spartan, whatever.) Here's what we're thinking: - Server: Linux or Solaris-Running x86's, two of them, each with dual processors and RAID 5. Not true clustering, the 2nd one just "mirrors" the first and routinely backs up transactions to tape. This costs no downtime. In the event of server 1 failure, we switch over to server two, swap some hard drives around, restore, and re-run. Operational downtime of about 3 minutes in the event of a failure, with general uptime of, say, 99.9% or so. We'd pay the extra $ for the hardware that scales and doesn't fail, and a tape backup that can run while the database is running. (Question: Can PostGreSQL and Lunix handle this? I think we might need solaris for it's transactioning file system.) My thought was that we didn't need the extra "guarentees" (system mirroring) if we had tape backup and raid 5, but my partner is a team leader at Meijer and insists that 1 hour of downtime will most more than just buying an extra system and mirroring it. Which begs the questions: 1) Can we do tape backup in Linux in real-time with PostGreSQL and 2) What do you think of the scheme above? regards, Matt H. From albert.tobey at priority-health.com Fri Oct 5 12:58:24 2001 From: albert.tobey at priority-health.com (Albert P Tobey) Date: Wed Aug 4 00:01:17 2004 Subject: less theoretical and more realistic Database Stuff In-Reply-To: <85256ADC.00517BD6.00@corpny55wls01.mcgraw-hill.com> References: <85256ADC.00517BD6.00@corpny55wls01.mcgraw-hill.com> Message-ID: <1002304704.4555.45.camel@linuxws1> > - Server: Linux or Solaris-Running x86's, two of them, each > with dual processors and RAID 5. Not true clustering, the > 2nd one just "mirrors" the first and routinely backs up > transactions to tape. This costs no downtime. In the event > of server 1 failure, we switch over to server two, swap some > hard drives around, restore, and re-run. Operational downtime > of about 3 minutes in the event of a failure, with general uptime > of, say, 99.9% or so. Really, on x86, Linux will smoke Solaris in both performance and reliability. Linux's filesystem(s) are much faster and Linux is actually _native_ to the ix86 platform instead of a port from SPARC. Both can do RAID5, but Linux pulls ahead when you want to do failover for cheap. There is a filesystem for linux called GFS developed by Sistina (www.sistina.com) that would eliminate the need for disk swaps and make the whole thing automatable. GFS will allow you to have two servers with full access to the same physical disks (one read, one read/write) so that in the event of a failure, you can bring up the database on the other server with little hassle. I haven't used this, but it's there for your perusal. There are also some more primitive versions of mirrored filesystems out there, but they won't perform as well and aren't as reliable. For either Linux or Solaris, you'll want to check out the linux-ha (www.linux-ha.org) project for high-availability software that is free and can handle what you're going to need. > We'd pay the extra $ for the hardware that scales and doesn't > fail, and a tape backup that can run while the database is running. > (Question: Can PostGreSQL and Lunix handle this? I think we might > need solaris for it's transactioning file system.) The word you're thinking of is "journaling" filesystem. Solaris has vxfs (veritas) which is journaling, but Linux has a plethora of journaling filesystems such as reiserfs, ext3, jfs, and xfs. All four are considered stable and production ready. xfs and jfs have some neat features that may interest you further such as ACLs. Remember, journaling filesystems only provide additional data security in the event of hardware failures (such as loss of power) and faster boot times once the problem has been resolved (fsck). Buying an HP NetServer or a Penguin Computing (www.penguincomputing.com) machine will provide all of the fault tolerance and scalability you'll need with support and great performance. The price difference for a SPARC system is staggering and probably won't buy you much reliability as far as I've seen (I used to work in a Sun shop). Both PostgreSQL and MySQL support hot backups of various sorts. The fun part of each is that TMTOWTDI. Both have a command for dumping the database to flat SQL text, and both have hot-backup facilities, although I think you have to buy it for MySQL. The underlying OS has nothing to do with the database's ability to be backed up hot or cold. > > My thought was that we didn't need the extra "guarentees" > (system mirroring) if we had tape backup and raid 5, > but my partner is a team leader at Meijer and insists that 1 hour > of downtime will most more than just buying an extra system and > mirroring it. Which begs the questions: > > 1) Can we do tape backup in Linux in real-time with PostGreSQL and > 2) What do you think of the scheme above? 1. If PostgreSQL can do it, either OS can do the tape backup. The OS really makes no difference whatsoever in this regard. 2. Looks good. I could add a ton of stuff for making it more highly-available (esp. on Linux), but the failover scheme looks good for minimizing downtime. With Linux + GFS + linux-ha, I'd say could get your downtime to quickly approach 0, but you'll be paying more for a disk array that supports GFS and more time to set it all up, but it might be worth it in the long run. Really, if PostgreSQL fulfills your needs, it makes no difference which operating system you use. I, of course, would recommend Linux over Solaris because it will likely perform better on the x86 platform and there is more high-availability software available in the public domain. Another option you might want to check out is RedHat's RedHat Database which is a derivative of PostgreSQL. I don't know much about it, but they can certainly offer you support and maybe even some features above and beyond what the OSS version can. Also, think about using SuSE Linux if you want jounaling filesystem out of the box (reiserfs). RedHat 7.1 can support reiserfs and 7.2 will come with ext3. Debian likely supports any of the filesytems I mentioned earlier in this message, although you'll have to apt-get them after install. I could go on all day about how to set up a Linux machine for production/HA, but I'll leave that for when you're ready ;P -Al -- "Open source" means that anyone can get a copy of the source code. Developers can find security weaknesses very easily with Linux. The same is not true with Microsoft Windows. Microsoft, "What Every Retailer Should Know", February 2001 ******************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the Priority Health Information Services Department at (616) 942-0954. ******************************************************************** From steve at bbdltd.com Fri Oct 5 13:02:01 2001 From: steve at bbdltd.com (Steve Johnson) Date: Wed Aug 4 00:01:17 2004 Subject: More theoretical Database Stuff In-Reply-To: <85256ADC.00517BD6.00@corpny55wls01.mcgraw-hill.com> Message-ID: 1) Can we do tape backup in Linux in real-time with PostGreSQL Yes, use pg_dump to do a live dump and create unix-pipeable output which can feed tar or be pulled/pushed to another system or to the file system and then cut to tape. You only need to do a shutdown if you are backing up at the file system level. 2) What do you think of the scheme above? There is always a cost associated with recovery time. Pay more for a shorter recovery time (duplicated unused hardware). Or pay even more when the disaster hits (rush around and buy it then while the users wait...). But specifically; > Server: Linux or Solaris-Running x86's, two of them... In the event > of server 1 failure, we switch over to server two, swap some > hard drives around, restore, and re-run.... Easier than that, you can set this up to fail over automatically using some basic linux functionality. With the free stuff, you only lose db connections. Everything else fails over automagically (connections then get re-established). Check out the HOW-TOs. I've read that Postgresql can be made to handle syncing between databases, but I haven't done it. If you can't get it to work, then just snapshot the data every x hours and snapshot the transaction journal every fraction of x hours. > (Question: Can PostGreSQL and Lunix handle this? I think we might > need solaris for it's transactioning file system.) You can add transactioning file system to linux, but that probably won't matter. You want a transactional database, which PG is. Adding transactions to the file system just makes that more reliable and allows faster recovery than fsck'ing the disks. > My thought was that we didn't need the extra "guarentees" > (system mirroring) if we had tape backup and raid 5, These help make things safer, but... > but my partner is a team leader at Meijer and insists that 1 hour > of downtime will most more than just buying an extra system and > mirroring it How about 1 minute? That's what you can get to with a fast failover model that you can create with the free tools and similar hardware (doesn't have to be exactly the same). The question is, do you need fast recover (less than 30 minutes)? Then you need a live system. If you don't need that fast, then you better be ready for times measured in days for hardware repair, system recover/re-install, database recovery, journal recovery, and system audit. That's my 2 cents. sj From Tim.Maletic at priority-health.com Fri Oct 5 13:38:50 2001 From: Tim.Maletic at priority-health.com (Tim.Maletic@priority-health.com) Date: Wed Aug 4 00:01:17 2004 Subject: More theoretical Database Stuff Message-ID: <46D57156F98CD511991D00D0B7BFF75F70D29E@guam.internal.priority-health.com> I don't have much to add to Al and Steve's points, except this... Your goal should be to reduce *application* downtime to the 0, not host downtime. Hosts *must* go down (for kernel-level security patches, if nothing else). So use a front-end load-balancer (or the poor-man's version: round-robin DNS) to distribute your traffic across 2 or more systems, and use linux-ha to fail an IP address from one machine to another in the cluster. Then you can bring down a host for maintenance, and the world will never know. (Well, ok, it takes about 5-10 seconds for the IP to fail over.) -Tim PS. I'm sure that Perl comes to the rescue at some point in this strategy, I just can't remember where:) ******************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the Priority Health Information Services Department at (616) 942-0954. ******************************************************************** From matthew_heusser at mcgraw-hill.com Mon Oct 8 06:51:29 2001 From: matthew_heusser at mcgraw-hill.com (matthew_heusser@mcgraw-hill.com) Date: Wed Aug 4 00:01:17 2004 Subject: less theoretical and more realistic Database Stuff Message-ID: <85256ADF.0041521A.00@corpnj148ls01.mcgraw-hill.com> Thanks for the help, guys. ESPECIALLY the hard links to companies that provide those functions. As for Perl, I would probably use perl scripts to do some of the backup/administration, and put them into the CRON to be periodic. (Linux has oneathem, right?) And, um, the phrase I was searching for was "transactional file system", but you're right, you could use the phrase "Journalling" instead. (At least according to Tanenbaum, author of "Minix." But then again, he ain't no Al Tobey.) Seriously - Thanks. Unless anyone objects, I'm going to quote some of you and stick you at the appendix of my report. :-) regards, Matt H. From matthew_heusser at mcgraw-hill.com Tue Oct 9 07:06:26 2001 From: matthew_heusser at mcgraw-hill.com (matthew_heusser@mcgraw-hill.com) Date: Wed Aug 4 00:01:17 2004 Subject: MajorDomo Message-ID: <85256AE0.0042B18F.00@corpnj148ls01.mcgraw-hill.com> So, here's the theoretical question of the hour: You walk into a meeting about List-Serv management software, someone throws out the number "We will pay around $1,500." You respond "uh, dude, MajorDomo is Free." Which begs the questions "But we'd have to recompile it and port it to Win32." "Well, uh, like, no. For $1,500 we could buy a linux box and just like, run it and stuff. For $750 we could do that." "But we don't do UNIX." "So we could rent space somewhere for $10/month and run a list-serv." "We don't do off-site hosting." "But it's not web-hosting, and other division out-source thier list-services; that's why we need to get our own." "We'd have to re-compile it." "Well, uh, no, MajorDomo is a Perl script." "If it's open source, we'd have to look at the code." "Well, not really, and it's in Perl, which is one of our core competencies." "MajorDomo has security holes." "Really? It's been around for years. Everybody uses it. A search on Yahoo for 'Security Majordomo' lists a few common problems, but those are really just problems for administrators that fail to perform due diligence. Majordomo is relatively secure" "No. It's got all kinds of security holes." "Is this proof by repeated assertion, or just Fear, Uncertainty, and Doubt? (FUD) - I can never tell those two apart ... " -- So, all that said, hypthetically, where would you take this discussion? (And keep it civil ...) regards, Matt H. From albert.tobey at priority-health.com Wed Oct 10 16:11:16 2001 From: albert.tobey at priority-health.com (Albert P Tobey) Date: Wed Aug 4 00:01:17 2004 Subject: MajorDomo In-Reply-To: <85256AE0.0042B18F.00@corpnj148ls01.mcgraw-hill.com> References: <85256AE0.0042B18F.00@corpnj148ls01.mcgraw-hill.com> Message-ID: <1002748276.14567.363.camel@linuxws1> On Tue, 2001-10-09 at 08:06, matthew_heusser@mcgraw-hill.com wrote: > > So, here's the theoretical question of the hour: > > You walk into a meeting about List-Serv management software, > someone throws out the number "We will pay around $1,500." > > You respond "uh, dude, MajorDomo is Free." > > Which begs the questions "But we'd have to recompile it and > port it to Win32." > > "Well, uh, like, no. For $1,500 we could buy a linux box and just > like, run it and stuff. For $750 we could do that." > > "But we don't do UNIX." > > "So we could rent space somewhere for $10/month and run a list-serv." > > "We don't do off-site hosting." > > "But it's not web-hosting, and other division out-source thier list-services; > that's why we need to get our own." > > "We'd have to re-compile it." > > "Well, uh, no, MajorDomo is a Perl script." > > "If it's open source, we'd have to look at the code." > > "Well, not really, and it's in Perl, which is one of our core competencies." > > "MajorDomo has security holes." > > "Really? It's been around for years. Everybody uses it. A search on > Yahoo for 'Security Majordomo' lists a few common problems, but those > are really just problems for administrators that fail to perform due diligence. > Majordomo is relatively secure" > > "No. It's got all kinds of security holes." > > "Is this proof by repeated assertion, or just Fear, Uncertainty, and Doubt? > (FUD) - I can never tell those two apart ... " > > -- So, all that said, hypthetically, where would you take this discussion? > (And keep it civil ...) > > regards, > > Matt H. > Quote Bruce Schneider or any other well-known security expert - they all agree that open standards and open source are, by design, more secure because security problems can be quickly spotted and fixed by the community. Also, see my signature at the end of this message. Its a quote from a whitepaper that Microsoft published - This is the full context, but Microsoft just didn't seem to 'get it' when they published the paper. Also, most of the security holes they're referring to (you might also call them on it and ask which holes they're talking about) are old sendmail hacks that, also, have been fixed for ages. Make sure they can cite specific examples of holes from Bugtraq or the like. Another important point to make is that any other mailing list software out there is going to seem rare compared to Majordomo. The fact that Majordomo is ubiquitous makes it more prone to having bugs discovered (and subsequently fixed) than any of the other software available. How can I find security holes in company XXXYYYZZZ's software if there aren't any installations in the wild to hack? It is considerably easier to find holes in a list if you're on the list also, and I'd wager that most of the $1500 software based lists have a fairly select member list and don't advertise themselves. But really, my favorite way to solve this dilemma is to scream obscenities and beat people with large cardboard shipping tubes until they see things my way. "My way or ER stay" -Al Tobey -- "Open source" means that anyone can get a copy of the source code. Developers can find security weaknesses very easily with Linux. The same is not true with Microsoft Windows. Microsoft, "What Every Retailer Should Know", February 2001 ******************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the Priority Health Information Services Department at (616) 942-0954. ********************************************************************