www.webkitz.com www.webkitz.com

Application Servers

Application ServersApplication servers for Web publishing are generally systems that let you write database-backed Web pages in Java.

The first problem with this idea is that Java, because it must be compiled, is usually a bad choice of programming language for Web services (see my book chapter on server-side programming).

The second problem with this idea is that, if what you really want to do is write some Java code that talks to data in your database, you can execute Java right inside of your RDBMS (Oracle 8.1, Informix 9.x). Java executing inside the database server's process is always going to have faster access to table data than Java running as a client. In fact, at least with Oracle on a Unix box, you could bind Port 80 to a program that would call a Java program running in the Oracle RDBMS. You don't even need a Web server, much less an application server. It is possible that you'll get higher performance and easier development by adding a thin-layer Web server like AOLserver or Microsoft's IIS/ASP, but certainly you can't get higher reliability by adding a bunch of extra programs and computers to a system that need only rely on one program and one computer.

This document works through some of these issues in greater detail, pointing out the grievous flaws in Netscape Application Server (formerly "Kiva") and explaining the situations in which Oracle Application Server is useful.
There is no scalability problem
My friend Jin and I spent some spare evenings building http://www.scorecard.org for the Environmental Defense Fund. When a user types in his zip code, the server shows him a map of the factories near his house. Clicking on a factory will list the chemicals released. Clicking on a chemical will list its health effects. The site was featured on ABC World News, in Newsweek, in the New York Times, on CNN, and was a Yahoo Pick of the Week. Every single page on the site is generated on-the-fly by querying a relational database management system (RDBMS). Some pages require five SQL queries. Each page requires at least one. The site gets about 30 requests/second at peaks (on days when traffic is over 500,000 hits). There are only a handful of sites on the Internet that serve a larger number of db-backed pages.

Our hardware for this monstrously popular site? A Sun Microsystems SPARC Ultra 2 pizza box Unix machine, built in 1996. Its dual 167-MHz CPUs would be laughed at by the average Quake-playing 10-year-old. The CPUs sit idle 80% of the time. The disks sit idle most of the time, partly because I spent $4,000 on enough RAM to hold the entire 750 MB data set. Oh yes, the machine also serves a few hundred thousand hits/day for other customers of arsdigita.com and runs the street cleaning and birthday reminder services that we built.

If we tarred up the site and moved it to a mid-range Unix server such as the HP K460 that sits behind http://www.photo.net, we could probably serve at least 5 million hits/day. If we moved it to the highest-end HP server, I'd bet that we could get close to the 100-million hit/day mark that sites like Yahoo serve.

Why has "scalable" become the buzzword du jour? People get burned because they do stupid things. They connect their Web server to their RDBMS via CGI, thus forcing the machine to work 10-20 times as hard for no good reason. They run Windows NT. They run some unproven junkware/middleware that came in an attractive box. Services get wedged and they run out and buy another dozen (or thousand, as with www.microsoft.com) physical computer systems. Now that they have a whole machine room full of hardware, they know that they can't keep it all running simultaneously so they look for software to yoke it all together somehow such that the death of one machine won't be noticed.

How do my friends and I avoid scalability problems? We know that we're stupid. We run the Oracle 8 RDBMS like the rest of the world and don't try to figure out if some new competitor's hype has any relationship to reality. We talk to the RDBMS via AOLserver, which has been doing connection pooling from a Tcl API since 1995. So we get the safety and software develop ease of Perl/CGI but the computer never has to fork a CGI process and the database connections are shared among the scripts. We've served roughly 1 billion hits with AOLserver so we're pretty sure that it works. Linux and NT get magazine writers excited, but we run the same commercial versions of Unix on which the Fortune 500 relies for its enterprise computing.