Linux WWW-HOWTO by Peter Dreuw, pdreuw@wing.gun.de v0.7.6, 6 October 1996 This document contains information about setting up WWW services under Linux (both server and client) and how to maintain them. It tries not to be a in detail manual but an overview and a good pointer to further information. 1. Introduction Many people are steping into Linux 'cause they are looking for a really good internet capable platform. Others use Linux for the fun installing a free OS on their system. Some of those want to get in touch with the internet, of course. Furthermore, there are institutes, universities and other mostly not-for-profit organisations which want to or need to set up internet sites on small expenses. This is, where the WWW HowTo comes in. This document tries to explain how to set up clients and servers for the (in my mind) largest online part of the net - The World Wide Web. 1.1. Copyright This document is Copyright (c) 1996 by Peter Dreuw. Please copy and distribute it widely, but do not modify the text or omit my name. If you sell this HOWTO on a CD, in a book or on another media, I would really like to have a copy for reference. Trademarks are owned by there respective owners. 1.2. Disclaimer This document is meant as an introduction into WWW techniques used or usable on Linux. I an not an WWW nor security expert ! I AM NOT RESPONSIBLE FOR ANY DAMAGES INCURRED DUE TO ACTIONS TAKEN BASED ON THE INFORMATION INCLUDED IN THIS DOCUMENT. 1.3. Feedback Any feedback is really welcome. Just mail to pdreuw@wing.gun.de. 1.4. New versions of this Document New versions of this document can be retrieved via anonymous FTP from sunsite.unc.edu under /pub/Linux/docs/HOWTO and almost any friendly Linux ftp mirror site. Furthermore, you can download it via as gzipped tar archive containing a sgml, text, latex and ps version. The html version is directly available under 2. Setting up WWW client software The following chapter is dedicated to the web users. Some hacks and tricks setting up current versions of common web browsers. Please feel free to contact me, if your favorite web browser is not mentioned here. (As this is a really early version of the WWW-HOWTO, most of them are likely not to be listed...) Personally, I prefer the Emacs - W3 browser and Lynx as they have some speed advantages and there is no need to retrieve the complete graphics through my slow speeded dial up line ;) 2.1. Overview Lynx is the smallest Web browser I know and use - but ist has many special features, so don't skip this chapter. Emacs - well there is nothing to say about the Emacs W3 browser, its just Emacs, like the Emacs news reader, the Emacs mail reader etc. pp. Netscape Navigator is the only browser mentioned here, which is capable of this new funny things like JavaScript and these nice tag feature needed tu run Java. Please report if there is any other web browser which can do the one or other. I'd really like to know. There are rumors, that Microsoft is going to port the Internet Explorer to varios Unix platforms - maybe including Linux. If you DO know something more reliable, please drop me a mail. 2.2. Lynx The smallest (?, hm, something around 650 K executable) and maybe fastest Web browser available. It does not eat up much bandwidth nor system resources as it only deals with text displays like any console, terminal or xterm. You don't need any X Window system nor additional megabytes of system memory running this little browser. Furthermore, the source code is available, too. 2.2.1. Where to get The latest version is 2.5 and can be retrieved from or from almost any friendly Linux ftp server like ftp://sunsite.unc.edu under /pub/Linux/system/Network/info-systems/www/ or mirror site. Or, take a look at the Lynx enhanced pages for information on using Lynx. 2.2.2. How to install Just retrieve the archive, unpack it, read the README and follow the steps told in the INSTALLATION file. If you don't want a source distribution, you'd maybe retrieve a binary distribution for the Linux on Intel based systems available on sunsite. Lynx compiles and runs on my system without any problems on both Linux 1.2.13 and 2.0.x. 2.2.3. Special features Well, there are. For a complete description, just read the manuals and doc files that come with Lynx. To get a nice glimpse, just type in lynx --help and be impressed. In my humble opinion, the most special feature of Lynx against all other web browsers is the capability for batch mode retrival. One can write a shell script which retrieves a document, file or anything like that via http, ftp, gopher, WAIS, NNTP or file:// - url's and save it to disk. Furthermore, one can fill in data into HTML forms in batch mode by simply redirecting the standard input and using the -post_data option. 2.3. Emacs-W3 There is one sad thing about the Emacs W3 browser ;) If you got GNU Emacs or XEmacs running, you probably got the W3 browser running to. Not much work in this HOWTO. If you feel, that there should be more information about this, please let me know. The Emacs W3 mode is a nearly fully featured web browser system written in the Emacs Lisp system. It mostly deals with text, but can display graphics, too - at least - if you run the emacs under the X Window system. The most recent GNU emacs package is available under , the most recent XEmacs could be retrieved from . 2.4. Netscape Navigator Gold 3.0 Yeah, you made it. The Queen of WWW browsers. Something almost like Emacs is in the world of text editors. Netscape Navigator can do nearly everything (except cooking coffee... but maybe java will do...). But on the other hand, the most memory hungry and resource eating pice of web browser, news reader, mail reader (pop3), mail & news editor I've ever seen. My latest version of the Netscape Navigator Gold (export version) is from 28-Aug-1996 and (c) 1995, 1996 Netscape Communications Corp. (As I live in Europe, I can only get the export version...) 2.4.1. Where to get The first place to get the Netscape Navigator for Linux as binary distribution is on . The second - as these servers are heavily loaded - may be any friendly netscape mirror site. You might as well ask archie about this. Maybe, you'll be happy and find it on a cd rom - this will save some bandwidth as the archive is quite large ( 2.5 MB). 2.4.2. Unpacking & Installing Unpack the archive und read the README file ! There is really nothing strange about this, you know. 2.4.3. Java applets with the navigator There are some reports telling that there are problems running java applets with the Netscape Navigator Gold 3.0 even if java is activated in the otions dialog. The archive known to me contained a file java_30 which must be renamed to java_30.zip. After this, any java applet should work fine within the netscape environment. If you continue to have problems using java applets like Netscape Navigator hangs or just terminates after downloading a java applet, take a look at your libc version. Just do a ldconfig -v | less (maybe, you have to be root doing so...) and watch out for an entry libc.so.5 => libc.so.5.xx.yy where your libc version is 5.xx.yy. If your libc isn't 5.2.18, this may be the problem. There are many reports for Linux 1.2.13 systems, that they should upgrade to libc 5.2.18 when the need to run Netscape Navigator in general. Additionally, it may be a good idea to downgrade your libc from 5.3.xx to the 5.2.18 if you run Netscape Navigator and a Linux 2.0.x kernel. (In fact, the libc 5.3.xx series is for beta testing purposes, so you should know what you're doing.) Some of the 5.3.xx series break Netscape Navigator and the Java classes code. For more information on Java on Linux or Java programming, please read the JAVA-HOWTO or visit . 3. Setting up WWW server systems This section contains information on different http server software packages and additional server side tools like script languages for CGI programs etc. For a technical description on the http mechanism, take a look at the RFC documents menitoned in the chapter "For further reading" of this HOWTO. 3.1. cern httpd As the cern original httpd server is reported to have some ugly bugs, to be quite slow and resource hungry, it is not described in this HOWTO by now. If you volunteer to admit some facts or chapters, please send them to me, I'll add them to this doc. 3.2. apache -To be written - sorry Features, Overview, Advantages 3.2.1. Where to get 3.2.2. Installing 3.2.3. Configuring 3.2.4. Special Features Apache httpd has got some special features in the actual version. 3.2.4.1. Host multicasting BlaBla??? how to setup .... 3.2.4.2. Module system how to include other modules ... where to get infos about module programming ... 3.3. CGI scripts systems - to be written - sorry - CGI (common gateway interface) 3.3.1. How does CGI work in principle ? - to be written - sorry - calling structure, http structure, program parameter format (slightly touched), things to keep in mind 3.3.2. Perl - to be written - sorry - something easy in perl (sample script) 3.3.3. PHP/FI - to be written - sorry - something easy in PHP/FI (sample script) 3.3.4. W3-mSQL - to be written - sorry - something even more easy (sample script) hint about setting up !!! 3.3.5. some useful scripts - to be written - sorry -FaxInbound to nice Table including php/fi script and shell script 4. Maintaining a WWW site or some Web Pages If you have to maintain a web site or if you maintain at least a web page, you have to think about your offer to the network and you have to spend some thoughts about approaching the reader / user of your web pages. 4.1. The mainstream: HTML technical Well, I'm not gonna tell you, how HTML is encoded an how you have to design your pages. I'll just give you some pointers where you can find more advanced information. You should take a look at for the latest HTML language specification. Take a look at the list at the end of this article, you'll find more hints, where to read on. 4.2. Some thoughts about bandwidth Many users connect to the internet via slow speed modem lines. A speed range from 14,400 bps to 28,800 bps is state-of-art for "private sites". In europe, there are ISDN systems growing, but a speed of 64,000 bps isn't that more fast in comparison to - let's keep it simple - 10,000,000 bps ethernet. And 10 Mbps ethernet isn't really a high speed LAN connection nowadays. As you realize that many users don't have this fast access to the net, you should keep in mind to put up the relation between information and bytes. Optimize it at 1:1 - if you can. You may use graphics in your web pages following the multi media trend, but always remember the goals of your page and of the graphic you're going to put in. If most of your users are connected via a small modem line and the graphic severes only for estethic reasons or some eye-catching effects, you'd better bann it from your pages, or -at least- rerender it to the smallest possible file size and use best compression. Your users will like it. Always remember, nobody really likes an eye-catcher, that comes up about 3-5 minutes after the text message. 4.3. Some thoughts about server load On a web server, there is normally at least one server task running. If this task reads a request from a http client, it duplicates itself (on Linux it's called forking) and the new copy serves the request, while the original keeps listening for new requests. After finishing the request, the copy terminates. (In fact, some servers - like the apache - always keep up a default of five ready waiting server copies for requests parallel to the master incarnation for speed reasons.) Some web browsers like the Netscape Navigator series do many requests parallel on the same server, which increases the server load spend on the same user. These browsers e.g. retrieve the HTML page and parse them while retrieving and issue new requests for other information like the embedded graphics, applet files, sound files or any other additional mime-encoded data. In opposition, 'simple' browsers request and retrieve one file after another, which keeps the server per user load relation as low as possible. Many users prefer browsers that use the multi request technique like the Netscape Navigator, because they bring up a more complete overview on the requested page before the single request browser does. This is in my opinion because many page designers do stick on embedding the information into the graphics, denying the text-only browsers. So, we - as server maintainers - got the problem, that most of the users cast multiple requests on out server within the same page retrival. We can limit this by limiting the server software not to serve more requests than "x" from the same requesting system at the same time. But how to get this "x" ? It's not easy to calculate and a lot of personal expirience on your site is necessary to depict it. But I'll give you some hints. We have to take our connection bandwidth into account, our server memory size, some feeling about our servers cpu/disk performance and ... well, that's enough for the first glimpse. You should take a look at the memory usage a single server task has. Then think, how many of them could kept in memory at all. Think, how many per cents of your web pages could remain in your servers disk cache. Optimize the count of web server tasks against the disk cache size and you're really near to your personal "x". Furthermore, you can put in other jobs the server got. E.g. if your system also serves for ftp, you might limit the maximum possible connections to keep up some minimum room for the ftp server task. If your web server also does some database services, you'd better keep up some cpu cycles and also shrink your "x". Play somewhat around with these values and test them. And (!) read the following chapter about CGI scripting, which also takes server performance and - depending on the CGI jobs - amount of memory. 4.4. CGI vs. Applet / Client side script - to be written - sorry - overview ond advantage/disadvantage and hints when to use which. 4.5. Style ideas Uh, a really difficult theme for beeing on a short sentence. I don't try to mix up your genious design ideas. Nor I'm gonna put you into my personal design strategies. I'd just like to add one or two statements to the above ideas on server load and bandwidth. Numerous research on human behavior on user interfaces and on-screen presentation have brought out interesting results. There are some simple facts one should keep in mind designing WWW pages. · Keep text in short blocks. This HOWTO is ugly to read on screen, but nice to read in paper print. (Try it yourself!) Human beeings often have difficulties to read lengthy text printed on screen. They loose their point in the sentence; their concentration suffers. · Don't mix up graphics and text blocks. This is a good-looking but ugly-to-read feature. You can spread Headlines, eyecatchers but, please, don't mix up block text with graphics. Behaviorists found out, that human are much more attracted by graphis on screen than by text. People find it more easy to realize a graphic on screen than on paper, in opposition to text which is more easy to "see and decode" on paper than screen. Did you know this ? If you'd like to get more information on that, search for GUI style guides and ergonomy research results done by many universities and software companies (including MS). 4.6. HTML editors under Linux Hm, there are some. In fact, there are reported to be many. But as I already shot my shoot, I didn't test them all. But I am really curiosly looking forward to read the reports you're gonna mail. 4.6.1. vi, vim vi and vim are perfectly usable for writing HTML code... (don't flame me on that) because HTML code only uses ASCII text chars. I don't want to give stuff for another editor war. Those who know vi/vim and use it daily can use it for HTML code either. You can make vi/vim help you developing HTML code by doing some macros for vi/vim. But as this is no VI-HOWTO, I'll leave this fact alone here. Just take it, that it is possible to use vi/vim for HTML editing (at least for some short changes). If you already know how to program vi/vim, you'll certainly know how to abstract for HTML either. If you don't do so, well, don't care. 4.6.2. emacs & XEmacs - to be written - sorry - 4.6.3. asWedit - to be written - sorry - 4.6.4. other pointers Ah, there was some reference for a package named phoenix, based on tkWWW, but I was not able to get them running on my system. I think, it was a problem with my tcl/tk versions but you'll never know. I didn't spend much time around with them, so, maybe they'll run on your system both. Just go'n ask archie. Maybe, you can drop me a mail, if you are sucessful. If you miss your faivorite HTML editor here, just write a mail to me. Maybe, I'll add some pointers to web pages about HTML editors for Linux to. Just send me some nice URL's. 4.7. Graphics Thoughts, Ideas, Hints ? Well, you may read the comp.graphics newsgroup. And, you can visit . 4.7.1. Format gif GIF (Graphics Interchange Format) was introduced 1987 by Compuserve, Inc. an revised 1989. It uses a LZ algorythm, which underlies U.S. copyright or patent law. So there might exist some legal problems using this graphics format in the internet - despite the fact that nearly anybody does. Gif is a good format for small pictures with simple structured graphics like computer graphics or banners. Gif has some advantages as it is one of the (if not the) widest spread graphic formats in online systems: · offers a good compression · compresses without information loss · has a interlace capability, i.e. pictures could be viewed in full size (with less resolution) before they're retrieved completely. · can hold more than one picture within one file · can hold a small animation in one file · nearly any graphical web browser supports gif · can hold a transparent color · fast decompression system The disadvantages are: · only 256 color pictures possible · license and copyright problems (?) · not ideal file size 4.7.2. Format jpeg The Joint Graphic Experts Group (JPEG) did the design for the jpeg/jpg/jiff graphic format. This format is based on a discrete cosinus transformation (DCT) and a Huffmann encode compression. JPEG works with an significant information loss, which can make your pictures somewhath less colorous or less sharp. Typical compression factor is 1:5 ranging to 1:50. (Above 1:10 anybody is able to see the artefacts risen through the compression/decompression cyle.) JPEG is a good format for photographies, large graphics and really complex pictures. The advantages are: · strong compression, small files and therefor fast download... · any graphical browser knows about jpeg The disadvantages are: · slow compression/decompression · possible information loss 4.7.3. Format png Portable Network Graphics (PNG) - the new format on the net. PNG is favorised by the W3 consortium. For some more special information visit and . Here you'll find a technical specification, some programmers information etc. PNG is a ideal format replacing GIF. The PNG homepage is on . For the users, PNG will have some advantages and some disadvantages. Here they are: For the advantages: · can replace the license loaded GIF - PNG has no license problems · 256 palette system as well as grayscale and true color capability including a transparency element · complex interlace mode where not only different lines are sequenced but a two dimensional serialize system retrieves the picture resulting the user to realise the picture content more early. · fast decompression algorythm is possible · public available description - license free · public available sample code - license free · extensible design For the disadvantages: · not widely spread (Netscape does not support it by now, some plugins do) · not so strong compressing pictures · no final specification ready, in working draft state. PNG is currently supported on Linux through the following programs: ImageMagick (Version >=3.7), GhostScript 4.0, Gimp, PovRay 3.0, the netpbm package. For xv 3.10a there exists an inofficial patch. 4.7.4. Converters - to be written - sorry - netpbm, xv, ghostscript, gimp, ImageMagick, CorelDraw auf Wine :-))) 4.8. Specials There are now many specials beyond the HTML'n'Image range. There are Applets written in Java and JavaScript pages and many things beyond. 4.8.1. Java There is nothing to add about Java in general, just read the java section in the Netscape Navigator chapter of this HOWTO and the overview on Java Applett vs. CGI script in this HOWTO. Then, you can also read the really good and compact Linux JAVA HOWTO. For programming Java, please refer really good books on that. 4.8.2. ActiveX ActiveX is at the time of writing still a Microsoft child. Microsoft claimed, that they would release it to the public domain or at least to release it to a ActiveX consortium. ActiveX has nothing to do with the X Window system nor with XFree. It is derived from the Microsoft and IBM OLE system. After releasing the specs, there should be a Unix port. But, we have to wait till then. Nothing for Linux, yet. 5. FAQ There aren't any frequent asked questions - yet... 6. For further reading · RFC1866 written by T. Berners-Lee and D. Connolly, "Hypertext Markup Language - 2.0", 11/03/1995 · RFC1867 writtenm by E. Nebel and L. Masinter, "Form-based File Upload in HTML", 11/07/1995 · RFC1942 written by D. Raggett, "HTML Tables", 05/15/1996 · RFC1945 by T. Berners-Lee, R. Fielding, H. Nielsen, "Hypertext Transfer Protocol -- HTTP/1.0", 05/17/1996. · RFC1630 by T. Berners-Lee, "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", 06/09/1994 · RFC1959 by T. Howes, M. Smith, "An LDAP URL Format", 06/19/1996 7. Thanks Special thanks to Greg Hankins gregh@cc.gatech.edu for encuraging me to write this work and the fun I had doing it. I'd also like to thank Chris Hendricks, Fido: 2:2433/443@fidonet.org for his engagement in Linux and my personal race to keep at least one nose ahead :-)