Jump to content.

IFX Group


Web Page Cleaning

  • Do you want your web content to be available for all users no matter what web browser they have?
  • Do you want a more accessible web site for users with disabilities?
  • Do you want your web site to be more available to search engines?

If you answered NO to any of these questions, rest assured that you are in good company with most amateur web site designers and developers.

On the other hand if you are more interested in getting your information into all search engines and allowing more users (customers) to navigate your web site, the following information will help start you down a path to more compatible and more standards compliant web development.

The Tools

The first step is to have the right tools for the job.

Web Browsers

Please note this is plural! Get at least three different web browsers installed on your workstation or have multiple work stations available all looking at the same content when developing your web site. Anyone developing web content with only one web browser is totally missing a huge part of their market and likely creating web sites that look ugly or maybe even unusable to potential customers. In any other setting, an employee that turns potential customers away at the door would be fired. It won't be long for this to be true on the web too.

At the time of this writing, the top five web browsers by volume were Google Chrome, Microsoft Internet Explorer, Mozilla Firefox, Apple Safari and Opera, but don't stop there. If you have access to Mac or Linux, there are additional web browsers available that are all well worth using for compatibility testing.

On a side note, it is very important to know that virtually all of the web browser ranking is based on the user-agent part of every web request sent by the web browser. Microsoft Internet Explorer is the only web browser that does not give the end user the ability easily change the user-agent string while virtually all other web browsers do and sometimes must change their identity to get around ignorant web site owners that require Internet Explorer to access their content. For this reason, Internet Explorer is likely to falsely appear more popular than it is in reality. Don't be fooled.

If you have Microsoft Windows XP or newer and have installed Internet Explorer 8 or newer, according to Microsoft you are officially out of luck for testing for Internet Explorer 6.0 or 7.0 compatibility. Microsoft offers a very large Virtual PC™ image of Windows XP that you can download to use for Internet Explorer compatibility testing, but the images have a time-bomb that forces you to download a fresh copy every 6 months or so. If you find yourself in this situation, it is strongly suggested to get the free VirtualBox program and create an Linux Mint virtual machine on your desktop. Then add the Internet Explorer Virtual Machines collection. The full collection can take in excess of 30 or 40 gigabytes of space on your hard drive because sadly each version of Internet Explorer requires a complete copy of Windows to test. For testing Internet Explorer versions older than 6 inside Linux Mint you can install the IEs4LINUX package and get multiple versions of Internet Explorer (6.*, 5.*, etc.) all running at the same time with Firefox, Chrome and Opera - something no version of Microsoft Windows can do. This method never expires and allows easy regression and compatibility testing for all of your web content.

Error Checking

Anyone writing software knows that it is nearly impossible to develop a useful, bug-free program without a source-level error checker. Most computer programming language compilers report syntax errors and warnings, but this is not so when creating web pages. Anyone with a plain text editor and knowledge of a few HTML tags can create web pages, but some errors are hard to spot until someone complains, if they care enough to complain. This is the sign of lazy (or possibly ignorant) web developers relying on web browsers to be forgiving of developer mistakes. The problem only gets worse when the web developer learns nothing about HTML itself and instead exclusively uses a graphical web editor to do the work.

Every web developer is strongly encouraged to get the HTML Validator for Firefox (after installation when asked, choose SERIAL validation) and then view the source code for your web site in the web browser. This is the one tool I use most when cleaning up web sites designed by others. Only after all of the Errors and Warnings are fixed should you start working on compatibility for different web browsers.

There are some very nice online error checking tools for web pages, but they all suffer from one key problem. They can not check a web page stored only on your workstation. For this reason, a validation tool inside the web browser is much more useful for checking content before it is posted on any web server. But don't let this stop you from using the online validation tools in addition to your local tools. Quite often different checking tools find different kinds of problems. The more you find and fix, the more compatible your web site will be. And along the way you get to learn what to look for even before you test.


The main purpose behind the web is communication. It does not matter if you are a business communicating with your customers or a blogger communicating with your friends. If communication breaks down between your web page and the web browser, your web content has less value.

Mis-communication between the web page and the web browser can sometimes show up in subtle ways and sometimes only on a few web browsers or operating systems.

There are typically three different things that tell the web browser how to show something to the user.

  1. DOCTYPE (the language standard used in the web page)
  2. MIME (the type of data sent by the server)
  3. Charset (the character set used to show the text of the page)

1. If the Doctype (must be the first line of the HTML file) is wrong or you use HTML tag syntax that does not match the Doctype you provide, you can really confuse a web rendering engine. It is safe to stick with HTML 4.01 transitional if you like GUI editing programs for web page development. Definitely avoid mixing XHTML or XML with HTML in the same page no matter what Doctype you use.

2. If your web server does not know the file extension for your web pages, especially if you use something other than .HTM or .HTML, then it may be telling the web browser it is sending plain text rather than something that should be rendered. Some browsers can ignore the MIME type and render the page based on the content anyway, others obey the MIME type and you get something that looks really ugly.

3. If the words or some of the characters on the screen look odd or funky, like they are graphics or a different language, that could point to a wrong or undefined Charset. This is something set either in the HTTP header sent by the web server or in the META tags at the top of each of your HTML pages. When in doubt, always choose the charset used by the workstation that created the text content.

First published 2007-09-11. The last major review or update of this information was on 2013-03-15. Your feedback using the form below helps us correct errors and omissions on this page.