Unit 4: New Web Technologies: XML, XHTML, CSS
Lesson 2: XHTML
Reference: XHTML: The Clean Code Solution by Peter Wiggin
- What is XHTML?
- Why use XHTML?
- Differences between XHTML and HTML documents
- What is new in XHTML?
What is XHTML?
- XML + HTML = XHTML
- HTML is a markup language described in SGML
- XML is a restricted form of SGML
- XHTML is the reformulation of HTML 4.0 as an application of XML
- It is the W3C's new version of HTML
Why use XHTML?
- Web browsers are bloated with code to handle poorly formed HTML
- As more people use handheld and TV browsers we need smaller, simpler browsers
- since it's an XML application, is designed to be extensible
- It is a stepping stone to future versions of HTML
Differences between XHTML and HTML documents
- All HTML tags and attribute names must be in lowercase
- All attribute values must be quoted
- All tags, including non-empty elements must be terminated
- Elements must nest, not overlap
- Required elements
Lowercase
All HTML tags and attribute names must be in lowercase
HTML:
<BODY BGCOLOR="#ffffff">
XHTML:
<body bgcolor="#ffffff">
Quoted Attribute Values
All attribute values must be quoted
HTML:
<table border=0>...
XHTML:
<table border="0">...
Terminate all tags
All tags, including non-empty elements must be terminated
HTML:
Paragraph 1<p>
Paragraph 2<p>
XHTML:
<p>Paragraph 1</p>
<p>Paragraph 2</p>
Properly Nested Tags
Elements must nest, not overlap
HTML:
<p>here is a bolded <b>word.</p></b>
XHTML:
<p>here is a bolded <b>word.</b></p>
Required elements
These should be in HTML anyway but are required in XHTML
- The
<head>and<body>elements cannot be omitted. - The
<title>element is a required element within the<head>element.
What's New?
- All documents must have a doctype declaration
- Three different DTDs for XHTML 1.0
<html>tag- Processing Instructions
- Empty elements must be terminated
- Attribute value pairs cannot be minimized
Doctype Declaration
- All documents must have a doctype declaration
- At the top before the
<html>tag - Determines which DTD your document will validate against
For example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD
HTML 4.0 Transitional//EN">
Three different DTDs for XHTML 1.0
- Strict
- Transitional
- Frameset
Strict
- Used when you're doing all of your formatting in Cascading Style Sheets (CSS)
- No
<font>and<table>tags to control how the browser displays your documents
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
Transitional
- Used when you need to use presentational markup in your document.
- Most popular because it offers features similar to HTML 4.0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/transitional.dtd">
Frameset
- used when your documents have frames
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/frameset.dtd">
Validating XHTML
- Since the DTD defines what's legal and what isn't, you can validate your document against the definition.
- There are many programs to validate documents, one is the W3C's own validator
- Simplest way to use it is to put a link to http://validator.w3.org/check/referer on your web page
<html> tag
- Must include a new namespace attribute xmlns in the opening HTML tag
- The namespace attribute defines which namespace the document uses
<html xmlns="http://www/w3/org/TR/xhtml1">
- An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names
- Namespaces are new to XML
Processing Instructions
- The PI is optionally the first item in any XML document
- It looks like this
<?xml version="1.0" encoding="UTF-8"?>
- It does two things:
- What version of XML the document is based on
- It declares the character encoding that the document is using
- Rendered in some HTML browsers, so you may want to leave it off if you can, and you can if the document only uses the default character encodings UTF-8 or UTF-16.
Empty elements must be terminated
- An empty element doesn't contain anything
- In XHTML, these tags need to be terminated
- You could add a closing
</br>to the opening<br>. This is valid in XML but it doesn't render properly in all browsers - Instead, XHTML recommends the use of a modified empty element:
<br /> - The space after the element text is not required by XML but helps to make it compatible with current and older browsers
HTML:
<br>
<hr>
<img src="image.gif">
XHTML:
<br />
<hr />
<img src="image.gif" />
Attribute value pairs cannot be minimized
- An attribute is said to be minimized when there is only one value for it
- For example, in
<option value="somevalue" selected>, the attribute "selected" has been minimized.
HTML:
<input type="radio" checked>
<input type="checkbox" checked>
<dl compact>
XHTML:
<input type="radio" checked="checked" />
<input type="checkbox" checked="checked" />
<dl compact="compact">
Converting HTML to XHTML
- These aren't all the differences
- see the XHTML 1.0 Specification
- To automatically convert from HTML to XHTML you can use HTML Tidy