Staff spaces...

Colin Jones

Web Developer

Valid XHTML - How to Lose the Non-SGML Characters

Posted on 05/31/2007 by Colin
comment bubble0 Comments

Here at Plexus, we pride ourselves on our use of Web standards, which lead to greater usability and searchability, in addition to better preparation for future browsers and other Web technology. We proudly display links to validation services at w3.org for our XHTML 1.0 Strict and CSS, and they show that we're writing valid code.

On our sites that allow users to enter content, however, we sometimes encounter validation errors such as:

This page is not Valid XHTML 1.0 Strict
Error Line 60 column 118: non SGML character number 128.

[the problem here was a left quote pasted in from Microsoft Word]

We've looked at writing scripts that convert each "non SGML character" to ones that will pass XHTML validation, but it ends up being very difficult to track down each individual invalid character.

The better solution (assuming you're like us and have been using ISO-8859-1 encoding) is to just change the character encoding UTF-8. That is, drop a META tag in your document template like this:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

And you should be home free, so now you can feel confident about your code still passing XHTML 1.0 Strict validation even though you're letting clients paste Microsoft characters in.

Tagged:  non SGML characters, UTF-8, xhtml

Post a Comment

Name

Comment

192

Space Highlight

Trumpet player

583

I created a site for my trumpet playing and teaching - check it out. I play for weddings and other events!

flickr

SupportSupport

Pay OnlinePay Online

Contact UsContact Us

Ruby on Rails
Ruby on Rails Web Development

Plexus Web Creations builds custom, rich internet applications on the Ruby on Rails web application framework.

ProductCart
ProductCart E-Commerce Developer

Plexus Web Creations provides ProductCart Shopping Cart for e-commerce web sites.