camen design

How to Learn HTML5

I received this email:

Hi Kroc,

I stumbled on your website today and was quite impressed with quite a few different things about it (the design, the tone of your writing, &c.).

The one thing in particular, and something I wanted to question you on, was your reference to HTML5. Something I know nothing about. In the post I came across you talked about a guy named Sam Ruby and how he referred to it as minimalist code. Anyway that’s beside the point. I’m really interested in using no IDs and/or classes, but know nothing about HTML5.

So here’s my question, ... do you recommend any other resources or tutorials for someone like me? I rely very heavily on classes and my stuff is sloppy for the most part. I’d love to convert my site but have no idea where to start.

Any advice or words of wisdom would be greatly appreciated. Thanks in advance.

Joe Holst

Hello Sir,

You only require one thing, and then to do three things in order to succeed in your goals here.

Firstly, you need to get some willpower. You already seem to have that, as you’ve taken interest to email me to ask about where to start. Without the want to write better code, no learning in the world can help!

It should be noted that absolutely nothing I’m doing is undocumented; in that fact none of the code on my website is even special—all it is, is the representation of my personal drive for quality, as I measure it.

I don’t think willpower is going to be a problem for you if you already twig that HTML5 and cleaner code is the direction you might want to go. Some people don’t ever progress beyond writing IE-only junk, and that’s not down to skill level, it’s a willpower problem.

Secondly, you need to do three things:

  1. revise,
  2. revise,
  3. revise.

I proof-read and break my code over and over again, to chip away what niggles me here and there. Getting rid of classes is a one-by-one process, because each class has a use that may be totally unrelated to the other classes in the document. Each class is it’s own problem, some big, some small—some requiring a complete rewrite even.

I’m quite happy to break my entire codebase for a week—rewriting and reorganising everything—just to get one tiny annoyance out of sight and out of mind.

If you proof read your codebase you’ll spot various things that could be better, all of varying difficulty levels, some maybe without a clear solution. Start by picking something that annoys you, that you know could be done better, and fix it… even if it’s just tidying up a comment so that it looks nicer. Polish your code.

That is how I work. I scan through my code and look at it objectively. I think about where I can make it cleaner, tidier, less complicated, better documented, easier for others to understand; anything that catches my eye.

I pick something that I personally feel motivated toward (I have no boss on my personal site, so I only have to fix what I care about; and in that, the quality, through passion, is maintained) and I set about fixing it. Sometimes that’s a big problem—like what I’m working on right now: a Markdown clone, so as to reduce the amount of HTML I’m writing for minor things like abbreviations, links and citations—or sometimes it’s a small thing like shaving a few lines off here and there. I do whatever my heart feels capable of doing that day. I try never strain myself by doing work I’m not interested in—that’s for secular work.

Once I’ve made my fix, I go through the whole process again. Often whilst I’m implementing one fix it unearths other annoyances that I want to solve and I’ll either get distracted and go off and fix those, or wait until I’ve finished the thing I started on and be well set with another task to do afterwards—with a much stronger understanding of the problem to drive me forward with the design issues.

O, the design issues.

I am a slave to my code sometimes. I will not accept a sloppy solution. I pace around a lot. I wrestle with the architectural design of the code in my mind for days. I spend 100 hours writing 29 lines of code. Because to me, good code is not about how l33t your programming knowledge is, good code is about how much you rethink what you’re trying to do. What cohesive statement are you trying to make with your code?

Sometimes your code is not just about trying to solve a problem,
it’s trying to solve a problem using your personality.

But getting back to HTML5

Before you learn HTML5, first learn HTML4. I know that sounds stupid, you may very well think you already know HTML4. In learning HTML5, I first referred to this list. Apart from the depreciated ones, do you know how to use all of them? I found that there were a number of HTML4 tags that I rarely used, that could easily replace sloppy <div>s and classes.

I see this often, and it really annoys me.

<div id="header">
	<h1>Title</h1>
</div>

Elements that are not <div>s can be styled just the same as a <div>. A <h1> is not somehow magically unable to have borders or backgrounds or margins or padding or anything a <div> can.

People seem to get this mindset that only <div>s can be used as boxes.
Here’s a list of elements you can use instead of <div>s.

P, BLOCKQUOTE (with P inside), H1/H2/H3…, ADDRESS, DL/DT/DD, UL/OL/LI, HR

(Note: You can’t put a block level element in another. i.e. you can’t have a H1 inside a P or vice-versa)

All of these can be styled with any effect—meaning that you can get rid of a <div> and/or class, by just referring to the element directly, or by it’s parent.

For example:

<h1>Website Title</h1>
<ol id="menu">
	<li>menu 1</li>
	<li>menu 2</li>
</ol>
<h2>Article Title</h2>

Is better than

<div id="title">Website Title</div>
<div id="menu">
	<div class="menuitem">menu 1</div>
	<div class="menuitem">menu 2</div>
</div>
<div class="title2">Article title</div>

And for that matter, why would have a class for a menu item, when the menu is perfectly identifiable? Even in this bad example, you could still get rid of the menuitem class, and just refer to them with #menu div {...}.

An <ol> makes a perfect menu. It is after all, an Ordered List. Get to know what each of the element names means, how you would think of that in a standard word processor document, and then how that can be applied to your site, imagining your site as a word processor document without any CSS. Your menu would be a table of contents of sorts, and therefore would obviously be an Ordered List.

Here’s a different example I helped someone with:

<div id="leftcol">
	<h2 class="Blue">Recent Project</h2>

</div>
<div id="rightcol">
	<h2 class="Green">News Updates</h2>

</div>

He wanted the headings on the left blue, and the headings on the right, green. Which is fair enough.
However, could you not just select the column, and do away with the need for the classes?

/* as a rough example */
#leftcol h2	{color: blue;}
#rightcol h2	{color: green;}

Knowing, and using more elements, instead of resorting to <div>s all the time, allows you to use CSS to select those elements more widely as well as specifically. Here, both of these are <h2> s. Now if they were <div>s, we would have to use classes, because there would likely be yet more <div>s in the columns and you couldn’t say #leftcol div without turning many things blue or green instead of just what you wanted.

Getting rid of <span>s with classes requires knowing the meaning of the many inline elements. Google them.

ABBR, ACRONYM, BIG, CITE, CODE, DEL, DFN, EM, INS, KBD, Q, SAMP, SMALL, STRONG, SUB, SUP, VAR

Think of these elements outside of the browser, on a printed piece of paper. You can then bombard them with CSS to make them look like anything—even if they don’t look anything like what they’re supposed represent—but their use will be semantically sound, having the right meaning in your website.


Learn the selectors.

If you know the selectors and the the tags well enough, then you only need a class (or ID) when you cannot differentiate two elements from each other with the browser you are supporting. Since I’m using CSS3 and not supporting IE at all, I don’t need any classes because I’ve made the right choice of tags, and can differentiate all of them with the right CSS selectors.

If you want clean code using few, if any, classes then right away ditch IE6. Stop supporting it, tell people to upgrade. Without + and > selectors, IE6 is too frightened to go anywhere that isn’t within sight of a class or ID.

IE7 does support + and >, and whilst it lacks in many other areas, it has the necessary basics to write good HTML/CSS. Check what your targeted browsers support.


Once you have made a decent HTML4 site, then you will look at the HTML5 specification, and it will make sense—you will know what to do with it.


Kind regards,

Improved Title Case Function for PHP

John Gruber originally made available his script to Title Case text, working around the fringe-cases.

From this, a number of ports were made of the script of which particularly noteworthy David Gouch’s Javascript port that was smaller, simpler and handled more fringe cases.

I’ve ported this to PHP and put it to use on this site. My version is based on David Gouch’s Javascript port, unlike the WordPress port which is, frankly, crap.

Code below.

//original Title Case script © John Gruber <daringfireball.net>
//javascript port © David Gouch <individed.com>
//PHP port of the above by Kroc Camen <camendesign.com>

//this is required for PHP to not break unicode characters in your titles when using `strtolower`/`strtoupper`
//you can place this near the top of your script, or within the function itself
mb_internal_encoding ("UTF-8");

function titleCase ($s_title) {
	//remove HTML, storing it for later
	//         html elements to ignore  | tags  | entities
	$regex = '/<(code|var)[^>]*>.*?<\/\1>|<[^>]+>|&[^\w]+;/';
	preg_match_all ($regex, $s_title, $html, PREG_OFFSET_CAPTURE);
	$result = preg_replace ($regex, '', $s_title);
	
	//break by punctuation, find the start of words
	preg_match_all ('/[\w&`\'‘’"“\.@:\/\{\(\[<>_]+-? */', $result, $matches, PREG_OFFSET_CAPTURE);
	foreach ($matches[0] as &$m) {
		//find words that should be lowercase
		if ($m[1]>0 && mb_substr ($result, $m[1]-2, 1) !== ':' && preg_match (
			'/^(a(nd?|s|t)?|b(ut|y)|en|for|i[fn]|o[fnr]|t(he|o)|vs?\.?|via)[ \-]/i', $m[0]
		)) {
			$m[0] = mb_strtolower ($m[0]);
			
		//brackets and other wrappers
		} elseif (preg_match ('/[\'"_{(\[]/', mb_substr ($result, $m[1], 3))) {
			$m[0] = mb_substr ($m[0], 0, 1).mb_strtoupper (mb_substr ($m[0], 1, 1)).
				mb_substr ($m[0], 2)
			;
			
		//both of these cases are no change, thus if not matched fall back to capitalisation
		} elseif (!(
			preg_match ('/[A-Z]+|&|[\w]+[._][\w]+/', mb_substr ($m[0], 1)) ||
			preg_match ('/[\])}]/', mb_substr ($result, $m[1]-1, 3))
		)) {
			$m[0] = mb_strtoupper (mb_substr ($m[0], 0, 1)).mb_substr ($m[0], 1);
		}
		//substitute the change into the title
		$result = substr_replace ($result, $m[0], $m[1], strlen ($m[0]));
	}
	//restore the HTML
	foreach ($html[0] as $tag) $result = substr_replace ($result, $tag[0], $tag[1], 0);
	return $result;
}

Anything broken, please let me know.
Kind regards,

Gosh, the stuff you can do in FF3.1 is sick. Rotated, skewed, blurred and colour-adjusted <video> all with no slow down, using standards—

So sick, I don’t know how to make use of all this new technology without going back to 1996 and blinging everything to the max.

I should not have to type “http://” into any web form, ever. This is a fundamental usability flaw errant across the web.

How to Use <abbr> in HTML5, and in General

Before I begin, I should profess that I am completely accountable for having never followed any of these rules in the past. However, the whole reason for writing this article was to solve that problem. Since moving to my new website back-end, I decided to go through the entire site’s contents with a fine brush and polish all of the code.

In doing that, I discovered how vague I was on the semantics of the abbr element, and working through all the test-cases that have sprung up in the wealth of HTML I've written for this site, I’ve documented here my new understanding of the often-abused abbr element.

Ⅰ. Abbreviations Are Not Dictionary Definitions

Let’s first define abbreviation clearly:

An abbreviation is where you have shortened one or more words into:
either one word, or an alternative phrase or acronym

The problem with the use of <abbr> so far, has been that developers have assumed that every abbreviation and acronym has had to be defined in full. This is incorrect.

BAD:	I made some <abbr title="American Standard Code for Information Interchange">ASCII</abbr> art.

An abbr element expands its contents into the desired spoken form. When you read a document, you naturally expand the abbreviations as appropriate in your mind.

GOOD:	Red <abbr title="versus">vs.</abbr> Blue

You would not read out aloud the abbreviated “et cetera” in “Granny went to the market and bought apples, bread & milk etc.” as “eee-tee-see”? So as it should be with HTML abbreviations. Here are some examples:

BAD:	My <abbr title="Cascading Style Sheets">CSS</abbr> is tweaked almost daily.
GOOD:	My <abbr title="style sheet">CSS</abbr> is tweaked almost daily.

Here we’ve used the abbr element to span over an abbreviation and provide an alternative, natural way of reading the abbreviation.

GOOD:	price <abbr title="does not equal">!=</abbr> <abbr title="total cost of ownership">TCO</abbr>

We have adapted something unpronounceable as letters into something perfectly readable.

In general, abbreviations should maintain the grammar. Whilst not necessary, this example demonstrates how grammatical flow can be improved, whilst also expanding a Latin abbreviation:

Along the way, open-source has forgotten what it really means (<abbr title="that is,">i.e.</abbr> in real life) to give.

Try and communicate your intentions. If you would personally read something one-way, define the abbreviation how you intend it to be read:

Switch to using the <abbr title="“wizzy-wig”">WYSIWYG</abbr> editor, instead.

In the example below however, there’s an abbreviation CDs inside the abbreviation title:

<abbr title="recordable CDs">CD-Rs</abbr> and <abbr title="recordable DVDs">DVD±Rs</abbr> are susceptible to literal bit-rot.

Isn’t this wrong? No, because remember that the point of abbreviations are to expand one phrase into another. The user is assumed to already know what a CD is, it doesn’t have to be spelt out for them.

This follows neatly into the next point: when and where to expand abbreviations at all…

Ⅱ. the title Attribute Is Optional

Oh man, this is so important. The misuse of the abbr element is because almost everybody is under the assumption that abbr elements must have a title attribute, in fact— it’d seem pointless otherwise!

Your users do not need to know the definition of every single acronym and abbreviated technical term. In fact, they don’t care. They don’t have to know what the V in DVD stands for if they know a DVD when they see one.

Only title abbreviations that you expect people to read as the expanded form in their mind, or out aloud.

An abbr element without a title attribute should be used on any abbreviation / acronym that is written in all-capitals (unless you are providing a spoken alternative, like the WYSIWYG example from earlier), to communicate that the abbreviation is either unpronounceable as a word, or that it is capitalised—not for emphasis—but because each letter has an individual meaning. E.g.

The <abbr>FBI</abbr> are like the British <abbr>MI5</abbr>.

Ⅲ. Citations Are Not Abbreviations

This one is very sneaky and can easily catch you out.

BAD:	The site will be built using <abbr title="Hypertext Pre-Processor">PHP</abbr>.

Firstly, this reads wrong; the abbreviation breaks the grammar. Secondly, remember that abbreviations are to communicate how things should be read, not to define terms.

But thirdly, it is not an abbreviation. It is not a section of the document that has been shortened or re-phrased by the author to fit their chosen grammar. It is not a personal rendering of words. The sentence is referring to a software product. This is a citation.

GOOD:	The site will be built using <cite>PHP</cite>.

Even though a cited name can be an abbreviation of something else, the name seals that abbreviation and turns that name into a real word of sorts (a brand). Names that are already made from an abbreviation, can then even be abbreviated! (since they behave as normal words) For example “Mac OS X” is already an abbreviation of “Macintosh Operating System version Ten”, and people then often abbreviate further, calling it “OS X” or by referring to the version number / name “10.5 / Leopard”.

What Counts as a Citation?

A citation represents the title of a work, where you are referring to it in the context of your sentence, or in passing. A work is defined as an intellectual human creation.

A work can be a book, a poem, a published piece of writing, a piece of art, a website, a song, a film, a TV show, a game &c. and also software.

However this does not include the following: people’s names, the name of a ship or real products in general; such as a packet of crisps, a stereo or computer hardware.

There Are Exceptions

I won’t go into details, but there are exceptions here and there, that mostly lie around the context; whereby you are either referring to the citation itself, or the use of that work in a specific case - particularly with broadly used technologies like HTML, CSS and PHP.

I am referring directly to the <cite>PHP</cite> language/technology.
My website’s <abbr>PHP</abbr> is small.

That said, details like this will boil down to personal taste, and it’ll never really hurt to just stick to using one element or the other for all such instances, regardless.

Ⅳ. Abbreviations Should Be Meek

An abbreviation is merely anything that is read different from how it is written and vice-versa. It does not need to be in your face, Javascript-powered, “intelli-text”.

What if the Reader Does Not Know What a Technical Abbreviation / Acronym Means?

Isn’t the point of an abbr element so that these technical terms can be defined by hovering the mouse over the term?

There’s two valid answers to this:

  1. That’s what the <dfn> element is for, and …
  2. It is not your responsibility to be an encyclopædia.
    Being paranoid about your reader’s abilities is just going to make your life difficult

You, the author, only have to take the responsibility to know your audience and define those terms which you think they won’t know, or that you may be newly introducing to them.

If a user does not know a term, your website is not the only resource in existence where they can then find the definition! The user can easily google the term. In many browsers they can just right-click the word and choose to search the web for it. On a Mac, there’s a system-wide integrated dictionary you can access in a number of ways. There is no end to the ways a user can find out what a term means if they need to.

How to Style Your Abbreviations

The traditional way to style abbreviations is a grey dotted line, like so:

abbr	{border-bottom: 1px solid #666;}

However, this was under the previous model of using abbr as some kind of inline dictionary. Abbreviations are for the benefit of screen readers, search engines and enthusiasts like me. Generally, abbreviations shouldn’t be styled at all.

That said, abbreviations still do provide a useful service by allowing readers to uncover how something should be read. We need a subtle approach that doesn’t fill the user’s screen with grey dotted lines, but at the same time does allow them to discover where you’ve provided reading “hints”.

The method I’m using is to only show the grey-dotted underline when the user’s mouse is within the paragraph containing the abbreviations, so that when the user moves their mouse into the surrounding text, the abbreviations (with titles) will be marked, and the user can hover over them to then see the tooltip.

*:hover>abbr[title]	{border-bottom: 1px dotted #666; cursor: help;}
Update: The above code only works with abbreviations directly within paragraphs, if the abbr element is wrapped in a link or any other kind of tag, the grey dotted line won’t appear until you hover directly over it. The new CSS below fixes this:
(where section is the element/ID containing your blog posts)
/* first, the immediate descendants of the content area are set to highlight abbreviations on hover, but avoiding lists; as I don’t want *all* abbreviations highlighted when you hover on a root list… */
section>*:not(ol):not(ul):not(dl):hover abbr[title],
/* …only when hovering on each list-item */
 p:hover abbr[title], li:hover abbr[title], dl>*:hover abbr[title] {
	border-bottom: 1px dotted #666; cursor: help;
}

I hope this article provides with some practical guidance and enthusiasm.

If you spot any flaws in the HTML of my articles, please do contact me and let me know, I’ve got so many thousands of lines I’m sure to have made mistakes everywhere. Also you’re free to e-mail me if you’ve any questions about this article and using abbr, cite and HTML5 in general.

Special thanks goes to Adam of firsttube.com for reviewing the article whilst it was being prepared and spotting a number of flaws.