camen design

Under the Hood #5:
New Website-Ish

Welcome to Camen Design v0.2-ish. I’ve replaced the publishing code in the site, leaving the HTML5 & CSS intact. They will be replaced in the next update. I plan to target Firefox 3.1 (and hopefully Safari 4 may be out by then too), allowing me to make use of CSS animation/transitions and border-image.

In fact, because my site has its PHP / HTML / CSS entirely separated, any one can be replaced without touching a line of the other.

On the subject of future-proof CSS, I noted:

A CSS file is such that you can throw it away easily and start again. I could design my website any way I wanted without ever changing the HTML.

Clean and separated HTML/CSS means that parts can be replaced. That’s what future-proof is about – the ability to adapt to changes.

Being bit-rot proof is an entirely different matter!

That’s a different topic for another day though. This article is about the new back-end:

Clean URLs

In the previous version of the site, a PHP script handled the database, spitting out the data as requested. v0.2 is now all static XHTML5 files, ensuring faster load speed and better caching. The publishing script generates pre-gzipped “.xhtml” files for each of the articles, as well as each of the index pages. The home page is “1.xhtml”, the second page “2.xhtml” and so on. Each content-type is a folder, containing another set of numbered files for the pages. e.g. “blog/1.xhtml” and so on.

Mod_Rewrite is used to mask the “.xhtml” extension, so it isn’t required, giving nice looking URLs with no querystring as before. “/?blog&page=2” now becomes “/blog/2”.

The ‘.htaccess’ file I wrote now handles everything dynamic, applying the ‘application/xhtml+xml’ mime-type to the HTML, but falling back to ‘text/html’ for browsers that can’t deal with that.

I’ve opened up my ‘.htaccess’ file so you can view it fully, but a detailed break-down is covered below.

Serving Compressed XHTML5

A big problem with the old code was that a single PHP page was not being cached very well, relying on me manually setting all the HTTP-Headers for the various pages requested. In this new version each page is a separate file, and so Apache and your browser can handle things fine.

FileETag MTime Size
AddDefaultCharset utf-8

This declaration tells Apache to send ETags in HTTP-Headers. The ETag is a unique hash of the file, so that the browser knows when the file has actually changed. Apache sends ETags automatically anyway, but uses the default “MTime INode Size”, which ties the ETag to the file’s storage cluster on the disk. If you were to upload the same file again, despite it’s contents not changing, Apache would send a different ETag in that case.

# .xhtml files are gzipped html5 documents ready to serve
AddType application/xhtml+xml .xhtml
AddEncoding gzip .xhtml

This creates a new file-type “.xhtml”, and serves it as ‘application/xhtml+xml’ by default. Though it is possible to serve HTML5 as ‘text/html’ in Firefox 3 & Safari, the <legend> tag will not work correctly when used inside a <figure> element. This is due to the all-round broken-ness of the <legend> tag in all browsers (caused by pandering to IE’s even more broken implementation).

The publishing script uses the gzencode PHP function when saving the files to zip the contents for bandwidth-savings and fast delivery. The “AddEncoding” declaration applies this to Apache, adding the necessary “Content-Encoding: gzip;HTTP-Header automatically.

# load page 1 by deafult
DirectoryIndex 1.xhtml index.php index.html

The home page is just page 1 of however many pages of the full archive of the site. Therefore “1.xhtml” is set as the default page to go to in a folder so that “/art/” returns “/art/1.xhtml”.

# if the url contains the ".xhtml", show the source code
RewriteCond %{THE_REQUEST} \b([^\.]*[^/])\.xhtml\b
RewriteRule ^ - [T=text/plain,L]

Viewing the HTML source of the pages on this site is an integral part of it’s design, so I wanted to make it very easy to do so. Just click the “html” link at the top of any page. The code above checks if the URL typed into the browser had the “.xhtml” included and if so, keeps the URL as is, but serves it as “text/plain” instead, preventing the browser from rendering the HTML.

# leave the ".xhtml" off (clean urls)
RewriteRule ^([^\.]*[^/])$ /$1.xhtml [L]

This finds any URL that has zero or one subfolder, and no dot in the filename. It then rewrites the URL to append “.xhtml” as the actual file to return. This is so “/blog/hello”, secretly returns the file “/blog/hello.xhtml”.

# although I don’t support IE, I do have to fall back to text/html,
# otherwise it will try and download the page instead of rendering it
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml
RewriteCond %{REQUEST_FILENAME} .*\.xhtml
RewriteRule ^ - [T=text/html,L]

As described, this will check the browser capabilities to see if it does not accept ‘application/xhtml+xml’ and revert to ‘text/html’. If this is not done, IE will try and download the file instead of showing it. In 2008.

Compressed CSS

# "csz" compressed CSS filetype
AddType text/css .csz
AddEncoding gzip .csz

As with the “.xhtml” definition, this creates a “.csz” filetype of mime-type “text/css”, and default gzip (compressed) encoding. The publishing script takes the normal ‘design.css’ file and spins off a compressed copy ‘design.csz’.

# on my localhost, don’t use a cached CSS file
RewriteCond %{DOCUMENT_ROOT} "^/Users/kroc/Sites/Camen Design/upload"
RewriteRule ^design/$ /design/design.css [L]
RewriteRule ^design/$ /design/design.csz [L]

When I’m editing the website on my computer, I’m refreshing constantly to see new CSS changes. This code checks if the webroot is that of my Mac’s localhost and passes the standard ‘.css’ file and stops processing. The next line passes the compressed CSS file in the case the document root match was not made (live server).

Compressed RSS

The publishing script creates a compressed ‘rss.rsz’ file in each folder and on root.

AddType application/rss+xml .rsz
AddEncoding gzip .rsz

RewriteRule ^([a-z0-9-]+/)?rss$ /$1.rsz [L]

This redirects URLs ending in “rss” to the compressed “rsz” file. e.g. “/tweet/rss” becomes “/tweet/rss.rsz”.

Static Publishing

When I mentioned my plans for v0.2, I noted one particular fallacy:

A simple text field is never going to replicate the editing power I have with TextMate. I’ve got no search and replace, no syntax highlighting, no keyboard shortcuts.

Trying to add these things is just re-implementing the wheel, and thus breaks my very own design principle №3, Let Everybody Else Do Their Job

Therefore, I removed the administration interface, in favour of a Laguna 2 (sadly offline now) style system.

The publish script is available to view online, but is not much use out of context. You can download a stub copy of this website with everything necessary to roll your own using the enclosure in your RSS reader, or the attachment at the bottom of this article.

Content on Disk

The source content of this website is just a folder, with a sub folder for each of the “content-types” (blog | tweet | photo &c.). In each folder is a file containing a JSON meta-data header and the raw HTML of the article. This layout directly maps to the new clean URLs too.

camen design v0.2’s data folder layout

Creating a new blog post is nothing more than creating a new file. Because content is now disk files, instead of database entries, I can use my text-editor’s global search and replace and HTML editing capabilities and I can use my O.S. to manage the files instead of having to implement more and more server-side administration pages to do the same thing.

Now I can blog the same way I create the stuff I blog about.

Inside a Content Entry

A content file looks like this inside: (this one is for this article)

{	"date"		:	200807101232,
	"updated"	:	200807101232,
	"title"		:	"Under The Hood #3: ¬Using A Quick &amp; Easy SQLite Database",
	"licence"	:	"cc-by",
	"tags"		:	["code-is-art", "web-dev"],
	"enclosure"	:	"sqlite.php"
}

HTML content goes here...

Pretty self-explanatory. When creating a new article, the “date” and “updated” fields are left out, and the publishing script then adds them in automatically. If I want to mark an article as updated, and thus push it to the top of the RSS regardless of it’s original publishing date, I just delete the “updated” line and the publishing script puts in a new timestamp.


The attached zip file is updated every time I publish, so it always contains the most up to date code.

Under the Hood #4:
Getting a File’s Mime-Type From Apache

There is no sane way to get a file’s mime-type in PHP.
The mime_content_type command is depreciated and not installed by default in PHP5.
The FileInfo PECL extension is not installed by default and can be insanely difficult to install.

Thirdly, you can use a Unix call:

$mime_type = exec ("file -i -b '$file_path'");

But that only works on Linux / BSD / Mac systems and not for Windows users, and it also requires that if you’re on a shared webhost that they allow you to use the exec command. This method also doesn’t always give accurate results.

Lastly of course, you could write a simple function to look up a file-extension and return the appropriate mime-type:

function mimeType ($extension) {
	switch ($extension) {
		case 'gif':	return 'image/gif';			break;
		case 'jpg':	return 'image/jpeg';			break;
		case 'png':	return 'image/png';			break;
		case 'pdf':	return 'application/pdf';		break;
		case 'zip':	return 'application/zip';		break;
		case 'exe':	return 'application/octet-stream';	break;
	}
}

But this is hardly automatic, requires constant maintenance when you comes across new types that enter your system and looks plain ugly in your code. It lacks elegance.


I found a cute solution to this problem through looking at the headers returned by Apache. For every file you access, Apache sets various HTTP Headers, here’s an example from a photo.

Date: Sat, 12 Jul 2008 11:24:33 GMT
Server:	Apache/2
Last-Modified: Tue, 17 Jun 2008 13:34:34 GMT
Etag: "2678b24-5edfe-cdfaae80"
Accept-Ranges: bytes
Content-Length: 388606
Content-Type: image/jpeg

And lo, Apache is returning the mime-type automatically for any file-type. Since PHP has the ability to access remote URLs with many of its functions, we could theoretically ask PHP to ping the local file we want to know the mime-type of and retrieve the relevant information from the HTTP headers.

This is indeed possible, the get_headers function will give the HTTP Headers returned from a URL request as an array (if the second parameter is “1”). This function does not work without a full URL (“http://”), and therefore in order to get the mime-type of a local file, it needs to be in a publicly available location (even if only temporarily). You just prepend your domain name to the file path.

Here’s the final example:

$domain = 'http://camendesign.com';
$file_path = 'data/content-media/photo/DSC00013.jpg';

$url_headers = get_headers ("$domain/$file_path", 1);
$mime_type = reset (explode (';', $url_headers['Content-Type']));

echo ($mime_type);

Which correctly outputs:

image/jpeg

Apache sometimes returns a Content-Type value with the character set appended, e.g. “text/html;charset=UTF-8”, so “explode (';', …)” is used to break this apart, and reset returns the first array element.

Limitations

There are a lot of issues with this practice, which is why it is not used on this site:


Update: Using the ‘Mime.types’ File

Hiếu Hoàng writes:

Debian’s default lighttpd.conf executes a
“/usr/share/lighttpd/create-mime.assign.pl” file to get mime-type assignments.
The “/etc/mime.types” which it uses is from the package mime-support.

This would alleviate writing the look-up function.

Sample output:

$ /usr/share/lighttpd/create-mime.assign.pl
mimetype.assign = (
	".ez" => "application/andrew-inset",
	".anx" => "application/annodex",
	".atom" => "application/atom+xml",
	".atomcat" => "application/atomcat+xml",
	".atomsrv" => "application/atomserv+xml",
	[....]
	".avi" => "video/x-msvideo",
	".movie" => "video/x-sgi-movie",
	".mpv" => "video/x-matroska",
	".ice" => "x-conference/x-cooltalk",
	".sisx" => "x-epoc/x-sisx-app",
	".vrm" => "x-world/x-vrml",
)

This file isn’t included in the Mac OS X PHP distribution, but I’ve copied it below with some small modifications (Output a PHP array, and Mac OS X’s ‘mime.types’ file is in ‘/etc/apache2/’ instead of ‘/etc/’).

#!/usr/bin/perl -w
# (I don’t know Perl, and how to colour this right)
use strict;
open MIMETYPES, "/etc/apache2/mime.types" or exit;
print "\$mime_types = array (\n";
my %extensions;
while(<MIMETYPES>) {
	chomp;
	s/\#.*//;
	next if /^\w*$/;
	if(/^([a-z0-9\/+-.]+)\s+((?:[a-z0-9.+-]+[ ]?)+)$/) {
		foreach(split / /, $2) {
			# mime.types can have same extension for different
			# mime types
			next if $extensions{$_};
			$extensions{$_} = 1;
			print "\".$_\" => \"$1\",\n";
		}
	}
}
print ");\n";

This could then be used to quickly put together a comprehensive (though massive) look-up function. I’ve saved a copy of the output of this script, enclosed at at the bottom of this article.

The ‘mime.types’ file is interesting as there may be a way to write a real small function (probably regex) to pull out a mime-type on request, which would be a far more elegant (and practical) than my own solution. I’ll have to give that some thought.

An Example of Code-Is-Art,
HTML5 + MathML + SVG

This is great. A beautiful hand-typed example of practical web design; Dana Lee Ling writes test papers for his college algebra and statistics courses in XHTML, with SVG graphs and MathML equations.

He also maintains a brilliantly pure resolve to never break a link:

Ted Nelson always objected that the world wide web is not what he envisioned when he coined the term hypertext in 1965. Like Vannevar Bush’s Memex envisioned in 1945, the links would be paths that do not “break.” Some form of massive distributed database would, theoretically, have kept track of material and kept links from breaking when material was “moved” or reorganized. Of course that would not be possible nor desirable.

My own paean, however, to Ted is to never move or rename a page once I put it up - even if I have misspelled the file name. Yes, I could use a meta to redirect, but my simpler solution is to not move a page. Nor reorganize my site. Organic growth unanticipated in 1998 has complicated site management, but that’s life - complicated in places.

My hat goes off to him for having such beautiful code tucked away in places we’d never think to look.
I don’t even have any SVG in my site, and certainly not the brains to use MathML, so please give some attention to the wonderful work within his site

Under the Hood #3:
Using a Quick & Easy SQLite Database

  1. The Database Class
  2. Instantiating
    1. Making Queries
      1. Standard Queries
      2. Executing Without Results
      3. Retuning An Array
      4. Returning A Single Value
      5. Return A Flat Array Of Single Values
      6. Compile A Query For Re-Use
    2. Limitations

When it came to writing this website, the thought of using a MySQL database made me laugh. There is nothing lightweight about MySQL. It is difficult to set up, difficult to maintain, difficult to fix, difficult-everything.

I set myself the goal of having no more than 10 fields in the database. There are 7.

It therefore made sense to use SQLite, a micro database system available to PHP that saves all its data into a file on disk, rather than through a client↔server architecture. It is quick, easy to use, and requires minimal fuss to operate.

To begin with I was using an excellent SQLite wrapper for PHPSQLiteDB”. Whilst this is functionally very good, there were a number of issues I wanted to solve in my design that would mean modifying SQLiteDB heavily, and my website is supposed to be lightweight, using only the lines of code needed, and there was much I wasn’t using in SQLiteDB.

These considerations were:

  1. The database must be able to create itself, without having to connect and run an SQL check every page-load

  2. The database code must only connect to the database when necessary to reduce load.
    Connecting every page-load when the database isn’t used for that page is no good

  3. Very compact code

The Database Class

This is my SQLite database class that allows you to create and interact with SQLite databases with ease:
(I’ll break the code down and give examples of use afterwards)

This code requires PHP’s PDO to be installed and enabled in your environment, which is the default.

class database {
	//query types supported, see the `query` method for descriptions
	const query_standard     = 0;
	const query_array        = 1;
	const query_single       = 2;
	const query_single_array = 3;
	const query_prepare      = 4;
	
	private $filepath;
	private $handle;
	private $sql;
	
	function __construct ($filepath, $sql = '') {
		$this->filepath = $filepath;
		$this->sql      = $sql;
	}
	
	private function connect () {
		//does the database file exist on disk?
		$populate = file_exists ($this->filepath);
		//connect to the database (automatically creates the file on disk if it doesn’t exist)
		$this->handle = new PDO ('sqlite:'.$this->filepath);
		
		//if the database is new, build the tables from the sql originally passed to the class
		if (!$populate) $this->exec ($this->sql);
	}
	
	//execute sql statement(s) without returning a recordset. instead returns true/false for success
	public function exec ($sql) {
		//no connection is made to the database until a query is made
		if (!isset ($this->handle)) $this->connect ();
		return $this->handle->exec ($sql);
	}
	
	public function query ($sql, $mode = self::query_standard) {
		//no connection is made to the database until a query is made
		if (!isset ($this->handle)) $this->connect ();
		
		return 	//return the entire results as an array
			$mode == self::query_array  ? $this->handle->query ($sql)->fetchAll (PDO::FETCH_NUM) : (
			//return just the value of the very first column of the first row
			$mode == self::query_single ? $this->handle->query ($sql)->fetchColumn () : (
			//return a flat array of the first value of each row
			$mode == self::query_single_array
			? $this->handle->query ($sql)->fetchAll (PDO::FETCH_COLUMN) : (
			//compile an sql query for repeat execution
			$mode == self::query_prepare ? $this->handle->prepare ($sql)
			//else: return a standard result set
			: $this->handle->query ($sql, PDO::FETCH_NUM)
		)));
	}
	
	function __destruct () {
		$this->handle = null;
	}
}

Instantiating

To create a database / connect to an existing database and pre-populate it with some SQL if it doesn’t exist, instantiate a copy of the class and provide a filepath to a .sqlite file (in a writeable directory), and some SQL statements (separated by semi-colons).

Here’s how it’s done on this site:

$database = new database (_root.'/data/content.sqlite',
	'CREATE TABLE [content] ('.
		'[when]      INT PRIMARY KEY,'.	//INT instead of INTEGER, disables auto-numbering of primary key
		'[updated]   INTEGER,'.		//last edit timestamp (for the RSS) YYYYMMDDHHMM
		'[title]     TEXT,'.		//html
		'[content]   TEXT,'.		//html
		'[tags]      TEXT,'.		//"|tag|tag|tag|tag|"
		'[enclosure] TEXT'.		//"mime-type;filename;preview_filename"
	');'.
	'CREATE TABLE [tags] ('.
		'[tag] CHAR(20) PRIMARY KEY'.
	');'
);

The $database variable is set to an instance of the database class, and the SQL string containing two CREATE TABLE commands is provided, should the database not already exist on the disk.
See the SQLite website for help on the SQL syntax.

Inside the database class, the filepath and SQL statement are saved for later, no action is taken at all. We do not connect to the database yet, nor create it, until such a task is absolutely required.

function __construct ($filepath, $sql = '') {
	$this->filepath = $filepath;
	$this->sql      = $sql;
}

Making Queries

Standard Queries

This code returns the date, tags and title of 3 blog entries from the website. An object-orientated PDOStatement is returned that you can manipulate, or loop over using a regular foreach.

$rows = $database->query (
	'SELECT [when], [tags], [title] FROM [content] ORDER BY 1 DESC LIMIT 3;',
	database::query_standard
);

print_r ($rows);

foreach ($rows as $row) {
	print_r ($row);
}

Which outputs:

PDOStatement Object
(
    [queryString] => SELECT [when], [tags], [title] FROM [content] ORDER BY 1 DESC LIMIT 3;
)
Array
(
    [0] => 200807081323
    [1] => |code|cc-by|code-is-art|web-dev|
    [2] => Under The Hood #2: <br />Internal / External Links, The CSS3 Way
)
Array
(
    [0] => 200807051806
    [1] => |code|cc-by|code-is-art|web-dev|
    [2] => Under The Hood #1: <br />Is A PNG 32-Bit? In One Line
)
Array
(
    [0] => 200807050142
    [1] => |blog|cc-by|web-dev|
    [2] => Real-World Test Successful
)

Executing Without Results

When you’re doing INSERT, UPDATE or DELETE queries there’s no recordset returned. The exec method allows you to execute an SQL statement, and only return true or false as to whether it succeeded or not. For example, this just empties the content table.

$database->exec ('DELETE FROM [content];');

Returning an Array

By using database::query_array, a normal PHP array will be returned of all the columns and rows. It’s not recommended to do this on large data-sets as it would fill up PHP’s memory and impact performance.

print_r ($database->query (
	'SELECT [when], [tags], [title] FROM [content] ORDER BY 1 DESC LIMIT 3;',
	database::query_array
));

This outputs:

Array
(
    [0] => Array
        (
            [0] => 200807081323
            [1] => |code|cc-by|code-is-art|web-dev|
            [2] => Under The Hood #2: <br />Internal / External Links, The CSS3 Way
        )

    [1] => Array
        (
            [0] => 200807051806
            [1] => |code|cc-by|code-is-art|web-dev|
            [2] => Under The Hood #1: <br />Is A PNG 32-Bit? In One Line
        )

    [2] => Array
        (
            [0] => 200807050142
            [1] => |blog|cc-by|web-dev|
            [2] => Real-World Test Successful
        )

)

Returning a Single Value

If you’re only interested in the very first value in the first row, you can return just that value without it being wrapped in an array or object. This example echoes the title of a particular blog entry.

echo ($database->query ('SELECT [title] FROM [content] WHERE [when]=200806181021;', database::query_single));

And outputs:

Hello.

Return a Flat Array of Single Values

There are instances where you wish to get the values from a single column from a number of rows. For example, I wish to retrieve just a list of the titles from a few blog entries; if I used database::query_array, like this:

print_r ($database->query (
	'SELECT [title] FROM [content] LIMIT 3;', database::query_array
));

It would give me a difficult to use array:

Array
(
    [0] => Array
        (
            [0] => The Real Reason Microsoft About-Faced on IE8 Standards Opt-In
        )

    [1] => Array
        (
            [0] => Will Microsoft please stop pulling the damn strings of A List Apart please. …
        )

    [2] => Array
        (
            [0] => Why don’t cans of paint have the colour in hexadecimal printed on them?
        )

)

What I really need is to flatten this array. The database class has a database::query_single_array type to do this for you.

print_r ($database->query (
	'SELECT [title] FROM [content] LIMIT 3;', database::query_single_array
));

Which now gives:

Array
(
    [0] => The Real Reason Microsoft About-Faced on IE8 Standards Opt-In
    [1] => Will Microsoft please stop pulling the damn strings of A List Apart please. …
    [2] => Why don’t cans of paint have the colour in hexadecimal printed on them?
)

Compile a Query for Re-Use

If you have an SQL statement that you have to execute over and over, but with different values, you can prepare an SQL statement with blanks that you can fill in afterwards each time you execute it.
Refer to the PHP Manual for instructions on how to prepare SQL statements.

Using my class, just call the query method with the statement to prepare, using database::query_prepare as the query type to get a compiled PDOStatement object back.

Limitations

SQLite only supports a subset of what MySQL does, and so it’s best for where you want to keep things simple. If you have serious data to crunch then consider MySQL, but if you just want to store some HTML and get it back, SQLite offers a light and simple way to do it.

On some hosted web-servers, PDO may not be installed or enabled. If not, you can use the built-in procedural SQLite 2 commands in PHP. Here’s my older version of the database class ;) sqlite2.php

Under the Hood #2:
Internal / External Links, the CSS3 Way

Edit #4: Added TinyURL to favicon list.
Edit #3: Added “mailto:” and “itms:” to scheme. Bug fixes.
Edit #2: External icon overriding file icon on external files. Opera fixes.
Edit #1: Updated CSS to prevent Safari preloading all the favicons.

If, unlike me, you don’t have a sixth-sense that means you know when a link will be internal / external, or will open in a new window, it is increasingly common practice to add little images to links to show that they lead to external websites. The benefit is that one can queue up external links in background-tabs, or avoid things not interested in.

For example, external links in Wikipedia:

A screenshot of links on a Wikipedia article. image from

There are a number of things I wish to alert my users to on this website through the link scheme.
Float over each of the example links for a demonstration.

An internal link, to another page on this site
The dotted-line is used to signify a weaker hyperlink that does not break the boundary of this website.
A link to an external website
A normal hyperlink is used to represent the interlinked web, going from one site to another.
The image is not shown until the mouse is placed over the link so as to reduce visual clutter and to not add stutter to the reading rhythm. The image juts out to the left, so as to not cause the text to spasm about and break the reading flow, nor is it placed to the right where it may cover up the next word, and thus also break the reading flow.
Some links to popular websites, and a redirect
Adding the favicon to external links to some sites will help the user recognise what sort of content they will be lead to. In some cases this will better help them decide if the link is useful or a waste of time, and what sort of context is meant when the link text is not descriptive of what it is.
A link to an email address, and an application protocol
Links to other protocols may cause programs on the user’s computer to launch, or may require them to copy & paste the link into a piece of software.
A direct link to a file, rather than a webpage
Links will behave differently when leading directly to a file. Users need to be made aware of this, especially if they want to avoid PDF links, or to proceed with due caution. A link to a file is a lot like an enclosure in an email; it should be distinctly marked with some icon to show its type. As a bonus, if you’re using Firefox, that icon above will be the one from your computer for that filetype.

Beginning With Good Markup

Although all of this can be done without any additional markup than just the href, the CSS would be 10× larger. It could be done with a set of CSS classes, but then this website has no classes, and ultimately these link effects should be zero-maintenance and automatic.

We can reduce the length of selectors needed by using a few HTML attributes that have been around for ages.

<a href="http://…" rel="external" />
The rel attribute defines the relationship between this page and the linked page.
In this case we are stating that the page linked to is external.
<a href="a.pdf" type="application/pdf" />
The same as when specifying a stylesheet or javascript file in the <head>, you can provide the mime-type of the content being linked to.

These are both clean and meaningful ways to markup links in a way that robots can understand, and doesn’t rely upon class names that tie you to your design, and won’t work interchangeably with syndicated content on other people’s sites.

I could type these extra attributes manually as I write my articles, but I knew that I’d miss one or two here and there, and I’d prefer something a bit more automatic.

Automatic Markup With PHP

Here is some code that searches for links starting with “http” and adds “rel="external"” to the tag. Internal links are relative, (e.g. “href="?blog"” and don’t contain my domain name, but the code can be easily modified to look for links that don’t start with your own domain name if your CMS always writes full URLs - even to internal pages)

//add `rel="external"` to outside links:
$content = preg_replace_callback (
	//this finds links that begin with a protocol, e.g. "http"
	'/<a[^>]*href="(?:[a-z]+):[^"]+"[^>]*>/',
	//this does the substitution, either adding a rel attribute, or appending "external" to an existing one
	create_function ('$m',
		'return (strpos($m[0],"rel=\"")!==false)'.		//does 'rel="..."' already exist?
		'?str_replace("rel=\"","rel=\"external ",$m[0])'.	//insert "external" into `rel`
		':str_replace("<a ","<a rel=\"external\" ",$m[0]);'	//add `rel="external"`
	), $content
);

The second example here, is finding links that lead directly to a file, rather than a page:

//add 'type="mime/type"' to links in the content:
$content = preg_replace_callback (
	//this regex finds links to the listed file types, and adds 'type="mime/type"'
	'/<a([^>]*)href="([^"]+)\.(gif|jpg|png|pdf|zip|exe)"([^>]*)>/',
	//this does the insertion, recreating the link, with the added attribute
	create_function ('$m',
		'return "<a type=\"".mimeType($m[3])."\"${m[1]}href=\"${m[2]}.${m[3]}\"${m[4]}>";'
	), $content
);

//the "mimeType" function called above, which returns a mime-type from a file-extension
function mimeType ($extension) {
	switch ($extension) {
		case 'gif':	return 'image/gif';			break;
		case 'jpg':	return 'image/jpeg';			break;
		case 'png':	return 'image/png';			break;
		case 'pdf':	return 'application/pdf';		break;
		case 'zip':	return 'application/zip';		break;
		case 'exe':	return 'application/octet-stream';	break;
	}
}

The CSS

Internal Links

For links that are not external… which is easy now that they are automatically marked up by the PHP.
(A description of the CSS3 selectors used can be found here)

a:not([rel~="external"]) {
	text-decoration: none; border-bottom: dotted 1px;
}

A colour is not given on the border-bottom attribute so as to keep the existing link colour - even the user’s chosen browser link colour if the link colour has not been overridden anywhere.

Links to Files

We will cover these next, as the CSS for external links makes reference to these.

a[type]		{padding: 0 5px 0 25px; text-decoration: none;
		 /* start with the default "unknown file-type" icon */
		 background: #dedede url("/design/icons/page_white.png") no-repeat 5px 50%;
		 /* rounded, borders. the bottom border is removed for if the link is internal */
		 -moz-border-radius: 4px; -webkit-border-radius: 4px; border-bottom: 0 !important;}
a[type]:hover	{background-color: #eea;}

/* these icons © Mark James, <famfamfam.com/lab/icons/silk> */
a[href$=".gif"], a[href$=".jpg"], a[href$=".png"]
		{background-image: url("/design/icons/page_white_picture.png");}
a[href$=".pdf"]	{background-image: url("/design/icons/page_white_acrobat.png");}
a[href$=".zip"]	{background-image: url("/design/icons/page_white_zip.png");}
a[href$=".exe"]	{background-image: url("/design/icons/application_xp_terminal.png");}

/* Firefox users will get their own native icons from their OS.
   I’m sure this can be done in Safari, but I don’t know how */
@-moz-document url-prefix() {
	/* `@moz-document` isolates the following CSS for Firefox (gecko) only */
	/* get the "unknown file-type" icon from the OS */
	a[type]		{background-image: url("moz-icon://.?size=16");}
	/* and the other file type icons */
	a[href$=".gif"]	{background-image: url("moz-icon://.GIF?size=16");}
	a[href$=".jpg"]	{background-image: url("moz-icon://.JPG?size=16");}
	a[href$=".png"]	{background-image: url("moz-icon://.PNG?size=16");}
	a[href$=".pdf"]	{background-image: url("moz-icon://.PDF?size=16");}
	a[href$=".zip"]	{background-image: url("moz-icon://.ZIP?size=16");}
	a[href$=".exe"]	{background-image: url("moz-icon://.EXE?size=16");}
}

External Links

External links already have the underline as part of the defaults.

/* set the default external-link icon (this icon taken from Wikipedia) */
a[rel~="external"]:not([type]) {
	background: url('/design/icons/external.png') no-repeat 0 50%;
}
/* hide the icon when not hovering on the link (whilst keeping the icon on standby)
   `:not([type])` is needed to not break the file-links which already have an image */
a[rel~="external"]:not([type]):not(:hover) {
	background-image: none;
}
/* when you hover over the link, jut the favicon over the left side */
a[rel~="external"]:not([type]):hover {
	/* `background-color` is set to prevent text clashing with heavily transparent favicons, like Google’s */
	margin-left: -18px; padding-left: 18px; background-color: #fcfcfc;
}

/* some favicons for common websites I link to.
   the `:hover` is only required by Safari to prevent it from preloading these */
a[href*="apple."]:hover		{background-image: url('http://apple.com/favicon.ico');}
a[href*="archive.org"]:hover	{background-image: url('http://web.archive.org/favicon.ico');}
a[href*="deviantart."]:hover	{background-image: url('http://i.deviantart.com/icons/favicon.png');}
a[href*="google."]:hover	{background-image: url('http://google.com/favicon.ico');}
a[href*="osnews."]:hover	{background-image: url('http://osnews.com/favicon.ico');}
a[href*="php.net"]:hover	{background-image: url('http://static.php.net/www.php.net/favicon.ico');}
a[href*="slashdot."]:hover	{background-image: url('http://slashdot.org/favicon.ico');}
a[href*="tinyurl."]:hover	{background-image: url('http://tinyurl.com/favicon.ico');}
a[href*="wikipedia."]:hover	{background-image: url('http://en.wikipedia.org/favicon.ico');}
a[href*="youtube."]:hover	{background-image: url('http://s.ytimg.com/yt/favicon-vfl1123.ico');}

/* icons for other protocols */
a[href^="mailto:"]:hover	{background-image: url('/design/icons/email.png');}
a[href^="itms:"]:hover		{background-image: url('/design/icons/itms.png');}

Enjoy.

Limitations

Requires the :not CSS3 selector, available in Firefox, Safari & Opera 9.5.
As you can imagine, this does not work in IE. But then as you know, I don’t care.