This is a document written using ReMarkable, a shorthand syntax for generating HTML.
{ "date" : 200810291521,
"updated" : 201001011938,
"licence" : "cc-by",
"tags" : ["web-dev"]
}
<section>
# Improved Title Case Function for PHP #
<aside>
*Update:* _
Small words (“in”, “and” {&c.|et cetera}) now capitalise after em or en-dash.
</aside>
*John Gruber* originally <made available (//daringfireball.net/2008/05/title_case)> his script to Title Case text, working around the fringe-cases.
From this, a <number of ports (//daringfireball.net/2008/08/title_case_update)> were made of the script of which particularly noteworthy <David Gouch’s Javascript port (//individed.com/code/to-title-case/)> that was smaller, simpler and handled more <fringe cases (//individed.com/code/to-title-case/tests.html)>.
I’ve ported this to PHP and put it to use on this site. My version is based on David Gouch’s Javascript port, ---unlike the <WordPress port (//files.nanovivid.com/wordpress/title-case.php)> which is, frankly, crap---. Ironically, now there’s a <WordPress port (//wordpress.org/extend/plugins/to-title-case/)> that uses my port. The circle is complete! `:P`
¬¬
Code below.
~~~ PHP ~~~>
//original Title Case script © John Gruber <daringfireball.net>
//javascript port © David Gouch <individed.com>
//PHP port of the above by Kroc Camen <camendesign.com>
function titleCase ($title) {
//remove HTML, storing it for later
// HTML elements to ignore | tags | entities
$regx = '/<(code|var)[^>]*>.*?<\/\1>|<[^>]+>|&\S+;/';
preg_match_all ($regx, $title, $html, PREG_OFFSET_CAPTURE);
$title = preg_replace ($regx, '', $title);
//find each word (including punctuation attached)
preg_match_all ('/[\w\p{L}&`\'‘’"“\.@:\/\{\(\[<>_]+-? */u', $title, $m1, PREG_OFFSET_CAPTURE);
foreach ($m1[0] as &$m2) {
//shorthand these- "match" and "index"
list ($m, $i) = $m2;
//correct offsets for multi-byte characters (`PREG_OFFSET_CAPTURE` returns *byte*-offset)
//we fix this by recounting the text before the offset using multi-byte aware `strlen`
$i = mb_strlen (substr ($title, 0, $i), 'UTF-8');
//find words that should always be lowercase…
//(never on the first word, and never if preceded by a colon)
$m = $i>0 && mb_substr ($title, max (0, $i-2), 1, 'UTF-8') !== ':' &&
!preg_match ('/[\x{2014}\x{2013}] ?/u', mb_substr ($title, max (0, $i-2), 2, 'UTF-8')) &&
preg_match ('/^(a(nd?|s|t)?|b(ut|y)|en|for|i[fn]|o[fnr]|t(he|o)|vs?\.?|via)[ \-]/i', $m)
? //…and convert them to lowercase
mb_strtolower ($m, 'UTF-8')
//else: brackets and other wrappers
: ( preg_match ('/[\'"_{(\[‘“]/u', mb_substr ($title, max (0, $i-1), 3, 'UTF-8'))
? //convert first letter within wrapper to uppercase
mb_substr ($m, 0, 1, 'UTF-8').
mb_strtoupper (mb_substr ($m, 1, 1, 'UTF-8'), 'UTF-8').
mb_substr ($m, 2, mb_strlen ($m, 'UTF-8')-2, 'UTF-8')
//else: do not uppercase these cases
: ( preg_match ('/[\])}]/', mb_substr ($title, max (0, $i-1), 3, 'UTF-8')) ||
preg_match ('/[A-Z]+|&|\w+[._]\w+/u', mb_substr ($m, 1, mb_strlen ($m, 'UTF-8')-1, 'UTF-8'))
? $m
//if all else fails, then no more fringe-cases; uppercase the word
: mb_strtoupper (mb_substr ($m, 0, 1, 'UTF-8'), 'UTF-8').
mb_substr ($m, 1, mb_strlen ($m, 'UTF-8'), 'UTF-8')
));
//resplice the title with the change (`substr_replace` is not multi-byte aware)
$title = mb_substr ($title, 0, $i, 'UTF-8').$m.
mb_substr ($title, $i+mb_strlen ($m, 'UTF-8'), mb_strlen ($title, 'UTF-8'), 'UTF-8')
;
}
//restore the HTML
foreach ($html[0] as &$tag) $title = substr_replace ($title, $tag[0], $tag[1], 0);
return $title;
}
<~~~
Anything broken, please let me know. _
Kind regards,
</section>