Right now, RDFa is starting to appear on my blog’s templates. Already, if you validate my page, it’ll say whether or not it validates as XHTML + RDFa.
What the?
XHTML is simply not as semantically rich as it could be. It describes the kind of information you’re creating, but not the meaning of the information that’s in there.
RDFa stands for Resource Description Framework in Attributes, and it’s a way to explain things semantically just by adding some attributes to the XHTML markup we already know and love. I think it’s awesomely cool, so I’m learning about it through doing it. Here’s a fairly simple primer.
A little grammar lesson for the Web
If you think back to English grammar lessons you’ll remember that concepts can be expressed as having a subject (usually a noun), predicate (verb), and object (usually another noun). The concepts in a longer sentence, or even a whole paragraph, can also be broken up into their component basic ideas, using subjects, predicates, and objects to describe each of those ideas.
Here’s an example. Imagine you read the following sentence on another Web site:
Raena is a chick in Melbourne who loves beer.
It might be marked up like so:
<p>
<a href="http://www.heyraena.com">Raena</a> is a chick in Melbourne who loves beer.
</p>
Being an intelligent human, you’re able to break this whole concept up into those basic ideas. Here they are, in the form of subject-predicate-object:
- This person’s name is Raena
- Raena has a Web site at heyraena.com
- Raena loves beer
- Raena is female
- Raena is in Melbourne
But your Web browser only knows HTML, so the only meaningful information it can get out of this is:
- There is some text and a hyperlink
- The hyperlink leads to http://www.heyraena.com.
That’s not a lot of information.
RDFa changes that — it allows us to give a machine some clues about what the content actually means by defining the subject, predicate and object of each idea. Here’s an example of the above statement, with additional RDFa clues in the markup. For this example, the clues come from a set of definitions called FOAF (Friend of a Friend), used for describing facts about people and their relationships to other things.
<p typeof="foaf:Person">
<a property="foaf:name" rel="foaf:weblog"
href="http://www.heyraena.com">
Raena
</a>
is a
<span property="foaf:gender" content="female">
chick
</span>
in
<span property="foaf:location" content="Melbourne, Australia">
Melbourne
</span>
who loves
<span property="foaf:interest">
beer
</span>.
</p>
In Semantic Web speak, basic ideas are called triples, and a set of definitions is called a vocabulary. If you’re ever reading up about this stuff and it’s all like “blah blah triples blah vocabulary blah blah?” just remember that triples are basic facts, and a vocabulary is a set of defined terms.
This markup allows the machine to extract the following triples. The emphasised bits come from the properties and contents.
- This paragraph is some information about a person
- The person’s name is Raena
- That hyperlink goes to heyraena.com and is related to Raena — it’s Raena’s weblog
- Raena’s gender is female
- Raena is located in Melbourne, Australia
- Raena is interested in beer
Sweet! Now that information is ready to be mashed up into something else.
Ew, look at all those extra span elements. I thought overusing elements was silly.
Yep, but these are informative spans. They tell a machine that the text inside them has a meaning.
Can’t computers already understand language? Google seems pretty smart.
Sure, some systems can guess at what a sentence might mean, but it’s not really easy for computers to do that and they don’t always get it right. They’re more likely to get it right with an unambiguous way to define concepts.
If J. Random Hacker wants to develop an app in her spare time that uses data from the Web, using semantic information will make it easier for her to get at the meaning of that data. Google has about eight gazillion bucks and a pile of geniuses at its disposal, so it can presumably afford to spend a bunch of time and money on developing natural language interpretation. J doesn’t have that luxury.
Hardly anyone is doing this; why bother?
Actually, Yahoo is already indexing RDFa content. And now that RDFa is a fully-fledged recommendation from the W3C, hopefully various companies and developers are going to start feeling more comfortable with implementing RDFa. When that happens, I’ll be ready!
I’ve heard about this thing called microformats; is this it?
Not really, but they do pretty much the same thing. I don’t really feel that microformats are as graceful as RDFa, though; here’s why:
Some of the practices for microformats have resulted in dodgy accessibility and misuse of certain elements, and the attitude towards that kind of rubs me up wrongways. On the one hand the microformats community loves to say “Humans first, machines second.” On the other they thought it was cool to shove horrible time/date formats into the title attribute of the
abbrelement.Using RDFa, you need to add namespaces to show the machine where it can find the vocabulary you plan to use. With microformats, there’s no namespacing and not necessarily any specificity in the names they use, either, so it’s possible to have clashing definitions — see accepted limitations of microformats for a discussion on what this could mean.
Speaking of no namespacing, microformats (eg hCard) use the class attribute to define things. That means if some classes I’m using today wind up in a microformat in the future, and I don’t hear about it, I could be outputting dodgy looking metadata without even realising it. I’m in but I didn’t opt in, and I’m probably cocking it up to boot! That’s a bit rude!
What now?
Since I’m a learn by doing kinda gal, I’m going to try and get into the good habit of adding RDFa (and probably microformats also) whenever I can.
