AxKit.org [logo curtesy of http://xml.com]
--sep--
Start Navigation
About AxKit
Index
xml.apache.org
Features
Live Sites
Installation
Documentation
Daily Churn
Getting AxKit
License
Download
Mailing List
Contribute
CVS
Support
Bugs
End Navigation
Prev Top Next

The XPathScript API

Along with the code delimiters XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters above to provide a stylesheet language that is as powerful as XSLT, and yet provides all the features of a full programming language (in this case, Perl, but I'm certain that other implementations such as Python or Java would be possible).

Extracting Values

A simple example to get us started, is to use the API to bring in the title from a docbook article. A docbook article title looks like this:

<article>
 <artheader>
  <title>XPathScript - A Viable Alternative to XSLT?</title>
  ...
The XPath expression to retrieve the text in the title element is:
/article/artheader/title/text()
Putting this all together to make this text into the HTML title we get the following XPathScript stylesheet:
<html>
	<head>
		<title><%= findvalue("/article/artheader/title/text()") %></title>
	</head>
	<body>
		This was a DocBook Article. We're only extracting the title for now!
		<p>
		The title was: <%= findvalue("/article/artheader/title/text()") %>
	</body>
</html>

There are lots of features to the expression syntax we used to find that "node", and this syntax is called XPath. This is a W3C standard for finding and matching XML document nodes. The standard is fairly readable and is at http://www.w3.org/TR/xpath alternatively I can recommend Norm Walsh's XPath introduction which covers a slightly older version of the specification, but I didn't notice anything in the article that is missing or different from the current recommendation.

Extracting Nodes

The above example showed us how to extract single values, but what if we have a list of things we wish to extract values from? Here's how we might get a table of contents from docbook article sections:

...
<%
for my $sect1 (findnodes("/article/sect1")) {
	print $sect1->findvalue("title/text()"), "<br>\n";
	for my $sect2 ($sect1->findnodes("sect2")) {
		print " + ", $sect2->findvalue("title/text()"), "<br>\n";
		for my $sect3 ($sect2->findnodes("sect3")) {
			print " + + ", $sect3->findvalue("title/text()"), "<br>\n";
		}
	}
}
%>
...
This gives us a table of contents down to three levels (adding links to the actual part of the document is left as an exercise). The first call to findnodes gives use all sect1 nodes that are children of the root element (article). The XPath expressions following that are relative to the current node. You can see that by the absence of the leading /. Again, XPath is a very interesting query language, and you would be best to visit the XPath specification to learn more.

Note that in the above we don't use the global function findnodes() after finding the sect1 nodes, instead we call the node method findnodes(), which does exactly the same thing, but makes the node you are calling from the context of the XPath expression.

Declarative Templates

The examples up to now have all covered a concept of a single global template with a search/replace type functionality from the source XML document. This is a powerful concept in itself, especially when combined with loops and the ability to change the context of searches. But that style of template is limited in utility to well structured data, rather than processing large documents. In order to ease the processing of documents, XPathScript includes a declarative template processing model too, so that you can simply specify the format for a particular element and let XPathScript do the work for you.

In order to support this method, XPathScript introduces one more API function: apply_templates(). The name is intended to appeal to people already familiar with XSLT. The apply_templates() function takes either a list of start nodes, or an XPath expression (that must result in a node set) and optional context. Starting at the start nodes it traverses the document tree applying the templates defined by the $t hash reference.

First a simple example to introduce this feature. Lets assume for a moment that our source XML file is valid XHTML, and we want to change all anchor links to italics. Here is the very simple XPathScript template that will do that:

<%
$t->{'a'}{pre} = '<i>';
$t->{'a'}{post} = '</i>';
$t->{'a'}{showtag} = 1;
%>
<%= apply_templates() %>
Note that apply_templates() has to be output using <%= %>. That's because apply_templates() actually outputs a string representation of the transformation, it doesn't do the output to the browser for you.

The first thing this example does is sets up a hash reference $t that XPathScript knows about (lets call it magical). The keys of $t are element names (including namespace prefix if we are using namespaces). The hash can have the following sub-keys:

  • pre
  • post
  • showtag
  • testcode
We'll cover testcode in more depth later in The Template Hash, but for now know that it is a place holder for code that allows for more complex templates.

Unlike XSLT's declarative transformation syntax, the keys of $t do not specify XPath match expressions. Instead they are simple element names. This is a trade off of speed of execution over flexibility. Perl hash lookups are extremely quick compared to XPath matching. Luckily because of the testcode option, more complex matches are quite possible with XPathScript.

The simple explanation for now is that pre specifies output to appear before the tag, post specifies output to appear after the tag, and showtag specifies that the tag itself should be output as well as the pre and post values.


Prev Top Next

Printer Friendly
Raw XML