[mdx] metadata verification; ingress filtering

Wed May 20 09:08:26 PDT 2009

I mentioned this in another context to the Shibb team earlier in the  
week, but it seems tangentially relevant to aggregation engines so I'm  
reposting here with some modifications.

Historically, we've done a couple of things in our build process:

* metadatatool is used for signing, and it performs the same checks as  
the 1.3 IdP does when you run it on its own output

* one output of the build process is the notorious HTML "stats page",  
generated by a huge and difficult to maintain XSLT stylesheet; this  
was tweaked to include a section at the top that flagged known  
problems; the person doing the signing run needed to eyeball this  
every time

* we run a cron job that uses both metadatatool and siterefresh to  
download the production metadata

Until recently, this collection of hacks has been sufficient to detect  
everything important we've encountered in real life. Then we found  
that a particular typo in a KeyName element in the right place could  
take down all of our 140+ 1.3 SPs and I started thinking about getting  
more serious about verification.

What I've done to address this for now is to migrate the XSLT-based  
checking out of the normal Xalan-driven stats page generation to way  
earlier in the pipeline where we build the metadata to be signed.  I  
still use XSLT, but instead of using command line Xalan, I have  
written a little app to support the checking process:

http://svn.ca.iay.org.uk/ukfedmeta/trunk/src/uk/org/ukfederation/apps/mdcheck/MetadataCheck.java

This currently just runs a given XSLT stylesheet against the provided  
metadata file and then discards the output; the stylesheet  
communicates with the application using <xsl:message> elements to  
signal errors.  It may be possible to extend this by having the  
application pull in other relevant information and pass it to the  
checking stylesheet as parameters.

The current checking stylesheet is here:

http://svn.ca.iay.org.uk/ukfedmeta/trunk/build/check.xsl

This is a lot smaller and simpler to change than the HTML-generating  
version I had before, and it's also possible to add new checks far  
more easily because you don't need to care about generating valid  
output.

Here's the check for that 1.3 SP issue:

<xsl:template match="ds:KeyInfo/*[namespace-uri() != 'http://www.w3.org/2000/09/xmldsig#' 
]">
  <xsl:call-template name="fatal">
    <xsl:with-param name="m">ds:KeyInfo child element not in ds  
namespace</xsl:with-param>
  </xsl:call-template>
</xsl:template>

I've used a callable template to report errors so that I can add extra  
information in about the entity context in the error message without  
duplicating a lot of XSLT all over the place.

One final wrinkle that I plan on making fairly serious use of the  
ability of the Xalan processor in particular to call out to custom  
Java classes.  At present, for example, there is a call out to this  
class:

http://svn.ca.iay.org.uk/ukfedmeta/trunk/src/uk/org/ukfederation/xalan/Members.java

The main check.xsl passes this an auxiliary metadata document we  
maintain in a custom format that describes federation members; the  
class digests that to extract the canonical names of those members  
into a Set<String> so that we can do this later on:

<xsl:template match="md:EntityDescriptor[md:Organization/ 
md:OrganizationName]
    [not(ukfxm:isOwnerName($members, md:Organization/ 
md:OrganizationName))]">
  <xsl:call-template name="fatal">
    <xsl:with-param name="m">unknown owner name: <xsl:value-of  
select="md:Organization/md:OrganizationName"/></xsl:with-param>
  </xsl:call-template>
</xsl:template>

(Note the call to ukfxm:isOwnerName)

This makes some kinds of check that would be almost impossible (and  
very cpu-intensive) in plain XSLT pretty trivial. Ideally, I'd start  
using OpenSAML in this role a lot more, once I get round the fact that  
some of my other XSLT (used in command-line Xalan calls elsewhere)  
apparently breaks if you hand it the same endorsed libraries as  
OpenSAML 2 needs :-(

Relevance to MDX follows...

The arch document talks about transformation pipelines (T blocks) in  
the aggregation engine.  Cases like this make me think that:

1) most aggregation engines, certainly ones publishing to end  
entities, are going to want to check for known-evil metadata  
constructs [1]

2) I think one simple way of doing this is to allow one of the things  
in a T block pipeline to be something that explicitly there to check  
-- rather than transform -- the metadata as it goes by.  It probably  
makes most sense for this to be on an entity-by-entity basis, and in  
fact that's how I've been assuming the transformation pipelines would  
work anyway.  Checking is different mainly in that the output is  
discarded, and replaced with the input if nothing bad happened during  
the check.

	-- Ian

[1] ... or to be able to process incoming metadata into a form that is  
known to be non-evil by construction before handing metadata along to  
consumers.  This is harder in general but is the sort of thing you  
want to do if you want to enforce a particular set of conventions,  
dropping everything that falls outside them "just in case".  I'm not  
interested in this case here.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2448 bytes
Desc: not available
URL: <http://lists.iay.org.uk/pipermail/mdx-iay.org.uk/attachments/20090520/2e196c94/attachment-0002.bin>