[mdx] Joe on section 3.2.1

Thu Sep 26 06:35:58 PDT 2013

Joe:

> -- 3.2.1, Reqest: How long can a request be? Hypothetically, what
>   if I have a laundry list of a thousand IDs, all transformed into
>   SHA1 hashes? That could make for a long request!

Scott commented:

> There really is no clean way to address URL length that I've ever seen.

My remarks:

I don't think this arises in practice, because the "+" operator in a query is equivalent to set intersection (it's defined as asking for the entity or entities that have id1 AND id2 AND id3).  It doesn't seem likely that you'd want to query for the intersection of a thousand named sets.

The spec does not at present include any way of asking for id1 OR id2 OR id3, which is what would introduce that kind of concern I think. At the moment, if you want to ask for metadata for a thousand identifiers, that would be one query per identifier. I don't think we really have a use case to justify extending the query language in that way, but who knows what the future holds. I will observe that if we though that such an extension might one day be required, it might make sense to think about reserving some more characters in identifiers now (not just "+" and "{").

Joe:

>   In 3.2.1, are the curly braces around the IDs literal curly braces
>   or is that just a semantic representation issue?

Leif:

>> cf note about needing to discuss {} above

Scott:

> It's a specification of the actual character, after any decoding.

My remarks:

I'm not sure that's true.  I think we're talking about this:

<base_url>/entities/{ID}+{ID}+…

Judging from the example, where we see "{ID}" in the above example, the intention is actually to include an appropriately encoded identifier not surrounded by braces; if a literal opening brace appears it is taken as a signal for a transformation indicator.

So this actually looks like a typo to me, and should probably be more like:

<base_url>/entities/<ID>+<ID>+…

Of course, it might be better to have a proper grammar for this.  I don't think the current spec even says that <base_url> is a non-terminal, for example, the meaning of <X> in that context is completely implicit.

Comments?

>   I'd also prefer to see specificity wrt what happens for an ill-formed
>   request (e.g., an unterminated or otherwise malformed ID value, for 
>   example, on a query that requests multiple IDs -- should the query
>   fail, or should partial results be returned or ?)

It's hard to see how you would get an unterminated or otherwise malformed ID value in the simple case, because there's no grammar for IDs.  So, for example, leaving characters off the end would just give you an identifier that doesn't return any results.

The only cases I can think of where you can have a syntactic error as things stand would be:

* <base_url>/entities/FOO+        (the second ID is missing)

* <base_url>/entities/{md5f3678248a29ab8e8e5b1b00bee4060e0      (no '}')

These both seem like natural 400 status (malformed request) cases to me, but do we need to make that explicit?

Any comments as to whether these are the only cases, and what to say about how they should be handed?

	-- Ian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4813 bytes
Desc: not available
URL: <http://lists.iay.org.uk/pipermail/mdx-iay.org.uk/attachments/20130926/8f741777/smime.p7s>