Loading External Resources

A guide to using external resources with XmlPrime

This topic contains the following sections.

Overview

For security reasons, by default the set of available documents, collections and unparsed text resources is empty. This means that the doc, document, collection and unparsed-text functions will always raise an error, and doc-available and unparsed-text-available will always return false. This guide explains how to load external documents and query them with XmlPrime.

The Document Set

The doc, document, collection and unparsed-text functions are all defined to be stable. This means that every time they are called within an XQuery program or XPath expression they must return the same object. Since we cannot make the same garuantee for external resources, the accessibility and content of the documents must be cached throughout the evaluation of the query. The caching of resources is handled by the DocumentSetDocumentSetDocumentSet.

 
 
Note
XmlPrime treats documents, collections and unparsed text resources independently. For example just because a document exists at a particular URI does not necessarily imply that the unparsed-text of that URI is retrievable or even results in the same document.
 

Becuase a document set contains the documents used during query evaluation, the document set must be bound to a name table, which is specified in the constructor.

When a resource is used by a query or expression, it is requested from the document set. If the resource (or the fact that the resource is unavailable) is cached in the document set then it is returned (or an error is raised). Otherwise the document set proceeds to retrieve the resource through its resolvers.

To avoid reloading resources, and to allow sharing of the cached documents, the document set can be shared between different queries and XPath expressions. The document set is designed to be thread-safe, so it can also be shared between queries and expressions executing concurrently (assuming that the name table used is also thread-safe, for example ConcurrentNameTableConcurrentNameTableConcurrentNameTable). The document set to be used for evaluation of a query or expression is specified by the DynamicContextSettings.DocumentSetDynamicContextSettings.DocumentSetDynamicContextSettings::DocumentSet property.

It is recommended that any documents passed as arguments to an XQuery 1.0 program or an XPath 2.0 expression are loaded through the document set to improve consistency.

Pre-populating the Document Set

The document set can be populated programmatically. This provides bindings from URIs to resources that override those specified by the resolvers. Documents, collections and unparsed-text resources can all be added to the document set before it is used.

Any documents contatining nodes specified in the context item or any parameters with a non-empty document URI are automatically added in a similar fashion.

XmlPrime provides the IncludeWellKnownDocumentTypeDefinitionsIncludeWellKnownDocumentTypeDefinitionsIncludeWellKnownDocumentTypeDefinitions method to pre-populate the document set with the XHTML 1.0 DTDs.

Resolving Resources

The document set defines which documents are available via the document resolver, collection resolver and resource resolver which are passed in to the constructor. These are used to retrieve any documents that are not already in the cache

XmlPrime provides specialized interfaces to resolve resources rather than using an XmlResolverXmlResolverXmlResolver. This is so that resources already loaded in memory do not have to be serialized and reparsed. It also allows flexibility in which document representations are used.

 
 
Security note
A compromised query can cause any documents available in the document set to be retrieved. Avoid allowing sensitive information to be retrieved from a document set, by carefully defining your resolvers. If only a few resources are required, then consider constructing a document set without any resolvers, and pre-populating it with accessible resources instead.
 

Document Resolvers

A document resolver is a class implementing the IDocumentResolverIDocumentResolverIDocumentResolver interface. The interface includes the ResolveDocumentResolveDocumentResolveDocument method which is called to resolve external documents as requested by the doc and document functions.

The method is passed the URI of the document to resolve, the document set itself and the name table to use when loading any new documents. The method returns null if the URI is not handled, returns the document if it was retrieved successfully, or throws an exception if there was an error retrieving the document. The document resolver should not attempt to add or retrieve the document with the requested URI from the document set, as this will result in a deadlock.

 
 
Note
To help increase consistency, XmlPrime imposes the requirement that any document returned must either have an empty document URI, or a document URI that is equivalent to the document being requested.
 

The document set is passed in for the case that a resolver wants to use other available resources to retrieve the document.

Two default implementations of IDocumentResolverIDocumentResolverIDocumentResolver are provided by XmlPrime

UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver
This resolver retrieves the unparsed text with the specified URI from the document set, and then attempts to parse it as a document.
XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver
This resolver uses the supplied XmlReaderSettingsXmlReaderSettingsXmlReaderSettings to retrieve the document at the specified URL. Note that this does not make the resource available to the unparsed-text function.

Collection Resolvers

A collection resolver is a class implementing the ICollectionResolverICollectionResolverICollectionResolver interface. The interface includes the ResolveCollectionResolveCollectionResolveCollection method which is called to resolve external collections as requested by the collection function.

The method is passed the URI of the collection to resolve, the document set itself and the name table to use when loading any new documents. The method returns null if the URI is not handled, returns the collection if it is retrieved successfully, or throws an exception if there was an error retrieving the collection. If a null URI is passed in then this indicates that the default collection should be resolved.

 
 
Warning

Any nodes returned as part of a collection must either have an empty document URI, or must be in the document set. This is to enforce the rule in XQuery that doc(document-uri($N)) is $N is always true for any document node $N.

This is easiest to enforce if all documents returned are loaded from the document set.

 
 
 
Note
The default collection can be specified on a per-query basis, even when sharing a document set by setting the DefaultCollectionURIDefaultCollectionURIDefaultCollectionURI which maps the default collection to a particular URI instead of null.
 

The collection resolver should not attempt to add or retrieve the collection with the requested URI from the document set, as this will result in a deadlock.

Resource Resolvers

A resource resolver is a class implementing the IResourceResolverIResourceResolverIResourceResolver interface. The interface includes the ResolveResourceResolveResourceResolveResource method which is called to resolve external resources as requested by the unparsed-text function.

The method is passed the URI of the resource to resolve. It returns null if the URI is not handled, returns the resource if it was retrieved successfully, or throws an exception if there was an error retrieving the resource.

The XmlResourceResolverXmlResourceResolverXmlResourceResolver is a resource resolver that wraps the specified XmlResolverXmlResolverXmlResolver.

Using an XmlResolver to Resolve Documents

The DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver^, XmlReaderSettings^) constructor initializes a new document set with a UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver and an XmlResourceResolverXmlResourceResolverXmlResourceResolver wrapping the XmlReaderSettingsXmlReaderSettingsXmlReaderSettings and XmlResolverXmlResolverXmlResolver passed in. Any document requested will first be retrieved as unparsed text, and then parsed to create a document. This ensures that the resources returned by unparsed-text and doc remain consistent.

If a query or expression never uses the unparsed-text function then this results in the raw data of every document retrieved being unnecassarrily cached in memory. In this case it is better to construct the DocumentSetDocumentSetDocumentSet using an XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver to avoid caching the unparsed data. The code below shows how to set up a document set with an XmlUrlResolverXmlUrlResolverXmlUrlResolver without caching unparsed data.

 
XmlNameTable nameTable = new NameTable();

XmlResolver resolver = new XmlUrlResolver();

XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.NameTable = nameTable;
readerSettings.XmlResolver = resolver;

IDocumentResolver documentResolver = new XmlReaderDocumentResolver(readerSettings);

DocumentSet documentSet = new DocumentSet(nameTable, documentResolver, null, null);
 

Setting the Types of External Resources

It is often advantageous to specify the types of external resources. This can indicate that particular documents will conform to a particular schema for example, and can help improve static type checking and aid in optimization. A call to doc, document or collection calls upon the document type resolver or collection type resolver to identify the static type of the document or collection respectively.

If a document or collection retrieved during evaluation of an XQuery program or XPath expression does not match the type declared by the document resolver or collection resolver then an XPST0004 type error is raised.

Document Type Resolvers

A document type resolver implements the IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver interface. When the URI of a document can be determined statically, then the ResolveDocumentTypeResolveDocumentTypeResolveDocumentType method is called which returns the static type of the document. If the URI of the document can not be determined statically, or the ResolveDocumentTypeResolveDocumentTypeResolveDocumentType method returned null, then the type is set to the value of the DefaultDocumentTypeDefaultDocumentTypeDefaultDocumentType property, or document-node() if it returns null.

An implementation of IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver can be retrieved from a document set with the DocumentTypeResolverDocumentTypeResolverDocumentTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.

The document type resolver is set by setting the StaticContextSettings.DocumentTypeResolverStaticContextSettings.DocumentTypeResolverStaticContextSettings::DocumentTypeResolver property.

Collection Type Resolvers

A collection type resolver implements the ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver interface. When the URI of a collection can be determined statically, then the ResolveCollectionTypeResolveCollectionTypeResolveCollectionType method is called which returns the static type of the collection. If the URI of the document can not be determined statically, or the ResolveCollectionTypeResolveCollectionTypeResolveCollectionType method returned null, then the type is set to the value of the DefaultCollectionTypeDefaultCollectionTypeDefaultCollectionType property, or node()* if it returns null.

The static type of the default collection is determined by first passing a null URI to ResolveCollectionTypeResolveCollectionTypeResolveCollectionType. If this returns null the process proceeds as above.

An implementation of ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver can be retrieved from a document set with the CollectionTypeResolverCollectionTypeResolverCollectionTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.