Loading External Resources
A guide to using external resources with XmlPrime
This topic contains the following sections.
Overview
For security reasons, by default the set of available documents,
collections and unparsed text resources is empty. This means that the
doc
, document
,
collection
and
unparsed-text
functions will always raise an
error, and doc-available
and
unparsed-text-available
will always return
false
. This guide explains how to load
external documents and query them with XmlPrime.
The Document Set
The doc
, document
,
collection
and
unparsed-text
functions are all defined to
be stable. This means that every time they are
called within an XQuery program or XPath expression they must return
the same object. Since we cannot make the same garuantee for external
resources, the accessibility and content of the documents must be
cached throughout the evaluation of the query. The caching of
resources is handled by the
DocumentSetDocumentSetDocumentSet.
Becuase a document set contains the documents used during query evaluation, the document set must be bound to a name table, which is specified in the constructor.
When a resource is used by a query or expression, it is requested from the document set. If the resource (or the fact that the resource is unavailable) is cached in the document set then it is returned (or an error is raised). Otherwise the document set proceeds to retrieve the resource through its resolvers.
To avoid reloading resources, and to allow sharing of the cached documents, the document set can be shared between different queries and XPath expressions. The document set is designed to be thread-safe, so it can also be shared between queries and expressions executing concurrently (assuming that the name table used is also thread-safe, for example ConcurrentNameTableConcurrentNameTableConcurrentNameTable). The document set to be used for evaluation of a query or expression is specified by the DynamicContextSettings.DocumentSetDynamicContextSettings.DocumentSetDynamicContextSettings::DocumentSet property.
It is recommended that any documents passed as arguments to an XQuery 1.0 program or an XPath 2.0 expression are loaded through the document set to improve consistency.
Pre-populating the Document Set
The document set can be populated programmatically. This provides bindings from URIs to resources that override those specified by the resolvers. Documents, collections and unparsed-text resources can all be added to the document set before it is used.
Any documents contatining nodes specified in the context item or any parameters with a non-empty document URI are automatically added in a similar fashion.
XmlPrime provides the IncludeWellKnownDocumentTypeDefinitionsIncludeWellKnownDocumentTypeDefinitionsIncludeWellKnownDocumentTypeDefinitions method to pre-populate the document set with the XHTML 1.0 DTDs.
Resolving Resources
The document set defines which documents are available via the document resolver, collection resolver and resource resolver which are passed in to the constructor. These are used to retrieve any documents that are not already in the cache
XmlPrime provides specialized interfaces to resolve resources rather than using an XmlResolverXmlResolverXmlResolver. This is so that resources already loaded in memory do not have to be serialized and reparsed. It also allows flexibility in which document representations are used.
Document Resolvers
A document resolver is a class implementing the
IDocumentResolverIDocumentResolverIDocumentResolver
interface. The interface includes the
ResolveDocumentResolveDocumentResolveDocument
method which is called to resolve external documents as requested
by the doc
and
document
functions.
The method is passed the URI of the document to resolve, the
document set itself and the name table to use when loading any
new documents. The method returns null
if the URI is not
handled, returns the document if it was retrieved successfully,
or throws an exception if there was an error retrieving the
document. The document resolver should not
attempt to add or retrieve the document with the requested URI
from the document set, as this will result in a deadlock.
The document set is passed in for the case that a resolver wants to use other available resources to retrieve the document.
Two default implementations of IDocumentResolverIDocumentResolverIDocumentResolver are provided by XmlPrime
- UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver
- This resolver retrieves the unparsed text with the specified URI from the document set, and then attempts to parse it as a document.
- XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver
-
This resolver uses the supplied
XmlReaderSettingsXmlReaderSettingsXmlReaderSettings
to retrieve the document at the specified URL. Note that this
does not make the resource available to the
unparsed-text
function.
Collection Resolvers
A collection resolver is a class implementing the
ICollectionResolverICollectionResolverICollectionResolver
interface. The interface includes the
ResolveCollectionResolveCollectionResolveCollection
method which is called to resolve external collections as requested
by the collection
function.
The method is passed the URI of the collection to resolve, the document set itself and the name table to use when loading any new documents. The method returns null if the URI is not handled, returns the collection if it is retrieved successfully, or throws an exception if there was an error retrieving the collection. If a null URI is passed in then this indicates that the default collection should be resolved.
Any nodes returned as part of a collection must either have an
empty document URI, or must be in the document set. This is to
enforce the rule in XQuery that
doc(document-uri($N)) is $N
is always
true for any document node $N
.
This is easiest to enforce if all documents returned are loaded from the document set.
The collection resolver should not attempt to add or retrieve the collection with the requested URI from the document set, as this will result in a deadlock.
Resource Resolvers
A resource resolver is a class implementing the
IResourceResolverIResourceResolverIResourceResolver
interface. The interface includes the
ResolveResourceResolveResourceResolveResource
method which is called to resolve external resources as requested
by the unparsed-text
function.
The method is passed the URI of the resource to resolve. It returns null if the URI is not handled, returns the resource if it was retrieved successfully, or throws an exception if there was an error retrieving the resource.
The XmlResourceResolverXmlResourceResolverXmlResourceResolver is a resource resolver that wraps the specified XmlResolverXmlResolverXmlResolver.
Using an XmlResolver to Resolve Documents
The
DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver^, XmlReaderSettings^)
constructor initializes a new document set with a
UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver
and an
XmlResourceResolverXmlResourceResolverXmlResourceResolver
wrapping the
XmlReaderSettingsXmlReaderSettingsXmlReaderSettings
and
XmlResolverXmlResolverXmlResolver
passed in. Any document requested will first be retrieved as
unparsed text, and then parsed to create a document. This
ensures that the resources returned by
unparsed-text
and
doc
remain consistent.
If a query or
expression never uses the unparsed-text
function then this results in the raw data of every document
retrieved being unnecassarrily cached in memory. In this case
it is better to construct the
DocumentSetDocumentSetDocumentSet
using an
XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver
to avoid caching the unparsed data. The code below shows how to
set up a document set with an
XmlUrlResolverXmlUrlResolverXmlUrlResolver
without caching unparsed data.
XmlNameTable nameTable = new NameTable(); XmlResolver resolver = new XmlUrlResolver(); XmlReaderSettings readerSettings = new XmlReaderSettings(); readerSettings.NameTable = nameTable; readerSettings.XmlResolver = resolver; IDocumentResolver documentResolver = new XmlReaderDocumentResolver(readerSettings); DocumentSet documentSet = new DocumentSet(nameTable, documentResolver, null, null);
Setting the Types of External Resources
It is often advantageous to specify the types of external resources.
This can indicate that particular documents will conform to a
particular schema for example, and can help improve static type
checking and aid in optimization. A call to
doc
, document
or
collection
calls upon the document
type resolver or collection type resolver
to identify the static type of the document or collection respectively.
If a document or collection retrieved during evaluation of an XQuery
program or XPath expression does not match the type declared by the
document resolver or collection resolver then an
XPST0004
type error is raised.
Document Type Resolvers
A document type resolver implements the
IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver
interface. When the URI of a document can be determined statically,
then the
ResolveDocumentTypeResolveDocumentTypeResolveDocumentType
method is called which returns the static type of the document. If
the URI of the document can not be determined statically, or the
ResolveDocumentTypeResolveDocumentTypeResolveDocumentType
method returned null, then the type is set to the value of the
DefaultDocumentTypeDefaultDocumentTypeDefaultDocumentType
property, or document-node()
if it returns
null.
An implementation of IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver can be retrieved from a document set with the DocumentTypeResolverDocumentTypeResolverDocumentTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.
The document type resolver is set by setting the StaticContextSettings.DocumentTypeResolverStaticContextSettings.DocumentTypeResolverStaticContextSettings::DocumentTypeResolver property.
Collection Type Resolvers
A collection type resolver implements the
ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver
interface. When the URI of a collection can be determined statically,
then the
ResolveCollectionTypeResolveCollectionTypeResolveCollectionType
method is called which returns the static type of the collection. If
the URI of the document can not be determined statically, or the
ResolveCollectionTypeResolveCollectionTypeResolveCollectionType
method returned null, then the type is set to the value of the
DefaultCollectionTypeDefaultCollectionTypeDefaultCollectionType
property, or node()*
if it returns
null.
The static type of the default collection is determined by first passing a null URI to ResolveCollectionTypeResolveCollectionTypeResolveCollectionType. If this returns null the process proceeds as above.
An implementation of ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver can be retrieved from a document set with the CollectionTypeResolverCollectionTypeResolverCollectionTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.