Loading External Resources
A guide to using external resources with XmlPrime
This topic contains the following sections.
Overview
          For security reasons, by default the set of available documents,
          collections and unparsed text resources is empty.  This means that the
          doc, document,
          collection and 
          unparsed-text functions will always raise an
          error, and doc-available and 
          unparsed-text-available will always return
          false.  This guide explains how to load
          external documents and query them with XmlPrime.
        
The Document Set
          The doc, document,
          collection and 
          unparsed-text functions are all defined to
          be stable.  This means that every time they are
          called within an XQuery program or XPath expression they must return
          the same object.  Since we cannot make the same garuantee for external
          resources, the accessibility and content of the documents must be
          cached throughout the evaluation of the query.  The caching of
          resources is handled by the
          DocumentSetDocumentSetDocumentSet.
        
Becuase a document set contains the documents used during query evaluation, the document set must be bound to a name table, which is specified in the constructor.
When a resource is used by a query or expression, it is requested from the document set. If the resource (or the fact that the resource is unavailable) is cached in the document set then it is returned (or an error is raised). Otherwise the document set proceeds to retrieve the resource through its resolvers.
To avoid reloading resources, and to allow sharing of the cached documents, the document set can be shared between different queries and XPath expressions. The document set is designed to be thread-safe, so it can also be shared between queries and expressions executing concurrently (assuming that the name table used is also thread-safe, for example ConcurrentNameTableConcurrentNameTableConcurrentNameTable). The document set to be used for evaluation of a query or expression is specified by the DynamicContextSettings.DocumentSetDynamicContextSettings.DocumentSetDynamicContextSettings::DocumentSet property.
It is recommended that any documents passed as arguments to an XQuery 1.0 program or an XPath 2.0 expression are loaded through the document set to improve consistency.
Pre-populating the Document Set
The document set can be populated programmatically. This provides bindings from URIs to resources that override those specified by the resolvers. Documents, collections and unparsed-text resources can all be added to the document set before it is used.
Any documents contatining nodes specified in the context item or any parameters with a non-empty document URI are automatically added in a similar fashion.
XmlPrime provides the IncludeWellKnownDTDsIncludeWellKnownDTDsIncludeWellKnownDTDs method to pre-populate the document set with the XHTML 1.0 DTDs.
Resolving Resources
The document set defines which documents are available via the document resolver, collection resolver and resource resolver which are passed in to the constructor. These are used to retrieve any documents that are not already in the cache
XmlPrime provides specialized interfaces to resolve resources rather than using an XmlResolverXmlResolverXmlResolver. This is so that resources already loaded in memory do not have to be serialized and reparsed. It also allows flexibility in which document representations are used.
Document Resolvers
              A document resolver is a class implementing the
              IDocumentResolverIDocumentResolverIDocumentResolver
              interface.  The interface includes the
              ResolveDocumentResolveDocumentResolveDocument
              method which is called to resolve external documents as requested 
              by the doc and 
              document functions.
            
The method is passed the URI of the document to resolve, the document set itself and the name table to use when loading any new documents. The method returns null if the URI is not handled, returns the document if it was retrieved successfully, or throws an exception if there was an error retrieving the document. The document resolver should not attempt to add or retrieve the document with the requested URI from the document set, as this will result in a deadlock.
The document set is passed in for the case that a resolver wants to use other available resources to retrieve the document.
Two default implementations of IDocumentResolverIDocumentResolverIDocumentResolver are provided by XmlPrime
- UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver
- This resolver retrieves the unparsed text with the specified URI from the document set, and then attempts to parse it as a document.
- XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver
- 
                This resolver uses the supplied
                XmlReaderSettingsXmlReaderSettingsXmlReaderSettings
                to retrieve the document at the specified URL.  Note that this
                does not make the resource available to the
                unparsed-textfunction.
Collection Resolvers
              A collection resolver is a class implementing the
              ICollectionResolverICollectionResolverICollectionResolver
              interface.  The interface includes the
              ResolveCollectionResolveCollectionResolveCollection
              method which is called to resolve external collections as requested 
              by the collection function.
            
The method is passed the URI of the collection to resolve, the document set itself and the name table to use when loading any new documents. The method returns null if the URI is not handled, returns the collection if it is retrieved successfully, or throws an exception if there was an error retrieving the collection. If a null URI is passed in then this indicates that the default collection should be resolved.
                Any nodes returned as part of a collection must either have an
                empty document URI, or must be in the document set.  This is to
                enforce the rule in XQuery that 
                doc(document-uri($N)) is $N is always
                true for any document node $N.
              
This is easiest to enforce if all documents returned are loaded from the document set.
The collection resolver should not attempt to add or retrieve the collection with the requested URI from the document set, as this will result in a deadlock.
Resource Resolvers
              A resource resolver is a class implementing the
              IResourceResolverIResourceResolverIResourceResolver
              interface.  The interface includes the
              ResolveResourceResolveResourceResolveResource
              method which is called to resolve external resources as requested 
              by the unparsed-text function.
            
The method is passed the URI of the resource to resolve. It returns null if the URI is not handled, returns the resource if it was retrieved successfully, or throws an exception if there was an error retrieving the resource.
The XmlResourceResolverXmlResourceResolverXmlResourceResolver is a resource resolver that wraps the specified XmlResolverXmlResolverXmlResolver.
Using an XmlResolver to Resolve Documents
              The
              DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver, XmlReaderSettings)DocumentSet (XmlResolver^, XmlReaderSettings^)
              constructor initializes a new document set with a 
              UnparsedTextDocumentResolverUnparsedTextDocumentResolverUnparsedTextDocumentResolver
              and an
              XmlResourceResolverXmlResourceResolverXmlResourceResolver
              wrapping the
              XmlReaderSettingsXmlReaderSettingsXmlReaderSettings
              and
              XmlResolverXmlResolverXmlResolver
              passed in.  Any document requested will first be retrieved as 
              unparsed text, and then parsed to create a document.  This
              ensures that the resources returned by 
              unparsed-text and 
              doc remain consistent.
          
              If a query or
              expression never uses the unparsed-text
              function then this results in the raw data of every document
              retrieved being unnecassarrily cached in memory.  In this case
              it is better to construct the
              DocumentSetDocumentSetDocumentSet
              using an
              XmlReaderDocumentResolverXmlReaderDocumentResolverXmlReaderDocumentResolver
              to avoid caching the unparsed data.  The code below shows how to
              set up a document set with an
              XmlUrlResolverXmlUrlResolverXmlUrlResolver
              without caching unparsed data.
            
XmlNameTable nameTable = new NameTable(); // this should be the name table used for the query/expression XmlResolver resolver = new XmlUrlResolver(); XmlReaderSettings readerSettings = new XmlReaderSettings(); readerSettings.NameTable = nameTable; readerSettings.XmlResolver = resolver; IDocumentResolver documentResolver = new XmlReaderDocumentResolver(readerSettings); DocumentSet documentSet = new DocumentSet(nameTable, documentResolver, null, null);
Setting the Types of External Resources
          It is often advantageous to specify the types of external resources.
          This can indicate that particular documents will conform to a 
          particular schema for example, and can help improve static type
          checking and aid in optimization.  A call to
          doc, document or 
          collection calls upon the document
          type resolver or collection type resolver
          to identify the static type of the document or collection respectively.
        
          If a document or collection retrieved during evaluation of an XQuery
          program or XPath expression does not match the type declared by the
          document resolver or collection resolver then an 
          XPST0004 type error is raised.
        
Document Type Resolvers
              A document type resolver implements the
              IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver
              interface.  When the URI of a document can be determined statically,
              then the
              ResolveDocumentTypeResolveDocumentTypeResolveDocumentType
              method is called which returns the static type of the document.  If
              the URI of the document can not be determined statically, or the
              ResolveDocumentTypeResolveDocumentTypeResolveDocumentType
              method returned null, then the type is set to the value of the
              DefaultDocumentTypeDefaultDocumentTypeDefaultDocumentType
              property, or document-node() if it returns
              null.
            
An implementation of IDocumentTypeResolverIDocumentTypeResolverIDocumentTypeResolver can be retrieved from a document set with the DocumentTypeResolverDocumentTypeResolverDocumentTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.
The document type resolver is set by setting the StaticContextSettings.DocumentTypeResolverStaticContextSettings.DocumentTypeResolverStaticContextSettings::DocumentTypeResolver property.
Collection Type Resolvers
              A collection type resolver implements the
              ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver
              interface.  When the URI of a collection can be determined statically,
              then the
              ResolveCollectionTypeResolveCollectionTypeResolveCollectionType
              method is called which returns the static type of the collection.  If
              the URI of the document can not be determined statically, or the
              ResolveCollectionTypeResolveCollectionTypeResolveCollectionType
              method returned null, then the type is set to the value of the
              DefaultCollectionTypeDefaultCollectionTypeDefaultCollectionType
              property, or node()* if it returns
              null.
            
The static type of the default collection is determined by first passing a null URI to ResolveCollectionTypeResolveCollectionTypeResolveCollectionType. If this returns null the process proceeds as above.
An implementation of ICollectionTypeResolverICollectionTypeResolverICollectionTypeResolver can be retrieved from a document set with the CollectionTypeResolverCollectionTypeResolverCollectionTypeResolver property. This resolver resolves all the documents requested statically, and returns their actual type.
