Document and Media
Services
- document conversion service
- document repository service
- content and Repository Development
- eDiscovery Services
Introduction
Accessing heterogeneous data sources within an enterprise is
not new. Far from it, it is one of the core tasks that every IT department is
chartered with. Whether it is driven by the need for improving
Customer Care, enhancing Risk Assessment, or simply improving
Operational Efficiency – every organization has
to pull on multiple information assets that are stored in relational
databases, applications, mainframes, web sites, documents,
spreadsheets, etc. in order to satisfy rapidly changing business
requirements.
The challenge of integrating
information on an enterprise-wide scale is most pressing for
Fortune-class companies.
Analysts estimate that Fortune-class companies have over 40
enterprise application systems and spend up to 70 percent of
their application development budgets creating ways to access
disparate data.1
Traditionally, information
integration is achieved in one of two ways: either by back-hauling
data to a staging database
(sometime known as an Operational Data Store) or by writing custom
code where the data relationships are embedded in code. The
downside to both these approaches is that they are costly, time
consuming to implement and often result in replicated and
inconsistent views of the original data. Neither approach is well
suited to addressing the need for incorporating content with
data.
The emergence of tools
specifically designed for Enterprise Information Integration (EII)
promises to reduce the cost and complexity of integrating disparate
data, while providing better manageability and reuse. This new
market segment is sometimes also referred to as data federation,
data aggregation, or even virtual data
warehousing.
Snapbridge’s
information integration products: FDX Information Server
and
XStudio.
Snapbridge FDX, Information
Server is powered by a patent-pending technology capable of using
data resources within and
beyond the enterprise to create composite views of information that
can be applied or updated as part of a transaction.
Snapbridge XStudio is a visual drag-and-drop environment for
developers to describe their data transformation and processing
requirements.
“Snapbridge FDX is the first
standardsbased approach to Data Federation the Delphi Group has
seen emerge from the Enterprise Information Integration (EI)
software segment. FDX’s schema -free architecture… combined with
native support for XML and its associated standards, wil allow FDX
to reduce the footprint of large integration initiatives and ofer a
greater return on investment from existing
infrastructure.”
--Delphi
Group
Snapbridge FDX Information
Server
Snapbridge’s FDX is an
XML-based data/content federation product that is optimized for the
complexities of working with XML in real-time in a computationally
economic manner. Unlike many information integration products, it is
capable of bi-directionality, which is key to transactions
processing and enterprise data synchronization. Snapbridge FDX
manages the lifecycle of an enterprise data federation initiative,
enabling a seamless migration from development through
implementation, and features an intuitive GUI design interface that
lets decision-makers manage and choreograph information with
ease.
Snapbridge FDX works with other data integration
technologies such as data warehousing and operational data stores,
Enterprise Applications Integration (EAI) tools, and extract,
transform and load (ETL) tools, further empowering these
technologies and leveraging an enterprise’s existing information
infrastructure and workflow processes. As the Aberdeen Group has
argued, all of these complementary technologies “belong in an
enterprise’s strategic information architecture.”2
However, “what immediately separates FDX data federation from
EAI, data warehousing, and other approaches to complex information
integration is a dynamic data
management approach built on XML transformation capabilities, rather
than a schema-driven architecture.
”3
FDX Information Server has a
web-based interface that supports three key functions: (1) the
creation of federations in an
interface called the ‘builder’, (2) the management of XML documents
stored within FDX called
the ‘repository’ which is also accessible via several different
interfaces including WebDAV, and (3) system administration
including security, performance monitoring,
etc.
The Builder provides an
intuitive environment to allow business users to connect, federate,
and publish data. This interface allows users rather than
programmers to perform the most common data federation operations
with an extremely short learning curve.
·
Connect—Lets users define a data source, display the data, and
interrogate relational databases, flat files, XML documents, SOAP
services, etc.
·
Federate—Lets users
create relationships between data sources by pointing and
clicking
·
Publish—Lets users
format the newly created data using a pre-defined template or by
creating a new template with no knowledge of XML or XSLT
required.
While XML processing provides a
high degree of flexibility, it has the reputation of being
computationally intensive and therefore slow. FDX Information
Server's architecture mitigates this problem through three
complementary approaches.
The first is with incremental,
distributed processing. FDX Information server promotes breaking XML
processing into a pipelined
sequence of smaller operations that can be handled concurrently and
distributed across machines.
Caching of intermediate results
may be introduced at any step in the processing
chain.
The second is in the
representation of XML content. While FDX's inputs and outputs may be
XML text, internally FDX streams an optimized representation of
XML's information model between sub-processes, thereby avoiding the
overhead of parsing, validating, copying and serializing content
between processing steps. FDX's adapters to outside data sources
often convert data to this internal representation without the
intermediate steps of converting to an XML text stream which must
then be parsed. This streaming model enables FDX to handle much
larger data sets than other approaches.
Finally, FDX contains a native
XML database for persistent data. Queries to the database provide a
dynamic XML view of just the relevant fragments of content from
across a collection of documents. The core of the FDX product suite
is a high-performance federation processing engine that first
indexes, then federates, and finally delivers access. While most
information integration deployments use a schema to map data structures and definitions,
Snapbridge FDX instead uses a schema-less approach, utilizing a
Meta Index combined with an XML based runtime language called
XRAP. Developers can user XRAP and a graphical design environment to
map relationships and business logic and rules. XRAP (eXtensible
Repository Action Protocol), has been designed specifically for
bi-directional data federation. Because XRAP is a functional
language, it allows FDX to perform real-time federations of complex
data sets.
XRAP
is the
input to the FDX federation processing engine. Snapbridge enhances
the processing engine with a patent-pending I/O parallel pipeline
architecture that achieves an order-of-magnitude faster XML query
return time and that mitigates bandwidth bottlenecks. This
achievement, unduplicated among other XML-based data federation
products, distributes the query problem, effectively resolving
thorny processing overhead issues associated with
XML.
It is important to note that data federation is a
virtual technology. It points to data and opens portals for access, as well as transforms the
data for composite viewing, all the while leaving the data
where it resides and in the form in
which it resides. However, Information Retrieval research
has found that 60% of all queries are repeated
ones,4 and therefore in the interest of
processing efficiency as well as other dataflow issues,
Snapbridge FDX includes configurable caches for both individual
fragments of data sources and composite views of virtual data sets.
The data that flows between the XRAP functions—for example, a
composite customer view—can be cached in memory for faster access,
freeing the federation processing engine from having to re-execute
all the supporting data transforms required to produce that
composite view. The lifetime of the data in the cache is determined
by settings configured by the system
administrator.
XRAP composite objects can be
stored in the FDX repository to allow for re-use within the
enterprise. FDX provides for a full content management repository
for storing, updating and retrieving composite objects as well as
rich document types.
FDX Information Server Technology &
Architecture
Snapbridge FDX is a
patent-pending technology for integrating large amounts of different
kinds of data in real-time. Snapbridge FDX fuses multiple data
sources such as account detail from relational databases, flat file
mainframe data, email correspondence, digital images from content
repositories, feeds from third party resources, other information
from the Internet, etc., to create composite objects that can be
viewed, or updated as part of a transaction—regardless of
where the data is stored, how it is formatted or when it was
created.
FDX capitalizes on the XML
standard for structuring and expressing information, allowing the
system to operate on structured data and semi-structured content
(documents and images) at the same time. Snapbridge FDX information
integration software combines revolutionary, open-standards based
technologies for indexing,
normalization, aggregation, correlation and “semantic” data
delivery, resulting in the industry’s fastest, most flexible,
and most comprehensive information integration
solution.
The architecture of the
Snapbridge FDX Information Server product is shown below and is
described layer by layer in the
following text. As the diagram shows, FDX uses open standards and is
capable of federating virtually any data from structured
(e.g., relational database files) or semi-structured files (e.g.,
flat files, MS Word, Excel, web services, media files) that
resides in any type of data
store.
FDX builds aggregated data sets
by retrieving, transforming, and presenting information from
disparate data sources. Optimized for data
bidirectionality, the FDX product architecture alows
enterprises to accomplish enterprise information
integration ten times faster and at one-tenth the cost traditionaly
spent on custom-coded solutions.
FDX operates with open
standards such as XML, XSLT, XPATH, SOAP,
J2EE.
Snapbridge
XStudio
Snapbridge XStudio is a
powerful and flexible development environment for building
information integration
solutions based on XML technologies, fully supporting the creation
of XSLT, and of processes to choreograph the federation and
publication of information. It is unique in its ability to create
XSLT based on visually mapping source documents to target
outputs.
No XSL
Coding Necessary. In addition to making XSLT simple to
users who have no XSL experience, XStudio is a full featured
tool that speeds development by any XSL
expert.
Visual XSLT
Creation and Testing. Stylesheets are generated by
example using the drag and drop interface. Users can iteratively import
source and target documents, and then graphically define
transformations between them. No need for DTDs or hand-coding of
XSL. XStudio also incorporates XPath visualization, syntax help, and
integrated browser preview.
Visual XML
Processing. Generate processing scripts in XRAP
for getting external data, choreographing transformations,
federations and publication using a pipes and filters model. Scripts
can be executed against an instance of Snapbridge FDX Information
Server for integrating large amounts of different kinds of data in
real-time.
XRAP
This document describes the eXtensible Repository Action
Protocol (XRAP).
XRAP is a computer language for scripting the processing of
information; this can be in the form of structured data (databases,
flat files, forms, spreadsheets, etc.) or unstructured content
(documents, images, etc.). Processing includes actions such as the
simple transformation of data from one format to another, accessing
or storing content, correlating data, generation of new content, or
conversion among data representations. XRAP provides a set of
commands for specifying these actions in such a way that the
processing actions can be combined or assembled into complex
information processing
applications.
XRAP leverages open standards: XML, XSL,
etc.
XRAP programs are executed within the Snapbridge FDX
Information Server product. The Information Server is the only
comprehensive standards-based, real-time federation solution. This
product replaces the need for much of the custom code in enterprise
information integration and content
management.
This document is structured to introduce the reader to the
core ideas behind the XRAP programming model, the environment in
which XRAP is used, the basic commands that are most commonly used
in XRAP programming, and then finish off with the more advanced
commands.
Wherever possible we have included example XRAP scripts and
explanations on where and why commands would be
used.
