Design, Publish, Search!
Strictly speaking, Spinque is not a search engine, but rather a search engine generator.
With other tools, data is fed into the system, and the readily available search options can be exploited. If the search process you had in mind cannot be expressed with such systems, a considerable programming effort is necessary for their customisation, when this is at all possible.
With Spinque, there is no pre-defined search engine. You can design your own search engine, following the logical process that seems more reasonable to you. The result of this design (which happens via a graphical interface) is what we call a strategy. Strategies can be saved, changed, shared, optimised and ultimately published as stand-alone search engines. The following drawings summarise how to work with the Spinque platform and how this relates to its software architecture.
Spinque platform - Workflow
The typical workflow with the Spinque platform is that you create an index from the data, you design the desired search strategies to access the data and publish them as search engines on the Web. Modifying the search strategies also modifies the way users search your data.
The steps in the workflow relate to the architecture components (in bold), as follows:
Spinque platform - Architecture
Below, we describe briefly each of the architecture components. Upcoming posts will provide more in-depth information.
Spinque can work with data coming in a variety of formats. XML is natively supported in Spinque. Many other formats are supported and converted into XML under the hood when necessary, including HTML, SQL, JSON, CSV files, PDF and office documents, RDF documents.
Spinque's flexibility in handling data is enriched by data containers, which define how data is accessed. These include filesystem files and directories, data archives such as TAR and ZIP, SVN repositories, RSS feeds and Open Archive Initiative Protocol Metadata Harvesting (OAI-PMH). Containers can be grouped and chained together to allow extra flexibility. For example, documents from a filesystem can be combined with RSS feeds into a single data input source for the system.
Every data collection comes in different "sizes & shapes", not only in terms of data formats but also in terms of internal structure. Data templates instruct the system about where to find relevant pieces of information within the data at hand. While it is possible to use data-agnostic, zero-configuration data templates for simple search operations like full text search, more focused operations require some detailed knowledge on the data being searched.
As an example, consider a collection of Wikipedia pages. If we planned to support queries for the navigation of the "See also" section, we would need to tell the system how the database entry (Linux, see also, List of operating systems) can be derived from the analysis of the Wikipedia page about the Linux operating system, that is, where to find such links within the document structure.
Such focused data description is made easy by Spinque-customised XSL stylesheets, which contain definitions such as:
<xsl:template match=xpath_to_seealso_section/item> .. <spinque:relation subject=current_page predicate="see_also" object=related_page/> .. </xsl:template>
The indexer is a piece of software that takes data containers and data templates in input and populates a relational database with an index to make data searchable. It is designed to achieve high data throughput, albeit dependent on the complexity of the data templates used.
Spinque relies on a Relational DataBase Management System as a central repository for storing and querying all the meta-data and indices that are necessary for its operations. Relational database technology guarantees a solid ground for a robust and efficient data management layer. Spinque can talk to any RDBMS, but is optimised to work at best with MonetDB/SQL, a high-performance, column-store database engine that excels in heavy analytical tasks on large data.
Later posts will talk more specifically about the nuts and bolts of how data is stored and crunched inside the database. We can however highlight the following interesting aspects that make Spinque platform unique:
SpinQL (pronounced as ' spinkle' ) is a domain specific language designed to bridge the gap between the expressive power of Spinque's Search by Strategy concept and the the unfriendly efficiency of SQL queries to the RDBMS. It is used internally to describe in full detail the strategies that can be designed visually using the Strategy Editor. While it is not necessary to even know of SpinQL's existence in standard scenarios, skilled users can optionally use it to create more customised search primitives.
All Spinque components communicate via a REpresentational State Transfer (REST) API available through a Web service. REST commands are issued over HTTP as simple URIs. The simplicity and usefulness of this layer can be best illustrated with an example:
app1 project, using
strategy1, with query fields
buildYear=1970. Any custom-build search UI can issue queries and receive answers to a Spinque server, as long as it is able to construct such URIs.
The Strategy Editor is a Web service that allows to design search strategies (remember: Don't program search engines, design them!). Simply drag the available building blocks into the design space and connect them, to define how data flows from one operation to the next one. Define which parameters will be customisable by end-users. Gain full control on the search process by inspecting the intermediate results at any block in the designed strategy.
When your strategy reflects the search experience you want to offer on your data, save it and publish it as an independent end-user search engine. How? By using the Spinque Management Console.
The Spinque Managegement Console is a Web service that implements a dashboard from which you can control workspaces, databases, strategies and more. You can connect one or more databases to the workspace that represents your search application, and select which search strategies you want to publish as stand-alone search engines for end-users.
As soon as a search strategy is published, the search engine and underlying REST API are generated, and ready to be queried. Thanks to the REST interface, the UI design is completely independent from the Spinque platform and can be designed by any third party. This screenshot shows a simple UI designed by Spinque (we are no UI experts!) for a demo search application related to Dutch political data. Upcoming posts will show how to build a simple UI for Spinque in just a few minutes.