probabilistic graph database

We offer a set of online tools that facilitate search design. The technology underlying this toolset is the probabilistic graph database. As the name implies this database stores graph data and a probability estimate for every node and every edge. Below we present the various components of our technology stack and how they jointly enable search design.

Knowledge Graphs
A prerequisite for search design is to have all data easily accessible. We accomplish this by storing the entities and concepts contained in data sources as Linked Data. This method of data storage was developed by the World Wide Web Consortium (W3C) to facilitate the integration of data on the web; the same approach to represent data is equally applicable to other data sources you encounter in the enterprise.

The basic principle of Linked Data is to store data as ‘triples’. Triples have the form subject - predicate - object: one resource (the subject) is connected to another resource (the object) via a relation (the predicate). In other words: data is stored as tiny fragments of a network (or graph):

triple indicating the location of our office

By assigning unique identifiers to these concepts and relations, these fragments can be combined together into a large network, a knowledge graph:

a small knowledge graph of our domain

The resulting knowledge graph can be linked to other existing graphs (often on the web) to further enrich the data:

our knowledge graph enriched with external data on how to reach us

Knowledge graphs are extremely powerful - they allow many different kinds of information to be linked together into one rich model of a business domain. In order to accommodate this power, the graph database does not have a fixed schema: it may contain any type of data, tiny and large, internal and external, structured and unstructured.

our knowledge graph stored in a table

With the data unified as a knowledge graph in the database, search designers now need a way to access all the information contained in it in order to answer the various questions their users might have.

SpinQL
We have developed a query language for the graph database: SpinQL. On the one hand SpinQL can be used to answer questions about structured data in the graph, for example to find the number of businesses in an office building. On the other hand it can rank data to answer questions about unstructured data in the graph, for example, to find the most relevant business on the subject of ‘leadership coaching’. The best part however is that SpinQL can combine structured and unstructured queries seamlessly, for example, to rank all businesses in an office building that exist for more than 5 years, have more than 8 employees and are most relevant on the subject of ‘search’ :p

To allow this fusion of selecting and ranking, we have extended the data presentation with a single probability estimate, for each and every result. When selecting structured data from a database, you are presented with certain answers from facts, and these probability estimates will be equal to one or zero. When ranking unstructured data however, computers will never be certain about the degree of relevance of the result - here, the answer to a query is really a series of probable answers, produced by one ranking algorithm or another. SpinQL considers every element to have a ranking score: we simply append a probability column to all tables representing the knowledge graph, and results from both types of operations, selections and rankings, are stored equally: facts stored with a probability of 1, ranked results with a probability estimate that expresses their relevance. Crucially, SpinQL combines and propagates these probabilities: it has been defined how to compute the resulting probability, for all operations on the graph.

Propagation of probabilities in a query that selects CWI spin-offs and ranks them on ‘search solutions’

Spinque Desk
Search designers use SpinQL to answer any type of question about their business domain. We provide an editor to compose SpinQL queries graphically: Spinque Desk. By combining building blocks that contain snippets of SpinQL, each performing specific operations on the database, search designers create so-called search strategies that answer their questions. Defining the search strategy is an interactive process where the search designer expresses their knowledge about the business domain to achieve the most relevant search results.

A search strategy in Spinque Desk

API deployment: compiler + execution

Once the search designers are satisfied with the output of a search strategy, they deploy it as an API from Spinque Desk. During this process the compiler combines the SpinQL snippets contained in the strategy and converts them to a SQL query. A SpinQL query can be executed in many different sequences and can be expressed in many different ways in SQL. The compiler analyses each strategy, rewrites it and automatically caches the parts that are independent of query parameters. This way it produces an efficient SQL query of which large parts have been pre-computed.

Finally the resulting SQL-queries are executed on MonetDB, a state-of-the-art, column-store database that is optimised to perform analytical tasks. Together this results in the fast and stable execution of search solutions.

Our technology stack enables search designers to store their data as a knowledge graph, to design answers to any type of question about their business domain and to convert these designs into fast and reliable API’s. It supports search design and enables the creation of excellent search solutions.

Would you like to dive deeper in our technology stack?
Check out our online publications.

Videos

grasp the main concepts in just a few minutes

CultuurLINK tutorial

Link thesauri with Spinque. watch >

GTOU demonstrator (Dutch)

De demo laat zien hoe twee mediacollecties, waar de media-items zijn geannoteerd met 'aligned' thesauri kan worden verkend. watch >

Scientific publications and presentations

Explore in full detail the technology that make Spinque solutions possible. With peer-reviewed publications, we actively seek formal validation of our approach by the research community, to make sure that our claims are backed by solid arguments.

KARS 2017 presentation

Challenges for industrial-strength. Information Retrieval on Databases. slides >

KARS 2017 paper

Challenges for industrial-strength. Information Retrieval on Databases. read >

Searching Political Data by Strategy

Searching Political Data by Strategy, In Proceedings of the Integrating IR technologies for Professional Search Workshop, 2013. read >

Combining document representations for prior-art retrieval

Combining document representations for prior-art retrieval, In Proceedings of CLEF, 2011. read >

Searching CLEF-IP by strategy

Searching CLEF-IP by strategy, in 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Revised Selected Papers, 2010. read >

Search by strategy

Search by strategy, poster session, ESAIR'10, Toronto, Canada, 2010. read >

Hooghiemstraplein 126

3514 AZ Utrecht, The Netherlands

info@spinque.com

+31 (0) 30 700 9705

 

Copyright Spinque B.V.