You are here

Data set criteria

Once upon a time, we were happy with search engines that could retrieve textual documents. Today, we want to search through multiple and heterogeneous data sets of any kind, including images, audio/video, highly structured as well as free-form material. Also, we want our search results to be documents, paragraphs, audio segments, as well as named entities such as persons or companies. Spinque software is designed to get all this out of your own data.
 

Definition of a data set

  • Spinque strategies allow to search multiple data sets at once
  • Each data set will be indexed and can have completely different content and structure
  • Each data set is seen as an object oriented database

Example

A search engine might search a company's intranet, its product catalogue and the logs from the helpdesk. Each of these sources are regarded as different data sets. On the intranet it is likely to have documents with metadata such as a 'created/modified' date and perhaps an author. A product catalogue contains a list of products, associated categories, prices, and perhaps descriptions. A helpdesk log may have date of the telephone call, description of the incident, and proposed solution.

 

Data set content / format

  • Content can be almost anything (documents, relational database, images, audio/video tapes, or a mix of these).
  • Content can have additional descriptions (metadata) or markup (annotations). Notice that Spinque does not (at this time) provide annotation services to enrich your data.
  • Content often comes with associated metadata. To be able to quickly index such metadata, this should be either stored in RDF / XML / JSON / plain text / an SQL database (most SQL dialects by various database-vendors are supported). Different formats can be handled upon request.

 

Data size

Spinque can provide real-time search for meta-data sets of more than a 100GB on a single server. To put this in perspective, the total size of the whole English Wikipedia is about 30GB.

Additional factors contribute to determine the actual responsiveness of search solutions created on top of your data: the complexity of the search strategies defined, the hardware resources available.

 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer