New Methods and Algorithms to Estimate the Selectivity of SQL LIKE Queries
8/1/2017
ŞEHİR Faculty Member Assist. Prof. Ali Çakmak's project entitled "New Methods and Algorithms to Estimate the Selectivity of SQL LIKE Queries" has been awarded with TÜBİTAK grant
Assist. Prof. Ali Çakmak, Faculty Member of İstanbul Şehir University Computer Science and Engineering Department, has been awarded a TÜBİTAK 1001 - Scientific and Technological Research Projects Funding Program grant for his project entitled "New Methods and Algorithms to Estimate the Selectivity of SQL LIKE Queries".

Accurate cost and time estimation of a query is one of the major success indicators for database management systems. SQL allows to express flexible queries on text-formatted data. The LIKE operator is used to search for a specified pattern in a string database (e.g., name LIKE ‘es%’ predicate allows to search for people whose names start with ‘es’). It is vital to estimate the selectivity of such flexible predicates accurately for the query optimizer to choose an efficient execution plan. In this project, we will study the problem of estimating the selectivity of a LIKE query predicate over a bag of strings.

We propose a new type of pattern-based histogram structure to summarize the data distribution in a particular column. More specifically, we will first mine sequential patterns over a given string database, and then construct a special histogram out of the mined patterns. Besides, in this project, we will extend the existing sequence mining techniques to compute more specific sequence patterns and increase the selectivity estimation accuracy of the proposed framework. Orthogonal to the proposed techniques, as part of this project, we will question the value of the currently used metrics in the literature to compare different selectivity estimation methods.​