Keywords : DNA sequence compression
European Journal of Molecular & Clinical Medicine,
2020, Volume 7, Issue 6, Pages 203-213
DNA or genomic sequences compression and indexing using the standard algorithms are facing a high complexity as massive datasets grow rapidly.To avoid this problem, a Tunneled Run-Length Encoded (RLE) Burrows-Wheeler Transform (BWT)-based encoding with Improved Index (TBWT-II) algorithm has been proposed that uses Text-Label index (TLBW-index) for counting and discovering the labeled patterns. However, the reduction on global space consumption of the TLBW-index was not effective. Also, the classic MTF in TBWT-II has a specific local property that can be leveraged during encoding time and the decoded character was a series function of the decoded values of prior characters. Therefore in this article, an Enhanced TBWT-II (ETBWT-II) algorithm is proposed to effectively reduce the global space consumption of TLBW-index. The major goal of this algorithm is to avoid the need of local searching capabilities within the compressed database and minimize the space consumption during retrieval of characters. As a result, a locally-decodable Move-To-Front (MTF) encoding is used instead of standard MTF in TBWT-II for reducing the decoding time of a single character with the minimum space consumption. Finally, the experimental results on SCOPe 1.67 dataset show the performance efficiency of proposed ETBWT-II algorithm compared to the existing compression algorithms.