Opinion - International Research Journal of Biochemistry and Bioinformatics ( 2022) Volume 12, Issue 3
Received: 01-Jun-2022, Manuscript No. IRJBB-22-25; Editor assigned: 03-Jun-2022, Pre QC No. IRJBB-22-25(PQ); Reviewed: 17-Jun-2022, QC No. IRJBB-22-25; Revised: 22-Jun-2022, Manuscript No. IRJBB-22-25(R); Published: 29-Jun-2022, DOI: 10.14303/irjbb.2022.25
Next-generation sequencing (NGS) technology offers a great opportunity to revolutionise a wide range of medical and biological research as well as their induced application fields, such as medical diagnosis, biotechnologies, virology, etc. (Alic et al., 2016). This is because NGS technology has an ever-increasing high throughput and dramatically decreasing cost. The sequences are not flawless, and there are a variety of errors present, including substitutions, insertions, deletions, and uncalled bases. For example, substitution error rates range from 1% to 2.5%, and insertion and deletion error rates can reach 40% (Kelley et al., 2010). The sequencing flaws in the data have made data analysis very difficult. So, the first and most important task is to fix these mistakes. a lot of downstream applications Corrected sequencing reads can be useful for many downstream applications, including sequence assembly, variant calling, read mapping, etc. (Salmela et al., 2011). Numerous strategies have been put out to fix faults, just a few of them are Coral, BLESS (Heo et al., 2014), and MEC (Zhao et al., 2017). These strategies strongly rely on k-mer.
K consecutive nucleotides make up a k-mer, which is a substring of a sequencing read. Mining of solid kmers is typically the first and most important phase in a k-merbased technique. When a k-frequency mer's exceeds a certain minimal threshold, it is said to be solid, whereas the others are weak. This straightforward description successfully helps to distinguish between strong and weak k-mers, although it still has clear limits.
The main flaw is that a k-mer with a low frequency could not actually be weak. This is because system biases, such as the difficulty of sequencing sections with high GC concentration, lead the distribution of sequencing depth to be uneven. In order to do this, we concentrate on the significant yet understudied topic of refining solid k-mers utilizing.
Our model divides all k-mers into solid and liquid k-mers after counting k-mers using KMC2 depending on their frequency, provisionally set to be weak. At a later time, each k-and merits, the z-score and other factors are used to determine solidity jointly its frequency.
An essential component of many sequencing analyses, a k-mer is especially helpful for error correction, sequence assembly, variant calling, etc. K-mers are the cornerstone of several NGS applications. However, k-mers are prone to errors, which presents significant difficulties for further data processing. We provide a statistical method for clearly separating strong kmers from weak k-mers. For each k-mer, we precisely calculate a z-score, and using the z-score and frequency, we jointly decide if the k-mer is indeed solid. Studies reveal that our method successfully identifies solid kmers with low frequency.
A k-mer with a low frequency, however, may not be inaccurate due to bias and sequencing mistakes. Therefore, it is not ideal to discriminate solid k-mers just by frequency. We suggest a unique strategy of using z-score to identify incorrectly categorised weak and solid kmers in order to address this issue rather than ignoring it by existing ways. Research demonstrates that the z-score may be used to discriminate between genuine solid k-mers.
Indexed at, Google Scholar , Crossref
Indexed at, Google Scholar , Crossref
Indexed at, Google Scholar , Crossref
Indexed at, Google Scholar , Crossref
Copyright: 2022 International Research Journals This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.