Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 85,056 bioRxiv papers from 365,919 authors.

Most downloaded bioRxiv papers, all time

in category bioinformatics

7,973 results found. For more information, click each entry to expand.

7321: SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data
more details view paper

Posted to bioRxiv 03 Feb 2020

SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data
154 downloads bioinformatics

Collin Giguere, Harsh Vardhan Dubey, Vishal Kumar Sarsani, Hachem Saddiki, Shai He, Patrick Flaherty

Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision. We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools. The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.

7322: Predicting candidate genes from phenotypes, functions, and anatomical site of expression
more details view paper

Posted to bioRxiv 31 Mar 2020

Predicting candidate genes from phenotypes, functions, and anatomical site of expression
154 downloads bioinformatics

Jun Chen, Azza Althagafi, Robert Hoehndorf

Motivation: Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease gene prioritization task. These methods generally compute the similarity between a patient's phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine learning models. Results: We developed a novel graph-based machine learning method for biomedical ontologies which is able to exploit axioms in ontologies and other graph-structured data. Using our machine learning method, we embed genes based on their associated phenotypes, functions of the gene products, and anatomical location of gene expression. We then develop a machine learning model to predict gene--disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state of the art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes which are associated with phenotypes, functions, or site of expression. Availability: Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Contact: robert.hoehndorf@kaust.edu.sa

7323: Distribution of disease-causing germline mutations in coiled-coils suggests essential role of their N-terminal region
more details view paper

Posted to bioRxiv 07 Apr 2020

Distribution of disease-causing germline mutations in coiled-coils suggests essential role of their N-terminal region
153 downloads bioinformatics

Zsofia E. Kalman, Bálint Mészáros, Zoltán Gáspári, Laszlo Dobson

Next-generation sequencing resulted in the identification of a huge number of naturally occurring variations in human proteins. The correct interpretation of the functional effects of these variations necessitates the understanding of how they modulate protein structure. Coiled-coils are α-helical structures responsible for a diverse range of functions, but most importantly, they facilitate the structural organization of macromolecular scaffolds via oligomerization. In this study, we analyzed a comprehensive set of disease-associated germline mutations in coiled-coil structures. Our results highlight the essential role of residues near the N-terminal part of coiled-coil regions, possibly critical for superhelix assembly and folding in some cases. We also show that coiled-coils of different oligomerization states exhibit characteristically distinct patterns of disease-causing mutations. Our study provides structural and functional explanations on how disease emerges through the mutation of these structural motifs. ### Competing Interest Statement The authors have declared no competing interest.

7324: DeepHE: Accurately Predicting Human Essential Genes based on Deep Learning
more details view paper

Posted to bioRxiv 15 Feb 2020

DeepHE: Accurately Predicting Human Essential Genes based on Deep Learning
153 downloads bioinformatics

Xue Zhang, Wangxin Xiao, Weijia Xiao

Motivation: Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. Results: We proposed a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method was utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features were integrated to train a multilayer neural network. A cost-sensitive technique was used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes showed that our proposed method, DeepHE, can accurately predict human gene essentiality with an average AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compared DeepHE with several widely used traditional machine learning models (SVM, Naive Bayes, Random Forest, Adaboost). The experimental results showed that DeepHE greatly outperformed the compared machine learning models. Conclusions: We demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.

7325: Loss-functions matter, on optimizing score functions for the estimation of protein models accuracy
more details view paper

Posted to bioRxiv 03 Jun 2019

Loss-functions matter, on optimizing score functions for the estimation of protein models accuracy
153 downloads bioinformatics

Tomer Sidi, Chen Keasar

Motivation: Methods for protein structure prediction (PSP) generate multiple alternative structural models (aka decoys). Thus, supervised learning methods for the evaluation and ranking of these models are crucial elements of PSP. Supervised learning involves optimization of loss functions, but their influence on performance is typically overlooked. Here we put the loss functions in the spotlight, and study their effect on prediction performance. Results: Here we report the performances of three variants of MESHI-score, a supervised learning method for the estimation of model accuracy (EMA). Each variant was trained with a different loss function and showed better performance in different aspects of the EMA problem. Most importantly, better discrimination between models of the same target, is gained by target centered loss functions. Contact: chen.keasar@gmail.com

7326: Two biological constants for accurate classification and evolution pattern analysis of Subgen.strobus and subgen. Pinus
more details view paper

Posted to bioRxiv 11 Apr 2018

Two biological constants for accurate classification and evolution pattern analysis of Subgen.strobus and subgen. Pinus
153 downloads bioinformatics

Huabin zou

Currently, biological classification and determination of different categories are all based on empirical knowledge,which is obtained relying on morphological and molecular characters. For these methods they lacks of absolutely quantitative criteria ground on intrinsically scientific principles. In fact, accurate science classification must depend on the correct description of biology evolution rules. In this article a new theoretical approach was proposed, in which two characteristic constants were gained from biological common heredity and variation information theory equation, when it is at the maximum information states, corresponding to symmetric and asymmetric variation states. They are common composition ratios, =0.61, and =0.70. By analyzing the common composition ratios of compounds among oleoresins, two pine subgenus:Subgen.Strobus (Sweet) Held and Subgen. Pinus could be integrated into one class, Genus pinus, excellently, when= 0.61. These two pine subgenus could be classified into two groups clearly,when= 0.70. The results is somewhat different from that achieved by means of classical classification relying on morphological characters. On the other hand, the evolution relationship of two subgenus was analyzed based on characteristic sequences of samples, it indicated that white pine origin from pinus tabuliformis. The two constants should be used as the classification constants of some biological categories of plants.

7327: FastSK: Fast Sequence Analysis with Gapped String Kernels
more details view paper

Posted to bioRxiv 23 Apr 2020

FastSK: Fast Sequence Analysis with Gapped String Kernels
153 downloads bioinformatics

Derrick Blakely, Eamon Collins, Ritambhara Singh, Yanjun Qi

Gapped k-mer kernels with Support Vector Machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly-sized training sets. However, existing gkm-SVM algorithms suffer from the slow kernel computation time, as they depend exponentially on the sub-sequence feature-length, number of mismatch positions, and the task's alphabet size. In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On 10 DNA transcription factor binding site (TFBS) prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in AUC, while achieving average speedups in kernel computation of 100 times and speedups of 800 times for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks across all 10 TFBS tasks. We then extend FastSK to 7 English medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Our algorithm is available as a Python package and as C++ source code. (Available for download at https://github.com/Qdata/FastSK/. Install with the command make or pip install) ### Competing Interest Statement The authors have declared no competing interest.

7328: An adaptable analysis workflow for characterization of platelet spreading and morphology
more details view paper

Posted to bioRxiv 31 Jan 2020

An adaptable analysis workflow for characterization of platelet spreading and morphology
153 downloads bioinformatics

Jeremy A. Pike, Victoria A Simms, Christopher W Smith, Neil V. Morgan, Abdullah O. Khan, Natalie S. Poulter, Iain Styles, Steven G. Thomas

The assessment of platelet spreading through light microscopy, and the subsequent quantification of parameters such as surface area and circularity, is a key assay for many platelet biologists. Here we present an analysis workflow which robustly segments individual platelets to facilitate the analysis of large numbers of cells while minimising user bias. Image segmentation is performed by interactive learning and touching platelets are separated with an efficient semi-automated protocol. We also use machine learning methods to robustly automate the classification of platelets into different subtypes. These adaptable and reproducible workflows are made freely available and are implemented using the open source software KNIME and ilastik.

7329: Matrix factorization recovers consistent regulatory signals from disparate datasets
more details view paper

Posted to bioRxiv 27 Apr 2020

Matrix factorization recovers consistent regulatory signals from disparate datasets
153 downloads bioinformatics

Anand Sastry, Alyssa Hu, David Heckmann, Saugat Poudel, Erol S. Kavvas, Bernhard O. Palsson

The availability of gene expression data has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We also show that echoes of this structure remain in the proteome, accelerating biological discovery through multi-omics analysis. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets. ### Competing Interest Statement The authors have declared no competing interest.

7330: Functional MRI Investigation on Paradigmatic and Syntagmatic Lexical Semantic Processing
more details view paper

Posted to bioRxiv 13 Jan 2020

Functional MRI Investigation on Paradigmatic and Syntagmatic Lexical Semantic Processing
153 downloads bioinformatics

Songsheng Ying, Sabine Ploux

The word embeddings related to paradigmatic and syntagmatic axes are applied in an fMRI encoding experiment to explore human brain's activity pattern during story listening. This study proposes the construction of paradigmatic and syntagmatic semantic embeddings respectively by transforming WordNet-alike knowledge bases and subtracting paradigmatic information from a statistical word embedding. It evaluates the semantic embeddings by leveraging word-pair proximity ranking tasks and contrasts voxel encoding models trained with the two types of semantic features to reveal the brain's spatial pattern for semantic processing. Results indicate that in listening comprehension, paradigmatic and syntagmatic semantic operations both recruit inferior (ITG) and middle temporal gyri (MTG), angular gyrus, superior parietal lobule (SPL), inferior frontal gyrus. A non-continuous voxel line is found in MTG with a predominance of paradigmatic processing. The ITG, middle occipital gyrus and the surrounding primary and associative visual areas are more engaged by syntagmatic processing. The comparison of two semantic axes' brain map does not suggest a neuroanatomical segregation for paradigmatic and syntagmatic processing. The complex yet regular contrast pattern starting from temporal pole, along MTG to SPL necessitates further investigation.

7331: Charge-perturbation dynamics - a new avenue towards in silico protein folding
more details view paper

Posted to bioRxiv 03 Apr 2019

Charge-perturbation dynamics - a new avenue towards in silico protein folding
153 downloads bioinformatics

Purbaj Pant, Ravi José Tristão Ramos, Crina-Maria Ionescu, Jaroslav Koča

Molecular dynamics (MD) has greatly contributed to understanding and predicting the way proteins fold. However, the time-scale and complexity of folding are not accessible via classical MD. Furthermore, efficient folding pipelines involving enhanced MD techniques are not routinely accessible. We aimed to determine whether perturbing the electrostatic component of the MD force field can help expedite folding simulations. We developed charge-perturbation dynamics (CPD), an MD-based simulation approach that involves periodically perturbing the atomic charges to values non-native to the MD force field. CPD obtains suitable sampling via multiple iterations in which a classical MD segment (with native charges) is followed by a very short segment of perturbed MD (using the same force field and conditions, but with non-native charges); subsequently, partially folded intermediates are refined via a longer segment of classical MD. Among the partially folded structures from low-energy regions of the free-energy landscape sampled, the lowest-energy conformer with high root-mean-square deviation to the starting structure and low radius of gyration is defined as the folded structure. Upon benchmark testing, we found that medium-length peptides such as an alanine-based pentadecapeptide, an amyloid-β peptide, and the tryptophan-cage mini-protein can fold starting from their extended linear structure in under 45 ns of CPD (total simulation time), versus over 100 ns of classical MD. CPD not only achieved folding close to the desired conformation but also sampled key intermediates along the folding pathway without prior knowledge of the folding mechanism or final folded structure. Our findings confirmed that perturbing the electrostatic component of the classical MD force field can help expedite folding simulations without changing the MD algorithm or using expensive computing architectures. CPD can be employed to probe the folding dynamics of known, putative, or planned peptides, as well as to improve sampling in more advanced simulations or to guide further experiments.

7332: Computational Analysis of Dynamical Fluctuations of Oncoprotein E7 (HPV) for the Hot Spot Residue Identification Using Elastic Network Model
more details view paper

Posted to bioRxiv 28 Aug 2018

Computational Analysis of Dynamical Fluctuations of Oncoprotein E7 (HPV) for the Hot Spot Residue Identification Using Elastic Network Model
153 downloads bioinformatics

R. M. Malik, F. Nazir, S. Fazal, A. Bhatti, M. Ullah, S. I. Malik, A. Kanwal, S. E. Aziz, S. Azam

Virus proteins after invading human body alter host protein-protein interaction networks, resulting in the creation of new interactions, along with destroying or modifying other interactions or proteins. Topological features of new or modified networks compromise the host system causing increased production of viral particles. The molecular basis for this alteration of proteins interactivity is short linear peptide motifs similar in both virus and humans. These motifs are identified by modular domains, which are the subunits of a protein, in the human body, resulting in stabilization or moderation of these protein interactions. Protein molecules can be modeled by elastic network models showing the fluctuations of residues when they are biologically active. We focused our computational study on the binding and competing interactions of the E7 protein of HPV with Rb protein. Our study was based on analysis of dynamic fluctuations of E7 in host cell and correlation analysis of specific residue found in motif of LxCxE, that is the key region in stabilizing interaction between E7 and Rb. Hot spot residue of E7 were also identified which could provide platform for drug prediction in future. Nevertheless, our study validates the role of linear binding motifs LxCxE of E7 of HPV in interacting with Rb as an important event in propagation of HPV in human cells and transformation of infection into cervical cancer.

7333: Iterative point set registration for aligning scRNA-seq data
more details view paper

Posted to bioRxiv 13 May 2020

Iterative point set registration for aligning scRNA-seq data
153 downloads bioinformatics

Amir Alavi, Ziv Bar-Joseph

Several studies profile similar single cell RNA-Seq (scRNA-Seq) data using different technologies and platforms. A number of alignment methods have been developed to enable the integration and comparison of scRNA-Seq data from such studies. While each performs well on some of the datasets, to date no method was able to both perform the alignment using the original expression space and generalize to new data. To enable such analysis we developed Single Cell Iterative Point set Registration (SCIPR) which extends methods that were successfully applied to align image data to scRNA-Seq. We discuss the required changes needed, the resulting optimization function, and algorithms for learning a transformation function for aligning data. We tested SCIPR on several scRNA-Seq datasets. As we show it successfully aligns data from several different cell types, improving upon prior methods proposed for this task. In addition, we show the parameters learned by SCIPR can be used to align data not used in the training and to identify key cell type-specific genes.

7334: The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization
more details view paper

Posted to bioRxiv 03 Jan 2020

The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization
153 downloads bioinformatics

Svetlana Poznanović, Fidel Barrera-Cruz, Anna Kirkpatrick, Matthew Ielusic, Christine Heitsch

Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical "arms" radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.

7335: Histopathological Landscape of Molecular Genetics and Clinical Determinants in MDS Patients
more details view paper

Posted to bioRxiv 03 May 2020

Histopathological Landscape of Molecular Genetics and Clinical Determinants in MDS Patients
152 downloads bioinformatics

Oscar Brück, Susanna Lallukka-Brück, Helena Hohtari, Aleksandr Ianevski, Freja Ebeling, Panu Kovanen, Soili Kytölä, Tero Aittokallio, Pedro Marques Ramos, Kimmo Porkka, Satu Mustjoki

In myelodysplastic syndrome (MDS), bone marrow (BM) histopathology is visually assessed to identify dysplastic cellular morphology, cellularity, and blast excess. Yet, many morphological findings elude the human eye. Here, we extracted visual features of 236 MDS, 87 MDS/MPN, and 10 control BM biopsies with convolutional neural networks. Unsupervised analysis distinguished underlying correlations between tissue composition, leukocyte metrics, and clinical characteristics. We applied morphological features in elastic net-regularized regression models to predict genetic and cytogenetic aberrations, prognosis, and clinical variables. By parallelizing tile, pixel, and leukocyte-level image analysis, we deconvoluted each model to texture and cellular composition to dissect their pathobiological context. Model-based mutation predictions correlated with variant allele frequency and number of affected genes per pathway, demonstrating the models' ability to identify relevant visual patterns. In summary, this study highlights the potential of deep histopathology in hematology by unveiling the fundamental association of BM morphology with genetic and clinical determinants. ### Competing Interest Statement The authors have declared no competing interest.

7336: PSS: An enabling QTY server for designing water-soluble α-helical transmembrane proteins
more details view paper

Posted to bioRxiv 05 Aug 2019

PSS: An enabling QTY server for designing water-soluble α-helical transmembrane proteins
152 downloads bioinformatics

Fei Tao, Hongzhi Tang, Shuguang Zhang, Ping Xu

Membrane proteins, especially the α-helical ones such as G-protein coupled receptors (GPCRs), are considered extremely important owing to their significant biological roles. However, their expression and purification pose difficulties because of their poor solubility in water, which seriously impedes research progress in this field. Recently, QTY method, a revolutionary code-based protein engineering approach, was developed for the purpose of producing soluble transmembrane proteins. Here we describe a web server built for QTY design and certain analyses related to it (pss.sjtu.edu.cn). Typically, the Simple Design model is expected to take only 2-4 min, and the Library Design 2-5 h, of computer time, depending on target protein size and the number of transmembrane helices. Further, we describe a protocol for using the server with both Simple and Library Design modules. Protocols for experiments based on QTY design are also included. In summary, utilization of the web server, and associated protocols, will enable QTY-based protein-engineering to be implemented in a convenient, fast, accurate, and rational manner.

7337: Comparative structural dynamic analysis of GTPases
more details view paper

Posted to bioRxiv 16 Jul 2018

Comparative structural dynamic analysis of GTPases
152 downloads bioinformatics

Hongyang Li, Xin-Qiu Yao, Barry J. Grant

GTPases regulate a multitude of essential cellular processes ranging from movement and division to differentiation and neuronal activity. These ubiquitous enzymes operate by hydrolyzing GTP to GDP with associated conformational changes that modulate affinity for family-specific binding partners. There are three major GTPase superfamilies: Ras-like GTPases, heterotrimeric G proteins and protein-synthesizing GTPases. Although they contain similar nucleotide-binding sites, the detailed mechanisms by which these structurally and functionally diverse superfamilies operate remain unclear. Here we compare and contrast the structural dynamic mechanisms of each superfamily using extensive molecular dynamics (MD) simulations and subsequent network analysis approaches. In particular, dissection of the cross-correlations of atomic displacements in both the GTP and GDP-bound states of Ras, transducin and elongation factor EF-Tu reveals analogous dynamic features. This includes similar dynamic communities and subdomain structures (termed lobes). For all three proteins the GTP-bound state has stronger couplings between equivalent lobes. Network analysis further identifies common and family-specific residues mediating the state-specific coupling of distal functional sites. Mutational simulations demonstrate how disrupting these couplings leads to distal dynamic effects at the nucleotide-binding site of each family. Collectively our studies extend current understanding of GTPase allosteric mechanisms and highlight previously unappreciated similarities across functionally diverse families.

7338: IRIS: an accurate and efficient barcode calling tool for in situ sequencing
more details view paper

Posted to bioRxiv 13 Apr 2020

IRIS: an accurate and efficient barcode calling tool for in situ sequencing
152 downloads bioinformatics

Yang Zhou, Hao Yu, Qiye Li, Rongqin Ke, Guojie Zhang

The emerging in situ RNA sequencing technologies which can capture and amplify RNA within the original tissues provides efficient solution for producing spatial expression map from dozens to thousands of genes. Most of in situ RNA-seq strategies developed recently infer the expression patterns based on the fluorescence signals from the images taken during sequencing. However, an automate and convenient tool for decoding signals from image information is still absent. Here we present an easy-to-use software named IRIS to efficiently decode image signals from in situ sequencing into nucleotide sequences. This software can record the quality score and the spatial information of the sequencing signals. We also develop an interactive R shiny app named DAIBC for data visualization. IRIS is designed in modules so that it could be easily extended and compatible to new technologies. ### Competing Interest Statement The authors have declared no competing interest.

7339: Genetic distance between complex repeats
more details view paper

Posted to bioRxiv 26 Oct 2018

Genetic distance between complex repeats
152 downloads bioinformatics

Luca Ferretti, Aurora Ruiz-Herrera, Alice Ledda

Complex nucleotide or amino acid repeats with long units play an important role in proteins. The evolutionary analysis of these variants is challenging due to genetic diversity within repeat units as well as variability in the arrangement of different units along the repeat sequence. Here we present a new approach for the computation of genetic distances between complex repeats. This method takes into account evolutionary processes including point mutations, insertions and deletions of repeat units, as well as duplication of single units. We provide an algorithm for the computation of these distances along with the corresponding global pairwise alignment of repeats. As an example, we apply our approach to the evolution of repeat units in the highly polymorphic zinc-finger repeat domain of the PRDM9 protein across wild populations of house mice. This approach opens the way for new insights into the evolutionary history of polymorphic repeats.

7340: In Silico identification of potential drug targets by subtractive genome analysis of Enterococcus faecium DO
more details view paper

Posted to bioRxiv 15 Feb 2020

In Silico identification of potential drug targets by subtractive genome analysis of Enterococcus faecium DO
152 downloads bioinformatics

Marwah Karim, MD Nazrul Islam, G. M. Nurnabi Azad Jewel

Once believed to be a commensal bacteria, Enterococcus faecium has recently emerged as an important nosocomial pathogen worldwide. A recent outbreak of E. faecium unrevealed natural and in vitro resistance against a myriad of antibiotics namely ampicillin, gentamicin and vancomycin due to over-exposure of the pathogen to these antibiotics. This fact combined with the ongoing threat demands the identification of new therapeutic targets to combat E. faecium infections . In this present study, comparative proteome analysis, subtractive genomic approach, metabolic pathway analysis and additional drug prioritizing parameters were used to propose a potential novel drug targets for E. faecium strain DO. Comparative genomic analysis of Kyoto Encyclopedia of Genes and Genomes annotated metabolic pathways identified a total of 207 putative target proteins in E. faecium DO that showed no similarity to human proteins. Among them 105 proteins were identified as essential novel proteins that could serve as potential drug targets through further bioinformatic approaches; such as-prediction of subcellular localization, calculation of molecular weight, and web-based investigation of 3D structural characterization. Eventually 19 non-homologous essential proteins of E. faecium DO were prioritized and proved to have the eligibility to become novel broad-spectrum antibiotic targets. Among these targets aldehyde-alcohol dehydrogenase was found to be involved in maximum pathways, and therefore, was chosen as novel drug target. Interestingly, aldehyde-alcohol dehydrogenase enzyme contains two domains namely acetaldehyde dehydrogenase and alcohol dehydrogenase, on which a 3D structure homology modeling and in silico molecular docking were performed. Finally, eight molecules were confirmed as the most suitable ligands for aldehyde-alcohol dehydrogenase and hence proposed as the potential inhibitors of this target. In conclusion, being human non-homologous, aldehyde-alcohol dehydrogenase protein can be targeted for potential therapeutic drug development in future. However, laboratory based experimental research should be performed to validate our findings in vivo .

Previous page 1 . . . 365 366 367 368 369 370 371 . . . 399 Next page

PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News