DMINDA 2.0: An integrated DNA motif analysis web server

Overview
What's DMINDA2
What's new
Cite us

DMINDA² is an updated version of our previous motif analysis webserver, DMINDA (regulatory DNA motif identification and analysis), which was published in Nucleic Acids Research in April, 2014 (PMID: 24753419). Since publication, the DMINDA server has been accessed over 10,000 times, and the corresponding paper has been cited more than 10 times.

The Ma lab has recently published one de-novo motif prediction algorithm (PMID: 27507169) based on the phylogenetic footprinting technology (Function I), and one regulon identification method (PMID: 26975728)(Function II) based on our motif analysis tools and an orthologous gene mapping algorithm (PMID: 21965536)(Function III). These two functions can be easily applied to the 2,072 prokaryotic genomes supported by the DOOR2 operon database (PMID: 24214966)(Function V). These two new methods along with a network analysis and visualization function (Function IV) were implemented and connected with all the existing functions of DMINDA (Functions VI-VII), giving rise to this 2^nd-generation web server: DMINDA². The entire list of computational motif analysis functions on DMINDA²:

MP³: De-novo regulatory motif predicting based on an integrative phylogenetic footprinting framework for prokaryotic genomes;

Regulon Prediction: Regulon modeling and predicting based on a new computational framework and a novel graph model;

GOST: Orthologous gene mapping through combining sequence similarity and contextual (working partners) information, using a combinatorial optimization framework;

Cytoscape Visualization: Visualizing and analyzing the regulon network based on Cytoscape.js;

DOOR²: Containing genome-scale operons for 2072 prokaryotes with complete genomes;

BoBro: De novo motif finding for a given set of promoter sequences along with a statistical motif evaluation framework (a control set or a set of reference genomes can improve the prediction performance, which is optional) (PMID:21149261);

BoBro 2.0: Scanning motif instances of a query motif in provided genomic sequences; motif comparison and clustering of identified motifs; co-occurrence analysis of query motifs in given promoter sequences in support co-factor identification and the co-regulation mechanism (PMID:23846744).

DMINDA² is freely available to all users and there is no login requirement. The server is powered by a special Red Hat Enterprise Linux Server with 6 CPUs, and is particularly useful for DNA-motif analyses in prokaryotic genomes. We believe that DMINDA², as a new and comprehensive web server for cis-regulatory motif finding and analysis, will benefit the genomic research community in general and prokaryotic genome researchers in particular, in terms of elucidating the mechanism of transcriptional regulation at a system level.

Sep/2/2018

Release - DESSO (DEep Sequence and Shape mOtif) web server

Jun/21/2018

Release - BoBro2.1 with Performance improvement

May/8/2018

Update - Added the number of submitted jobs overview on submission page

Apr/23/2018

Update - Fixed the issue of file uploading section on Chrome, Firefox.

Jan/25/2018

Update - Improved the mail notification interface

Apr/10/2017

The paper "DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses" was officially accepted by Bioinformatics.

Mar/14/2017

The paper "DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses" was conditionally accepted by Bioinformatics.

Feb/22/2017

The paper "DMINDA 2.0: integrated and systematic views of regu-latory DNA motif identification and analyses" was submitted to Bioinformatics.

11/01/2016

The DMINDA² was ready for external test by our collaborators.

Sep/10/2016

The DMINDA² was ready for initial test by BMBL members.

Aug/09/2016

The paper of MP3 was published in BMC Genomics. (PMID: 27507169)

Jun/01/2016

DMINDA² was invited by ISMB 2016 as a technology track presentation.

Mar/15/2016

The paper of Regulon Prediction was published in Scientific Reports. (PMID: 26975728)

Apr/21/2014

The paper of DMINDA server was published in Nucleic Acids Research. (PMID: 24753419)

Please cite papers from the following list if you use the results of corresponding motif finding programs:

Jinyu Yang, Xin Chen, Adam McDermaid, and Qin Ma$, DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses, Bioinformatics, DOI: 10.1093/bioinformatics/btx223, 2017. (PMID:28419194)

pdf

Motivation: Motif identification and analyses are important and have been long-standing computational problems in bioinformatics. Substantial efforts have been made in this field during the past several decades. However, the lack of intuitive and integrative web servers impedes the progress of making effective use of emerging algorithms and tools.

Results: Here we present an integrated web server, DMINDA 2.0, which contains: (i) five motif prediction and anal-yses algorithms, including a phylogenetic footprinting framework; (ii) 2,125 species with complete genomes to sup-port the above five functions, covering animals, plants, and bacteria; and (iii) bacterial regulon prediction and visual-ization.

Bingqiang Liu, Jinyu Yang, Yang Li, Adam McDermaid, and Qin Ma$, An algorithmic perspective of de-novo cis-regulatory motif finding based on ChIP-seq data, Briefings in Bioinformatics, DOI: https://doi.org/10.1093/bib/bbx026, 2017. (PMID: 28334268)

pdf

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.

Bingqiang Liu, Hanyuan Zhang, Chuan Zhou, Guojun Li, Anne Fennell, Guanghui Wang, Yu Kang, Qi Liu and Qin Ma$, An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes, BMC Genomics, DOI: 10.1186/s12864-016-2982-x, 2016. (PMID: 27507169)

pdf

BACKGROUND: Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction.

RESULTS: Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes.

CONCLUSION: The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.

Bingqiang Liu, ChuanZhou, Guojun Li, HanyuanZhang, ErliangZeng, Qi Liu, Qin Ma$, Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses, Scientific Reports, DOI: 10.1038/srep23030, 2016. (PMID:26975728)

pdf

Regulons are the basic units of the response system in a bacterial cell, and each consists of a set of transcriptionally co-regulated operons. Regulon elucidation is the basis for studying the bacterial global transcriptional regulation network. In this study, we designed a novel co-regulation score between a pair of operons based on accurate operon identification and cis regulatory motif analyses, which can capture their co-regulation relationship much better than other scores. Taking full advantage of this discovery, we developed a new computational framework and built a novel graph model for regulon prediction. This model integrates the motif comparison and clustering and makes the regulon prediction problem substantially more solvable and accurate. To evaluate our prediction, a regulon coverage score was designed based on the documented regulons and their overlap with our prediction; and a modified Fisher Exact test was implemented to measure how well our predictions match the co-expressed modules derived from E. coli microarray gene-expression datasets collected under 466 conditions. The results indicate that our program consistently performed better than others in terms of the prediction accuracy. This suggests that our algorithms substantially improve the state-of-the-art, leading to a computational capability to reliably predict regulons for any bacteria.

Qin Ma*, Hanyuan Zhang*, Xizeng Mao, Chuan Zhou, Bingqiang Liu, Xin Chen, Ying Xu$, DMINDA: An integrated web server for DNA motif identification and analyses, Nucleic Acids Research, DOI: 10.1093/nar/gku315, 2014. (PMID:24753419)

pdf

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular.

Xizeng Mao*, Qin Ma*, Chuan Zhou*, Xin Chen*, Hanyuan Zhang, Jincai Yang, Fenglou Mao, Wei Lai, Ying Xu$, DOOR 2.0: presenting operons and their functions through dynamic and integrated views, Nucleic Acids Research, DOI: 10.1093/nar/gkt1048, 2014. (PMID: 24214966)

pdf

We have recently developed a new version of the DOOR operon database, DOOR 2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/ and will be updated on a regular basis. DOOR 2.0 contains genome-scale operons for 2072 prokaryotes with complete genomes, three times the number of genomes covered in the previous version published in 2009. DOOR 2.0 has a number of new features, compared with its previous version, including (i) more than 250,000 transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons; (ii) an integrated operon-centric data resource that provides not only operons for each covered genome but also their functional and regulatory information such as their cis-regulatory binding sites for transcription initiation and termination, gene expression levels estimated based on RNA-seq data and conservation information across multiple genomes; (iii) a high-performance web service for online operon prediction on user-provided genomic sequences; (iv) an intuitive genome browser to support visualization of user-selected data; and (v) a keyword-based Google-like search engine for finding the needed information intuitively and rapidly in this database.

Qin Ma*, Bingqiang Liu*, Chuan Zhou, Yanbin Yin, Guojun Li, Ying Xu$, BoBro2.0: An integrated toolkit for accurate prediction and analysis of cis regulatory elements at a genome scale. Bioinformatics, 10.1093/bioinformatics/btt397, 2013 (PMID:23846744)

pdf

MOTIVATION: We present an integrated toolkit, BoBro2.0, for prediction and analysis of cis-regulatory motifs. This toolkit can (i) reliably identify statistically significant cis-regulatory motifs at a genome scale; (ii) accurately scan for all motif instances of a query motif in specified genomic regions using a novel method for P-value estimation; (iii) provide highly reliable comparisons and clustering of identified motifs, which takes into consideration the weak signals from the flanking regions of the motifs; and (iv) analyze co-occurring motifs in the regulatory regions.

RESULTS: We have carried out systematic comparisons between motif predictions using BoBro2.0 and the MEME package. The comparison results on Escherichia coli K12 genome and the human genome show that BoBro2.0 can identify the statistically significant motifs at a genome scale more efficiently, identify motif instances more accurately and get more reliable motif clusters than MEME. In addition, BoBro2.0 provides correlational analyses among the identified motifs to facilitate the inference of joint regulation relationships of transcription factors.

Guojun Li*, Bingqiang Liu*, Qin Ma, Ying Xu$, A new framework for identifying cis-regulatory motifs in prokaryotes, Nucleic Acids Research. 2011 Apr;39(7):e42 (PMID:21149261)

pdf

We present a new algorithm, BOBRO, for prediction of cis-regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the prediction accuracy and extends the scope of applicability of the existing programs based on two key new ideas: (i) we developed a highly effective method for reliably assessing the possibility for each position in a given promoter to be the (approximate) start of a conserved sequence motif; and (ii) we developed a highly reliable way for recognition of actual motifs from the accidental ones based on the concept of 'motif closure'. These two key ideas are embedded in a classical framework for motif finding through finding cliques in a graph but have made this framework substantially more sensitive as well as more selective in motif finding in a very noisy background. A comparative analysis shows that the performance coefficient was improved from 29% to 41% by our program compared to the best among other six state-of-the-art prediction tools on a large-scale data sets of promoters from one genome, and also consistently improved by substantial margins on another kind of large-scale data sets of orthologous promoters across multiple genomes. The power of BOBRO in dealing with noisy data was further demonstrated through identification of the motifs of the global transcriptional regulators by running it over 2390 promoter sequences of Escherichia coli K12.

Guojun Li*, Qin Ma*, Xizeng Mao, Yanbin Yin, Xiaoran Zhu, Ying Xu$, Integration of sequence-similarity and functional association information can overcome intrinsic problems in orthology mapping across bacterial genomes, Nucleic Acids Research, DOI: 10.1093/nar/gkr766, 2011. (PMID: 21965536)

pdf

Existing methods for orthologous gene mapping suffer from two general problems: (i) they are computationally too slow and their results are difficult to interpret for automated large-scale applications when based on phylogenetic analyses; or (ii) they are too prone to making mistakes in dealing with complex situations involving horizontal gene transfers and gene fusion due to the lack of a sound basis when based on sequence similarity information. We present a novel algorithm, Global Optimization Strategy (GOST), for orthologous gene mapping through combining sequence similarity and contextual (working partners) information, using a combinatorial optimization framework. Genome-scale applications of GOST show substantial improvements over the predictions by three popular sequence similarity-based orthology mapping programs. Our analysis indicates that our algorithm overcomes the intrinsic issues faced by sequence similarity-based methods, when orthology mapping involves gene fusions and horizontal gene transfers. Our program runs as efficiently as the most efficient sequence similarity-based algorithm in the public domain. GOST is freely downloadable at http://csbl.bmb.uga.edu/~maqin/GOST.

Guojun. Li*,Bingqiang Liu*, Ying Xu$, Accurate Recognition of cis Regulatory Motifs with the Correct Lengths in Prokaryotic Genomes, Nucleic Acids Research, Vol. 38, No. 2, e12, 2010. (PMID:19906734)

pdf

We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/.

(* Equal contribution)
($ Corresponding author)

Warning!

Please input valid query items based on our seggestions

Please cite papers from the following list if you use the results of corresponding motif finding programs: