logo

#the protocol of BoBro2.0: BBR, BBS, BBC and BBA.

Functions BoBro2.0 Unique feature
Motif Refining BBR Strong ability in filtering out noises at a genome scale
Motif Scanning BBS p-value assessment for all the scanned candidate motifs
Motif Comparison BBC Utilization of weak conserved signals of motifs¡¯ flanking regions when comparing motifs; A motif clustering algorithm
Motif Annotation BBA Motifs¡¯ co-occurrence annotation

Installation

Simply put "BoBro2.0.tar.gz" in any directory,

  • $ tar zxvf BoBro2.0.tar.gz
enter the folder "BoBro2.0/" and type
  • $ ./INSTALL

BBR usage

The software called 'BBR' has the following functions:

  1. De-nove Motif finding in a fasta format promoter file.
  2. De-nove Motif finding with background sequences (if have) in fasta format.
  3. Motif finding with a comparative genomic framework.

BBR: CMD line

  • Simply run the following cmd to get a brief guide.
$ perl BBR.pl
  • To do De-nove Motif finding in a fasta format promoter file:
$ perl BBR.pl 1 promoters
  • To do De-nove Motif finding with background sequences (if have) in fasta format:
$ perl BBR.pl 2 promoters background_file
  • To do Motif finding with a comparative genomic framework:
$ perl script/bbr_cmp_method.pl

BBR: Inputs and outputs

When used to do De-nove motif finding

  • The promoter file and background_file should be in standard fasta format,(see promoter and background file in this folder for example).
  • The output file will be named promoters.closures, (see promoters.closures for example). Basically, it contains
  1. input data summary
  2. command line summary
  3. for each motif candidate found, there will be detailed information:
    • motif seed: the seed sequence used to find this motif(which is a 'core' of the motif);
    • motif position weight matrix and consensus;
    • a table show all the aligned motif.

When used to do Motif finding with a comparative genomic framework

For the target genome and reference genomes, three kinds of data is needed:

  1. the genome data (which could be downloaded from ncbi genebank);
  2. the operon data (which could be downloaded or predicted from DOOR database);
  3. the orthology relationship between the target and references (which could be predicted use RBH method or GOST);
NOTE: Please see the the contents of folder example for details.

Take E. coli as the target genome, two other species as reference.

  • target_list: a list of gi from Ecoli;
  • Ecoli.opr: operon structure of Ecoli;
  • Escherichia_coli_K_12_substr
  • MG1655_uid57779: Ecoli data from NCBI;
  • ncbi_data: the directory contains the species reference information downloaded from NCBI;

All the output files are stored in the folder example_output, which contains:

  1. result.txt: same as the a De-nove prediction;
  2. motif.alignment: all alignments of predicted motif;
  3. motif.alignment.similarity: similarity score between each motif, (same as the output of BBC);
  4. Logos foreach motif are also given.

Data format

  • peron: operon structure from DOOR;

1: 16077069
2: 16077070
3: 16077071 16077072 255767014
4: 16077074

  • ortholog: orthology information between Ecoli and reference: (stanard output of GOST)

145698239, 187933779 5e-90, 6e-90
145698257, 187933775 2e-05, 3e-05
145698262, 187935634 1e-27, 6e-27
145698268, 187932476 2e-25, 9e-23
145698269, 187933610 5e-05, 7e-05

BBS Usage

This software provides a BoBro Based Searching/Scanning (BBS) tool capable of searching motifs in a set of sequences using known motif patterns.

BBS: Inputs and outputs

The major program in the provided package is *BBS*, it can search motifs in a fasta file using known alignment, matrix format or consensus of motifs, and example files are provided (example_alignment, example_matrix and example consensus).

For basic usage of motif scanning

Use packed PERL script BBS.pl by

  • basic usage:
$ perl BBS.pl
  • Search motif in alignment format:
$ perl BBS.pl motif_alignment promoters 1
  • Search motif in matrix format:
$ perl BBS.pl motif_matrix promoters 2
  • Search motif in consensus format
$ perl BBS.pl motif_consensus promoters 3
  • Search motif considering background genome:
$ perl BBS.pl motif_consensus promoters 1\/2\/3 background

For advanced usage of motif scanning

Use the program BBS

  • basic usage:
$ ./BBS -h (./BBS)

  • To search motif base on alignment
$ ./BBS -i example -j example_alignment $ ./BBS -i example -j example_alignment -D (output seed only)

note: the minimal length of sequences in example should not be less than the minimal motif length in example_alignment

  • To search motif base on frequency matrix
$ ./BBS -i example -m example_matrix BBS generates a output file, namely, '.motifinfo' file. In '.motifinfo' file, it generates a closures corresponding to each alignment (matrix) in example_alignment (example_matrix), in the increasing order of closure's pvalue.
  • To calculate the zscore compare to the background
$ ./BBS -i example -j example_matrix -z example -u .95
  • To transfer consensus format to matrix
$ ./BBS -p example_consensus -i example > example_consensus_matrix
  • To compare similarity between any pair of input motifs
$ ./BBS -i example -j example_alignment -C $ ./BBS -i example -j motif.txt -C (uninformative cloumn example)
  • To change alignment to matrix and consensus
$ ./BBS -i example -j example_alignment -a
  • Furthermore, we can control the output result mainly by controlling four parameters

-e [1,3] the larger the e value the more searched TFBSs
-t (0.3,0.9) the smaller the t value the more searched TFBSs
-n (0,0.3] when e>1, the larger of n the stricter of searching strength
-E if you want to get more searched TFBSs, adding -E

BBS: Input Formats

Matrix

A 5 6 4 5 4 1 0 4 4 3 5 1 2 4 3 3 0 0 3 4 3 3 3 3 3
C 5 0 2 2 1 2 1 5 1 0 0 1 5 0 0 0 0 7 0 0 0 4 4 1 0
G 0 1 0 4 6 5 2 1 5 8 6 8 0 5 8 8 2 2 7 7 8 4 4 4 8
T 1 4 5 0 0 3 8 1 1 0 0 1 4 2 0 0 9 2 1 0 0 0 0 3 0

Alignment

ATCAACTGAAACAAAACGAAAGATT
GAAAACCATTATCTTTCGTTTTATT
GACTTTCATTATGTTTCTTTTGTGA
ACCAAGTGAAATGAAACGAAAGGCA
AACTTTCAGTTTCTTTTCTATAGAT
AAATTTCGTTTTATTTCTTTTTTCT
GCAATCCCTTTTGCTTCCTTTATCT
GCCTTTCTTTTTCTTTCGTTTTGAT
CAGGGTCAATTAGCTTCGTTTTGAT
GCAAAACGAAATGAAACGAAAGTTT
AAGGTGGGCTTGCATTTGCTTAATA

Consensus

AGGRKTTBCCGA

BBC Usage

The software called 'BBC' has the following functions:

  1. Compare diffrent motif profiles.
  2. Cluster motif profiles.
  3. Annotate motifs in aligned sequences.

BBC: Inputs and outputs

The input file can be only sequences in standard fasta forma

BBC will run BoBro firstly to get the motif prediction, then cluster and annotate them back to aligned sequences.

  • $ perl BBC.pl SequenceFile
The three result files will be in folder 'SequenceFile.BBC':
  1. Result about motif prediction: SequenceFile.BBC/SequenceFile.closures;
  2. Result about motif comparison is in SequenceFile.BBC/SequenceFile.similarity;
  3. Result about motif cluster and annotation is in SequenceFile.BBC/SequenceFile.BBC.

Users can input SequenceFile and known motif files in 4 format

  • BoBro standard output: 0;
  • alignment: 1;
  • matrix: 2;
  • consensus: 3.

the examples of the last three format are provided in current package (motif_alignment, motif_consensus, motif_matrix).

  • perl BBC.pl SequenceFile motif_alignment [0/1/2/3]

The three result files will be in folder 'SequenceFile.BBC'

  1. Result about motif prediction: SequenceFile.BBC/SequenceFile.motif_alignment.closures;
  2. Result about motif comparison is in SequenceFile.BBC/SequenceFile.motif_alignment.similarity;
  3. Result about motif cluster and annotation is in SequenceFile.BBC/SequenceFile.motif_alignment.BBC;

BBC: Format of outputs

Result file about motif comparison

Here is an example

similarityMotif-1 Motif-2 Motif-3 Motif-4
Motif-1 0.00 (0-0)0.18 (1-1)0.15 (4-1)0.24 (2-1)
Motif-2 0.18 (1-1)0.00 (0-0)0.20 (2-1)0.11 (3-1)
Motif-3 0.15 (1-4)0.20 (1-2)0.00 (0-0)0.27 (2-1)
Motif-4 0.24 (1-2)0.11 (1-3)0.27 (1-2)0.00 (0-0)

This is comparison result for 4 motifs. The decimals in 4*4 matrix (leading diagonal excluded) mean similarity scores of corresponding motif pairs.

Result file about motif cluster and annotation

Here is an example of head information of the output file,

BOBRO-Based motif Comparison and Annotation (BBC) result:
Input sequences: data.fasta;
Predicted motifs: data.fasta.closures;

Motifs with hierarchical clustering:

Rank Name Length M(Motif rank)-(1st level cluster)-(2nd level cluster)
1 Motif-1 14 M1_1_1
2 Motif-2 14 M2_2_2
3 Motif-3 14 M3_3_3
4 Motif-4 14 M4_2_4

Above is the information of cluster result for 4 motifs, followed by annotation on aligned sequences. The label like 'M1_1_1' is cluster information of motifs, the first number is original label of motifs (ranked by the decreasing order of z-score), the motifs with same second number means that they are in same cluster with fair similarity and same third number means they share high similarity.

BBA Usage

The script called 'BBA.pl' is designed for motif 'co-occurrence Analysis'. Note: The computer you use should have R installed.

BBA: CMD line

Simply run the following cmd to get a brief guide.

  • $ perl BBA.pl
BBA can do co-occurrence Analysis for file 'Input' by CMD:
  • $ perl BBA.pl Input SequenceNum

BBA: Inputs

The input file include TF name and the sequence lables which contain binding sites of this TF(see 'Motif_position_for_BBA' for example). The format of input file:

>AcrR
2251
1650

Here is a motif information for the TF AcrR: its name should have a prefix '>'; the number 2251 means there is a motif occurrence for AcrR in the 2251st promoter sequence.

BBA: Outputs

There are two output files: 'Input.BBA' and 'Input.BBA.all';

The significantly co-occurred TF pairs are collected in "Input.BBA"; Co-related scores for all TF pairs are stored in "Input.BBA.all";

The data in result file have 7 columns with means:

  1. TF1
  2. TF2
  3. Hyper-geometric p-value
  4. Total sequences number
  5. The number of sequences contain binding sites of TF1
  6. The number of sequences contain binding sites of TF2
  7. The number of sequences contain binding sites for both TF1 and TF2

Contact

Any questions, problems, bugs are welcome and should be dumped to

  • Qin Ma <maqin2001@uga.edu>
  • Bingqiang Liu <bingqiangsdu@gmail.com>
  • Chuan Zhou <zhouchuan121@gmail.com>

Creation: June. 27, 2012