Advanced sections for ChiLin

This sections for developers who are interested in our development and want to participate Developers: in order to be sure they are developing the right project that fulfills requirements provided in this document. Users: in order to get familiar with the idea of the project and suggest other features that would make it even more functional.

Dive into options

Configuration template generator

The template use jinja method of assigning methods, the template system is extensible and flexible, you could customize your desired species analysis dependent conf. Just modify the ChiLin.conf under chilin/conf {% set %} part.

step control

If you don’t want to go from beginning to the end of the pipeline, say, you just want the peaks calling results, the choice is the -d option, you could end at wherever you want.

Debug mode

Here, we design options for debug model by using –debug Through checking whether the output files are in the outputdirectory, the program will judge by itself to continue or skip the steps.

only quality control

-k options for only quality control if you have run dc pipeline, or you just want to see the preprocessed data quality, you can run qc only with -k option

Multithread

For Bowtie and Fastqc, we provide options -p to open inner multi pool supporting much faster processing.

Hidden features

Macs model or not in the conf files

if further analysis requirements is needed, you may want to ajust the default options we set here.

Dive into conf

for ceas

For Dnase seq, the default value for peaks number is all because of the unique data pattern. For Histone and TF, the default value is set to 3000.

  • bipromotor sizes
  • promotor sizes
  • rel_dist

for conservation

  • w width of conservation plot region

for macs2

  • –shift-size –nomodel: optional

motif

1. MDscan width is the speed limited step in the denovo motif discovery.

conf as following ::
[seqpos] ... mdscan_width = 200

Configuration instructions

[basis]

Lists all the meta-data of current workflow. Consist of the following options:

id

The name for the dataset, which will be the value of %{DatasetID}s Limit: a string (1) consist of numbers, alphabets or '_' (2) shorter than 20 characters

species
The name of species, written to the QCreport and log
Limit: a string (1) consist of ``numbers``, ``alphabets`` or ``'_'`` (2) shorter than 20 characters
factor

The name of species, writen to DC summary and QCreport, log Limit: a string (1) come from GO standard term

treat

The paths of treatment files Limit: absolute path of files in supported formats

control

The paths of treatment files Limit: absolute or relative path of files in supported formats

Dive into rule

We separate Name Rules for output and temporary files from analysis codes part for easier to maintain, If you don’t feel comfortable of our name ways, it’s simple for you to adjust it.

Dive into background

Built-in Data

The ChiLin package includes all the build-in data for hg19 and mm9. For other species, you may need to download these data from data source or custom it yourself.

Data Name Used by Data Source Format
Chromesome length samtools `UCSC table browser`_ 2-column
Chromesome length CEAS
Genome backgroud annotation CEAS `CEAS site`_ sqlite3
DHS region bedtools Custom BED
Velcro region bedtools Custom BED
Motif database MDSeqPos `MDSeqPos site`_ xml
FastQC result database QCreport Custom bed
Data summary database QCreport Custom bed

External Data

Some data are too large to be included by the pipeline package, so you need to download these data from data source.

Data Name Used by Data Source Format
Bowtie pre-built index Bowtie `Bowtie site`_ ebwt
Conservation profile Conservation Plot `Cistrome site`_ Bigwig

Built-in Tools

Built-in tools are the scripts that can be run from command-line independently when you have installed the Cpipe package.

Tool Name Modified from
liftover  
Venn Diagram  
Conservation Plot  
Correlation plot bigwig_correlation
bamtofastq  
BedClip  
wigTobigwiggle  
RegPotential  
sample_contamination  

External Tools

External Tools are the tools invoked by ChiLin by their path.

Tool Name Download source Version
FastQC    
R    
Cython    
MACS2 `MACS site`_ 2.0.10 20120605
CEAS `CEAS site`_ 0.9.9.7
bedtools `bedtools site`_ v2.16.2
pybedtools    
samtools `SAMtools site`_ 0.1.17
Bowtie `Bowtie site`_ 0.12.8
bedGraphToBigWig `UCSC utilities`_ v4
FastQC `FastQC site`_ v0.10.1
pdfTeX `pdfTex site`_ v1.40.10
IGV