This sections for developers who are interested in our development and want to participate Developers: in order to be sure they are developing the right project that fulfills requirements provided in this document. Users: in order to get familiar with the idea of the project and suggest other features that would make it even more functional.
The template use jinja method of assigning methods, the template system is extensible and flexible, you could customize your desired species analysis dependent conf. Just modify the ChiLin.conf under chilin/conf {% set %} part.
If you don’t want to go from beginning to the end of the pipeline, say, you just want the peaks calling results, the choice is the -d option, you could end at wherever you want.
Here, we design options for debug model by using –debug Through checking whether the output files are in the outputdirectory, the program will judge by itself to continue or skip the steps.
-k options for only quality control if you have run dc pipeline, or you just want to see the preprocessed data quality, you can run qc only with -k option
For Bowtie and Fastqc, we provide options -p to open inner multi pool supporting much faster processing.
For Dnase seq, the default value for peaks number is all because of the unique data pattern. For Histone and TF, the default value is set to 3000.
1. MDscan width is the speed limited step in the denovo motif discovery.
Lists all the meta-data of current workflow. Consist of the following options:
The name for the dataset, which will be the value of %{DatasetID}s Limit: a string (1) consist of numbers, alphabets or '_' (2) shorter than 20 characters
The name of species, writen to DC summary and QCreport, log Limit: a string (1) come from GO standard term
The paths of treatment files Limit: absolute path of files in supported formats
The paths of treatment files Limit: absolute or relative path of files in supported formats
We separate Name Rules for output and temporary files from analysis codes part for easier to maintain, If you don’t feel comfortable of our name ways, it’s simple for you to adjust it.
The ChiLin package includes all the build-in data for hg19 and mm9. For other species, you may need to download these data from data source or custom it yourself.
Data Name | Used by | Data Source | Format |
---|---|---|---|
Chromesome length | samtools | `UCSC table browser`_ | 2-column |
Chromesome length | CEAS | – | – |
Genome backgroud annotation | CEAS | `CEAS site`_ | sqlite3 |
DHS region | bedtools | Custom | BED |
Velcro region | bedtools | Custom | BED |
Motif database | MDSeqPos | `MDSeqPos site`_ | xml |
FastQC result database | QCreport | Custom | bed |
Data summary database | QCreport | Custom | bed |
Some data are too large to be included by the pipeline package, so you need to download these data from data source.
Data Name | Used by | Data Source | Format |
---|---|---|---|
Bowtie pre-built index | Bowtie | `Bowtie site`_ | ebwt |
Conservation profile | Conservation Plot | `Cistrome site`_ | Bigwig |
Built-in tools are the scripts that can be run from command-line independently when you have installed the Cpipe package.
Tool Name | Modified from |
---|---|
liftover | |
Venn Diagram | |
Conservation Plot | |
Correlation plot | bigwig_correlation |
bamtofastq | |
BedClip | |
wigTobigwiggle | |
RegPotential | |
sample_contamination |
External Tools are the tools invoked by ChiLin by their path.
Tool Name | Download source | Version |
---|---|---|
FastQC | ||
R | ||
Cython | ||
MACS2 | `MACS site`_ | 2.0.10 20120605 |
CEAS | `CEAS site`_ | 0.9.9.7 |
bedtools | `bedtools site`_ | v2.16.2 |
pybedtools | ||
samtools | `SAMtools site`_ | 0.1.17 |
Bowtie | `Bowtie site`_ | 0.12.8 |
bedGraphToBigWig | `UCSC utilities`_ | v4 |
FastQC | `FastQC site`_ | v0.10.1 |
pdfTeX | `pdfTex site`_ | v1.40.10 |
IGV |