rDiff: Accurate detection of differential RNA processing

Examples

rDiff can be used in various experimental settings.

Detecing differential relative transcript abundance when gene annotation is complete
Detecing differential relative transcript abundance when gene annotation is incomplete
Working without replicates

Using rDiff.parametric

When the gene structure is known we recommend using rDiff.parametric. This statistical test tests for difference in the relative abundance of annotated transcripts. rDiff.parametric requires as input bam files for each sample, as well as a GFF3 gene structure. In the following example we test for differences in the two samples "1" and "2" which have their replicates bam1_r1.bam, bam1_r2.bam resp. bam2_r1.bam, bam2_r2.bam. In our example we assume that the bam files are located in the directory bamdir and that the reads are 75 long. Futhermore we assume that our gene structure is saved in the file genes.gff3 in the GFF3-Format. The test can then be started by first changing into the directory bin:

cd bin

and then typing:

./rdiff -o outdir -d bamdir -a bam1_r1.bam,bam1_r2.bam -b bam2_r1.bam,bam2_r2.bam -g genes.gff3 -m param -L 75 -m 30

Here we required furthermore that a read has to be at least 30 bp long in order to be included in the analysis. A detailed description of the parameters used can be found int the following table:

Option	Description
-o	The output directory for the results
-d	The directory where the bam files are
-a	The filenames of the bamfiles in the first samples. The filenames must be separated by "," and without spaces.
-b	The filenames of the bamfiles in the second samples. The filenames must be separated by "," and without spaces.
-g	The filename of the gene structure. Please provide the absolute path to the file.
-m	Method to be used for testing. The value 'param' is for rDiff.parametric, 'nonparam' for rDiff.nonparametric and 'poisson' for rDiff.poisson.
-L	The read length of the reads
-m	Minimal length of reads that should be used. Reads shorter than this will not be included in the analysis.

The output files can be found in outdir. The output-files are described in the following table:

Filename	Description
P_values_rDiff_parametric.tab	This file contains the p-values of rDiff.parametric. The file is tab-delimited and has three columns. The first column contains the gene names, the second the p-values and the third the test status.
Gene_expression.tab	This file contains the gene expression estimations for all the replicates. The file is tab-delimited. The first column contains the gene names and the other columns the read counts for each gene for all replicates.
Alternative_region_counts.mat	This file contains the counts for the alternative regions. The format is the binary mat format.
genes.mat	This file contains the gene structure. The format is the binary mat format.
variance_function_1.mat	This file contains the saved variance function for sample "1". It is a locfit-structure saved in the binary mat format.
variance_function_2.mat	This file contains the saved variance function for sample "2". It is a locfit-structure saved in the binary mat format.

Using rDiff.nonparametric

When the gene structure is incomplete we recommend using rDiff.nonparametric. This test looks for significant differences in read coverages. To run rDiff.nonparametricr requires as input the bam files for each samples as well as a GFF3 gene structure. rDiff.nonaprametric tries to estimate the biological variance on the annotated gene structure. Therefore, it is of advantage but not necessary to have an as complete gene structure as possible. Apart from the variance estimation rDiff.nonparametric uses only the gene starts and gene stops for testing.
In the following example we test for dfferences in the two samples "1" and "2" which have their replicates bam1_r1.bam, bam1_r2.bam resp. bam2_r1.bam, bam2_r2.bam. In our example we assume that the bam files are located in the directory bamdir and that the reads are 75 long. Furthermore, we assume that our gene structure is saved in the file genes.gff3 in the GFF3-Format. The test can then be started by first changing into the directory bin:

cd bin

and then typing:

./rdiff -o outdir -d bamdir -a bam1_r1.bam,bam1_r2.bam -b bam2_r1.bam,bam2_r2.bam -g genes.gff3 -m nonparam -L 75 -m 30

Here we required furthermore that a read has to be at least 30 bp long in order to be included in the analysis. A detailed description of the parameters used can be found in the following table:

Option	Description
-o	The output directory for the results
-d	The directory where the bam files are
-a	The filenames of the bamfiles in the first samples. The filenames must be separated by "," and without spaces.
-b	The filenames of the bamfiles in the second samples. The filenames must be separated by "," and without spaces.
-g	The filename of the gene structure. Please provide the absolute path to the file.
-m	Method to be used for testing. The value 'param' is for rDiff.parametric, 'nonparam' for rDiff.nonparametric and 'poisson' for rDiff.poisson.
-L	The read length of the reads
-m	Minimal length of reads that should be used. Reads shorter than this will not be included in the analysis.

The output files can be found in outdir. The outputfiles are described in the following table:

Filename	Description
P_values_rDiff_nonparametric.tab	This file contains the p-values of rDiff.nonparametric. The file is tab-delimited and has three columns. The first column contains the gene names, the second the p-values and the third the test status.
Gene_expression.tab	This file contains the gene expression estimations for all the replicates. The file is tab-delimited. The first column contains the gene names and the other columns the read counts for each gene for all replicates.
Nonparametric_region_counts.mat	This file contains the counts for the alternative regions used to estimate the variance functions. The format is the binary mat format.
genes.mat	This file contains the gene structure. The format is the binary mat format.
variance_function_1.mat	This file contains the saved variance function for sample "1". It is a locfit-structure saved in the binary mat format.
variance_function_2.mat	This file contains the saved variance function for sample "2". It is a locfit-structure saved in the binary mat format.

Working without replicates

When there is only one replicate available in each sample one can merge the replicates from both samples for the variance function estimation. This can be done using the option -x additionally to the other options.

Documentation

Examples

Using rDiff.parametric

Using rDiff.nonparametric

Working without replicates