Examples
rDiff can be used in various experimental settings.- Detecing differential relative transcript abundance when gene annotation is complete
- Detecing differential relative transcript abundance when gene annotation is incomplete
- Working without replicates
Using rDiff.parametric
When the gene structure is known we recommend using rDiff.parametric. This statistical test tests for difference in the relative abundance of annotated transcripts. rDiff.parametric requires as input bam files for each sample, as well as a GFF3 gene structure. In the following example we test for differences in the two samples "1" and "2" which have their replicatesbam1_r1.bam
, bam1_r2.bam
resp. bam2_r1.bam
, bam2_r2.bam
. In our example we assume that the
bam files are located in the directory bamdir
and that the reads are 75 long. Futhermore we assume that our gene structure is saved in the file genes.gff3
in the GFF3-Format.
The test can then be started by first changing into the directory bin
:cd binand then typing:
./rdiff -o outdir -d bamdir -a bam1_r1.bam,bam1_r2.bam -b bam2_r1.bam,bam2_r2.bam -g genes.gff3 -m param -L 75 -m 30Here we required furthermore that a read has to be at least 30 bp long in order to be included in the analysis. A detailed description of the parameters used can be found int the following table:
Option | Description |
---|---|
-o | The output directory for the results |
-d | The directory where the bam files are |
-a | The filenames of the bamfiles in the first samples. The filenames must be separated by "," and without spaces. |
-b | The filenames of the bamfiles in the second samples. The filenames must be separated by "," and without spaces. |
-g | The filename of the gene structure. Please provide the absolute path to the file. |
-m | Method to be used for testing. The value 'param' is for rDiff.parametric, 'nonparam' for rDiff.nonparametric and 'poisson' for rDiff.poisson. |
-L | The read length of the reads |
-m | Minimal length of reads that should be used. Reads shorter than this will not be included in the analysis. |
The output files can be found in
outdir
. The output-files are described in the following table:Filename | Description |
---|---|
P_values_rDiff_parametric.tab | This file contains the p-values of rDiff.parametric. The file is tab-delimited and has three columns. The first column contains the gene names, the second the p-values and the third the test status. |
Gene_expression.tab | This file contains the gene expression estimations for all the replicates. The file is tab-delimited. The first column contains the gene names and the other columns the read counts for each gene for all replicates. |
Alternative_region_counts.mat | This file contains the counts for the alternative regions. The format is the binary mat format. |
genes.mat | This file contains the gene structure. The format is the binary mat format. |
variance_function_1.mat | This file contains the saved variance function for sample "1". It is a locfit-structure saved in the binary mat format. |
variance_function_2.mat | This file contains the saved variance function for sample "2". It is a locfit-structure saved in the binary mat format. |
Using rDiff.nonparametric
When the gene structure is incomplete we recommend using rDiff.nonparametric. This test looks for significant differences in read coverages. To run rDiff.nonparametricr requires as input the bam files for each samples as well as a GFF3 gene structure. rDiff.nonaprametric tries to estimate the biological variance on the annotated gene structure. Therefore, it is of advantage but not necessary to have an as complete gene structure as possible. Apart from the variance estimation rDiff.nonparametric uses only the gene starts and gene stops for testing.In the following example we test for dfferences in the two samples "1" and "2" which have their replicates
bam1_r1.bam
, bam1_r2.bam
resp. bam2_r1.bam
, bam2_r2.bam
. In our example we assume that the
bam files are located in the directory bamdir
and that the reads are 75 long. Furthermore, we assume that our gene structure is saved in the file genes.gff3
in the GFF3-Format.
The test can then be started by first changing into the directory bin
:cd binand then typing:
./rdiff -o outdir -d bamdir -a bam1_r1.bam,bam1_r2.bam -b bam2_r1.bam,bam2_r2.bam -g genes.gff3 -m nonparam -L 75 -m 30Here we required furthermore that a read has to be at least 30 bp long in order to be included in the analysis. A detailed description of the parameters used can be found in the following table:
Option | Description |
---|---|
-o | The output directory for the results |
-d | The directory where the bam files are |
-a | The filenames of the bamfiles in the first samples. The filenames must be separated by "," and without spaces. |
-b | The filenames of the bamfiles in the second samples. The filenames must be separated by "," and without spaces. |
-g | The filename of the gene structure. Please provide the absolute path to the file. |
-m | Method to be used for testing. The value 'param' is for rDiff.parametric, 'nonparam' for rDiff.nonparametric and 'poisson' for rDiff.poisson. |
-L | The read length of the reads |
-m | Minimal length of reads that should be used. Reads shorter than this will not be included in the analysis. |
The output files can be found in
outdir
. The outputfiles are described in the following table:Filename | Description |
---|---|
P_values_rDiff_nonparametric.tab | This file contains the p-values of rDiff.nonparametric. The file is tab-delimited and has three columns. The first column contains the gene names, the second the p-values and the third the test status. |
Gene_expression.tab | This file contains the gene expression estimations for all the replicates. The file is tab-delimited. The first column contains the gene names and the other columns the read counts for each gene for all replicates. |
Nonparametric_region_counts.mat | This file contains the counts for the alternative regions used to estimate the variance functions. The format is the binary mat format. |
genes.mat | This file contains the gene structure. The format is the binary mat format. |
variance_function_1.mat | This file contains the saved variance function for sample "1". It is a locfit-structure saved in the binary mat format. |
variance_function_2.mat | This file contains the saved variance function for sample "2". It is a locfit-structure saved in the binary mat format. |
Working without replicates
When there is only one replicate available in each sample one can merge the replicates from both samples for the variance function estimation. This can be done using the option -x
additionally to the other options.