Basic Usage
FAUST is mostly a python API. It is designed to compute effect sizes and p-values/q-values (see The FAUST null hypothesis) using gRNA-UMI counts. These counts are most commonly enriched via polymerase chain reaction (PCR). We typically generate counts using PoolQ3, from the Broad Institute. It is also possible to generate these counts directly with FAUST.
The core functionality of FAUST may be found in the function faust.utilities.get_summary_df().
This function expects a pandas dataframe df with the following example format:
example argument dffor functionfaust.utilities.get_summary_df()barcode
UMI
gene
input1
input2
ln1
ln2
tumor1
tumor2
TCGA…
T…
gene1
532
501
300
251
10000
5238
ATCG…
C…
gene2
102
112
10
11
12
23
GATC…
G…
gene3
400
390
17
21
33
12
CGAT…
A…
control1
310
290
177
212
120
242
AGCT…
T…
control2
310
290
177
212
120
242
The next argument, controls should be a list of control targets.
These generally correspond to gRNAs that target intergenic regions, or that target no site in the genome at all. In the table above, controls would take the value ["control1","control2"]
The next arguments, inputs and outputs, should be a list of columns of df that correspond to input and output sites, respectively.
FAUST will compute the ratio, for each gRNA-UMI, between the output and the input sites provided.
The exact way this is done will depend on the argument input_type.
If input_type is ‘single’, FAUST will take the row-wise sum over all input columns in inputs; each entry in each column in outputs will then be divided (row-wise) by this sum to obtain the “factor of expansion” \(F_e\) for each gRNA-UMI.
If input_type is ‘matched’, FAUST will compute this factor of expansion for matched elements in inputs and outputs.
Let’s suppose we want to test the null hypothesis \(H_0\) that the factor of expansion \(F_e\) between a common input aliquot and a particular lymph node is equally likely to be greater or lesser for gRNA-UMIs targeting gene1 vs. gRNA-UMIs targeting control loci.
To do this, set input_type to be ‘single’, controls to be ["control1","control2"], inputs to be ["input1","input2"], and outputs to be ["ln1","ln2"].
FAUST will pool the counts for all the input aliquot measurements (we will test \(H_0\) using a Mann-Whitney U test, so whether we sum or average these input counts won’t affect our final result).
FAUST will then evalute \(H_0\) separately for ln1 and ln2.
Let’s now suppose we want to test the null hypothesis \(H_0\) that the factor of expansion \(F_e\) between a particular lymph node and a matched tumor is equally likely to be greater or lesser for gRNA-UMIs targeting gene1 vs. gRNA-UMIs targeting control loci.
To do this, set input_type to be ‘matched’, controls to be ["control1","control2"], inputs to be ["ln1","ln2"], and outputs to be ["tumor1","tumor2"].
FAUST will then evalute \(H_0\) between ln1 and tumor1, then ln2 and tumor2.
That is, the ordering of inputs and outputs matters, and they should be lists of the same length.
Tried to run with input_type ‘matched’ with inputs and outputs of unequal lengths will raise an exception.