Basic Usage

FAUST is mostly a python API. It is designed to compute effect sizes and p-values/q-values (see The FAUST null hypothesis) using gRNA-UMI counts. These counts are most commonly enriched via polymerase chain reaction (PCR). We typically generate counts using PoolQ3, from the Broad Institute. It is also possible to generate these counts directly with FAUST.

The core functionality of FAUST may be found in the function faust.utilities.get_summary_df(). This function expects a pandas dataframe df with the following example format:

example argument df for function faust.utilities.get_summary_df()

barcode

UMI

gene

input1

input2

ln1

ln2

tumor1

tumor2

TCGA…

T…

gene1

532

501

300

251

10000

5238

ATCG…

C…

gene2

102

112

10

11

12

23

GATC…

G…

gene3

400

390

17

21

33

12

CGAT…

A…

control1

310

290

177

212

120

242

AGCT…

T…

control2

310

290

177

212

120

242

The next argument, controls should be a list of control targets. These generally correspond to gRNAs that target intergenic regions, or that target no site in the genome at all. In the table above, controls would take the value ["control1","control2"]

The next arguments, inputs and outputs, should be a list of columns of df that correspond to input and output sites, respectively. FAUST will compute the ratio, for each gRNA-UMI, between the output and the input sites provided. The exact way this is done will depend on the argument input_type. If input_type is ‘single’, FAUST will take the row-wise sum over all input columns in inputs; each entry in each column in outputs will then be divided (row-wise) by this sum to obtain the “factor of expansion” \(F_e\) for each gRNA-UMI. If input_type is ‘matched’, FAUST will compute this factor of expansion for matched elements in inputs and outputs.

Let’s suppose we want to test the null hypothesis \(H_0\) that the factor of expansion \(F_e\) between a common input aliquot and a particular lymph node is equally likely to be greater or lesser for gRNA-UMIs targeting gene1 vs. gRNA-UMIs targeting control loci. To do this, set input_type to be ‘single’, controls to be ["control1","control2"], inputs to be ["input1","input2"], and outputs to be ["ln1","ln2"]. FAUST will pool the counts for all the input aliquot measurements (we will test \(H_0\) using a Mann-Whitney U test, so whether we sum or average these input counts won’t affect our final result). FAUST will then evalute \(H_0\) separately for ln1 and ln2.

Let’s now suppose we want to test the null hypothesis \(H_0\) that the factor of expansion \(F_e\) between a particular lymph node and a matched tumor is equally likely to be greater or lesser for gRNA-UMIs targeting gene1 vs. gRNA-UMIs targeting control loci. To do this, set input_type to be ‘matched’, controls to be ["control1","control2"], inputs to be ["ln1","ln2"], and outputs to be ["tumor1","tumor2"]. FAUST will then evalute \(H_0\) between ln1 and tumor1, then ln2 and tumor2. That is, the ordering of inputs and outputs matters, and they should be lists of the same length. Tried to run with input_type ‘matched’ with inputs and outputs of unequal lengths will raise an exception.