API

Utilities

faust.utilities.apply_cross_sample_filter_to_grna_umi_counts(counts_matrix, subjects, threshold_factor=1e-05, subject_blacklist=[])

Parameters:

counts_matrix

subjects

threshold_factor – (Default value = 1e-5)

subject_blacklist – (Default value = [])

faust.utilities.apply_threshold_to_grna_umi_counts(counts, barcodes=None, fixed_threshold=0, per_sample_read_fraction=1e-06, per_sample_fixed_threshold=None, number_of_cells_recovered=None, per_sample_per_barcode_of_expected_threshold=0.001, faust_probabilistic_model_error_rate=1e-06, per_recovered_cell_read_fraction=0.001, mode='fixed')

Parameters:

counts

barcodes – (Default value = None)

fixed_threshold – (Default value = 0)

per_sample_read_fraction – (Default value = 1e-6)

per_sample_fixed_threshold – (Default value = None)

number_of_cells_recovered – (Default value = None)

per_sample_per_barcode_of_expected_threshold – (Default value = 1e-3)

faust_probabilistic_model_error_rate – (Default value = 1e-6)

per_recovered_cell_read_fraction – (Default value = 1e-3)

mode – (Default value = ‘fixed’)

faust.utilities.count_gpp_output(sgRNA_input, barcode_input, prefix, valid_constructs, valid_umis, conditions, output, quality_output=None, approximate_construct_matching=False, min_mean_read_quality_score=0, min_min_read_quality_score=0, read_quality_start=32, read_quality_end=38, verbose=True)

Parameters:

sgRNA_input

barcode_input

prefix

valid_constructs

valid_umis

conditions

output

quality_output – (Default value = None)

approximate_construct_matching – (Default value = False)

min_mean_read_quality_score – (Default value = 30)

min_min_read_quality_score – (Default value = 0)

verbose – (Default value = True)

read_quality_start – (Default value = 32)

read_quality_end – (Default value = 38)

faust.utilities.estimate_cell_input(df, sample, target_gene, count_threshold, target_gene_col='Target Gene')

Parameters:

df

sample

target_gene

count_threshold

target_gene_col – (Default value = ‘Target Gene’)

faust.utilities.estimate_read_error_singlets(n_observed_unique_grna_umis, n_observed_zeros, log2_counts_sum)

Parameters:

n_observed_unique_grna_umis

n_observed_zeros

log2_counts_sum

faust.utilities.get_engraftment(counts, mode, inputnumber, recoverednumber, barcodes, fixed_threshold=1, faust_probabilistic_model_error_rate=1e-06)

Parameters:

counts (List of gRNA-UMI counts)

mode (One of 'nothreshold','fixed','per_sample','per_sample_cell_recovery_adjusted','per_sample_per_barcode', 'faust_probabilistic_model','combined')

inputnumber (Number of cells in input aliquot (pre-engraftment cell count))

recoverednumber (Number of cells recovered by output, as counted by flow)

barcodes (list of barcode (gRNA) names (identical in length to list of counts))

faust.utilities.get_mageck_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', append_UMI=True, output=None)

Parameters:

df

sgRNA_col – (Default value = ‘Construct Barcode’)

gene_col – (Default value = ‘Construct IDs’)

UMI_col – (Default value = ‘UMI’)

append_UMI – (Default value = True)

output – (Default value = None)

faust.utilities.get_mageck_ibar_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output=None)

Parameters:

df

sgRNA_col – (Default value = ‘Construct Barcode’)

gene_col – (Default value = ‘Construct IDs’)

UMI_col – (Default value = ‘UMI’)

output – (Default value = None)

faust.utilities.get_replicate_aggregated_statistics(summary_df, aggregation_column=None, inplace=False, pvalue_col_name='MannWhitneyP')

Parameters:

summary_df

aggregation_column – (Default value = None)

inplace – (Default value = False)

pvalue_col_name – (Default value = ‘MannWhitneyP’)

faust.utilities.get_riger_compatible_df(df, sgRNA_col='Construct Barcode', UMI_col='UMI', append_UMI=True, gene_col='Construct IDs', score_col=None, rank_col=None)

Parameters:

df

sgRNA_col – (Default value = ‘Construct Barcode’)

UMI_col – (Default value = ‘UMI’)

append_UMI – (Default value = True)

gene_col – (Default value = ‘Construct IDs’)

score_col – (Default value = None)

rank_col – (Default value = None)

faust.utilities.get_summary_df(df, controls, inputs, outputs, input_type='single', verbose=True, count_threshold=0, estimate_cells=True, gene_col='Target Gene', alternative='two-sided', downsample_control=False, custom_test=None, custom_effect_size=None, progress_logger=None, input_aggregation_function='sum', include_controls_in_summary=False)

Parameters:

df

controls

inputs

outputs

input_type – (Default value = ‘single’)

verbose – (Default value = True)

count_threshold – (Default value = 1)

estimate_cells – (Default value = True)

gene_col – (Default value = ‘Target Gene’)

alternative – (Default value = ‘two-sided’)

downsample_control – (Default value = False)

custom_test – (Default value = None)

custom_effect_size – (Default value = None)

faust.utilities.get_zfc_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', ctrl_col=None, exp_col=None, output=None)

Parameters:

df

sgRNA_col – (Default value = ‘Construct Barcode’)

gene_col – (Default value = ‘Construct IDs’)

UMI_col – (Default value = ‘UMI’)

ctrl_col – (Default value = None)

exp_col – (Default value = None)

output – (Default value = None)

faust.utilities.import_check(package, statement_upon_failure, standard_prefix=True)

Parameters:

package

statement_upon_failure

standard_prefix – (Default value = True)

faust.utilities.morisita(counts1, counts2)

Parameters:

counts1

counts2

faust.utilities.nan_fdrcorrection_q(pvalues)

Parameters:

pvalues

faust.utilities.predicted_input(n_possible_grna_umi, n_detected_grna_umi)

Parameters:

n_possible_grna_umi

n_detected_grna_umi

faust.utilities.read_gpp_output(folders, chipfile=None, chipfile_gene_symbol_colname='Gene Symbol', barcode2gene_dict=None, indices=['Construct Barcode', 'Construct IDs', 'UMI', 'Target Gene'], dropcols=[], umi_col='UMI', collapse_umis=False)

Helper function designed to read output generated at the Broad Gene Perturbation Platform (GPP). Output takes the form of multiple folders with counts matrices generated via PoolQ3. The chipfile is used to create a mapping between a barcode sequence and a gene (using the columns “Barcode Sequence” and “Gene Symbol” within the chip file). If no chip file is present, one may simply pass a dictionary with this mapping as the optional argument “barcode2gene_dict”. This function will then loop through all the files in the specified folders (ignoring those with “MATCH” in the filename, indicating counts that could not be matched to a specific UMI), make a note of their UMI, and concatenate them. These filenames should have the form “<something>-XXXXXX.txt”, where “XXXXXX” is the umi.

Parameters:

folders (str, path to folders containing .txt files generated by the GPP)

chipfile (str, path to chip file (usually with the suffix '.chip')) – (Default value = None)

barcode2gene_dict (dict, mapping between "Barcode Sequence" and "Gene Symbol") – (Default value = None)

indices (str, column in .txt files to use as the index (relevant when summing the counts from different plates together)) – (Default value = [‘Construct Barcode’, ‘Construct IDs’, ‘UMI’ , ‘Target Gene’] :

dropcols (list, list of columns in txt file to drop) – (Default value = [])

IDs' ('Construct)

'UMI'

Gene'] ('Target)

faust.utilities.run_alternative_test(df, test=None, exp_col=None, ctrl_col=None, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output='')

Parameters:

df

test – (Default value = None)

exp_col – (Default value = None)

ctrl_col – (Default value = None)

sgRNA_col – (Default value = ‘Construct Barcode’)

gene_col – (Default value = ‘Construct IDs’)

UMI_col – (Default value = ‘UMI’)

output – (Default value = ‘’)

faust.utilities.simpson(x, with_replacement=False)

For computing simpson index directly from counts (or frequencies, if with_replacement=True)

Parameters:

x

with_replacement – (Default value = False)

Visualization

faust.visualization.plot_top_hits(summary_df, y='CommonLanguageEffectSize', ascending=False, nhits=10, figsize=(9, 5), fig=None, ax=None, sort_criterion='CommonLanguageEffectSize', hue=None, swarmviolin_kwargs={}, selected_genes=None, both_sides=False)

Parameters:

summary_df

ascending – (Default value = False)

nhits – (Default value = 10)

figsize – (Default value = (9)

5)

fig – (Default value = None)

ax – (Default value = None)

sort_criterion – (Default value = ‘CommonLanguageEffectSize’)

hue – (Default value = None)

swarmviolin_kwargs – (Default value = {})

selected_genes – (Default value = None)