API

Utilities

faust.utilities.apply_cross_sample_filter_to_grna_umi_counts(counts_matrix, subjects, threshold_factor=1e-05, subject_blacklist=[])
Parameters:
  • counts_matrix

  • subjects

  • threshold_factor – (Default value = 1e-5)

  • subject_blacklist – (Default value = [])

faust.utilities.apply_threshold_to_grna_umi_counts(counts, barcodes=None, fixed_threshold=0, per_sample_read_fraction=1e-06, per_sample_fixed_threshold=None, number_of_cells_recovered=None, per_sample_per_barcode_of_expected_threshold=0.001, faust_probabilistic_model_error_rate=1e-06, per_recovered_cell_read_fraction=0.001, mode='fixed')
Parameters:
  • counts

  • barcodes – (Default value = None)

  • fixed_threshold – (Default value = 0)

  • per_sample_read_fraction – (Default value = 1e-6)

  • per_sample_fixed_threshold – (Default value = None)

  • number_of_cells_recovered – (Default value = None)

  • per_sample_per_barcode_of_expected_threshold – (Default value = 1e-3)

  • faust_probabilistic_model_error_rate – (Default value = 1e-6)

  • per_recovered_cell_read_fraction – (Default value = 1e-3)

  • mode – (Default value = ‘fixed’)

faust.utilities.count_gpp_output(sgRNA_input, barcode_input, prefix, valid_constructs, valid_umis, conditions, output, quality_output=None, approximate_construct_matching=False, min_mean_read_quality_score=0, min_min_read_quality_score=0, read_quality_start=32, read_quality_end=38, verbose=True)
Parameters:
  • sgRNA_input

  • barcode_input

  • prefix

  • valid_constructs

  • valid_umis

  • conditions

  • output

  • quality_output – (Default value = None)

  • approximate_construct_matching – (Default value = False)

  • min_mean_read_quality_score – (Default value = 30)

  • min_min_read_quality_score – (Default value = 0)

  • verbose – (Default value = True)

  • read_quality_start – (Default value = 32)

  • read_quality_end – (Default value = 38)

faust.utilities.estimate_cell_input(df, sample, target_gene, count_threshold, target_gene_col='Target Gene')
Parameters:
  • df

  • sample

  • target_gene

  • count_threshold

  • target_gene_col – (Default value = ‘Target Gene’)

faust.utilities.estimate_read_error_singlets(n_observed_unique_grna_umis, n_observed_zeros, log2_counts_sum)
Parameters:
  • n_observed_unique_grna_umis

  • n_observed_zeros

  • log2_counts_sum

faust.utilities.get_engraftment(counts, mode, inputnumber, recoverednumber, barcodes, fixed_threshold=1, faust_probabilistic_model_error_rate=1e-06)
Parameters:
  • counts (List of gRNA-UMI counts)

  • mode (One of 'nothreshold','fixed','per_sample','per_sample_cell_recovery_adjusted','per_sample_per_barcode', 'faust_probabilistic_model','combined')

  • inputnumber (Number of cells in input aliquot (pre-engraftment cell count))

  • recoverednumber (Number of cells recovered by output, as counted by flow)

  • barcodes (list of barcode (gRNA) names (identical in length to list of counts))

faust.utilities.get_mageck_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', append_UMI=True, output=None)
Parameters:
  • df

  • sgRNA_col – (Default value = ‘Construct Barcode’)

  • gene_col – (Default value = ‘Construct IDs’)

  • UMI_col – (Default value = ‘UMI’)

  • append_UMI – (Default value = True)

  • output – (Default value = None)

faust.utilities.get_mageck_ibar_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output=None)
Parameters:
  • df

  • sgRNA_col – (Default value = ‘Construct Barcode’)

  • gene_col – (Default value = ‘Construct IDs’)

  • UMI_col – (Default value = ‘UMI’)

  • output – (Default value = None)

faust.utilities.get_replicate_aggregated_statistics(summary_df, aggregation_column=None, inplace=False, pvalue_col_name='MannWhitneyP')
Parameters:
  • summary_df

  • aggregation_column – (Default value = None)

  • inplace – (Default value = False)

  • pvalue_col_name – (Default value = ‘MannWhitneyP’)

faust.utilities.get_riger_compatible_df(df, sgRNA_col='Construct Barcode', UMI_col='UMI', append_UMI=True, gene_col='Construct IDs', score_col=None, rank_col=None)
Parameters:
  • df

  • sgRNA_col – (Default value = ‘Construct Barcode’)

  • UMI_col – (Default value = ‘UMI’)

  • append_UMI – (Default value = True)

  • gene_col – (Default value = ‘Construct IDs’)

  • score_col – (Default value = None)

  • rank_col – (Default value = None)

faust.utilities.get_summary_df(df, controls, inputs, outputs, input_type='single', verbose=True, count_threshold=0, estimate_cells=True, gene_col='Target Gene', alternative='two-sided', downsample_control=False, custom_test=None, custom_effect_size=None, progress_logger=None, input_aggregation_function='sum', include_controls_in_summary=False)
Parameters:
  • df

  • controls

  • inputs

  • outputs

  • input_type – (Default value = ‘single’)

  • verbose – (Default value = True)

  • count_threshold – (Default value = 1)

  • estimate_cells – (Default value = True)

  • gene_col – (Default value = ‘Target Gene’)

  • alternative – (Default value = ‘two-sided’)

  • downsample_control – (Default value = False)

  • custom_test – (Default value = None)

  • custom_effect_size – (Default value = None)

faust.utilities.get_zfc_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', ctrl_col=None, exp_col=None, output=None)
Parameters:
  • df

  • sgRNA_col – (Default value = ‘Construct Barcode’)

  • gene_col – (Default value = ‘Construct IDs’)

  • UMI_col – (Default value = ‘UMI’)

  • ctrl_col – (Default value = None)

  • exp_col – (Default value = None)

  • output – (Default value = None)

faust.utilities.import_check(package, statement_upon_failure, standard_prefix=True)
Parameters:
  • package

  • statement_upon_failure

  • standard_prefix – (Default value = True)

faust.utilities.morisita(counts1, counts2)
Parameters:
  • counts1

  • counts2

faust.utilities.nan_fdrcorrection_q(pvalues)
Parameters:

pvalues

faust.utilities.predicted_input(n_possible_grna_umi, n_detected_grna_umi)
Parameters:
  • n_possible_grna_umi

  • n_detected_grna_umi

faust.utilities.read_gpp_output(folders, chipfile=None, chipfile_gene_symbol_colname='Gene Symbol', barcode2gene_dict=None, indices=['Construct Barcode', 'Construct IDs', 'UMI', 'Target Gene'], dropcols=[], umi_col='UMI', collapse_umis=False)

Helper function designed to read output generated at the Broad Gene Perturbation Platform (GPP). Output takes the form of multiple folders with counts matrices generated via PoolQ3. The chipfile is used to create a mapping between a barcode sequence and a gene (using the columns “Barcode Sequence” and “Gene Symbol” within the chip file). If no chip file is present, one may simply pass a dictionary with this mapping as the optional argument “barcode2gene_dict”. This function will then loop through all the files in the specified folders (ignoring those with “MATCH” in the filename, indicating counts that could not be matched to a specific UMI), make a note of their UMI, and concatenate them. These filenames should have the form “<something>-XXXXXX.txt”, where “XXXXXX” is the umi.

Parameters:
  • folders (str, path to folders containing .txt files generated by the GPP)

  • chipfile (str, path to chip file (usually with the suffix '.chip')) – (Default value = None)

  • barcode2gene_dict (dict, mapping between "Barcode Sequence" and "Gene Symbol") – (Default value = None)

  • indices (str, column in .txt files to use as the index (relevant when summing the counts from different plates together)) – (Default value = [‘Construct Barcode’, ‘Construct IDs’, ‘UMI’ , ‘Target Gene’] :

  • dropcols (list, list of columns in txt file to drop) – (Default value = [])

  • IDs' ('Construct)

  • 'UMI'

  • Gene'] ('Target)

faust.utilities.run_alternative_test(df, test=None, exp_col=None, ctrl_col=None, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output='')
Parameters:
  • df

  • test – (Default value = None)

  • exp_col – (Default value = None)

  • ctrl_col – (Default value = None)

  • sgRNA_col – (Default value = ‘Construct Barcode’)

  • gene_col – (Default value = ‘Construct IDs’)

  • UMI_col – (Default value = ‘UMI’)

  • output – (Default value = ‘’)

faust.utilities.simpson(x, with_replacement=False)

For computing simpson index directly from counts (or frequencies, if with_replacement=True)

Parameters:
  • x

  • with_replacement – (Default value = False)

Visualization

faust.visualization.plot_top_hits(summary_df, y='CommonLanguageEffectSize', ascending=False, nhits=10, figsize=(9, 5), fig=None, ax=None, sort_criterion='CommonLanguageEffectSize', hue=None, swarmviolin_kwargs={}, selected_genes=None, both_sides=False)
Parameters:
  • summary_df

  • ascending – (Default value = False)

  • nhits – (Default value = 10)

  • figsize – (Default value = (9)

  • 5)

  • fig – (Default value = None)

  • ax – (Default value = None)

  • sort_criterion – (Default value = ‘CommonLanguageEffectSize’)

  • hue – (Default value = None)

  • swarmviolin_kwargs – (Default value = {})

  • selected_genes – (Default value = None)