API
Utilities
- faust.utilities.apply_cross_sample_filter_to_grna_umi_counts(counts_matrix, subjects, threshold_factor=1e-05, subject_blacklist=[])
- Parameters:
counts_matrix
subjects
threshold_factor – (Default value = 1e-5)
subject_blacklist – (Default value = [])
- faust.utilities.apply_threshold_to_grna_umi_counts(counts, barcodes=None, fixed_threshold=0, per_sample_read_fraction=1e-06, per_sample_fixed_threshold=None, number_of_cells_recovered=None, per_sample_per_barcode_of_expected_threshold=0.001, faust_probabilistic_model_error_rate=1e-06, per_recovered_cell_read_fraction=0.001, mode='fixed')
- Parameters:
counts
barcodes – (Default value = None)
fixed_threshold – (Default value = 0)
per_sample_read_fraction – (Default value = 1e-6)
per_sample_fixed_threshold – (Default value = None)
number_of_cells_recovered – (Default value = None)
per_sample_per_barcode_of_expected_threshold – (Default value = 1e-3)
faust_probabilistic_model_error_rate – (Default value = 1e-6)
per_recovered_cell_read_fraction – (Default value = 1e-3)
mode – (Default value = ‘fixed’)
- faust.utilities.count_gpp_output(sgRNA_input, barcode_input, prefix, valid_constructs, valid_umis, conditions, output, quality_output=None, approximate_construct_matching=False, min_mean_read_quality_score=0, min_min_read_quality_score=0, read_quality_start=32, read_quality_end=38, verbose=True)
- Parameters:
sgRNA_input
barcode_input
prefix
valid_constructs
valid_umis
conditions
output
quality_output – (Default value = None)
approximate_construct_matching – (Default value = False)
min_mean_read_quality_score – (Default value = 30)
min_min_read_quality_score – (Default value = 0)
verbose – (Default value = True)
read_quality_start – (Default value = 32)
read_quality_end – (Default value = 38)
- faust.utilities.estimate_cell_input(df, sample, target_gene, count_threshold, target_gene_col='Target Gene')
- Parameters:
df
sample
target_gene
count_threshold
target_gene_col – (Default value = ‘Target Gene’)
- faust.utilities.estimate_read_error_singlets(n_observed_unique_grna_umis, n_observed_zeros, log2_counts_sum)
- Parameters:
n_observed_unique_grna_umis
n_observed_zeros
log2_counts_sum
- faust.utilities.get_engraftment(counts, mode, inputnumber, recoverednumber, barcodes, fixed_threshold=1, faust_probabilistic_model_error_rate=1e-06)
- Parameters:
counts (List of gRNA-UMI counts)
mode (One of 'nothreshold','fixed','per_sample','per_sample_cell_recovery_adjusted','per_sample_per_barcode', 'faust_probabilistic_model','combined')
inputnumber (Number of cells in input aliquot (pre-engraftment cell count))
recoverednumber (Number of cells recovered by output, as counted by flow)
barcodes (list of barcode (gRNA) names (identical in length to list of counts))
- faust.utilities.get_mageck_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', append_UMI=True, output=None)
- Parameters:
df
sgRNA_col – (Default value = ‘Construct Barcode’)
gene_col – (Default value = ‘Construct IDs’)
UMI_col – (Default value = ‘UMI’)
append_UMI – (Default value = True)
output – (Default value = None)
- faust.utilities.get_mageck_ibar_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output=None)
- Parameters:
df
sgRNA_col – (Default value = ‘Construct Barcode’)
gene_col – (Default value = ‘Construct IDs’)
UMI_col – (Default value = ‘UMI’)
output – (Default value = None)
- faust.utilities.get_replicate_aggregated_statistics(summary_df, aggregation_column=None, inplace=False, pvalue_col_name='MannWhitneyP')
- Parameters:
summary_df
aggregation_column – (Default value = None)
inplace – (Default value = False)
pvalue_col_name – (Default value = ‘MannWhitneyP’)
- faust.utilities.get_riger_compatible_df(df, sgRNA_col='Construct Barcode', UMI_col='UMI', append_UMI=True, gene_col='Construct IDs', score_col=None, rank_col=None)
- Parameters:
df
sgRNA_col – (Default value = ‘Construct Barcode’)
UMI_col – (Default value = ‘UMI’)
append_UMI – (Default value = True)
gene_col – (Default value = ‘Construct IDs’)
score_col – (Default value = None)
rank_col – (Default value = None)
- faust.utilities.get_summary_df(df, controls, inputs, outputs, input_type='single', verbose=True, count_threshold=0, estimate_cells=True, gene_col='Target Gene', alternative='two-sided', downsample_control=False, custom_test=None, custom_effect_size=None, progress_logger=None, input_aggregation_function='sum', include_controls_in_summary=False)
- Parameters:
df
controls
inputs
outputs
input_type – (Default value = ‘single’)
verbose – (Default value = True)
count_threshold – (Default value = 1)
estimate_cells – (Default value = True)
gene_col – (Default value = ‘Target Gene’)
alternative – (Default value = ‘two-sided’)
downsample_control – (Default value = False)
custom_test – (Default value = None)
custom_effect_size – (Default value = None)
- faust.utilities.get_zfc_compatible_df(df, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', ctrl_col=None, exp_col=None, output=None)
- Parameters:
df
sgRNA_col – (Default value = ‘Construct Barcode’)
gene_col – (Default value = ‘Construct IDs’)
UMI_col – (Default value = ‘UMI’)
ctrl_col – (Default value = None)
exp_col – (Default value = None)
output – (Default value = None)
- faust.utilities.import_check(package, statement_upon_failure, standard_prefix=True)
- Parameters:
package
statement_upon_failure
standard_prefix – (Default value = True)
- faust.utilities.morisita(counts1, counts2)
- Parameters:
counts1
counts2
- faust.utilities.nan_fdrcorrection_q(pvalues)
- Parameters:
pvalues
- faust.utilities.predicted_input(n_possible_grna_umi, n_detected_grna_umi)
- Parameters:
n_possible_grna_umi
n_detected_grna_umi
- faust.utilities.read_gpp_output(folders, chipfile=None, chipfile_gene_symbol_colname='Gene Symbol', barcode2gene_dict=None, indices=['Construct Barcode', 'Construct IDs', 'UMI', 'Target Gene'], dropcols=[], umi_col='UMI', collapse_umis=False)
Helper function designed to read output generated at the Broad Gene Perturbation Platform (GPP). Output takes the form of multiple folders with counts matrices generated via PoolQ3. The chipfile is used to create a mapping between a barcode sequence and a gene (using the columns “Barcode Sequence” and “Gene Symbol” within the chip file). If no chip file is present, one may simply pass a dictionary with this mapping as the optional argument “barcode2gene_dict”. This function will then loop through all the files in the specified folders (ignoring those with “MATCH” in the filename, indicating counts that could not be matched to a specific UMI), make a note of their UMI, and concatenate them. These filenames should have the form “<something>-XXXXXX.txt”, where “XXXXXX” is the umi.
- Parameters:
folders (str, path to folders containing .txt files generated by the GPP)
chipfile (str, path to chip file (usually with the suffix '.chip')) – (Default value = None)
barcode2gene_dict (dict, mapping between "Barcode Sequence" and "Gene Symbol") – (Default value = None)
indices (str, column in .txt files to use as the index (relevant when summing the counts from different plates together)) – (Default value = [‘Construct Barcode’, ‘Construct IDs’, ‘UMI’ , ‘Target Gene’] :
dropcols (list, list of columns in txt file to drop) – (Default value = [])
IDs' ('Construct)
'UMI'
Gene'] ('Target)
- faust.utilities.run_alternative_test(df, test=None, exp_col=None, ctrl_col=None, sgRNA_col='Construct Barcode', gene_col='Construct IDs', UMI_col='UMI', output='')
- Parameters:
df
test – (Default value = None)
exp_col – (Default value = None)
ctrl_col – (Default value = None)
sgRNA_col – (Default value = ‘Construct Barcode’)
gene_col – (Default value = ‘Construct IDs’)
UMI_col – (Default value = ‘UMI’)
output – (Default value = ‘’)
- faust.utilities.simpson(x, with_replacement=False)
For computing simpson index directly from counts (or frequencies, if with_replacement=True)
- Parameters:
x
with_replacement – (Default value = False)
Visualization
- faust.visualization.plot_top_hits(summary_df, y='CommonLanguageEffectSize', ascending=False, nhits=10, figsize=(9, 5), fig=None, ax=None, sort_criterion='CommonLanguageEffectSize', hue=None, swarmviolin_kwargs={}, selected_genes=None, both_sides=False)
- Parameters:
summary_df
ascending – (Default value = False)
nhits – (Default value = 10)
figsize – (Default value = (9)
5)
fig – (Default value = None)
ax – (Default value = None)
sort_criterion – (Default value = ‘CommonLanguageEffectSize’)
hue – (Default value = None)
swarmviolin_kwargs – (Default value = {})
selected_genes – (Default value = None)