Module explosig_connect.connection

Classes

class Connection

Represents a connection to an ExploSig session, with functions for transforming and sending data.

Subclasses

Methods

def send_sample_metadata(self, df)

Send a dataframe containing sample metadata values to ExploSig.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are metadata variables. The following are recognized column names: {Study, Donor}.
def send_mutation_type_counts(self, df)

Send a dataframe containing mutation count values by mutation type to ExploSig.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are mutation types (SBS, DBS, INDEL).
def send_signatures(self, mut_type, df, prob_max=None)

Send a dataframe containing signatures to ExploSig.

Parameters

mut_type : str
The mutation type corresponding to this set of signatures (SBS, DBS, INDEL).
df : pandas.DataFrame
Dataframe with index of signature names. Columns are mutation categories (A[C>A]A, etc.).
prob_max : None or 'auto', optional
How to compute the maximum y-value of signature plots. If None, defaults to 0.2. If 'auto', set to the maximum value in the matrix. by default None
def send_exposures(self, mut_type, df, send_sigs=False)

Send a dataframe containing exposures to ExploSig.

Parameters

mut_type : str
The mutation type corresponding to this set of signatures (SBS, DBS, INDEL).
df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are signature names.
send_sigs : bool, optional
Whether to also send signature names with the exposures. Useful if not intending to call send_signatures(). by default False
def send_clinical_data(self, df, types={}, scales={})

Send a dataframe containing clinical data.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are clinical variables.
types : dict, optional
A dict mapping column names to data types ('continuous' or 'categorical'), by default {} If a column name is not found in the dict, it is assumed that numeric columns are continuous and string columns are categorical.
scales : dict, optional
A dict mapping column names to scale domains, by default {} If a column name is not found in the dict, it is assumed that categorical column scales are simply a list of unique elements and continuous column scales are [min, max].
def send_gene_mutation_data(self, df)

Send a dataframe containing gene mutation data.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are gene IDs.
def send_gene_expression_data(self, df)

Send a dataframe containing gene expression data.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are gene IDs.
def send_copy_number_data(self, df)

Send a dataframe containing copy number data.

Parameters

df : pandas.DataFrame
Dataframe with index of sample IDs. Columns are gene IDs.
class ConfigConnection (session_id, token, server_hostname, client_hostname)

Represents a connection to a previously-configured ExploSig session.

Ancestors

Methods

def get_config(self)

Get the current data configuration as a dict.

Returns

dict A dictionary containing the selected samples, signatures, clinical variables, and genes.

def get_mutation_type_counts(self)

Get the counts by mutation type dataframe associated with the current config.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and mutation types (SBS, DBS, INDEL) on the columns. Values are counts.

def get_mutation_category_counts(self, mut_type)

Get the counts by mutation category dataframe (for a particular mutation type) associated with the current config.

Parameters

mut_type : str
One of {'SBS', 'DBS', 'INDEL'}.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and mutation categories on the columns. Values are counts.

def get_clinical_data(self)

Get the clinical data dataframe associated with the current config.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and clinical variables on the columns.

def get_gene_mutation_data(self)

Get a dataframe containing mutation classes associated with the current config.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are mutation classes.

def get_gene_expression_data(self)

Get a dataframe containing gene expression values associated with the current config.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are gene expression classes.

def get_copy_number_data(self)

Get a dataframe containing copy number values associated with the current config.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are copy number classes.

def get_exposures(self, mut_type, tricounts_method=None)

Get the sample by signature exposures dataframe (for a particular mutation type) associated with the current config.

Parameters

mut_type : str
One of {'SBS', 'DBS', 'INDEL'}.
tricounts_method : str, optional
One of {'By Study', 'None'}. Whether or not to normalize trinucleotides by frequency (based on sequencing strategy of each selected cohort). By default, 'None'.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and signature names on the columns. Values are counts (exposures).

Inherited members

class EmptyConnection (session_id, token, server_hostname, client_hostname)

Represents a connection to an "empty" ExploSig session.

Ancestors

Methods

def open(self, how='auto')

Attempts to open the session URL in a browser. Calls webbrowser.open if how == 'browser'. Outputs JavaScript if how == 'nb_js'. Outputs HTML if how == 'nb_link'. Otherwise, simply prints the URL.

Parameters

how : str, optional
One of {'auto', 'nb_js', 'nb_link', 'browser'}, by default 'auto'
def get_mutation_type_counts(self, projects)

Get the counts by mutation type dataframe associated with the current config.

Parameters

projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and mutation types (SBS, DBS, INDEL) on the columns. Values are counts.

def get_mutation_category_counts(self, mut_type, projects)

Get a mutation count dataframe (for a particular mutation type and set of sequencing projects).

Parameters

mut_type : str
One of {'SBS', 'DBS', 'INDEL'}.
projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and mutation categories on the columns. Values are counts.

def get_clinical_data(self, projects)

Get a clinical data dataframe (for a particular set of sequencing projects).

Parameters

projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and clinical variables on the columns.

def get_gene_mutation_data(self, genes, projects)

Get a dataframe containing mutation classes (for a particular set of genes and set of sequencing projects).

Parameters

genes : list of str
A list of gene IDs.
projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are mutation classes.

def get_gene_expression_data(self, genes, projects)

Get a dataframe containing gene expression values (for a particular set of genes and set of sequencing projects).

Parameters

genes : list of str
A list of gene IDs.
projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are gene expression classes.

def get_copy_number_data(self, genes, projects)

Get a dataframe containing copy number values (for a particular set of genes and set of sequencing projects).

Parameters

genes : list of str
A list of gene IDs.
projects : list of str
A list of sample cohort IDs.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and genes on the columns. Values are copy number classes.

def get_exposures(self, projects, signatures, mut_type, tricounts_method=None)

Get the sample by signature exposures dataframe (for a particular mutation type) associated with the current config.

Parameters

projects : list of str
A list of sample cohort IDs.
signatures : list of str
A list of signature names.
mut_type : str
One of {'SBS', 'DBS', 'INDEL'}.
tricounts_method : str, optional
One of {'By Study', 'None'}. Whether or not to normalize trinucleotides by frequency (based on sequencing strategy of each selected cohort). By default, 'None'.

Returns

pandas.DataFrame A dataframe with sample IDs on the index and signature names on the columns. Values are counts (exposures).

Inherited members