This function takes in a query signature (ex: log fold-change values of gene expression for a perturbation experiment) and calculates iPAS scores for each KEGG pathway. Many query signatures may be submitted at once. The input signature(s) must be in the form of a matrix where each column is a signature and each row is a gene. Row names must be Entrez IDs. Names for each signature may be given as column names.

iPAS_enrich(
  query,
  similarity = c("Pearson", "cosine", "dot_product"),
  gene_type = "entrez",
  category = c("Disease", "Other", "Signaling", "Cancer"),
  perm = 1000,
  testing = F,
  return_individual_cl = T,
  return_null_dist = F,
  overlap_min = 10,
  ncores = 1,
  seed = NULL,
  print_updates = F
)

Arguments

query

a matrix with the input/query signatures to perform pathway analysis on. The row names should be genes (Entrez IDs) and the column names should be names for the samples/signatures.

similarity

the type of similarity measure to use for iPAS. Choices are "Pearson" for Pearson correlation, "cosine" for cosine similarity, and "dot_product" for the dot product.

gene_type

the type of names used for the genes. Currently only Entrez IDs are available.

category

the categories of KEGG pathways for which to calculate iPAS scores. We categorize pathways as "Disease", "Other", "Signaling", or "Cancer". By default all categories/pathways are included.

perm

the number of permutations to perform (1000 by default)

testing

T/F value or integer. If TRUE, only calculate iPAS scores for the first 5 pathways. If an integer n, calculate iPAS scores for the first n pathways. Used in testing.

return_individual_cl

T/F value, whether to return similarity scores for each individual cell line

return_null_dist

T/F value, whether to return the null distribution of iPAS scores for each pathway (permutation scores). TRUE is required for the iPAS_density and iPAS_density_facet functions, which graph the density of the null distribution. FALSE by default.

overlap_min

the minimum number of overlapping genes required between the input signature and a pathway signature to calculate similarity (default is 10).

ncores

number of cores to use for calculation (via parallel package), default is 1.

seed

a seed (integer value) to use for random number generation, passed to set.seed. This can be used to make the analysis reproducible, since it involved random permutation.

print_updates

T/F value, whether to print updates while calculating, FALSE by default.

Value

a data frame (tibble) with correlation and p-value results for the input signature compared with each PAS signature