- fixed an issue in
get_kegg_gsets()
where empty result was returned for some organisms due to an error in parsing (#72)
- added
repel = TRUE
interm_gene_graph()
andcombined_results_graph()
for better visualization of labels - fixed minor issue in
enrichment_chart()
(#75) - fixed minor issue in
visualize_term_interactions()
- fixed issue in
get_biogrid_pin()
where the download method was set towget
(now set toauto
, per #83) - updated to using tab3 format for
get_biogrid_pin()
(if tab3 is available for the chosen release, otherwise tab2 format is used) - updated the default version of PIN obtained by
get_biogrid_pin()
to '4.4.200' - in
get_kegg_gsets()
, improved parsing of KEGG term descriptions so that no description is duplicated (#87) - in
score_terms()
, if using descriptions, the ID is now appended for (any) duplicated term descriptions (#87) - in
obtain_colored_url()
, swappedbg_color
withfg_color
due to an issue withKEGGREST
- added legend to
term_gene_heatmap()
(#95) - in
get_biogrid_pin()
, the "download.file.method" from global options is used combined_results_graph()
raises an error if there are no common terms in the combined data frame
- In
run_pathfindR()
, the defaultiterations
was set back to 10 (the default for all other v1.x)
- In
run_pathfindR()
, as "GR" (the default active subnetwork search method) provides nearly identical results in each iteration, the defaultiterations
is set to 1 - added the column 'support' (the proportion of active subnetworks leading to enrichment over all subnetworks) in the output
- updated the download URL in
get_biogrid_pin()
as BioGRID updated the URL for download
- changed old argument in the "Step-by-Step Execution of the pathfindR Enrichment Workflow" vignette
- fixed an issue in
visualize_term_interactions()
where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58)
- Fixed issue in
check_java_version()
where java version 14 could not be parsed (#49) - Fixed issue in
combined_results_graph()
where gene nodes were not colored correctly (#55)
- created separate package
pathfindR.data
for storing pathfindR data - added the function
visualize_active_subnetworks()
for visualizing graphs of active subnetworks - add the new vignette "Comparing Two pathfindR Results" that briefly describes how different pathfindR results can be compared
- added the functions
combine_pathfindR_results()
andcombined_results_graph()
for comparison of 2 pathfindR results and term-gene graph of the combined results, respectively - added the function
get_pin_file()
for obtaining organism-specific PIN data (only from BioGRID for now) - added the function
get_gene_sets_list()
for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDB - added the function
term_gene_heatmap()
to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. Ifgenes_df
is provided, colors of the tiles indicate the change values - added the function
UpSet_plot()
to create UpSet plots of enriched terms - added the human cell markers gene sets data
cell_markers_gsets
andcell_markers_descriptions
- fixed an issue regarding
parallel::makeCluster()
inrun_pathfindR()
(#45) - fixed save-related issue in
download_kegg_png()
(#37, @rix133) - added the output data
RA_comparison_output
of pathfindR results on another RA-related dataset (GSE84074) - in
visualize_hsa_KEGG()
, fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept) - in
visualize_hsa_KEGG()
, implemented a tryCatch to avoid any issues whenKEGGREST::color.pathway.by.objects()
might fail (#28) - in
visualize_hsa_KEGG()
, now limiting the number of genes passes ontoKEGGREST::color.pathway.by.objects()
to < 60 (because the KEGG API now limits the number?) - changed default visualization in
term_gene_heatmap()
(i.e. whengenes_df
is not provided) to binary colored heatmap (by default, "green" and "red", controlled bylow
andhigh
) by up-/down- regulation status - update the vignette "pathfindR Analysis for non-Homo-sapiens organisms" to reflect new data generation functions
get_pin_file()
andget_gene_sets_list()
and fixed a minor issue in the vignette (#46)
- Fixed corner case in
create_kappa_matrix()
whenchance
is 1, the metric is turned into 0 - Fixed misused
class(.) == *
incluster_graph_vis()
- Fixed error in DESCRIPTION: the Java version in SystemRequirements was corrected to "Java (>= 8.0)"
- The Java version is now checked
- Fixed behavior: when no input genes are present in the enriched hsa KEGG pathway, visualization of the pathway is now skipped
- Added the argument
max_to_plot
tovisualize_hsa_KEGG()
and torun_pathfindR()
. This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime ofrun_pathfindR()
as downloading the png files is slow. - Fixed links to visualizations in
enriched_ters.Rmd
- Replaced most occurrences of "pathway" to "term". This was adapted because "term" reflects the utility of the package better. The enrichment and clustering approaches work with any kind of gene set data (be it pathway gene sets, gene ontology gene sets, motif gene sets etc.) Accordingly:
DESCRIPTION
was updated- The functions
annotate_pathway_DEGs()
,calculate_pw_scores()
,cluster_pathways()
,fuzzy_pw_clustering()
,hierarchical_pw_clustering()
,visualize_pw_interactions()
andvisualize_pws()
were renamed toannotate_term_DEGs()
,score_terms()
,cluster_enriched_terms()
,fuzzy_term_clustering()
,hierarchical_term_clustering()
,visualize_term_interactions()
andvisualize_terms()
respectively - The Rmd template file for the report
enriched_pathways.Rmd
was renamed toenriched_terms.Rmd
- All the Rmd template files for the report were updated
- Documentation of each function was updated accordingly
- Added the visualization function
term_gene_graph()
, which creates a graph of enriched terms - involved genes - Made changes in
enrichment()
andenrichment_analyses()
to get enrichment results faster - Added the function
fetch_gene_set()
for obtaining gene set data more easily - Terms in gene sets can now be filtered according to the number of genes a term contains (controlled by
min_gset_size
,max_gset_size
infetch_gene_set()
andrun_pathfindR()
) - Added the argument
gaCrossover
during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover) - Added unit tests using
testthat
- Updated all gene sets data
- Updated all RA example data
- The vignettes were updated
- Updated all PIN data
- Improved speed of kappa matrix calculation (
create_kappa_matrix()
) - Added vignette for non-Homo-sapiens organisms
- Added Mus musculus (mmu) data:
mmu_kegg_genes
&mmu_kegg_descriptions
: mmu KEGG gene sets data- mmu STRING PIN
myeloma_input
&myeloma_output
: example mmu input and output data
- Added the STRING PIN (combined score >= 400)
- The argument
sig_gene_thr
in subnetwork filtering viafilterActiveSnws()
now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes andsig_gene_thr = 0.03
, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysis - Removed
pathview
dependency by implementing colored pathway diagram visualization function usingKEGGREST
andKEGGgraph
- In
hierarchical_term_clustering()
, redefined the distance measure as1 - kappa statistic
- Fixed minor issue in
cluster_graph_vis()
(during the calculations for additional node colors) - Removed title from graph visualization of hierarchical clustering in
cluster_graph_vis()
- In
active_snw_search()
, unnecessary warnings during active subnetwork search were removed - Fixed minor issue in
enrichment_chart()
, supplying fuzzy clustered results no longer raises an error - Added new checks in
input_testing()
andinput_processing()
to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17) - Fixed minor issue in
enrichment_chart()
, ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integers - In
enrichment_chart()
, added the argumentsnum_bubbles
(default is 4) to control number of bubbles displayed in the legend andeven_breaks
(default isTRUE
) to indicate if even increments of breaks are required - Updated the logo
- Minor fix in
term_gene_graph()
(create the igraph object as an undirected graph for better auto layout) - Minor fix in
visualize_term_interactions()
. The legend no longer displays "Non-input Active Snw. Genes" if they were not provided - The argument
human_genes
inrun_pathfindR()
andinput_processing()
was renamed asconvert2alias
- The gene symbols in the input data frame, the PIN and the gene sets are now turned into uppercase (for obtaining the best overlap)
- Added the argument
top_terms
toenrichment_chart()
, controlling the number top enriched terms to plot (default is 10) - Other minor bug/error fixes
- Separated the steps of the function
run_pathfindR
into individual functions:active_snw_search
,enrichment_analyses
,summarize_enrichment_results
,annotate_pathway_DEGs
,visualize_pws
. - renamed the function
pathmap
asvisualize_hsa_KEGG
, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (theinput_processing
function, assigns a change value of 100 to all). - Created new the visualization function
visualize_pw_interactions
, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways. - Added new vignette, describing the step-by-step execution of the pathfindR workflow
- Changed clustering metric to kappa statistic, created the new clustering related functions
create_kappa_matrix
,hierarchical_pw_clustering
,fuzzy_pw_clustering
andcluster_pathways
. - Implemented the new function
cluster_graph_vis
for visualizing graph diagrams of clustering results.
- Fixed the bug where the arguments
score_quan_thr
andsig_gene_thr
forrun_pathfindR
were not being utilized. - in
run_pathfindR
, added message at the end of run, reporting the number enriched pathways. - the function
run_pathfindR
now creates a variableorg_dir
that is the "path/to/original/working/directory".org_dir
is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to "..", i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (output_dir
, e.g. "./ALL_RESULTS/RESULT_A"). - in
input_processing
, added the argumenthuman_genes
to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML files - Added the data for
GO-All
, all annotations in the GO database (BP+MF+CC) - Updated the vignette
pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks
to reflect the new functionalities.
- in the function
plot_scores
, added the argumentlabel_cases
to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argumentcase_control_titles
which allows the user to change the default "Case" and "Control" headers. Also added the argumentslow
andhigh
used to change the low and high end colors of the scoring color gradient. - in the function
plot_scores
, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values) - minor change in
parseActiveSnwSearch
, replacedscore_thr
byscore_quan_thr
. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold. - minor change in
parseActiveSnwSearch
, increasedsig_gene_thr
from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results. - in
choose_clusters
, added the argumentp_val_threshold
to be used as p value threshold for filtering the enriched pathways prior to clustering.
- fixed issue related to the package
pathview
.
- in the function
choose_clusters
, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.
- Added the option to specify a custom gene set when using
run_pathfindR
. For this, thegene_sets
argument should be set to "Custom" andcustom_genes
andcustom_pathways
should be provided.
- fixed minor bug in
calculate_pw_scores
where if there was one DEG, subsetting the experiment matrix failed - added if condition to check if there were DEGs in
calculate_pw_scores
. If there is none, the pathway is skipped. - in
calculate_pw_scores
, ifcases
are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity incases
. This way, "up" pathways are grouped together, same for "down" pathways. - in
calculate_pwd
, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA. - in
choose_clusters
, ifresult_df
has less than 3 pathways, do not perform clustering. run_pathfindR
checks whether the output directory (output_dir
) already exists and if it exists, now appends "(1)" tooutput_dir
and displays a warning message. This was implemented to prevent writing over existing results.- in run
run_pathfindR
, recursive creation for the output directory (output_dir
) is now supported. - in run
run_pathfindR
, if no pathways are found, the function returns an empty data frame instead of raising an error.
-
Implemented the (per subject) pathway scoring function
calculate_pw_scores
and the function to plot the heatmap of pathway scores per subjectplot_scores
. -
Added the
auto
parameter tochoose_clusters
. Whenauto == TRUE
(default), the function chooses the optimal number of clustersk
automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway. -
Added the
Fold_Enrichment
column to the resulting data frame ofenrichment
, and as a corollary to the resulting data frame ofrun_pathfindR
. -
Added the option
bubble
to plot a bubble chart displaying the enrichment results inrun_pathfindR
using the helper functionenrichment_chart
. To plot the bubble chart setbubble = TRUE
inrun_pathfindR
or useenrichment_chart(your_result_df)
.
-
Add the parameter
silent_option
torun_pathfindR
. Whensilent_option == TRUE
(default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". Ifsilent_option == FALSE
, the output is printed on the screen. Default was set toTRUE
because multiple console outputs are simultaneously printed when running in parallel. -
Added the
list_active_snw_genes
parameter torun_pathfindR
. Whenlist_active_snw_genes == TRUE
, the function adds the columnnon_DEG_Active_Snw_Genes
, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value. -
Added the data
RA_clustered
, which is the example output of the clustering workflow. -
In the function,
run_pathfindR
added the option to specify the argumentoutput_dir
which specifies the directory to be created under the current working directory for storing the result HTML files.output_dir
is "pathfindR_Results" by default. -
run_pathfindR
now checks whether the output directory (output_dir
) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results. -
genes_table.html
now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.
- Added the
gene_sets
option inrun_pathfindR
to chose between different gene sets. Available gene sets areKEGG
,Reactome
,BioCarta
and Gene Ontology gene sets (GO-BP
,GO-CC
andGO-MF
) cluster_pathways
automatically recognizes the ID type and chooses the gene sets accordingly
- Fixed issue regarding p values < 1e-13. No active subnetworks were found when there were p values < 1e-13. These are now changed to 1e-13 in the function
input_processing
- In
input_processing
, genes for which no interactions are found in the PIN are now removed before active subnetwork search - Duplicated gene symbols no longer raise an error. If there are duplicated symbols, the lowest p value is chosen for each gene symbol in the function
input_processing
- To prevent the formation of nested folders, by default and on errors, the function
run_pathfindR
returns to the user's working directory. - Citation information are now provided for our BioRxiv pre-print