MetChem is an R package used to perform structural and functional analysis of metabolites using a simple pipeline.
The R package MetChem (current version 0.2) is part of the
Comprehensive R Archive Network (CRAN)1. The simplest way to
install the package is to enter the following command into your R
session: install.packages("MetChem")
. We suggest installing
the following R packages: pheatmap
and
RColorBrewer
to enable data visualization in heatmaps,
readxl
for the data reading of Excel files, and
impute
for the imputation of missing data.
# To install the pheatmap package
install.packages("pheatmap")
# To install the RColorBrewer package
install.packages("RColorBrewer")
# To install the readxl package
install.packages("readxl")
# To install the impute package
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("impute")
The package can be manually installed from source by opening the package’s page in CRAN and then proceeding as follows:
R CMD INSTALL MetChem.tar.gz
to install the package. Note
that this may require additional software on some platforms. Windows
requires Rtools2 to be installed and to be available in the
default search path (environment variable PATH). MAC OS X requires
installation of Xcode developers and command line tools.The package downloadable from CRAN was built using R version, R.4.2.1. The package should work without major issues on R versions > 3.5.0 and KODAMA package >= 2.3.
The R package MetChem depends by the R package rcdk, which is partially implemented in Java. In Windows environment, we reported an issue related to the uploading of rcdk package. It can be solved running the following code before loading the package MetChem.
replacement <- function(category = "LC_ALL") {
if (identical(category, "LC_MESSAGES"))
return("")
category <- match(category, .LC.categories)
if (is.na(category))
stop("invalid 'category' argument")
.Internal(Sys.getlocale(category))
}
base <- asNamespace("base")
environment(replacement) <- base
unlockBinding("Sys.getlocale", base)
assign("Sys.getlocale", replacement, envir = base)
lockBinding("Sys.getlocale", base)
To load the package, enter the following command in your R session:
library("MetChem")
If this command terminates without any error messages, the package is installed successfully. The MetChem package is now ready for use.
The package includes both a user manual (this document) and a
reference manual (help pages for each function). To view the user
manual, enter vignette("MetChem")
. Help pages can be viewed
using the command help(package="MetChem")
.
Here, we introduce an example for the analysis of metabolic
structural information using MetChem package. For this, we used a data
set of mass spectrometry dataset obtained from murine prostate tissue
samples reported by Labbé and Zadra (2019) (Supplementary Data 2). The
metabolic data are obtained from ventral prostate tissues of mice that
overexpress a human c-MYC transgene (MYC) in the prostate epithelium and
wild-type littermates (WT). Mice were fed either a high fat diet (HFD;
60% kcal from fat; lard—rich in saturated fat) or a control diet (CTD;
10% kcal from fat). The data set includes six replicates for each group
(, WT_CTD, MYC_CTD, WT_HFD, and MYC_HFD). To begin, download the data
from the Labbé and Zadra (2019) study. Download it and save it to your
hard disk. Metabolomic data is extracted using the instructions below.
Data is then imputed using a k-nearest neighbour (kNN) algorithm using
the function impute
as described in the publication.
require("readxl")
require("impute")
d=as.data.frame(read_excel("41467_2019_12298_MOESM5_ESM.xlsx",skip = 3))
d=d[1:414,]
rownames(d)=d[,"Metabolite"]
met=d[,4:27]
label=rep(c("WT_CTD","MYC_CTD","WT_HFD","MYC_HFD"),each=6)
label_MYC=rep(c("WT","MYC","WT","MYC"),each=6)
colnames(met)=paste(label,1:6)
met=data.matrix(met)
met=impute.knn(met,k=5)$data
Heatmap visualization is generated using the function
pheatmap
. Metabolites are hierarchically clustered
according to their relative concentration The hierarchical clustering is
performed using the distance matrix based on the KODAMA dimensions.
KODAMA is a learning algorithm for unsupervised feature extraction
specifically designed for analyzing noisy and high-dimensional data sets
(Cacciatore , 2014), implemented in the R package KODAMA
(Cacciatore , 2017). Additional information can be found in the review
of Zinga , 2023.
require("pheatmap")
require("RColorBrewer")
my_colour1 = list(genotype=c(MYC="#000000ff",WT="#eeeeeeff"),
group=c(MYC_CTD="#373898ff",MYC_HFD="#c11630ff",
WT_CTD="#00a4cfff",WT_HFD="#e40a81ff"))
set.seed(1)
kk1=KODAMA.matrix(t(met))
col=KODAMA.visualization(kk1)
hcol=hclust(dist(col),method="ward.D")
kk2=KODAMA.matrix(scale(met))
row=KODAMA.visualization(kk2)
hrow=hclust(dist(row),method="ward.D")
my_sample_col <- data.frame(group = label,genotype=label_MYC)
row.names(my_sample_col) <- colnames(met)
pheatmap(met,
cluster_cols = hcol,
cluster_rows = hrow,
labels_row = rep("",nrow(met)),
annotation_col = my_sample_col,
annotation_colors = my_colour1,
color = colorRampPalette(rev(brewer.pal(n = 11, name ="RdBu")))(100))
To analyze the chemical similarities among metabolites, we need the
Simplified Molecular-Input Line-Entry System (SMILES) of each metabolite
is obtained. The SMILES of the previous data set is stored in the list
HFD
that can be loaded using the function
data(HFD)
. MetChem
package includes the
modules.detection
function based on KODAMA analysis. This
function repeats the follwing steps (10 times as default): i)
transformation of the chemical structure dissimilarity matrix in a
multidimensional space (with 50 dimensions as defaults) using
multidimensional scaling; ii) KODAMA features extraction; iii)
hierarchical clustering based on the KODAMA output; iv) Calculation of
the silhoutte index from different number of clusters (from 2 to 30 as
default). The average of the siloutte index is calculated for each
cluster numbers to identity the optimal cluster number.
data(HFD)
met=met[rownames(HFD),]
SMILES=HFD$SMILES
names(SMILES)=rownames(HFD)
clu=modules.detection(SMILES)
plot(clu$min_nc:clu$max_nc,clu$silhouette,type="l",
ylab="Rousseeuw's Silhouette index",xlab="Number of clusters")
abline(v=5*(1:6),lty=2)
print(clu$main_cluster)
Based on the average Silhouette indeces, we have the optimal number of cluster for this data set is 8. Below is shown a graphical visualization of the final output of KODAMA. Each cluster is represented by a different color code. Each dot represents a different metabolite. Metabolites that are located near to each other share a similar chemical structure.
plot(clu$visualization,pch=21,bg=rainbow(8,alpha = 0.7)[clu$clusters[,"Clusters 8"]],cex=2)
legend(-30, 20, legend=paste("Cluster", unique(clu$clusters[,"Clusters 8"])),
col= rainbow(8,alpha = 0.7), pch= 16, cex=1)
The following line code can be used to identify the points on the scatter plot. The cluster belonging and chemical name of the selected points will be displayed. ESC key to terminate the command.
data.frame(metabolite=rownames(met),
cluster=clu$clusters[,"Clusters 8"])[identify(clu$visualization),]
A new heatmap is generated where metabolites are clustered according to their chemical similarity.
my_colour2 = list(cluster = c("1"=rainbow(8,alpha = 1)[1],
"2"=rainbow(8,alpha = 1)[2],
"3"=rainbow(8,alpha = 1)[3],
"4"=rainbow(8,alpha = 1)[4],
"5"=rainbow(8,alpha = 1)[5],
"6"=rainbow(8,alpha = 1)[6],
"7"=rainbow(8,alpha = 1)[7],
"8"=rainbow(8,alpha = 1)[8]),
genotype=c(MYC="#000000ff",WT="#eeeeeeff"),
group=c(MYC_CTD="#373898ff",MYC_HFD="#c11630ff",
WT_CTD="#00a4cfff",WT_HFD="#e40a81ff"))
clusters8=clu$clusters[,"Clusters 8"]
my_sample_row <- data.frame(cluster = as.character(clusters8))
row.names(my_sample_row) <- rownames(met)
set.seed(1)
met=met[rownames(HFD),]
kk1=KODAMA.matrix(t(met))
col=KODAMA.visualization(kk1)
hcol=hclust(dist(col),method="ward.D")
hrow=clu$hclust
my_sample_col <- data.frame(group = label,genotype=label_MYC)
row.names(my_sample_col) <- colnames(met)
pheatmap(met,
cluster_cols = hcol,
cluster_rows = hrow,
labels_row = rep("",nrow(met)),
annotation_colors = my_colour2,
annotation_col = my_sample_col,
annotation_row = my_sample_row,
cutree_rows = 8,
color = colorRampPalette(rev(brewer.pal(n = 11, name ="RdBu")))(100))
We next build a heatmap of the metabolite belonging to the cluster 4.
sel=clu$clusters[,"Clusters 8"]==4
met.sel=met[sel,]
my_sample_col <- data.frame(group = label,genotype=label_MYC)
row.names(my_sample_col) <- colnames(met)
oo=order(row.names(my_sample_col))
my_sample_col=my_sample_col[oo,]
met.sel=met.sel[,oo]
hrow.sel=hclust(dist(clu$visualization[sel,]),method="ward.D")
pheatmap(met.sel,fontsize = 7,
cluster_cols = FALSE,
cluster_rows = hrow.sel,
annotation_col = my_sample_col,
annotation_colors = my_colour1,
color = colorRampPalette(rev(brewer.pal(n = 11, name ="RdBu")))(100))
In the next step, we apply the Weighted Metabolite Chemical
Similarity Analysis (WMCSA) on all branches of the hierarchical
clustering performed on the cluster 4. WMCSA is implemented in the
function WMCSA
. This function summarizes the relative
concentration of metabolites within each module. Each module is defined
according to the chemical similarity.
set.seed(1)
my_sample_col <- data.frame(group = label,genotype=label_MYC)
row.names(my_sample_col) <- colnames(met)
cl=allbranches(hrow.sel)
ww=WMCSA(met.sel,cl)
hrow=hclust(dist(ww),method="ward.D")
pheatmap(ww,
cluster_cols = FALSE,
cluster_rows = hrow,
annotation_col = my_sample_col,
annotation_colors = my_colour1,
color = colorRampPalette(rev(brewer.pal(n = 11, name ="RdBu")))(100))
Differential analysis of the relevant modules can be performed using
the function multi_analysis
present in the R package
KODAMA
. In the example below, we perform a differential
analysis between MYC transgenic mice fed with high-fat diet or control
diet named as MYC_HFD and MYC_CTD, respectively.
label=rep(c("MYC","WT"),each=12)
multi_analysis(t(ww),label)
Feature | MYC | WT | p-value | FDR |
---|---|---|---|---|
Mod1, median [IQR] | -2.919 [-3.833 -1.536] | 2.938 [2.195 3.563] | 3.66e-05 | 7.84e-05 |
Mod2, median [IQR] | -2.17 [-2.638 -1.706] | 2.237 [1.98 2.478] | 3.66e-05 | 7.84e-05 |
Mod3, median [IQR] | -1.798 [-2.033 -1.215] | 1.409 [1.015 2.228] | 3.66e-05 | 7.84e-05 |
Mod4, median [IQR] | -2.45 [-3.037 -1.381] | 2.139 [1.776 2.827] | 3.66e-05 | 7.84e-05 |
Mod5, median [IQR] | -1.902 [-2.171 -1.425] | 1.688 [1.563 1.941] | 3.66e-05 | 7.84e-05 |
Mod6, median [IQR] | -1.772 [-3.022 -1.119] | 1.701 [0.742 2.204] | 2.46e-04 | 4.10e-04 |
Mod7, median [IQR] | 0.886 [-1.181 2.231] | -0.329 [-1.224 0.379] | 4.03e-01 | 4.64e-01 |
Mod8, median [IQR] | -1.674 [-2.485 -0.656] | 1.248 [0.609 1.995] | 3.84e-04 | 5.76e-04 |
Mod9, median [IQR] | 1.609 [0.961 2.06] | -1.412 [-1.706 -1.203] | 4.69e-05 | 8.80e-05 |
Mod10, median [IQR] | -0.84 [-1.956 0.087] | 0.8 [-0.148 1.161] | 4.64e-02 | 6.33e-02 |
Mod11, median [IQR] | -0.023 [-0.639 1.15] | -0.583 [-1.229 0.027] | 1.75e-01 | 2.19e-01 |
Mod12, median [IQR] | -1.347 [-1.928 -0.681] | 1.305 [0.711 1.736] | 3.66e-05 | 7.84e-05 |
Mod13, median [IQR] | 0.757 [-0.717 1.532] | -0.004 [-1.003 0.6] | 5.07e-01 | 5.43e-01 |
Mod14, median [IQR] | 1.691 [0.59 1.952] | -1.389 [-1.662 -0.889] | 3.66e-05 | 7.84e-05 |
Mod15, median [IQR] | 0.196 [-1.074 0.842] | 0.388 [-0.853 0.638] | 7.95e-01 | 7.95e-01 |
The function readMet
connects R to the HMDB database 3. to
retrieve chemical and functional information of each metabolite. This
can be summarized using different functions:
substituentsMet
, diseaseMet
,
enzymeMet
, pathwaysMet
,
taxonomyMet
. The function features
associates
the most prominent features to each module.
In this example, we characterized the modules by functional group.
HMDB=HFD$HMDB
names(HMDB)=rownames(HFD)
doc=readMet(HMDB)
cla1=substituentsMet(doc)
substituents=features(doc,cla1,cl)
cla2=enzymesMet(doc)
enzymes=features(doc,cla2,cl)
To visualize the metabolite of , for example, module 12, we can type:
cl[[12]]
Fisher’s test was used to rank the association of each module to the metabolite information. Below are reported the p-value for associations of substituents with the module 12.
substituents[[12]]
Substituents | p-value |
---|---|
Histidine or derivatives | 1.44e-06 |
Imidazolyl carboxylic acid derivative | 1.21e-05 |
Hybrid peptide | 4.97e-05 |
N-acyl-alpha amino acid or derivatives | 1.21e-03 |
Imidazole | 1.31e-03 |
Azole | 1.53e-03 |
Aromatic heteromonocyclic compound | 1.77e-03 |
N-acyl-alpha-amino acid | 2.33e-03 |
Beta amino acid or derivatives | 8.62e-03 |
Organoheterocyclic compound | 1.18e-02 |
Heteroaromatic compound | 1.47e-02 |
Carboximidic acid | 1.91e-02 |
Carboximidic acid derivative | 2.04e-02 |
Azacycle | 2.05e-02 |
Organic 1,3-dipolar compound | 2.32e-02 |
Propargyl-type 1,3-dipolar organic compound | 2.32e-02 |
Alpha-amino acid or derivatives | 3.95e-02 |
Amino acid or derivatives | 4.13e-02 |
Here are reported the p-value for associations of metabolite-related enzymes with the module 12.
enzymes[[12]]
Substituents | p-value |
---|---|
CNDP1 | 2.97e-04 |
CARNS1 | 1.38e-03 |
VIM | 8.62e-03 |
HSPA1A | 8.62e-03 |
SLC15A2 | 8.62e-03 |
MPO | 2.57e-02 |
Ebtesam Abdel-Shafy, Tadele Melak, David A. MacIntyre, Giorgia Zadra, Luiz F. Zerbini, Silvano Piazza, and Stefano Cacciatore Publication in submission
To obtain BibTex entries of the two references, you can enter the
following into your R session to Bibtex
citation("MetChem")
.
Cacciatore S, Luchinat C, Tenori L. Knowledge discovery by accuracy maximization. 2014; 111: 5117-22.
Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA (2017) KODAMA: an R package for knowledge discovery and data mining. 2017; 33(4): 621-623.
Labbé DP, Zadra G, Yang M, Reyes JM, Lin CY, Cacciatore S, Ebot EM, Creech AL, Giunchi F, Fiorentino M, Elfandy H, Syamala S, Karoly ED, Alshalalfa M, Erho N, Ross A, Schaeffer EM, Gibb EA, Takhar M, Den RB, Lehrer J, Karnes RJ, Freedland SJ, Davicioni E, Spratt DE, Ellis L, Jaffe JD, D’Amico AV, Kantoff PW, Bradner JE, Mucci LA, Chavarro JE, Loda M, Brown M. High-fat diet fuels prostate cancer progression by rewiring the metabolome and amplifying the MYC program. 2019; 10: 4358.
Zinga MM, Abdel-Shafy E, Melak T, Vignoli A, Piazza S, Zerbini LF, Tenori L, Cacciatore S. KODAMA exploratory analysis in metabolic phenotyping. 2023; 9: 1436.