| Title: | Clinical Metadata Exploration and Feature Matching |
|---|---|
| Description: | A collection of tools to easily analyze clinical data, including functions for correlation analysis, and statistical testing. The package facilitates the integration of clinical metadata with other omics layers, enabling exploration of quantitative variables. It also includes the utility for frequency matching samples across a dataset based on patient variables. |
| Authors: | Stefano Cacciatore [aut, trl, cre]
|
| Maintainer: | Stefano Cacciatore <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1 |
| Built: | 2026-05-07 07:51:45 UTC |
| Source: | https://github.com/tkcaccia/clinical |
This function appends the results of an analysis to a clinical object.
add_analysis(ma, name, x)add_analysis(ma, name, x)
ma |
A clinical object to which results will be attached. |
name |
The name (string) of the analysis. |
x |
The result object (e.g., data frame, matrix) to attach. |
Returns the updated clinical object with the new analysis appended.
Stefano Cacciatore
data(prostate) ma=initialization(prostate[,"Hospital"]) ma=add_analysis(ma,"Gender",prostate[,"Gender"]) ma=add_analysis(ma,"Gleason score",prostate[,"Gleason score"]) ma=add_analysis(ma,"BMI",prostate[,"BMI"]) ma=add_analysis(ma,"Age",prostate[,"Age"]) madata(prostate) ma=initialization(prostate[,"Hospital"]) ma=add_analysis(ma,"Gender",prostate[,"Gender"]) ma=add_analysis(ma,"Gleason score",prostate[,"Gleason score"]) ma=add_analysis(ma,"BMI",prostate[,"BMI"]) ma=add_analysis(ma,"Age",prostate[,"Age"]) ma
Converts a matrix of character or factor entries to a numeric matrix, preserving row and column names.
as.data.matrix(x)as.data.matrix(x)
x |
A matrix containing character or factor values that represent numeric entries. |
This function is useful when a matrix has been read as character or factor and needs to be transformed into numeric format for analysis. It preserves both the row and column names of the input matrix.
A numeric matrix of the same dimensions as x, with the same row and column names.
Stefano Cacciatore
# Create a character matrix ma <- matrix(c("1", "2", "3", "4"), ncol = 2) # Convert to numeric matrix ma_numeric <- as.data.matrix(ma) # Print result ma_numeric# Create a character matrix ma <- matrix(c("1", "2", "3", "4"), ncol = 2) # Convert to numeric matrix ma_numeric <- as.data.matrix(ma) # Print result ma_numeric
Summarization of the categorical information.
categorical.test (name,x,y,total.column=FALSE,...)categorical.test (name,x,y,total.column=FALSE,...)
name |
the name of the feature. |
x |
the information to summarize. |
y |
the classification of the cohort. |
total.column |
option to visualize the total (by default = " |
... |
further arguments to be passed to the function. |
The function returns a table with the summarized information and The p-value computated using the Fisher's test.
Stefano Cacciatore
correlation.test,continuous.test, txtsummary
data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) rbind(A,B,C,D)data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) rbind(A,B,C,D)
Summarization of the continuous information.
continuous.test (name, x, y, center = c("median", "mean"), digits = 3, scientific = FALSE, range = c("IQR","95%CI","range","sd"), logchange = FALSE, pos=2, method=c("non-parametric","parametric"), total.column=FALSE, ...)continuous.test (name, x, y, center = c("median", "mean"), digits = 3, scientific = FALSE, range = c("IQR","95%CI","range","sd"), logchange = FALSE, pos=2, method=c("non-parametric","parametric"), total.column=FALSE, ...)
name |
the name of the feature. |
x |
the information to summarize. |
y |
the classification of the cohort. |
center |
A character string specifying the measure of central tendency to report: either |
digits |
how many significant digits are to be used. |
scientific |
either a logical specifying whether result should be encoded in scientific format. |
range |
the range to be visualized. |
logchange |
either a logical specifying whether log2 of fold change should be visualized. |
pos |
a value indicating the position of range to be visualized. 1 for column, 2 for row. |
method |
a character string indicating which test method is to be computed. "non-parametric" (default), or "parametric". |
total.column |
option to visualize the total (by default = " |
... |
further arguments to be passed to or from methods. |
The function returns a table with the summarized information and the relative p-value. For non-parametric method, if the number of group is equal to two, the p-value is computed using the Wilcoxon rank-sum test, Kruskal-Wallis test otherwise. For parametric method, if the number of group is equal to two, the p-value is computed using the Student's t-Test, ANOVA one-way otherwise.
Stefano Cacciatore
correlation.test, categorical.test, txtsummary
data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) rbind(A,B,C,D)data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) rbind(A,B,C,D)
Summarization of the continuous information.
correlation.test (x, y, method = c("pearson", "spearman","MINE"), name=NA, perm=100, ...)correlation.test (x, y, method = c("pearson", "spearman","MINE"), name=NA, perm=100, ...)
x |
a numeric vector. |
y |
a numeric vector. |
method |
a character string indicating which correlation method is to be computed. "pearson" (default), "spearman", or "MINE". |
name |
the name of the feature. |
perm |
number of permutation needed to estimate the p-value with MINE correlation. |
... |
further arguments to be passed to or from methods. |
The function returns a table with the summarized information.
Stefano Cacciatore
categorical.test,continuous.test, txtsummary
data(prostate) correlation.test(prostate[,"Age"],prostate[,"BMI"],name="correlation between Age and BMI")data(prostate) correlation.test(prostate[,"Age"],prostate[,"BMI"],name="correlation between Age and BMI")
A method to select unbalanced groupd in a cohort.
frequency_matching (data,label,times=5,seed=1234)frequency_matching (data,label,times=5,seed=1234)
data |
a data.frame of data. |
label |
a classification of the groups. |
times |
The ratio between the two groups. |
seed |
a single number for random number generation. |
The function returns a list with 2 items or 4 items (if a test data set is present):
data |
the data after the frequency matching. |
label |
the label after the frequency matching. |
selection |
the rows selected for the frequency matching. |
Stefano Cacciatore
data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) # Analysis without matching rbind(A,B,C,D) # The order is important. Right is more important than left in the vector # So, Ethnicity will be more important than Age var=c("Age","BMI","Gleason score") data.categorized=prostate[,var] # Extract the Age vector x <- data.categorized[["Age"]] # Compute quantiles (0%, 25%, 50%, 75%, 100%) with NA handling breaks <- quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE) # Apply the cut and update the Age column with labeled bins data.categorized[["Age"]] <- cut(x, breaks = breaks, include.lowest = TRUE) # Extract the Age vector x <- data.categorized[["BMI"]] # Compute quantiles (0%, 25%, 50%, 75%, 100%) with NA handling breaks <- quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE) # Apply the cut and update the Age column with labeled bins data.categorized[["BMI"]] <- cut(x, breaks = breaks, include.lowest = TRUE) times=c(1,1) names(times)=c("Hospital A","Hospital B") t=frequency_matching(data.categorized,prostate[,"Hospital"],times=times) newdata=prostate[t$selection,] hosp.new=newdata[,"Hospital"] gender.new=newdata[,"Gender"] GS.new=newdata[,"Gleason score"] BMI.new=newdata[,"BMI"] age.new=newdata[,"Age"] A=categorical.test("Gender",gender.new,hosp.new) B=categorical.test("Gleason score",GS.new,hosp.new) C=continuous.test("BMI",BMI.new,hosp.new,digits=2) D=continuous.test("Age",age.new,hosp.new,digits=1) # Analysis with matching rbind(A,B,C,D)data(prostate) hosp=prostate[,"Hospital"] gender=prostate[,"Gender"] GS=prostate[,"Gleason score"] BMI=prostate[,"BMI"] age=prostate[,"Age"] A=categorical.test("Gender",gender,hosp) B=categorical.test("Gleason score",GS,hosp) C=continuous.test("BMI",BMI,hosp,digits=2) D=continuous.test("Age",age,hosp,digits=1) # Analysis without matching rbind(A,B,C,D) # The order is important. Right is more important than left in the vector # So, Ethnicity will be more important than Age var=c("Age","BMI","Gleason score") data.categorized=prostate[,var] # Extract the Age vector x <- data.categorized[["Age"]] # Compute quantiles (0%, 25%, 50%, 75%, 100%) with NA handling breaks <- quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE) # Apply the cut and update the Age column with labeled bins data.categorized[["Age"]] <- cut(x, breaks = breaks, include.lowest = TRUE) # Extract the Age vector x <- data.categorized[["BMI"]] # Compute quantiles (0%, 25%, 50%, 75%, 100%) with NA handling breaks <- quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE) # Apply the cut and update the Age column with labeled bins data.categorized[["BMI"]] <- cut(x, breaks = breaks, include.lowest = TRUE) times=c(1,1) names(times)=c("Hospital A","Hospital B") t=frequency_matching(data.categorized,prostate[,"Hospital"],times=times) newdata=prostate[t$selection,] hosp.new=newdata[,"Hospital"] gender.new=newdata[,"Gender"] GS.new=newdata[,"Gleason score"] BMI.new=newdata[,"BMI"] age.new=newdata[,"Age"] A=categorical.test("Gender",gender.new,hosp.new) B=categorical.test("Gleason score",GS.new,hosp.new) C=continuous.test("BMI",BMI.new,hosp.new,digits=2) D=continuous.test("Age",age.new,hosp.new,digits=1) # Analysis with matching rbind(A,B,C,D)
Initializes a new object of class "clinical.table", setting up the grouping variable and a flag for whether to include a total column.
initialization(y, total.column = FALSE)initialization(y, total.column = FALSE)
y |
A numeric or factor vector representing the outcome or grouping variable. |
total.column |
Logical; whether to include a total column in the results (default is |
Returns an S4 object of class "clinical.table".
y <- factor(c("A", "B", "A", "B")) ma <- initialization(y)y <- factor(c("A", "B", "A", "B")) ma <- initialization(y)
Returns the intersection of all vectors provided as arguments.
intersect(x, y, ...)intersect(x, y, ...)
x |
A vector. |
y |
A vector. |
... |
Additional vectors to include in the intersection. |
This function computes the intersection of two or more vectors by identifying elements that are present in all of them. Unlike the base R intersect, which only compares two vectors, this version can take multiple vectors as input.
A character vector (or NULL if no intersection exists) containing the elements that are common to all input vectors.
Stefano Cacciatore
x <- c("a", "b", "c") y <- c("b", "c", "d") z <- c("c", "b", "e") intersect(x, y, z) # Returns "b" and "c" intersect(x, y, c("f", "g")) # Returns NULLx <- c("a", "b", "c") y <- c("b", "c", "d") z <- c("c", "b", "e") intersect(x, y, z) # Returns "b" and "c" intersect(x, y, c("f", "g")) # Returns NULL
Summarization of the continuous information.
multi_analysis (data, y, FUN=c("continuous.test","correlation.test"), ...)multi_analysis (data, y, FUN=c("continuous.test","correlation.test"), ...)
data |
the matrix containing the continuous values. Each row corresponds to a different sample. Each column corresponds to a different variable. |
y |
the classification of the cohort. |
FUN |
function to be considered. Choices are " |
... |
further arguments to be passed to or from methods. |
The function returns a table with the summarized information. If the number of group is equal to two, the p-value is computed using the Wilcoxon rank-sum test, Kruskal-Wallis test otherwise.
Stefano Cacciatore
categorical.test,continuous.test,correlation.test, txtsummary
data(prostate) multi_analysis(prostate[,c("BMI","Age")],prostate[,"Hospital"],FUN="continuous.test")data(prostate) multi_analysis(prostate[,c("BMI","Age")],prostate[,"Hospital"],FUN="continuous.test")
The data belong to a cohort of 35 patients with prostate cancer from two different hospitals.
data(prostate)data(prostate)
The data.frame "prostate" with the following elements: "Hospital", "Gender", "Gleason score", "BMI", and "Age".
data(prostate) head(prostate)data(prostate) head(prostate)
Summarization of a numeric vector.
txtsummary(x, f = c("median", "mean"), digits = 0, scientific = FALSE, range = c("IQR", "95%CI", "range", "sd"))txtsummary(x, f = c("median", "mean"), digits = 0, scientific = FALSE, range = c("IQR", "95%CI", "range", "sd"))
x |
a numeric vector. |
f |
xxx. |
digits |
how many significant digits are to be used. |
scientific |
either a logical specifying whether result should be encoded in scientific format. |
range |
the range to be visualized. |
The function returns the median and the range (interquartile or 95% coefficient interval) of numeric vetor.
Stefano Cacciatore
categorical.test,continuous.test,correlation.test, txtsummary
data(prostate) txtsummary(prostate[,"BMI"])data(prostate) txtsummary(prostate[,"BMI"])