Biological Rediscovery Results¶

Beyond metric-based evaluation, we re-implemented the analytical workflows described in the original clinical studies to determine whether established biomarkers and biological insights could be independently rediscovered from synthetic data. This page presents the biological rediscovery findings organized by cancer cohort, following the narrative of each original study.

ccRCC

Melanoma

NSCLC

ccRCC¶

Cohort: 311 patients with advanced clear cell renal cell carcinoma from CheckMate 009, 010, and 025 trials, treated with nivolumab (anti-PD-1). Original study by Braun et al. (Nature Medicine, 2020).

Braun et al. performed an integrated genetic, transcriptomic, and immunopathologic analysis of advanced ccRCC tumors, uncovering how somatic alterations—particularly PBRM1 mutations and 9p21.3 deletions—interact with immune infiltration patterns to modulate response to PD-1 blockade.

Differential Gene Expression¶

In the original study, Braun et al. identified that PBRM1 alterations are associated with increased angiogenesis gene expression. As illustrated in Figure 1, we re-applied this analysis to synthetic datasets: both Avatars (K5/K10) and Gaussian Copula successfully reproduced this clinical signal (Wilcoxon rank-sum test, P < 0.01). In contrast, CTGAN and Synthpop failed to capture the PBRM1–angiogenesis association.

Angiogenesis signal between PRBM1 Figure 1: PBRM1 mutation-associated angiogenesis signal in ccRCC. Angiogenesis scores stratified by PBRM1 mutation status are shown. Statistical significance was assessed using a two-sided Wilcoxon rank-sum test.

Gene Set Enrichment Analysis¶

The original study revealed that 9p21.3 deletion was associated with a distinct pathway signature: increased EMT, mTORC1 signaling, angiogenesis, hypoxia, and glycolysis, alongside decreased oxidative phosphorylation, fatty acid metabolism, and TGF-β signaling. When we re-ran GSEA on synthetic datasets, Avatars (K5/K10) and Gaussian Copula reproducibly recapitulated the downregulated pathways—in particular fatty acid metabolism and oxidative phosphorylation. Upregulated pathways were harder to recover, with EMT being the only signal consistently detected across methods. Gaussian Copula demonstrated the best cross-replicate stability, including one replicate that was almost identical to the original full pathway pattern.

Cell type deconvolution

Figure 2: GSEA result of 9p21.3 deletion in ccRCC cohort. Bubble plots show NES values for Hallmark pathways in the original and synthetic datasets. Color indicates statistical significance.

Single-sample GSEA¶

At the individual sample level, Braun et al. showed that loss-of-function PBRM1 mutations were associated with reduced IL6-JAK-STAT3 signaling (Wilcoxon rank-sum test, P = 0.01). This signal was robustly reproduced by the Gaussian Copula across multiple synthetic replicates. Additionally, the direction and statistical significance of other pathways — including estrogen response, apoptosis, allograft rejection, and UV response — were also recovered by Gaussian Copula, whereas other SDG methods exhibited pronounced inter-replicacte variability.

ssGSEA ccRCC PBRM1 Figure 3: Association between ssGSEA signals and PBRM1 phenotype in the ccRCC dataset . Bubble plots compare differential NES between PBRM1 mutation patients and PBRM1 wildtype patients in original and synthetic datasets. Bubble color denotes P value thresholds (Wilcoxon rank-sum test). PBRM1 mutations were associated with reduced IL6-JAK-STAT3 signaling in the original cohort.

Cell Type Deconvolution¶

Using CIBERSORTx with the LM22 reference signature, differential analysis between immune-infiltrated and immune-excluded/desert tumors in the original ccRCC cohort identified enrichment of CD8+ T cells, follicular helper T cells, activated CD4+ memory T cells, and M1 macrophages in infiltrated tumors, whereas excluded/desert tumors exhibited higher proportions of M2/M0 macrophages, resting CD4+ memory T cells, resting NK cells, and eosinophils.

Among the SDG methods, only Avatars and Gaussian Copula managed to reconstruct immune contrasts at a near-significance level (Wilcoxon rank-sum test, FDR Q < 0.25). Importantly, reproducibility across synthetic replicates was achieved only for cell types with very strong significance in the original data (FDR Q < 0.05), such as CD8+ T cells and resting CD4+ memory T cells. For weaker or non-significant effects, corresponding synthetic results varied substantially between replicates.

Cell type deconvolution

Figure 4: Differential LM22 immune cell proportions in ccRCC. Dot plots compare mean differences in immune cell fractions inferred by CIBERSORTx between immune-infiltrated and immune-excluded/desert tumors across the original and synthetic datasets generated by each SDG method. Colors indicate statistical significance based on a two-sided Wilcoxon rank-sum test with FDR adjustment.

Survival Analysis¶

Braun et al. demonstrated that patients harboring PBRM1 alterations exhibited improved overall survival compared with wild-type cases. This genotype–outcome association was faithfully recapitulated by Avatars K10 and Gaussian Copula (log-rank test, P < 0.05), indicating effective preservation of mutation-linked survival dependencies. Overall, Avatars (K5/K10) and Gaussian Copula preserved significant differences between responders and non-responders for both OS and PFS (log-rank test, P < 0.01). CTGAN retained significance for OS only, whereas Synthpop preserved significance for PFS only—indicating partial recovery of survival signals. TVAE was excluded from this cohort because it generated only a single treatment label across replicates.

PBRM1 survival analysis Figure 5: PBRM1-associated overall survival in ccRCC. Kaplan-Meier curves compare overall survival between PBRM1-mutant (blue) and wild-type (orange) groups in the original dataset and across synthetic cohorts generated by each SDG method. The original survival advantage of PBRM1-mutant patients is reproduced in selected synthetic datasets, with log-rank P values C-indices reported in each panel.

Melanoma¶

Cohort: 121 patients with metastatic melanoma treated with anti-PD1 immune checkpoint blockade. Original study by Liu et al. (Nature Medicine, 2019).

Liu et al. performed an integrative molecular analysis, discovering that MHC class II-associated gene expression, interferon response pathways, and immune cell composition were key determinants of anti-PD1 response, and developed parsimonious predictive models integrating clinical, genomic and transcriptomic features.