The ovarian cancer dataset is based on gene expression data of oncological patients and is constituted by 517 observations over 12042 covariates. This data was obtained from The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/).
The dataset is publicly available (https://gdc-portal.nci.nih.gov/) and it was normalized and aggregated by the TCGA consortium allowing for the analysis to be reproducible with the original dataset. We only cleaned corrupted case records from the data, where correlated attributes had invalid information.
The clinical data was cleaned using “Days to last follow-up” and “Days to death” attributes to detect inconsistencies between them. Only the cases where the number of days matched were included in the analysis. The same process was performed for the attributes “Days to death” and “Vital status”, where some cases had as status “deceased”, but a missing “Days to death”.
This dataset was analysed in three different ways.
Lets read each dataset. Do not forget to change the filepath for the correct directory of your computer.
The covariate matrix has to be defined. The data is stored in a txt file.
data.survival18<- read.table("./tcga18.txt",dec = ",", header = TRUE)
head(data.survival18)
## LPL IGF1 EDNRA MFAP5 LOX INHBA THBS2 ADIPOQ
## 1 5.610701 7.511564 5.251405 4.348473 3.753541 4.789777 8.197449 4.852636
## 2 3.927762 6.238349 5.916283 6.667097 4.025553 4.925367 8.380464 3.204983
## 3 6.667391 4.218046 4.300340 3.395904 3.872636 3.835449 5.995941 3.164932
## 4 3.737681 6.878165 3.823494 2.997130 3.349100 3.585837 4.163529 3.209809
## 5 4.649516 7.648314 6.883327 7.845051 4.645337 5.202604 9.693533 3.193201
## 6 3.693657 6.110123 4.855396 6.772189 4.604459 6.536206 10.045702 3.260278
## NPY CCL11 VCAN DCN TIMP3 CRYAB CXCL12
## 1 3.468236 4.035410 6.582996 7.641697 6.138341 5.342155 6.256299
## 2 3.401826 4.431276 7.088650 10.527365 6.665464 7.184504 7.493833
## 3 3.538999 4.054605 3.874755 7.404490 4.243858 4.089568 4.210808
## 4 3.499236 3.793138 3.569516 3.744800 3.656709 6.956911 6.939087
## 5 3.251680 5.470451 8.267006 9.763219 6.290778 10.038258 5.326372
## 6 3.445472 5.797378 8.239962 9.495431 7.657357 6.589208 6.701060
## SPARC CNN1 FBN1
## 1 9.882551 5.000291 5.514547
## 2 10.482700 6.432753 7.220278
## 3 7.042135 4.435957 4.330573
## 4 7.010820 4.459158 3.850500
## 5 11.077949 9.074373 7.972447
## 6 10.894212 6.020242 6.567972
And the matrix for the time and status is given by:
data.time.status<-read.table("./timestatus.txt",dec = ",", header = TRUE)
head(data.time.status)
## time status
## 1 1336 1
## 2 1247 1
## 3 55 1
## 4 1495 0
## 5 61 1
## 6 1418 0
The Cox’s proportional hazard was performed in all variables.
Let’s start the analysis for the Cox regression model.
library(survival)
Fit the Cox proportional hazards to the dataset.
fit.cox.all18<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival18)
fit.cox.all18
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~
## ., data = data.survival18)
##
## coef exp(coef) se(coef) z p
## LPL 0.1263 1.1346 0.0751 1.68 0.0924
## IGF1 0.0210 1.0212 0.0600 0.35 0.7266
## EDNRA 0.0224 1.0227 0.1227 0.18 0.8549
## MFAP5 0.0165 1.0166 0.0482 0.34 0.7327
## LOX 0.1918 1.2114 0.1251 1.53 0.1254
## INHBA -0.1432 0.8666 0.1786 -0.80 0.4227
## THBS2 0.0639 1.0660 0.0902 0.71 0.4787
## ADIPOQ -0.1256 0.8819 0.0910 -1.38 0.1676
## NPY 0.0552 1.0567 0.0496 1.11 0.2655
## CCL11 -0.1296 0.8785 0.0960 -1.35 0.1771
## VCAN 0.0578 1.0595 0.1009 0.57 0.5664
## DCN 0.0729 1.0757 0.0892 0.82 0.4133
## TIMP3 0.0719 1.0745 0.0835 0.86 0.3891
## CRYAB 0.1092 1.1154 0.0424 2.58 0.0100
## CXCL12 0.0204 1.0206 0.0818 0.25 0.8030
## SPARC -0.3811 0.6831 0.1402 -2.72 0.0066
## CNN1 0.0863 1.0901 0.1141 0.76 0.4493
## FBN1 0.1135 1.1202 0.1690 0.67 0.5018
##
## Likelihood ratio test=26.8 on 18 df, p=0.0837
## n= 517, number of events= 284
Next, see if the hypothesis of proportional hazards is not violated. By the results obtained, the proportional hazards hypothesis is not violated.
pph18<-cox.zph(fit.cox.all18, transform="km", global=TRUE)
pph18
## rho chisq p
## LPL 0.0584 1.09e+00 0.2970
## IGF1 -0.0378 4.75e-01 0.4909
## EDNRA -0.0256 3.00e-01 0.5839
## MFAP5 0.0674 1.52e+00 0.2184
## LOX 0.0179 1.09e-01 0.7418
## INHBA 0.0118 4.31e-02 0.8355
## THBS2 -0.0379 4.56e-01 0.4997
## ADIPOQ -0.1246 4.87e+00 0.0273
## NPY -0.0310 3.28e-01 0.5669
## CCL11 0.0665 1.36e+00 0.2440
## VCAN 0.0240 1.81e-01 0.6705
## DCN -0.0100 3.12e-02 0.8598
## TIMP3 -0.0202 1.27e-01 0.7216
## CRYAB -0.0544 9.81e-01 0.3219
## CXCL12 -0.0014 6.83e-04 0.9791
## SPARC 0.0577 9.08e-01 0.3406
## CNN1 -0.0850 2.82e+00 0.0928
## FBN1 -0.0218 1.91e-01 0.6618
## GLOBAL NA 1.92e+01 0.3795
In order to see if the results obtained before are consistent, a robust version of the Cox regression model is going to be performed. First call the following package.
library(coxrobust)
To see the results obtained by fitting the robust Cox using exponential weights with a 5% level of trimming, do the following:
fit.coxrobust18<-coxr(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival18,trunc=0.95, f.weight="exponential",singular.ok=T,model=F)
fit.coxrobust18
##
## Call:
## coxr(formula = Surv(data.time.status$time, data.time.status$status) ~ ., data = data.survival18, trunc = 0.95, f.weight = "exponential", singular.ok = T, model = F)
##
## Partial likelihood estimator
## coef exp(coef) se(coef) p
## LPL 0.1263 1.135 0.0751 0.09257
## IGF1 0.0207 1.021 0.0601 0.73040
## EDNRA 0.0223 1.023 0.1227 0.85570
## MFAP5 0.0166 1.017 0.0482 0.73068
## LOX 0.1922 1.212 0.1251 0.12450
## INHBA -0.1431 0.867 0.1786 0.42301
## THBS2 0.0640 1.066 0.0902 0.47786
## ADIPOQ -0.1254 0.882 0.0910 0.16841
## NPY 0.0552 1.057 0.0496 0.26528
## CCL11 -0.1296 0.878 0.0960 0.17703
## VCAN 0.0576 1.059 0.1009 0.56776
## DCN 0.0728 1.076 0.0892 0.41396
## TIMP3 0.0719 1.074 0.0835 0.38923
## CRYAB 0.1092 1.115 0.0424 0.00999
## CXCL12 0.0204 1.021 0.0818 0.80330
## SPARC -0.3818 0.683 0.1403 0.00649
## CNN1 0.0866 1.091 0.1141 0.44763
## FBN1 0.1141 1.121 0.1690 0.49972
##
## Wald test=27 on 18 df, p=0.079
##
## Robust estimator
## coef exp(coef) se(coef) p
## LPL 0.10107 1.106 0.0856 0.2375
## IGF1 0.03407 1.035 0.0705 0.6289
## EDNRA 0.06185 1.064 0.2119 0.7704
## MFAP5 0.00888 1.009 0.0622 0.8865
## LOX 0.16876 1.184 0.1499 0.2604
## INHBA -0.15555 0.856 0.1895 0.4118
## THBS2 0.08631 1.090 0.1072 0.4205
## ADIPOQ -0.07269 0.930 0.1047 0.4875
## NPY 0.06251 1.065 0.0710 0.3785
## CCL11 -0.15784 0.854 0.1212 0.1927
## VCAN 0.02857 1.029 0.1419 0.8404
## DCN 0.07907 1.082 0.0993 0.4257
## TIMP3 0.07745 1.081 0.0906 0.3925
## CRYAB 0.11793 1.125 0.0544 0.0302
## CXCL12 0.01292 1.013 0.0962 0.8932
## SPARC -0.39783 0.672 0.2020 0.0489
## CNN1 0.13126 1.140 0.1395 0.3468
## FBN1 0.11223 1.119 0.2234 0.6154
##
## Extended Wald test=27.8 on 18 df, p=0.0647
Another proposal of the robust Cox is given by Heritier (2009), and is going to be performed next.
filepath="../functions/"
source(paste(filepath,"Chapter7_functions.r",sep=""))
The fitted model using exponential weights with 5% level of trimming is given bellow.
fit.coxrobust.H18=rcoxph(data.time.status$time,data.time.status$status,data.survival18,wt.type="exponential",quant=.95)
fit.coxrobust.H18$coefficients
## estimate SE z p.value
## LPL 0.101118182 0.07169238 1.4104454 0.15840
## IGF1 0.034040440 0.06700056 0.5080620 0.61140
## EDNRA 0.062118582 0.14823520 0.4190542 0.67517
## MFAP5 0.008912579 0.05164610 0.1725702 0.86298
## LOX 0.168964994 0.12809588 1.3190510 0.18715
## INHBA -0.155638383 0.18406256 -0.8455733 0.39779
## THBS2 0.086216158 0.09076783 0.9498537 0.34218
## ADIPOQ -0.072837762 0.10007700 -0.7278172 0.46672
## NPY 0.062470225 0.05534324 1.1287778 0.25899
## CCL11 -0.157642473 0.10130146 -1.5561718 0.11966
## VCAN 0.028569797 0.09559983 0.2988478 0.76505
## DCN 0.079087093 0.09756531 0.8106067 0.41759
## TIMP3 0.077490703 0.08806427 0.8799335 0.37889
## CRYAB 0.117990400 0.04369162 2.7005269 0.00692
## CXCL12 0.012983156 0.08792611 0.1476598 0.88261
## SPARC -0.397470586 0.13323024 -2.9833361 0.00285
## CNN1 0.131279965 0.13408693 0.9790660 0.32754
## FBN1 0.111613834 0.18058542 0.6180667 0.53653
The only genes statistically significant were: CRYAB and SPARC, for Cox’s and Cox’s robust.
The next figure shows that observations 113 and 219 are identified as influential observations in the sense that the weight given for each one is the lowest among the others.
In the next section it is going to be evaluated if the observations identified before are or not influential.
In order to identify which of the observations might be outliers the martingale residual is going to be performed.
The results regarding the martingale residuals are shown bellow. Observation 219 presented the lowest values for the martingale residual.
res.mart.18<- resid(fit.cox.all18,type="martingale")
head(sort(res.mart.18))
## 219 455 221 113 211 372
## -3.562662 -2.871125 -2.700999 -2.610279 -2.498537 -2.355086
NOTE: I can not find a way to identify in R markdown a way of identify the observations.
data.survival22<- read.table("./tcga22.txt",dec = ",", header = TRUE)
head(data.survival22)
## AKT1 BARD1 BRCA1 BRCA2 BRIP1 CDH1 CHEK2 CTNNB1
## 1 6.258296 7.988497 4.642252 3.593121 3.745706 5.988638 4.437772 9.551465
## 2 6.123882 6.139685 3.777406 3.592756 3.256215 6.551716 5.827722 9.299819
## 3 6.353982 6.700977 3.987946 3.522306 3.262751 5.714169 4.986823 8.766935
## 4 6.488616 5.662163 4.066489 3.440468 3.041098 5.845066 4.386200 7.753195
## 5 5.723935 5.675966 3.791599 3.385041 3.252254 6.202242 4.227166 8.777190
## 6 6.590594 5.910909 4.062616 3.499677 2.943837 6.766820 4.191829 8.896559
## MLH1 MRE11A MSH2 MSH6 NBN OPCML PALB2 PARK2
## 1 7.796957 4.532044 6.615308 5.563443 6.347600 3.119211 5.924211 3.270832
## 2 7.351675 4.777423 6.704939 5.320612 6.819968 3.360535 5.412581 3.523069
## 3 6.744995 4.038520 4.456795 4.682963 6.859582 3.511675 5.599444 3.255477
## 4 6.138424 4.834517 6.355851 5.410165 7.769844 3.416089 6.006152 3.395512
## 5 7.998900 4.384351 5.285149 4.660124 6.230915 3.276513 5.568247 3.235833
## 6 6.866562 4.831684 5.663074 4.876299 7.590006 3.394371 5.319687 3.360755
## PIK3CA PMS2 RAD50 RAD51C STK11 TP53
## 1 5.656110 5.249860 4.456818 6.461107 4.218988 3.523659
## 2 5.430994 4.380777 5.609112 5.433464 4.215096 5.423081
## 3 4.841365 5.168418 4.609403 4.817056 4.541523 5.123093
## 4 6.204039 5.301112 5.443193 5.576623 4.251991 5.041598
## 5 5.279189 6.034872 5.798886 5.775185 4.405784 4.780075
## 6 5.975738 4.745667 4.623963 4.661468 4.678426 5.243353
And the matrix for the time and status is given by:
data.time.status<-read.table("./timestatus.txt",dec = ",", header = TRUE)
head(data.time.status)
## time status
## 1 1336 1
## 2 1247 1
## 3 55 1
## 4 1495 0
## 5 61 1
## 6 1418 0
The Cox’s proportional hazard was performed in all variables.
Let’s start the analysis for the Cox regression model.
library(survival)
Fit the Cox proportional hazards to the dataset.
fit.cox.all22<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival22)
fit.cox.all22
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~
## ., data = data.survival22)
##
## coef exp(coef) se(coef) z p
## AKT1 -0.19913 0.81944 0.10276 -1.94 0.0526
## BARD1 -0.03631 0.96434 0.11452 -0.32 0.7512
## BRCA1 0.09836 1.10337 0.15951 0.62 0.5375
## BRCA2 0.49402 1.63890 0.21141 2.34 0.0194
## BRIP1 -0.22114 0.80160 0.23950 -0.92 0.3558
## CDH1 0.03771 1.03843 0.14218 0.27 0.7908
## CHEK2 -0.12778 0.88005 0.10071 -1.27 0.2045
## CTNNB1 0.19857 1.21966 0.17019 1.17 0.2433
## MLH1 0.06619 1.06843 0.14428 0.46 0.6464
## MRE11A -0.16245 0.85006 0.20968 -0.77 0.4385
## MSH2 0.04115 1.04201 0.13400 0.31 0.7588
## MSH6 0.04407 1.04506 0.21013 0.21 0.8339
## NBN 0.19078 1.21019 0.11486 1.66 0.0967
## OPCML 0.33666 1.40026 0.31939 1.05 0.2919
## PALB2 -0.42376 0.65458 0.13850 -3.06 0.0022
## PARK2 0.74684 2.11032 0.50070 1.49 0.1358
## PIK3CA 0.00855 1.00859 0.10115 0.08 0.9326
## PMS2 0.12670 1.13508 0.12100 1.05 0.2951
## RAD50 0.14258 1.15325 0.13167 1.08 0.2789
## RAD51C -0.09550 0.90892 0.11626 -0.82 0.4114
## STK11 0.06164 1.06358 0.34490 0.18 0.8582
## TP53 -0.04847 0.95269 0.06237 -0.78 0.4371
##
## Likelihood ratio test=28.6 on 22 df, p=0.157
## n= 517, number of events= 284
Next, see if the hypothesis of proportional hazards is not violated. By the results obtained, the proportional hazards hypothesis is not violated.
pph22<-cox.zph(fit.cox.all22, transform="km", global=TRUE)
pph22
## rho chisq p
## AKT1 0.00724 0.0171 0.8960
## BARD1 0.00683 0.0145 0.9043
## BRCA1 -0.09332 2.6854 0.1013
## BRCA2 0.07766 1.6607 0.1975
## BRIP1 -0.08000 1.5860 0.2079
## CDH1 0.04793 1.0132 0.3141
## CHEK2 -0.08389 2.1027 0.1470
## CTNNB1 0.06718 1.2913 0.2558
## MLH1 0.06660 1.3205 0.2505
## MRE11A 0.11347 3.4451 0.0634
## MSH2 -0.09146 2.5408 0.1109
## MSH6 0.10872 3.3348 0.0678
## NBN 0.01563 0.0829 0.7734
## OPCML 0.00652 0.0132 0.9084
## PALB2 -0.07804 1.9425 0.1634
## PARK2 0.02826 0.2431 0.6220
## PIK3CA -0.09990 2.9543 0.0857
## PMS2 0.02823 0.2454 0.6204
## RAD50 -0.04988 0.8128 0.3673
## RAD51C 0.00993 0.0352 0.8511
## STK11 -0.02991 0.2859 0.5928
## TP53 0.01613 0.0855 0.7700
## GLOBAL NA 23.2662 0.3868
In order to see if the results obtained before are consistent, a robust version of the Cox regression model is going to be performed. First call the following package.
library(coxrobust)
To see the results obtained by fitting the robust Cox using exponential weights with a 5% level of trimming, do the following:
fit.coxrobust22<-coxr(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival22,trunc=0.95, f.weight="exponential",singular.ok=T,model=F)
fit.coxrobust22
##
## Call:
## coxr(formula = Surv(data.time.status$time, data.time.status$status) ~ ., data = data.survival22, trunc = 0.95, f.weight = "exponential", singular.ok = T, model = F)
##
## Partial likelihood estimator
## coef exp(coef) se(coef) p
## AKT1 -0.19926 0.819 0.1028 0.05249
## BARD1 -0.03648 0.964 0.1145 0.75007
## BRCA1 0.09807 1.103 0.1595 0.53867
## BRCA2 0.49326 1.638 0.2114 0.01964
## BRIP1 -0.22078 0.802 0.2395 0.35661
## CDH1 0.03712 1.038 0.1422 0.79406
## CHEK2 -0.12806 0.880 0.1007 0.20356
## CTNNB1 0.19797 1.219 0.1703 0.24492
## MLH1 0.06658 1.069 0.1443 0.64448
## MRE11A -0.16265 0.850 0.2097 0.43793
## MSH2 0.04153 1.042 0.1340 0.75662
## MSH6 0.04362 1.045 0.2101 0.83557
## NBN 0.19075 1.210 0.1149 0.09679
## OPCML 0.33620 1.400 0.3194 0.29257
## PALB2 -0.42386 0.655 0.1385 0.00221
## PARK2 0.74706 2.111 0.5007 0.13565
## PIK3CA 0.00846 1.008 0.1011 0.93337
## PMS2 0.12680 1.135 0.1210 0.29467
## RAD50 0.14232 1.153 0.1317 0.27974
## RAD51C -0.09518 0.909 0.1163 0.41294
## STK11 0.06369 1.066 0.3449 0.85350
## TP53 -0.04862 0.953 0.0624 0.43569
##
## Wald test=28.7 on 22 df, p=0.153
##
## Robust estimator
## coef exp(coef) se(coef) p
## AKT1 -0.179299 0.836 0.1714 0.2954
## BARD1 -0.047121 0.954 0.1227 0.7010
## BRCA1 0.146716 1.158 0.2017 0.4669
## BRCA2 0.409153 1.506 0.2403 0.0886
## BRIP1 -0.144673 0.865 0.2869 0.6141
## CDH1 -0.013347 0.987 0.1903 0.9441
## CHEK2 -0.087715 0.916 0.1118 0.4325
## CTNNB1 0.155470 1.168 0.2419 0.5204
## MLH1 0.000376 1.000 0.1541 0.9981
## MRE11A -0.257807 0.773 0.3052 0.3983
## MSH2 0.108100 1.114 0.2364 0.6475
## MSH6 -0.029767 0.971 0.3432 0.9309
## NBN 0.179014 1.196 0.1530 0.2420
## OPCML 0.362036 1.436 0.3162 0.2522
## PALB2 -0.388593 0.678 0.2140 0.0694
## PARK2 0.696033 2.006 0.6044 0.2495
## PIK3CA 0.042643 1.044 0.1171 0.7157
## PMS2 0.107714 1.114 0.1561 0.4901
## RAD50 0.179381 1.196 0.1527 0.2402
## RAD51C -0.084398 0.919 0.1383 0.5418
## STK11 0.141996 1.153 0.3867 0.7134
## TP53 -0.052114 0.949 0.0908 0.5659
##
## Extended Wald test=21.3 on 22 df, p=0.502
Another proposal of the robust Cox is given by Heritier (2009), and is going to be performed next.
filepath="../functions/"
source(paste(filepath,"Chapter7_functions.r",sep=""))
The fitted model using exponential weights with 5% level of trimming is given bellow.
fit.coxrobust.H22=rcoxph(data.time.status$time,data.time.status$status,data.survival22,wt.type="exponential",quant=.95)
fit.coxrobust.H22$coefficients
## estimate SE z p.value
## AKT1 -0.1793829787 0.10541514 -1.70168136 0.08881
## BARD1 -0.0472931507 0.11184613 -0.42284120 0.67241
## BRCA1 0.1462191499 0.16571660 0.88234460 0.37759
## BRCA2 0.4093275515 0.21955109 1.86438410 0.06226
## BRIP1 -0.1445577094 0.25405582 -0.56899978 0.56935
## CDH1 -0.0134656022 0.17903523 -0.07521202 0.94004
## CHEK2 -0.0875453467 0.10429829 -0.83937467 0.40125
## CTNNB1 0.1554211725 0.16732977 0.92883158 0.35297
## MLH1 0.0004336543 0.15300874 0.00283418 0.99773
## MRE11A -0.2577233623 0.21334812 -1.20799455 0.22704
## MSH2 0.1082866920 0.13309187 0.81362365 0.41586
## MSH6 -0.0297638431 0.19530129 -0.15239962 0.87887
## NBN 0.1789929451 0.12562204 1.42485305 0.15419
## OPCML 0.3616498551 0.23661473 1.52843340 0.12640
## PALB2 -0.3884494783 0.15216721 -2.55278036 0.01068
## PARK2 0.6957356921 0.50590250 1.37523672 0.16905
## PIK3CA 0.0426598556 0.10668713 0.39985942 0.68926
## PMS2 0.1078097744 0.12648730 0.85233674 0.39402
## RAD50 0.1792425255 0.14388234 1.24575766 0.21285
## RAD51C -0.0843687276 0.12102781 -0.69710199 0.48573
## STK11 0.1422488093 0.36413834 0.39064497 0.69605
## TP53 -0.0520303860 0.06648857 -0.78254635 0.43389
The only genes statistically significant were: BRCA2 and PALB2 for the Cox’s and PALB2 for the robust version of Heritier.
The next figure shows that observations 114 and 211 are identified as influential observations in the sense that the weight given for each one is the lowest among the others.
## [1] 517
## 95%
## 1.708681
## 114 211 26 279 297 113
## 4.857660237 3.147022286 2.859966157 2.850864321 2.659806309 2.589187266
## 44 254 155 69 55 354
## 2.408387318 2.275224159 2.223210431 2.180841738 2.121971972 2.097512594
## 371 372 516 301 452 377
## 2.091148221 2.020831974 2.019849748 1.990466902 1.979131080 1.962350742
## 455 269 278 20 221 60
## 1.892881225 1.865787388 1.848765974 1.844500575 1.813631357 1.768411944
## 368 48 268 220 263 232
## 1.757595769 1.709457002 1.708486612 1.688189344 1.687841798 1.679534527
## 117 317 115 295 14 233
## 1.647032728 1.616395160 1.604698977 1.596106350 1.554440978 1.519331861
## 28 184 45 88 338 251
## 1.514266176 1.496526529 1.486739857 1.480507878 1.469047057 1.462671705
## 116 210 228 158 192 29
## 1.461296165 1.460678091 1.449018610 1.441962914 1.407642773 1.406111596
## 203 405 513 219 350 426
## 1.377870098 1.372725223 1.353851299 1.353623961 1.344349584 1.328444133
## 222 22 381 152 15 266
## 1.319010590 1.316517451 1.312148992 1.309659441 1.299641493 1.293569544
## 119 303 273 125 179 322
## 1.286982979 1.277137370 1.275485136 1.273654764 1.258013873 1.236653265
## 151 469 112 453 159 407
## 1.223303335 1.219913785 1.214600326 1.210520103 1.199949774 1.197797347
## 120 327 260 207 161 330
## 1.194253940 1.180419642 1.179366141 1.171988224 1.167047889 1.159614319
## 382 508 32 253 231 506
## 1.157261888 1.152999463 1.139996206 1.133809494 1.129817031 1.110084759
## 62 290 75 153 18 124
## 1.107385472 1.106501026 1.103525009 1.096050780 1.070926149 1.069508831
## 363 510 306 487 122 21
## 1.045947548 1.042491559 1.041004830 1.023966469 1.017302533 1.011569475
## 425 507 67 10 123 265
## 1.001606940 0.993636290 0.991737797 0.963630197 0.960649003 0.946600648
## 130 131 388 16 416 47
## 0.937591893 0.930755990 0.926765295 0.920331655 0.917739662 0.905607200
## 12 267 314 165 61 6
## 0.900319622 0.898331460 0.898191977 0.893883839 0.889819703 0.885453858
## 422 326 483 321 191 348
## 0.876920581 0.866374051 0.861508499 0.855175749 0.854266181 0.848003043
## 11 118 420 63 312 331
## 0.836842140 0.834173564 0.825139595 0.822273193 0.811117833 0.801795227
## 261 406 390 272 23 36
## 0.796623016 0.792034017 0.791293403 0.790541662 0.788853112 0.788550517
## 4 76 408 391 509 485
## 0.786626265 0.786496725 0.784164555 0.763928357 0.762321452 0.759191170
## 193 511 99 502 163 39
## 0.744589356 0.744479648 0.742891671 0.741373621 0.740056987 0.736346968
## 412 49 356 81 505 347
## 0.735309950 0.734333777 0.726889142 0.722982257 0.719779911 0.719755244
## 411 486 384 484 274 515
## 0.711062466 0.706939270 0.706908090 0.704770061 0.700530582 0.697874097
## 389 369 2 160 264 128
## 0.695516617 0.681742908 0.681102939 0.675882773 0.672488842 0.667420773
## 299 127 186 46 206 501
## 0.662163224 0.661476737 0.657963801 0.652885031 0.649787588 0.644608126
## 374 25 410 307 280 43
## 0.629169297 0.624263159 0.623604998 0.619869519 0.619178078 0.617704074
## 325 204 370 82 298 98
## 0.617422497 0.613082319 0.605822276 0.604837671 0.604274776 0.603222598
## 337 358 35 223 121 329
## 0.588457654 0.579149630 0.570378067 0.558589689 0.551709754 0.551306199
## 166 457 418 431 216 197
## 0.550791775 0.549311921 0.544015201 0.540171352 0.538950527 0.537394767
## 315 491 481 436 341 108
## 0.526105520 0.517787059 0.516387345 0.508714930 0.505086451 0.504965675
## 162 308 378 214 209 110
## 0.502217528 0.500865188 0.499893761 0.498732225 0.495632153 0.492861788
## 345 109 474 498 489 1
## 0.490402427 0.490127981 0.486189431 0.481207122 0.477641693 0.476503551
## 404 439 132 181 373 463
## 0.475996548 0.475931245 0.467306590 0.466758211 0.461398754 0.460375714
## 427 375 442 361 71 300
## 0.450120627 0.449968613 0.448709755 0.448501990 0.448307777 0.443438873
## 313 37 178 471 249 335
## 0.438237425 0.435059980 0.431965917 0.430916578 0.429304926 0.428696288
## 349 392 95 270 434 215
## 0.427788203 0.420997322 0.420929444 0.417045810 0.413581078 0.411123872
## 336 470 85 465 309 473
## 0.407111961 0.406263379 0.404495588 0.401456059 0.400992363 0.399202486
## 31 129 417 218 332 34
## 0.396558456 0.392681403 0.389825469 0.389279527 0.387970291 0.386753468
## 101 19 78 281 367 195
## 0.381272513 0.375829377 0.371320205 0.366388346 0.366078677 0.362366374
## 320 385 376 92 467 167
## 0.360462762 0.358833254 0.358785026 0.354012160 0.353578502 0.350812828
## 13 449 33 199 275 318
## 0.349934742 0.349377433 0.345084710 0.343405005 0.340592434 0.337938897
## 423 72 183 293 154 305
## 0.334543512 0.330085520 0.328717720 0.328626794 0.326356968 0.325412574
## 134 468 182 366 258 353
## 0.324125148 0.319131433 0.300456533 0.294311167 0.293595622 0.293594410
## 169 424 441 200 497 462
## 0.292051150 0.291421050 0.290030487 0.286233822 0.284213401 0.282162848
## 156 230 58 27 454 17
## 0.279008179 0.275577379 0.275396260 0.269053886 0.262833346 0.258765965
## 246 514 435 446 259 482
## 0.256110995 0.255329505 0.254690441 0.250933881 0.250587068 0.249707311
## 68 429 448 433 227 196
## 0.249529925 0.246915161 0.246088475 0.243240321 0.242500295 0.239823556
## 451 50 226 395 324 344
## 0.239394389 0.238925023 0.233934228 0.233034278 0.232281996 0.231980777
## 225 41 185 133 437 387
## 0.231179073 0.229307096 0.226705691 0.226593291 0.225053814 0.223839411
## 365 450 440 394 247 499
## 0.222597866 0.220250997 0.218026660 0.217656998 0.217012208 0.215547510
## 304 70 352 428 386 100
## 0.210538618 0.209184169 0.208659356 0.208537765 0.205175288 0.202814019
## 362 443 393 271 96 24
## 0.199417880 0.198228710 0.197846118 0.197308724 0.196959330 0.196739490
## 157 8 111 340 444 342
## 0.193341062 0.191575635 0.189682696 0.186793014 0.179167179 0.178056523
## 168 445 30 93 339 126
## 0.176411276 0.175358479 0.174837388 0.172039634 0.171865386 0.170627075
## 517 248 187 136 73 292
## 0.170588652 0.167872661 0.167557245 0.166033931 0.165316527 0.164664670
## 334 135 102 80 379 194
## 0.164255192 0.163971612 0.161644750 0.157615872 0.154137297 0.153229084
## 51 319 488 351 91 414
## 0.146095408 0.143487930 0.143481038 0.143267647 0.141269539 0.139677855
## 77 464 399 494 355 383
## 0.137451098 0.135462649 0.135151243 0.132278822 0.132133481 0.127482987
## 380 262 87 40 66 9
## 0.126939729 0.123735833 0.123701165 0.121809585 0.121525486 0.121504874
## 252 65 64 495 245 400
## 0.120859655 0.119737989 0.117384648 0.111200578 0.111011671 0.110373869
## 188 397 396 490 343 276
## 0.110124955 0.107675835 0.106077640 0.105794063 0.104930779 0.104907423
## 94 460 59 74 42 84
## 0.104644080 0.104080242 0.103782709 0.103290423 0.100051004 0.099762279
## 143 224 170 277 480 409
## 0.099017374 0.098041669 0.097619823 0.096424674 0.095364975 0.094256870
## 250 234 229 302 287 208
## 0.093091651 0.091141512 0.088577328 0.088542755 0.088055843 0.087679819
## 86 137 401 142 360 479
## 0.087594519 0.086000375 0.085551550 0.084392487 0.082835326 0.082785427
## 90 438 54 148 56 147
## 0.082383364 0.080732874 0.080135266 0.075555734 0.072555894 0.071983531
## 53 256 145 139 213 430
## 0.069983102 0.069279592 0.068849947 0.068519440 0.068350955 0.067287297
## 413 402 478 57 235 177
## 0.067006035 0.066230089 0.065582600 0.064321945 0.064282238 0.063515915
## 237 333 144 398 291 244
## 0.063185104 0.063074663 0.062638158 0.061663743 0.061618879 0.059857779
## 198 138 164 106 286 140
## 0.059787672 0.059584236 0.059056954 0.058018642 0.057864577 0.057550616
## 243 89 172 174 38 461
## 0.057504714 0.056817126 0.055781157 0.055343480 0.055253209 0.055155700
## 504 493 105 146 180 282
## 0.054996328 0.054949169 0.054278790 0.054212595 0.053990886 0.053582822
## 476 236 240 419 212 201
## 0.052819191 0.051795226 0.051667396 0.051644248 0.051593355 0.051026963
## 173 289 283 103 242 205
## 0.050439178 0.050060420 0.050002647 0.049658752 0.048539194 0.048314271
## 141 79 241 294 364 104
## 0.048111285 0.047760088 0.047324084 0.047176495 0.046942400 0.046864115
## 149 403 190 255 466 97
## 0.045147091 0.045111110 0.044492682 0.043816608 0.043281539 0.043106304
## 296 52 503 107 496 459
## 0.042587845 0.042450354 0.042433189 0.042055875 0.041927275 0.041184846
## 458 189 150 238 171 512
## 0.041016502 0.040638307 0.039284758 0.039190731 0.037491406 0.036921411
## 83 288 456 176 284 447
## 0.034321675 0.034229978 0.034188066 0.033328301 0.032924555 0.032868112
## 257 328 323 477 5 285
## 0.032849253 0.029642417 0.029105603 0.028805872 0.027082114 0.024675979
## 359 492 7 500 3 175
## 0.023714575 0.023265783 0.023034617 0.021863474 0.020858088 0.020297475
## 432 421 472 357 239 475
## 0.020240701 0.016662130 0.015868770 0.014485308 0.013028390 0.012796552
## 310 316 311 217 346 202
## 0.010116631 0.009420008 0.008437502 0.005756461 0.005584571 0.002492063
## 415
## 0.002129644
In the next section it is going to be evaluated if the observations identified before are or not influential.
In order to identify which of the observations might be outliers the martingale residual is going to be performed.
The results regarding the martingale residuals are shown bellow. Observations 114 and 211 presented the lowest values for the martingale residual.
res.mart.22<- resid(fit.cox.all22,type="martingale")
head(sort(res.mart.22))
## 114 211 55 155 26 516
## -3.949405 -3.167509 -2.269112 -2.254516 -2.080776 -2.021388
data.survival63<- read.table("./tcga63.txt",dec = ",", header = TRUE)
head(data.survival63)
## HPCA UBE2J1 RPS6KA2 SDF2L1 GRB7 PTGFR ABCD2 FLJ20323
## 1 3.379032 5.896456 5.357038 4.874745 5.016913 3.217106 2.542321 6.321591
## 2 3.598830 5.665333 5.723389 5.944378 4.641386 3.263849 2.679477 7.036717
## 3 3.546836 5.506559 5.600255 5.824784 4.549980 3.320506 2.907267 5.951632
## 4 3.837105 5.812184 5.101305 5.460481 5.336149 3.198962 2.628976 5.466801
## 5 3.475349 5.652375 5.715203 6.546454 4.958317 3.126072 2.591892 6.716352
## 6 3.464720 5.958244 5.066772 6.311690 5.894845 3.085201 2.700265 6.396568
## WDR76 NDUFA3 FJX1 GAPDHS RAB40B PRR16 CLTCL1 PPM2C
## 1 3.754587 9.535576 6.309770 3.317805 6.496108 2.899054 4.261355 6.611384
## 2 3.542326 9.333357 7.611736 3.209725 6.313544 2.928660 3.591383 6.975410
## 3 4.149125 9.086639 5.903247 3.899254 5.721326 2.838115 3.749531 5.458948
## 4 4.188862 8.992090 6.821217 3.444856 7.460255 2.706408 3.801350 6.004740
## 5 3.274037 9.124981 6.440648 3.448312 6.237777 2.979348 4.570501 6.185189
## 6 3.738367 9.948365 6.372649 3.195631 6.562669 2.786102 3.664497 5.691560
## FOXE3 CHIT1 PI3 BNC1 D4S234E SAPS2 CSNK1G1 MLL2
## 1 3.458641 3.198617 3.532951 3.145852 5.396823 4.924117 3.759848 3.521559
## 2 3.186995 3.164931 3.977135 3.326847 7.106902 4.729032 3.684128 3.492482
## 3 3.402760 3.315234 6.607681 2.981269 4.184155 5.110301 3.661495 3.913990
## 4 3.086911 3.377248 3.848075 3.128189 3.868022 4.569071 3.768946 3.525928
## 5 3.081613 3.335361 7.669747 3.272936 3.678554 4.961995 3.715381 3.399159
## 6 3.329947 3.272312 3.343027 4.940456 5.598197 5.010243 3.502163 3.465175
## HSPB7 SLC37A4 WTAP SSTR1 IDUA PSG3 SLC9A2 PAPOLG
## 1 3.452125 4.606796 4.828085 3.044503 3.656593 3.106292 3.581540 3.301122
## 2 3.510539 4.967137 5.583108 3.150559 3.610913 3.181027 3.333928 3.126235
## 3 3.415771 4.492828 4.376314 3.057642 3.950750 3.246748 3.749377 3.431068
## 4 3.508134 5.532416 5.091276 3.140188 3.681205 3.186806 3.560720 3.573857
## 5 4.872602 4.721920 5.544560 3.060770 3.872118 3.089372 3.528873 3.354763
## 6 3.823164 5.112001 5.381004 3.055014 3.600098 3.055226 4.111485 3.075872
## GAS1 ELA3A KIF26B GBP2 POPDC2 OPN1SW DAP SRY
## 1 6.734406 3.264758 4.134659 5.868016 3.523604 3.161536 6.463228 3.071553
## 2 6.610936 3.137067 3.371108 6.572171 3.571035 3.234470 7.130915 2.811296
## 3 5.898160 3.177108 3.763395 6.617245 3.626677 3.371202 5.897651 3.065847
## 4 6.546472 3.135440 3.625850 8.128674 3.815497 3.460386 6.283933 3.225709
## 5 7.333761 3.226604 4.205577 6.396803 3.975049 3.113610 7.021223 2.891586
## 6 7.588338 3.106817 4.166533 7.966391 3.437208 3.220461 6.365899 2.915808
## UTP20 HOXD11 HSPA1L PPP3CA PAX2 FZD10 TREML2 CCR7
## 1 6.123138 3.578445 4.035664 6.629685 3.571113 7.959389 3.600410 3.779032
## 2 5.441584 3.509853 4.167603 7.337564 3.859440 8.408041 3.732745 3.666788
## 3 5.106732 3.608324 4.100663 6.832305 3.892738 5.875592 3.673938 3.408387
## 4 5.022666 3.414346 4.096152 6.501051 3.704142 4.673326 3.915832 3.411735
## 5 5.712868 3.910163 4.471145 7.732060 3.589429 4.385718 3.745363 3.427102
## 6 5.259647 3.452583 4.209783 8.355183 8.173988 9.485458 3.640250 3.909681
## MPZ MGAT4C EHMT1 ALG8 KCNN2 ESR2 TGM2 LBP
## 1 3.347113 3.262756 3.332088 6.711518 3.427246 3.033150 4.445288 3.416204
## 2 3.159607 3.151684 3.515331 7.444677 3.156577 3.292847 4.931456 3.333844
## 3 3.550646 3.245773 3.129334 7.272591 3.279558 3.173689 4.736734 3.374328
## 4 3.444858 3.258214 3.111921 8.033669 3.225018 3.266501 4.160638 3.424529
## 5 3.603284 3.324066 3.611812 7.236755 3.572769 3.134702 7.856202 3.224660
## 6 3.183013 3.079606 3.206733 7.168507 3.130631 3.158874 4.431019 3.198542
## SRPK3 FBXO40 ANGPT2 IRF5 ANXA4 DENND2D SGEF
## 1 3.540970 3.445100 3.570261 4.063981 7.435274 6.749716 3.464290
## 2 3.566586 3.487669 4.707569 3.861587 8.507906 5.462768 3.347701
## 3 4.332254 3.986397 3.722016 4.501815 6.217415 6.341208 3.404718
## 4 4.359996 3.629891 4.121320 4.103739 7.890721 7.577159 3.334543
## 5 3.798010 3.375658 4.775442 3.837081 8.370533 6.645085 3.440538
## 6 3.683951 3.462665 3.712975 4.081843 7.568210 7.670845 3.410655
And the matrix for the time and status is given by:
data.time.status<-read.table("./timestatus.txt",dec = ",", header = TRUE)
head(data.time.status)
## time status
## 1 1336 1
## 2 1247 1
## 3 55 1
## 4 1495 0
## 5 61 1
## 6 1418 0
The Cox’s proportional hazard was performed in all variables.
Let’s start the analysis for the Cox regression model.
library(survival)
Fit the Cox proportional hazards to the dataset.
fit.cox.all63<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival63)
fit.cox.all63
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~
## ., data = data.survival63)
##
## coef exp(coef) se(coef) z p
## HPCA -1.1893 0.3044 0.3560 -3.34 0.00083
## UBE2J1 -0.2160 0.8057 0.1475 -1.46 0.14309
## RPS6KA2 0.2972 1.3461 0.1124 2.64 0.00818
## SDF2L1 -0.2025 0.8167 0.1024 -1.98 0.04803
## GRB7 0.3360 1.3994 0.0965 3.48 0.00050
## PTGFR 1.1771 3.2449 0.4891 2.41 0.01610
## ABCD2 2.1329 8.4396 0.7532 2.83 0.00463
## FLJ20323 0.2936 1.3412 0.1322 2.22 0.02638
## WDR76 1.1471 3.1489 0.3040 3.77 0.00016
## NDUFA3 0.3454 1.4125 0.1352 2.56 0.01061
## FJX1 -0.1945 0.8232 0.0987 -1.97 0.04875
## GAPDHS 0.8798 2.4105 0.5092 1.73 0.08404
## RAB40B -0.1852 0.8310 0.0833 -2.22 0.02625
## PRR16 -0.4071 0.6656 0.1887 -2.16 0.03099
## CLTCL1 0.3730 1.4521 0.2601 1.43 0.15146
## PPM2C 0.3999 1.4917 0.1005 3.98 7.0e-05
## FOXE3 -0.8118 0.4441 0.5080 -1.60 0.11004
## CHIT1 -0.9427 0.3896 0.2741 -3.44 0.00058
## PI3 0.2450 1.2776 0.0466 5.26 1.5e-07
## BNC1 0.1648 1.1791 0.0693 2.38 0.01737
## D4S234E -0.1471 0.8632 0.0606 -2.43 0.01528
## SAPS2 0.8055 2.2379 0.2158 3.73 0.00019
## CSNK1G1 0.8805 2.4122 0.3858 2.28 0.02245
## MLL2 1.0106 2.7471 0.4972 2.03 0.04210
## HSPB7 0.6657 1.9459 0.3540 1.88 0.06003
## SLC37A4 -0.2538 0.7759 0.1635 -1.55 0.12051
## WTAP 0.5562 1.7440 0.1590 3.50 0.00047
## SSTR1 -1.7443 0.1748 0.6359 -2.74 0.00609
## IDUA 1.4248 4.1569 0.4480 3.18 0.00147
## PSG3 -2.1008 0.1224 0.7371 -2.85 0.00437
## SLC9A2 0.3374 1.4014 0.1267 2.66 0.00772
## PAPOLG 1.8006 6.0530 0.4837 3.72 0.00020
## GAS1 0.2589 1.2954 0.0861 3.01 0.00265
## ELA3A -0.4516 0.6366 0.2360 -1.91 0.05572
## KIF26B 0.9000 2.4597 0.2329 3.86 0.00011
## GBP2 -0.3527 0.7028 0.0935 -3.77 0.00016
## POPDC2 -3.0285 0.0484 0.4894 -6.19 6.1e-10
## OPN1SW 2.3693 10.6894 0.5099 4.65 3.4e-06
## DAP -0.7017 0.4957 0.1333 -5.26 1.4e-07
## SRY -2.3810 0.0925 0.7835 -3.04 0.00238
## UTP20 0.3955 1.4851 0.1553 2.55 0.01085
## HOXD11 0.8313 2.2963 0.2268 3.67 0.00025
## HSPA1L 0.3765 1.4572 0.1828 2.06 0.03946
## PPP3CA 0.3213 1.3790 0.1113 2.89 0.00390
## PAX2 -0.2296 0.7948 0.0899 -2.56 0.01061
## FZD10 -0.0994 0.9054 0.0553 -1.80 0.07201
## TREML2 -0.6339 0.5305 0.4228 -1.50 0.13385
## CCR7 -0.6175 0.5393 0.2637 -2.34 0.01920
## MPZ 0.8243 2.2802 0.2329 3.54 0.00040
## MGAT4C 1.1627 3.1987 0.6331 1.84 0.06628
## EHMT1 1.8125 6.1258 0.4705 3.85 0.00012
## ALG8 -0.2209 0.8018 0.1067 -2.07 0.03852
## KCNN2 -1.1298 0.3231 0.3040 -3.72 0.00020
## ESR2 -2.6987 0.0673 1.0408 -2.59 0.00951
## TGM2 -0.2265 0.7973 0.1370 -1.65 0.09818
## LBP 1.0330 2.8095 0.2216 4.66 3.1e-06
## SRPK3 -0.7770 0.4598 0.2074 -3.75 0.00018
## FBXO40 1.4431 4.2336 0.5331 2.71 0.00679
## ANGPT2 -0.3112 0.7326 0.1571 -1.98 0.04768
## IRF5 -0.8805 0.4146 0.3143 -2.80 0.00508
## ANXA4 0.2854 1.3303 0.1191 2.40 0.01655
## DENND2D -0.2540 0.7757 0.1053 -2.41 0.01588
## SGEF -1.4599 0.2323 0.6064 -2.41 0.01606
##
## Likelihood ratio test=533 on 63 df, p=0
## n= 517, number of events= 284
Next, see if the hypothesis of proportional hazards is not violated. By the results obtained, the proportional hazards hypothesis is not violated.
pph63<-cox.zph(fit.cox.all63, transform="km", global=TRUE)
pph63
## rho chisq p
## HPCA 0.071903 1.70e+00 0.19170
## UBE2J1 0.022667 1.59e-01 0.69005
## RPS6KA2 -0.044125 7.34e-01 0.39169
## SDF2L1 -0.085277 2.58e+00 0.10829
## GRB7 -0.000351 4.23e-05 0.99481
## PTGFR 0.032263 3.74e-01 0.54084
## ABCD2 0.009111 2.79e-02 0.86729
## FLJ20323 -0.036833 4.49e-01 0.50291
## WDR76 -0.062116 1.61e+00 0.20436
## NDUFA3 -0.019034 1.29e-01 0.71991
## FJX1 0.089980 2.55e+00 0.10997
## GAPDHS 0.040006 6.19e-01 0.43145
## RAB40B 0.006540 1.46e-02 0.90382
## PRR16 -0.012644 4.73e-02 0.82784
## CLTCL1 -0.017146 1.06e-01 0.74504
## PPM2C 0.041147 5.89e-01 0.44263
## FOXE3 -0.083166 2.73e+00 0.09843
## CHIT1 -0.042271 7.69e-01 0.38063
## PI3 0.077881 2.07e+00 0.15005
## BNC1 -0.101654 3.97e+00 0.04632
## D4S234E 0.052414 9.98e-01 0.31787
## SAPS2 0.080843 2.29e+00 0.13028
## CSNK1G1 -0.073134 1.95e+00 0.16207
## MLL2 -0.007162 1.81e-02 0.89284
## HSPB7 0.064591 1.59e+00 0.20729
## SLC37A4 0.086254 2.50e+00 0.11406
## WTAP 0.089494 3.03e+00 0.08163
## SSTR1 -0.040066 6.38e-01 0.42461
## IDUA 0.001408 8.58e-04 0.97663
## PSG3 -0.012891 5.52e-02 0.81431
## SLC9A2 0.033349 4.40e-01 0.50720
## PAPOLG 0.023907 1.98e-01 0.65631
## GAS1 -0.018127 1.04e-01 0.74728
## ELA3A -0.195070 7.62e+00 0.00577
## KIF26B 0.061480 1.23e+00 0.26745
## GBP2 0.007471 2.06e-02 0.88587
## POPDC2 -0.008589 3.10e-02 0.86033
## OPN1SW 0.155674 8.57e+00 0.00342
## DAP 0.016182 1.04e-01 0.74701
## SRY 0.128577 6.47e+00 0.01100
## UTP20 0.083617 2.61e+00 0.10602
## HOXD11 -0.011061 4.92e-02 0.82444
## HSPA1L -0.115232 5.22e+00 0.02233
## PPP3CA -0.017258 9.30e-02 0.76043
## PAX2 -0.040967 6.22e-01 0.43014
## FZD10 -0.065230 1.60e+00 0.20657
## TREML2 -0.114341 5.30e+00 0.02131
## CCR7 -0.086059 2.35e+00 0.12530
## MPZ -0.042250 7.29e-01 0.39334
## MGAT4C -0.031590 3.38e-01 0.56114
## EHMT1 0.044085 7.31e-01 0.39253
## ALG8 -0.029398 3.14e-01 0.57529
## KCNN2 -0.084013 2.11e+00 0.14598
## ESR2 -0.005835 1.41e-02 0.90564
## TGM2 0.057214 1.60e+00 0.20635
## LBP 0.032140 3.21e-01 0.57107
## SRPK3 -0.040415 5.20e-01 0.47087
## FBXO40 0.023502 2.07e-01 0.64916
## ANGPT2 0.010703 4.12e-02 0.83922
## IRF5 -0.012002 5.13e-02 0.82085
## ANXA4 -0.094494 4.44e+00 0.03504
## DENND2D -0.005022 9.23e-03 0.92345
## SGEF -0.120777 5.37e+00 0.02051
## GLOBAL NA 7.25e+01 0.19319
Genes not significant: TGM2, MGAT4C, TREML2, FZD10, ELA3A, SLC37A4, HSPB7, FOXE3, CLTCL1, GAPDHS, UBE2J1.
In order to see if the results obtained before are consistent, a robust version of the Cox regression model is going to be performed. First call the following package.
library(coxrobust)
To see the results obtained by fitting the robust Cox using exponential weights with a 5% level of trimming, do the following:
fit.coxrobust63<-coxr(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival63,trunc=0.95, f.weight="exponential",singular.ok=T,model=F)
fit.coxrobust63
##
## Call:
## coxr(formula = Surv(data.time.status$time, data.time.status$status) ~ ., data = data.survival63, trunc = 0.95, f.weight = "exponential", singular.ok = T, model = F)
##
## Partial likelihood estimator
## coef exp(coef) se(coef) p
## HPCA -1.1909 0.3040 0.3561 8.24e-04
## UBE2J1 -0.2152 0.8064 0.1475 1.45e-01
## RPS6KA2 0.2968 1.3455 0.1125 8.34e-03
## SDF2L1 -0.2022 0.8170 0.1024 4.85e-02
## GRB7 0.3354 1.3985 0.0966 5.15e-04
## PTGFR 1.1720 3.2283 0.4893 1.66e-02
## ABCD2 2.1278 8.3967 0.7533 4.73e-03
## FLJ20323 0.2938 1.3416 0.1322 2.62e-02
## WDR76 1.1474 3.1499 0.3041 1.61e-04
## NDUFA3 0.3464 1.4139 0.1352 1.04e-02
## FJX1 -0.1942 0.8235 0.0987 4.91e-02
## GAPDHS 0.8798 2.4105 0.5093 8.41e-02
## RAB40B -0.1850 0.8311 0.0833 2.64e-02
## PRR16 -0.4090 0.6643 0.1889 3.03e-02
## CLTCL1 0.3740 1.4535 0.2600 1.50e-01
## PPM2C 0.3991 1.4905 0.1005 7.17e-05
## FOXE3 -0.8103 0.4447 0.5083 1.11e-01
## CHIT1 -0.9426 0.3896 0.2742 5.86e-04
## PI3 0.2451 1.2777 0.0466 1.48e-07
## BNC1 0.1649 1.1792 0.0693 1.73e-02
## D4S234E -0.1470 0.8633 0.0606 1.54e-02
## SAPS2 0.8059 2.2388 0.2158 1.88e-04
## CSNK1G1 0.8846 2.4221 0.3858 2.19e-02
## MLL2 1.0076 2.7390 0.4972 4.27e-02
## HSPB7 0.6667 1.9478 0.3537 5.95e-02
## SLC37A4 -0.2540 0.7757 0.1635 1.20e-01
## WTAP 0.5547 1.7414 0.1590 4.85e-04
## SSTR1 -1.7466 0.1744 0.6363 6.05e-03
## IDUA 1.4232 4.1505 0.4479 1.49e-03
## PSG3 -2.1078 0.1215 0.7373 4.25e-03
## SLC9A2 0.3381 1.4023 0.1267 7.59e-03
## PAPOLG 1.8011 6.0564 0.4837 1.96e-04
## GAS1 0.2591 1.2958 0.0861 2.63e-03
## ELA3A -0.4500 0.6376 0.2361 5.66e-02
## KIF26B 0.8986 2.4561 0.2330 1.15e-04
## GBP2 -0.3523 0.7030 0.0935 1.64e-04
## POPDC2 -3.0265 0.0485 0.4894 6.26e-10
## OPN1SW 2.3692 10.6889 0.5101 3.40e-06
## DAP -0.7013 0.4959 0.1333 1.43e-07
## SRY -2.3796 0.0926 0.7837 2.39e-03
## UTP20 0.3940 1.4829 0.1553 1.12e-02
## HOXD11 0.8369 2.3092 0.2273 2.32e-04
## HSPA1L 0.3766 1.4574 0.1828 3.94e-02
## PPP3CA 0.3217 1.3795 0.1113 3.86e-03
## PAX2 -0.2302 0.7944 0.0899 1.04e-02
## FZD10 -0.0994 0.9054 0.0552 7.20e-02
## TREML2 -0.6305 0.5323 0.4228 1.36e-01
## CCR7 -0.6191 0.5384 0.2638 1.89e-02
## MPZ 0.8266 2.2855 0.2329 3.86e-04
## MGAT4C 1.1617 3.1955 0.6332 6.66e-02
## EHMT1 1.8092 6.1055 0.4706 1.21e-04
## ALG8 -0.2219 0.8010 0.1068 3.77e-02
## KCNN2 -1.1281 0.3237 0.3042 2.08e-04
## ESR2 -2.7090 0.0666 1.0408 9.25e-03
## TGM2 -0.2265 0.7973 0.1370 9.82e-02
## LBP 1.0327 2.8086 0.2216 3.15e-06
## SRPK3 -0.7757 0.4604 0.2074 1.85e-04
## FBXO40 1.4477 4.2534 0.5332 6.63e-03
## ANGPT2 -0.3122 0.7319 0.1571 4.69e-02
## IRF5 -0.8836 0.4133 0.3143 4.93e-03
## ANXA4 0.2858 1.3308 0.1191 1.64e-02
## DENND2D -0.2547 0.7752 0.1053 1.56e-02
## SGEF -1.4636 0.2314 0.6065 1.58e-02
##
## Wald test=379 on 63 df, p=0
##
## Robust estimator
## coef exp(coef) se(coef) p
## HPCA -1.1803 0.3072 0.5877 0.044598
## UBE2J1 -0.2221 0.8008 0.2676 0.406408
## RPS6KA2 0.3892 1.4758 0.1408 0.005719
## SDF2L1 -0.2003 0.8185 0.1203 0.095856
## GRB7 0.3268 1.3865 0.1115 0.003370
## PTGFR 1.0255 2.7885 0.6001 0.087466
## ABCD2 2.3397 10.3778 1.1928 0.049827
## FLJ20323 0.2696 1.3095 0.1480 0.068506
## WDR76 1.1701 3.2224 0.5071 0.021036
## NDUFA3 0.4128 1.5111 0.1633 0.011454
## FJX1 -0.2867 0.7507 0.1616 0.075969
## GAPDHS 0.9733 2.6466 0.6198 0.116338
## RAB40B -0.2219 0.8010 0.1404 0.114028
## PRR16 -0.3362 0.7145 0.2740 0.219788
## CLTCL1 0.4470 1.5637 0.3452 0.195349
## PPM2C 0.4173 1.5179 0.2192 0.056928
## FOXE3 -0.5162 0.5968 0.6139 0.400476
## CHIT1 -0.9042 0.4049 0.4674 0.053073
## PI3 0.2305 1.2593 0.1083 0.033262
## BNC1 0.1830 1.2008 0.0847 0.030715
## D4S234E -0.1645 0.8484 0.0767 0.031932
## SAPS2 0.8342 2.3030 0.6100 0.171411
## CSNK1G1 1.0782 2.9393 0.4489 0.016327
## MLL2 1.3137 3.7200 0.8978 0.143397
## HSPB7 0.5092 1.6640 0.4368 0.243702
## SLC37A4 -0.3065 0.7360 0.2269 0.176835
## WTAP 0.5607 1.7519 0.3265 0.085950
## SSTR1 -1.7979 0.1657 0.7908 0.023001
## IDUA 1.4354 4.2013 0.8810 0.103237
## PSG3 -2.3029 0.1000 0.8579 0.007265
## SLC9A2 0.3185 1.3751 0.1677 0.057516
## PAPOLG 1.7430 5.7142 0.9548 0.067923
## GAS1 0.2756 1.3173 0.1380 0.045798
## ELA3A -0.4692 0.6255 1.1530 0.684023
## KIF26B 0.8508 2.3416 0.4996 0.088586
## GBP2 -0.3718 0.6895 0.1924 0.053247
## POPDC2 -2.7792 0.0621 1.2267 0.023478
## OPN1SW 2.1049 8.2061 1.0821 0.051762
## DAP -0.6959 0.4986 0.2120 0.001031
## SRY -2.4342 0.0877 1.0015 0.015077
## UTP20 0.4170 1.5174 0.2133 0.050553
## HOXD11 0.7056 2.0250 0.2897 0.014866
## HSPA1L 0.4634 1.5895 0.2344 0.048014
## PPP3CA 0.3294 1.3901 0.1262 0.009068
## PAX2 -0.2373 0.7888 0.2193 0.279162
## FZD10 -0.0801 0.9230 0.0748 0.284130
## TREML2 -0.6043 0.5464 0.5415 0.264440
## CCR7 -0.5713 0.5648 0.4291 0.183019
## MPZ 0.7611 2.1406 0.3173 0.016444
## MGAT4C 1.0216 2.7777 0.6915 0.139567
## EHMT1 1.5360 4.6458 1.0943 0.160442
## ALG8 -0.1276 0.8802 0.1482 0.389379
## KCNN2 -1.1903 0.3041 1.0630 0.262830
## ESR2 -2.4160 0.0893 1.7091 0.157479
## TGM2 -0.1904 0.8266 0.2393 0.426215
## LBP 0.9934 2.7004 0.2712 0.000249
## SRPK3 -0.8033 0.4479 0.4268 0.059855
## FBXO40 1.3587 3.8911 0.7145 0.057218
## ANGPT2 -0.3140 0.7305 0.1849 0.089417
## IRF5 -0.8175 0.4415 0.5146 0.112125
## ANXA4 0.2839 1.3283 0.1674 0.089973
## DENND2D -0.2419 0.7851 0.1388 0.081284
## SGEF -1.4272 0.2400 0.8081 0.077366
##
## Extended Wald test=142 on 63 df, p=4.91e-08
Another proposal of the robust Cox is given by Heritier (2009), and is going to be performed next.
filepath="../functions/"
source(paste(filepath,"Chapter7_functions.r",sep=""))
The fitted model using exponential weights with 5% level of trimming is given bellow.
fit.coxrobust.H63=rcoxph(data.time.status$time,data.time.status$status,data.survival63,wt.type="exponential",quant=.95)
fit.coxrobust.H63$coefficients
## estimate SE z p.value
## HPCA -1.16622632 0.33872023 -3.443037 0.00057
## UBE2J1 -0.22197365 0.13635663 -1.627890 0.10354
## RPS6KA2 0.39800725 0.12013505 3.312998 0.00092
## SDF2L1 -0.19786824 0.10167639 -1.946059 0.05164
## GRB7 0.32724594 0.08732137 3.747604 0.00017
## PTGFR 1.01309469 0.48986801 2.068097 0.03863
## ABCD2 2.35641347 0.78601672 2.997918 0.00271
## FLJ20323 0.26542245 0.12505088 2.122516 0.03379
## WDR76 1.16954351 0.33871131 3.452921 0.00055
## NDUFA3 0.41302078 0.12892646 3.203538 0.00135
## FJX1 -0.29338240 0.10227710 -2.868505 0.00412
## GAPDHS 0.99289587 0.55173576 1.799586 0.07192
## RAB40B -0.22318385 0.08377256 -2.664164 0.00771
## PRR16 -0.33674991 0.18630099 -1.807558 0.07067
## CLTCL1 0.43537045 0.28174096 1.545286 0.12227
## PPM2C 0.41604251 0.10265212 4.052936 0.00005
## FOXE3 -0.51292497 0.47058113 -1.089982 0.27572
## CHIT1 -0.91018751 0.35841391 -2.539487 0.01110
## PI3 0.23101345 0.04431142 5.213407 0.00000
## BNC1 0.18372409 0.07309222 2.513593 0.01195
## D4S234E -0.16641324 0.06357115 -2.617748 0.00885
## SAPS2 0.83454099 0.21327881 3.912911 0.00009
## CSNK1G1 1.08739509 0.39012329 2.787311 0.00531
## MLL2 1.32554186 0.51691291 2.564343 0.01033
## HSPB7 0.50042064 0.35262795 1.419118 0.15586
## SLC37A4 -0.31415363 0.16527507 -1.900793 0.05732
## WTAP 0.55986797 0.15632539 3.581427 0.00034
## SSTR1 -1.80387371 0.67103818 -2.688183 0.00718
## IDUA 1.44465756 0.47135596 3.064897 0.00217
## PSG3 -2.29976082 0.76730695 -2.997185 0.00272
## SLC9A2 0.31791252 0.13112309 2.424535 0.01532
## PAPOLG 1.74448080 0.46227442 3.773691 0.00016
## GAS1 0.27853812 0.08540026 3.261561 0.00110
## ELA3A -0.47146611 0.22662022 -2.080424 0.03748
## KIF26B 0.85017969 0.22985926 3.698697 0.00021
## GBP2 -0.37493182 0.09593794 -3.908066 0.00009
## POPDC2 -2.76747370 0.52140100 -5.307764 0.00000
## OPN1SW 2.11401127 0.50873426 4.155433 0.00003
## DAP -0.69571634 0.13071918 -5.322221 0.00000
## SRY -2.43819437 0.74968869 -3.252276 0.00114
## UTP20 0.41854495 0.15890859 2.633872 0.00844
## HOXD11 0.70474597 0.21470671 3.282366 0.00102
## HSPA1L 0.46453504 0.22070414 2.104786 0.03530
## PPP3CA 0.33156717 0.10189158 3.254118 0.00113
## PAX2 -0.23747698 0.08694116 -2.731468 0.00630
## FZD10 -0.08073319 0.05633465 -1.433100 0.15182
## TREML2 -0.61429369 0.46653054 -1.316728 0.18792
## CCR7 -0.56924372 0.23491797 -2.423160 0.01538
## MPZ 0.76255827 0.20973432 3.635830 0.00027
## MGAT4C 1.01771036 0.53744394 1.893612 0.05827
## EHMT1 1.52203784 0.49780094 3.057523 0.00223
## ALG8 -0.11883017 0.11347863 -1.047159 0.29502
## KCNN2 -1.19094243 0.29159222 -4.084274 0.00004
## ESR2 -2.44468742 1.13877596 -2.146768 0.03181
## TGM2 -0.19071028 0.16670681 -1.143986 0.25262
## LBP 0.99193289 0.24916038 3.981102 0.00006
## SRPK3 -0.80684707 0.19266924 -4.187732 0.00002
## FBXO40 1.35172810 0.55188615 2.449288 0.01431
## ANGPT2 -0.31514272 0.13928297 -2.262608 0.02365
## IRF5 -0.81755457 0.30970174 -2.639813 0.00829
## ANXA4 0.28518507 0.13498489 2.112719 0.03462
## DENND2D -0.24162579 0.09567723 -2.525426 0.01155
## SGEF -1.42639779 0.64339524 -2.216985 0.02662
Genes not significant: TGM2, ALG8, MGAT4, TREML2, FZD10, SLC37A4, HSPB7, FOXE3, CLTCL1, PRR16, GAPDHS, SDF2L1, and UBE2J1.
The next figure shows that observations 39 and 350 are identified as influential observations in the sense that the weight given for each one is the lowest among the others.
## [1] 517
## 95%
## 2.02691
## 39 350 339 43 272
## 1.289785e+01 5.941177e+00 4.566690e+00 4.474078e+00 4.345782e+00
## 273 295 341 69 365
## 4.256226e+00 4.250419e+00 3.566459e+00 3.431834e+00 3.404045e+00
## 248 279 108 369 268
## 3.357645e+00 3.183520e+00 3.011988e+00 2.923692e+00 2.683356e+00
## 250 44 378 290 473
## 2.592238e+00 2.536464e+00 2.427850e+00 2.328131e+00 2.273566e+00
## 337 405 24 265 278
## 2.191573e+00 2.095721e+00 2.092198e+00 2.088154e+00 2.082080e+00
## 78 233 16 317 14
## 2.066957e+00 2.016899e+00 2.013385e+00 2.004518e+00 1.900525e+00
## 209 232 153 115 358
## 1.895517e+00 1.851810e+00 1.836481e+00 1.805710e+00 1.796177e+00
## 297 41 206 377 300
## 1.789163e+00 1.714074e+00 1.693940e+00 1.674309e+00 1.627113e+00
## 45 86 366 349 19
## 1.615097e+00 1.592528e+00 1.581628e+00 1.570653e+00 1.539535e+00
## 314 331 355 345 152
## 1.531594e+00 1.506935e+00 1.465830e+00 1.443678e+00 1.432534e+00
## 321 409 381 160 80
## 1.425099e+00 1.391088e+00 1.357887e+00 1.318417e+00 1.312022e+00
## 110 332 412 474 303
## 1.310220e+00 1.293831e+00 1.290824e+00 1.289304e+00 1.284000e+00
## 417 313 36 26 81
## 1.279865e+00 1.270800e+00 1.247087e+00 1.237506e+00 1.230970e+00
## 251 380 457 253 319
## 1.219642e+00 1.219610e+00 1.218025e+00 1.208556e+00 1.207990e+00
## 87 88 301 25 348
## 1.199178e+00 1.197348e+00 1.194478e+00 1.171263e+00 1.166377e+00
## 222 260 114 123 27
## 1.145530e+00 1.145043e+00 1.127157e+00 1.126968e+00 1.124387e+00
## 422 192 83 453 12
## 1.109546e+00 1.108285e+00 1.106750e+00 1.106037e+00 1.097258e+00
## 338 298 40 385 223
## 1.093671e+00 1.078016e+00 1.077149e+00 1.048582e+00 1.047658e+00
## 416 122 267 127 95
## 1.041059e+00 1.037407e+00 1.035272e+00 1.016661e+00 9.993749e-01
## 164 360 469 264 151
## 9.970202e-01 9.930402e-01 9.771552e-01 9.631812e-01 9.527151e-01
## 71 388 203 335 67
## 9.361898e-01 9.327829e-01 9.232406e-01 9.174330e-01 9.067274e-01
## 305 343 55 191 261
## 8.835470e-01 8.724182e-01 8.496363e-01 8.413358e-01 8.379930e-01
## 304 46 484 64 113
## 8.317245e-01 8.252278e-01 7.997662e-01 7.879106e-01 7.845275e-01
## 327 418 374 270 512
## 7.798869e-01 7.738846e-01 7.736948e-01 7.696560e-01 7.678123e-01
## 215 216 42 266 411
## 7.657394e-01 7.622131e-01 7.610296e-01 7.411642e-01 7.314879e-01
## 76 356 29 306 47
## 7.272586e-01 7.091227e-01 7.088392e-01 7.019830e-01 6.966211e-01
## 307 254 515 420 271
## 6.964469e-01 6.915177e-01 6.909191e-01 6.859249e-01 6.801948e-01
## 370 368 407 511 275
## 6.801030e-01 6.735568e-01 6.720299e-01 6.699517e-01 6.676514e-01
## 34 323 336 158 8
## 6.584570e-01 6.478567e-01 6.475325e-01 6.460302e-01 6.457108e-01
## 101 121 37 184 361
## 6.456865e-01 6.380556e-01 6.299886e-01 6.287476e-01 6.221448e-01
## 116 33 353 414 362
## 6.155729e-01 6.108671e-01 6.043993e-01 5.937781e-01 5.870668e-01
## 347 35 410 219 322
## 5.811617e-01 5.797209e-01 5.774448e-01 5.703734e-01 5.694484e-01
## 513 488 269 159 451
## 5.635761e-01 5.474834e-01 5.437007e-01 5.431879e-01 5.415333e-01
## 423 210 329 166 383
## 5.371701e-01 5.329456e-01 5.308469e-01 5.273212e-01 5.246655e-01
## 28 1 299 330 382
## 5.212948e-01 5.195410e-01 5.156996e-01 5.148836e-01 5.130955e-01
## 31 354 258 13 318
## 5.115056e-01 5.105101e-01 5.103129e-01 5.094924e-01 5.035039e-01
## 249 274 391 434 517
## 5.028831e-01 4.962681e-01 4.926995e-01 4.923046e-01 4.918393e-01
## 311 352 59 479 157
## 4.901071e-01 4.857532e-01 4.828659e-01 4.782264e-01 4.721636e-01
## 9 467 178 462 90
## 4.712663e-01 4.660491e-01 4.633622e-01 4.629928e-01 4.629761e-01
## 292 277 30 85 471
## 4.501858e-01 4.467523e-01 4.466268e-01 4.442123e-01 4.434989e-01
## 32 455 371 425 363
## 4.388804e-01 4.296665e-01 4.292088e-01 4.208570e-01 4.172407e-01
## 470 75 281 486 117
## 4.152663e-01 4.092794e-01 4.064204e-01 4.042868e-01 4.039281e-01
## 444 454 169 326 156
## 3.955210e-01 3.873290e-01 3.865282e-01 3.865215e-01 3.836272e-01
## 58 79 436 442 218
## 3.834558e-01 3.759043e-01 3.655086e-01 3.638546e-01 3.615239e-01
## 68 489 263 247 282
## 3.594681e-01 3.567044e-01 3.462396e-01 3.430895e-01 3.396468e-01
## 450 197 196 424 344
## 3.382341e-01 3.373904e-01 3.369387e-01 3.353895e-01 3.322561e-01
## 302 98 231 426 419
## 3.243362e-01 3.233093e-01 3.221761e-01 3.195252e-01 3.155706e-01
## 220 20 211 504 18
## 3.123591e-01 3.118382e-01 3.069046e-01 3.052329e-01 3.016448e-01
## 375 333 320 186 445
## 3.003678e-01 3.001493e-01 2.974490e-01 2.969349e-01 2.940763e-01
## 340 246 392 351 239
## 2.874401e-01 2.872407e-01 2.849648e-01 2.846389e-01 2.805424e-01
## 91 92 112 293 21
## 2.741814e-01 2.738649e-01 2.729842e-01 2.701214e-01 2.676243e-01
## 181 130 259 433 324
## 2.660249e-01 2.658974e-01 2.658515e-01 2.652509e-01 2.648319e-01
## 395 38 200 309 214
## 2.633741e-01 2.631171e-01 2.626749e-01 2.612198e-01 2.610601e-01
## 379 342 84 93 207
## 2.571706e-01 2.532388e-01 2.510352e-01 2.472676e-01 2.356501e-01
## 373 325 308 449 11
## 2.283608e-01 2.266869e-01 2.255553e-01 2.219878e-01 2.200913e-01
## 481 77 125 404 440
## 2.147775e-01 2.127055e-01 2.115922e-01 2.111357e-01 2.086404e-01
## 22 111 74 102 225
## 1.998104e-01 1.965756e-01 1.917386e-01 1.896574e-01 1.823087e-01
## 276 201 463 431 294
## 1.774073e-01 1.762347e-01 1.728273e-01 1.727096e-01 1.677055e-01
## 165 452 188 137 109
## 1.621734e-01 1.620221e-01 1.600438e-01 1.567682e-01 1.547583e-01
## 393 359 376 514 134
## 1.513106e-01 1.511921e-01 1.490104e-01 1.477868e-01 1.446182e-01
## 406 439 170 10 61
## 1.443063e-01 1.389633e-01 1.378339e-01 1.374404e-01 1.345527e-01
## 510 66 390 389 195
## 1.341700e-01 1.315199e-01 1.313973e-01 1.287273e-01 1.235663e-01
## 161 124 62 118 387
## 1.200770e-01 1.156176e-01 1.144157e-01 1.130424e-01 1.127118e-01
## 384 396 466 448 252
## 1.117937e-01 1.115386e-01 1.110704e-01 1.089667e-01 1.085759e-01
## 168 23 372 227 491
## 1.063278e-01 1.057388e-01 1.055172e-01 1.046080e-01 1.037351e-01
## 516 226 133 244 242
## 1.035697e-01 1.030335e-01 1.027054e-01 1.009223e-01 9.999116e-02
## 465 154 119 280 120
## 9.678823e-02 9.404635e-02 9.390136e-02 9.262623e-02 9.157983e-02
## 367 485 142 82 51
## 9.006548e-02 8.972990e-02 8.887358e-02 8.794243e-02 8.524741e-02
## 15 386 429 483 402
## 8.463694e-02 8.377308e-02 8.258865e-02 8.121282e-02 8.060956e-02
## 447 126 104 506 6
## 7.954023e-02 7.654304e-02 7.524561e-02 7.282096e-02 6.968982e-02
## 131 235 100 441 221
## 6.658392e-02 6.629441e-02 6.490592e-02 6.469469e-02 6.345269e-02
## 48 490 179 193 312
## 6.305217e-02 6.290791e-02 6.230511e-02 6.050063e-02 5.848645e-02
## 475 472 427 229 291
## 5.690375e-02 5.584321e-02 5.560061e-02 5.296074e-02 5.291804e-02
## 497 194 394 146 148
## 5.208037e-02 5.113587e-02 5.098414e-02 4.995719e-02 4.954843e-02
## 328 400 144 70 228
## 4.936462e-02 4.796318e-02 4.748894e-02 4.542258e-02 4.263673e-02
## 204 63 185 364 2
## 4.203220e-02 4.166999e-02 4.142942e-02 4.128594e-02 4.055834e-02
## 507 73 190 437 132
## 4.033229e-02 3.986836e-02 3.875513e-02 3.759946e-02 3.731803e-02
## 482 509 128 99 49
## 3.657675e-02 3.501363e-02 3.496439e-02 3.453570e-02 3.300389e-02
## 224 502 487 230 107
## 3.291838e-02 3.268272e-02 3.236456e-02 3.232057e-02 3.226762e-02
## 189 155 288 17 399
## 3.059393e-02 2.949912e-02 2.945011e-02 2.943371e-02 2.848512e-02
## 60 499 172 413 476
## 2.807344e-02 2.719192e-02 2.623792e-02 2.616413e-02 2.614029e-02
## 438 397 495 262 501
## 2.610027e-02 2.604893e-02 2.542500e-02 2.498286e-02 2.384125e-02
## 256 428 149 435 237
## 2.306278e-02 2.265758e-02 2.256926e-02 2.255534e-02 2.243448e-02
## 171 286 284 208 94
## 2.235890e-02 2.213896e-02 2.210648e-02 2.203477e-02 2.176704e-02
## 315 162 498 468 52
## 2.161883e-02 2.067231e-02 2.020311e-02 1.882185e-02 1.699050e-02
## 334 176 140 163 182
## 1.684138e-02 1.572363e-02 1.511245e-02 1.439591e-02 1.424378e-02
## 458 4 187 205 217
## 1.412105e-02 1.403704e-02 1.401697e-02 1.397553e-02 1.396335e-02
## 199 198 3 430 56
## 1.359281e-02 1.332709e-02 1.286647e-02 1.249954e-02 1.237091e-02
## 143 289 508 141 446
## 1.227706e-02 1.203028e-02 1.167822e-02 1.131676e-02 1.091005e-02
## 398 496 5 213 54
## 1.087910e-02 9.805449e-03 9.260861e-03 9.179401e-03 7.723550e-03
## 316 103 456 129 283
## 7.364599e-03 7.300803e-03 7.245051e-03 6.949101e-03 6.426964e-03
## 346 7 106 296 310
## 6.297105e-03 6.227302e-03 6.170367e-03 5.828357e-03 5.783101e-03
## 408 212 236 500 202
## 5.551398e-03 5.527310e-03 5.450012e-03 5.148595e-03 5.122765e-03
## 89 147 97 150 138
## 5.061371e-03 5.004474e-03 4.924429e-03 4.694662e-03 4.658639e-03
## 505 494 53 464 50
## 4.513295e-03 4.441896e-03 4.441021e-03 4.244607e-03 4.205156e-03
## 357 177 460 167 240
## 4.114011e-03 3.882350e-03 3.852471e-03 3.615102e-03 3.538284e-03
## 443 135 72 421 238
## 3.389723e-03 3.307002e-03 3.261432e-03 3.140838e-03 3.042604e-03
## 245 234 415 478 480
## 2.862560e-03 2.846375e-03 2.823940e-03 2.755498e-03 2.705472e-03
## 96 257 105 174 255
## 2.651881e-03 2.577245e-03 2.473354e-03 2.034680e-03 1.930886e-03
## 492 173 287 461 503
## 1.917332e-03 1.874464e-03 1.783888e-03 1.552289e-03 1.468761e-03
## 403 57 65 285 243
## 1.437597e-03 1.078866e-03 1.029256e-03 9.111686e-04 8.727815e-04
## 139 477 241 401 145
## 7.829774e-04 7.783785e-04 7.523448e-04 7.317293e-04 7.300649e-04
## 432 180 175 136 183
## 7.251017e-04 7.066629e-04 6.837928e-04 5.030147e-04 4.460208e-04
## 459 493
## 3.514624e-04 2.667121e-04
In the next section it is going to be evaluated if the observations identified before are or not influential.
In order to identify which of the observations might be outliers the martingale residual is going to be performed.
The results regarding the martingale residuals are shown bellow. Observations 39 and 350 presented the lowest values for the martingale residual.
res.mart.63<- resid(fit.cox.all63,type="martingale")
head(sort(res.mart.63))
## 39 350 295 339 273 69
## -4.284005 -3.153217 -2.672307 -2.643038 -2.585822 -2.481997
In order to obtain a global result concerning each one of the techniques performed in the previous section, a rank product test is going to be used. First a rank matrix is going to be obtained based on the ranks of each outliers detection method.
res.mart.vec18<-as.vector(res.mart.18)
rank_martingale18<-rank(res.mart.vec18,ties.method = "first")
res.mart.vec22<-as.vector(res.mart.22)
rank_martingale22<-rank(res.mart.vec22,ties.method = "first")
res.mart.vec63<-as.vector(res.mart.63)
rank_martingale63<-rank(res.mart.vec63,ties.method = "first")
Now let us obtain based on the ranks above the rank product matrix. First let’s consider a rank matrix, with all the ranks obtained by each method.
rankMat<-cbind(rank_martingale18,rank_martingale22,rank_martingale63)
In second obtain the rank product, which is obtained based on the input of the rank matrix.
RankProduct = function(rankMat)
{
return(apply(rankMat, 1, prod))
}
rankproduct<-RankProduct(rankMat)
In order to obtain the p-values the algorithm proposed by Heskes (2014) was used. Notice that the p-values are based on the geometric mean of upper and lower bounds, defined recursively.
filepath="../functions/"
source(paste(filepath,"Heskes_pvalues.R",sep=""))
The input to obtain the p-values is the following:
rho=rankproduct
n<-dim(data.survival18)[1]
k<-dim(rankMat)[2]
The p-values are obtained by the following, where Delta option is the geometric mean.
pvalues<-as.vector(rankprodbounds(rho,n,k,Delta ='geometric'))
In order to obtain the observations with the lowest p-values we can do the lines:
pvalues.matrix<-cbind(pvalues)
which(pvalues.matrix<0.01)## pvalues < to 1%
## [1] 26 44 55 60 69 113 114 115 155 159 210 211 219 220 221 279 295
## [18] 297 314 372 377 407 452 455 516
which(pvalues.matrix<0.05) ## pvalues < to 5%
## [1] 10 11 15 20 21 22 26 32 39 44 47 48 55 60 61 69 113
## [18] 114 115 117 119 120 123 125 155 159 179 184 210 211 219 220 221 268
## [35] 273 278 279 295 297 301 314 317 350 372 377 382 405 407 426 452 455
## [52] 506 508 513 516
Another key issue when performing Rank product tests is related with the multiple testing problem. In fact, since many observations are tested, type-I errors (false positives) will increase.
The FDR, which is the expected proportion of false positives among all tests that are significant, sorts in an ascendant order the p-values and divides them by their percentile rank. The measure used to determine the FDR is the q-value. For the p-value: 0.05 implies that 5% of all tests will result in false positives, instead, for the q-value: 0.05 implies that 5% of significant tests will result in false positives. In this sense the q-value is able to control the number of false discoveries in those tests. For this reason it has the ability of finding truly significant results.
The q-value package must be installed.
library(qvalue)
Here the only input that is need is the p-values obtained before for each observation.
qobj<-qvalue(pvalues)
qvalues<-qobj$qvalues
which(qvalues<0.01)
## integer(0)
which(qvalues<0.05)
## [1] 55 114 211 219 455
In order to resume all the information, let us get an id vector and a data frame that combines the ranks the pvalues and the qvalues.
id<-as.vector(seq(1:n))# vector with the id
tcga.outliers<-as.data.frame(cbind(id,rank_martingale18,rank_martingale22,rank_martingale63,pvalues,qvalues))
Next we can obtain the top 15 outlier observations, with the qvalues sorted in increasing order.
sort.tcga.outliers <- tcga.outliers[order(qvalues) , ]
sort.tcga.outliers[1:15,]
## id rank_martingale18 rank_martingale22 rank_martingale63 pvalues
## 114 114 11 1 25 4.313384e-05
## 55 55 8 3 29 1.389582e-04
## 211 211 5 2 90 1.879408e-04
## 219 219 1 32 54 3.955799e-04
## 455 455 2 13 79 4.793825e-04
## 115 115 14 21 14 1.018179e-03
## 279 279 21 9 19 8.803098e-04
## 377 377 38 10 15 1.434347e-03
## 452 452 7 7 113 1.391558e-03
## 155 155 9 4 232 2.127598e-03
## 221 221 3 16 188 2.302167e-03
## 372 372 6 8 155 1.889920e-03
## 516 516 10 6 147 2.249225e-03
## 26 26 35 5 58 2.593542e-03
## 69 69 73 29 6 3.248457e-03
## qvalues
## 114 0.02230019
## 55 0.03238847
## 211 0.03238847
## 219 0.04956815
## 455 0.04956815
## 115 0.07519983
## 279 0.07519983
## 377 0.08239527
## 452 0.08239527
## 155 0.09155539
## 221 0.09155539
## 372 0.09155539
## 516 0.09155539
## 26 0.09577580
## 69 0.11196350
Notice the observations the with the lowest qvalues lead us to believe that they are influential observations and should taken for further study.
tcga_outliers_time<-cbind(id,data.time.status$time,data.time.status$status,rank_martingale18,rank_martingale22,rank_martingale63,pvalues,qvalues)
sort.tcga.outliers_time <- tcga_outliers_time[order(qvalues) , ]
sort.tcga.outliers_time[1:15,]
## id rank_martingale18 rank_martingale22 rank_martingale63
## [1,] 114 2780 0 11 1 25
## [2,] 55 2967 0 8 3 29
## [3,] 211 3953 0 5 2 90
## [4,] 219 3525 0 1 32 54
## [5,] 455 3532 0 2 13 79
## [6,] 115 2259 0 14 21 14
## [7,] 279 2688 1 21 9 19
## [8,] 377 2078 0 38 10 15
## [9,] 452 5481 0 7 7 113
## [10,] 155 2982 0 9 4 232
## [11,] 221 2788 0 3 16 188
## [12,] 372 3096 0 6 8 155
## [13,] 516 3825 0 10 6 147
## [14,] 26 3622 1 35 5 58
## [15,] 69 2490 1 73 29 6
## pvalues qvalues
## [1,] 4.313384e-05 0.02230019
## [2,] 1.389582e-04 0.03238847
## [3,] 1.879408e-04 0.03238847
## [4,] 3.955799e-04 0.04956815
## [5,] 4.793825e-04 0.04956815
## [6,] 1.018179e-03 0.07519983
## [7,] 8.803098e-04 0.07519983
## [8,] 1.434347e-03 0.08239527
## [9,] 1.391558e-03 0.08239527
## [10,] 2.127598e-03 0.09155539
## [11,] 2.302167e-03 0.09155539
## [12,] 1.889920e-03 0.09155539
## [13,] 2.249225e-03 0.09155539
## [14,] 2.593542e-03 0.09577580
## [15,] 3.248457e-03 0.11196350