THIS R MARKDOWN DOCUMENT HAS BEEN PRODUCED IN THE FRAME OF THE MAS OF SPATIAL ANALYSIS FOR PUBLIC HEALTH AT JHU.

The unique purpose of this document is to permit reproducing the analysis. Code should be streamlined and commented for clarity.

A. Setup

Data sources

Selected Projection for project: NAD_1983_California_Teale_Albers (WKID: 3310 Authority: EPSG)

File name	Source	Overview	URL
california_landcover	USGS	Land cover (especially developed land)	URL
california_dem	ESRI	Digital elevation model	URL
california_wildfires	CalFire	Fire location and surfaces burnt	URL
california_coastline	Natural Earth	Coastline	URL
acs_poverty	US Census Bureau	Population in poverty	URL
acs_characteristics	US Census Bureau	Population characteristics	URL
acs_population	US Census Bureau	Population	URL
area_counties	US Census Bureau	County polygons	URL
area_censusTracts	US Census Bureau	Census tract polygons	URL
point_pm25	EPA	PM2.5 monitors	URL

The pre-processed files as well as the source code to run this analysis can be downloaded at: https://github.com/BRaimbault/2020-JHU-IA

Dependencies

Loading libraries:

{library(rgdal); library(geoR); library(ggplot2); library(dplyr); library(splancs); library(summarytools); library(grid);}

B. Data loading

State boundaries

Loading California state boundaries shapefile:

shp_california <- readOGR("Projected/area_state_single.shp");
plot(shp_california)
title("Data loading: Californa State boundaries")
footnote('Figure 1')

Monitors data

Loading PM2.5 monitors shapefile:

shp_monitors <- readOGR("Monitors/point_pm25_final.shp")
plot(shp_california)
points(shp_monitors,col='red')
title("Data loading: Monitors location")
footnote('Figure 2')

Convert shapefile to dataframe and to geodata:

frm_monitors <- data.frame(
  easting=shp_monitors@data$POINT_X, # Coordinates
  northing=shp_monitors@data$POINT_Y, # Coordinates 
  pm25=shp_monitors@data$meanPM, # PM2.5 concentration
  p_elev=shp_monitors@data$ELEV, # Elevation in m
  p_coast=shp_monitors@data$DISTCOAST, # Distance form the coast in m
  p_wfire=shp_monitors@data$WILDFIRE_p, # % of the county burned due to wildfires
  lc_100=as.character(shp_monitors@data$RS_LC_100), # Landcover majoritarian category in 100mx100m areas  
  lc_1000=as.character(shp_monitors@data$RS_LC_1000), # Landcover majoritarian category in 1000mx1000m areas
  c_popden=shp_monitors@data$COU_POPDEN, # From county level: Population density
  c_poppov=shp_monitors@data$COU_POPPOV, # From county level: % of population in poverty
  c_popblack=shp_monitors@data$COU_POPBLA, # From county level: % of population black or african american
  c_pophisp=shp_monitors@data$COU_POPHIS, # From county level: % of popylation of hispanic or latino origins
  t_popden=shp_monitors@data$TRA_POPDEN, # From census tract level: Population density
  t_poppov=shp_monitors@data$TRA_POPPOV, # From census tract level: % of population in poverty 
  t_popblack=shp_monitors@data$TRA_POPBLA, # From census tract level: % of population black or african american
  t_pophisp=shp_monitors@data$TRA_POPHIS # From census tract level: % of popylation of hispanic or latino origin
)
geo_monitors_pm25 <- as.geodata(frm_monitors,coords.col = 1:2, covar.col=4:16)

Check for duplicate monitor locations:

dup.coords(geo_monitors_pm25)

## NULL

Summary table:

st_css()

print(dfSummary(frm_monitors, plain.ascii = FALSE, graph.magnif = 0.85), max.distinct.values = 5, max.string.width=15, method = "render")

Data Frame Summary

frm_monitors

Dimensions: 100 x 16
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

easting [numeric]

Mean (sd) : 6273.1 (191006.7) min < med < max: -354967 < -40110.9 < 424024.6 IQR (CV) : 324360.2 (30.4)

100 distinct values

100 (100%)

0 (0%)

northing [numeric]

Mean (sd) : -179415.7 (249237.4) min < med < max: -593579.9 < -144393.2 < 415195.2 IQR (CV) : 421266.6 (-1.4)

100 distinct values

100 (100%)

0 (0%)

pm25 [numeric]

Mean (sd) : 8.6 (2.9) min < med < max: 3 < 8.1 < 15.6 IQR (CV) : 4.1 (0.3)

100 distinct values

100 (100%)

0 (0%)

p_elev [numeric]

Mean (sd) : 239.8 (398.5) min < med < max: -32 < 90.5 < 2063 IQR (CV) : 211.2 (1.7)

87 distinct values

100 (100%)

0 (0%)

p_coast [numeric]

Mean (sd) : 67109.5 (64424.4) min < med < max: 1128.4 < 45195.5 < 283641.9 IQR (CV) : 98210.9 (1)

100 distinct values

100 (100%)

0 (0%)

p_wfire [numeric]

Mean (sd) : 0.5 (1.1) min < med < max: 0 < 0.1 < 5.7 IQR (CV) : 0.4 (2.2)

38 distinct values

100 (100%)

0 (0%)

lc_100 [character]

1. 0 2. 1 3. 2

20	(	20.0%	)
29	(	29.0%	)
51	(	51.0%	)

100 (100%)

0 (0%)

lc_1000 [character]

1. 0 2. 1 3. 2

20	(	20.0%	)
33	(	33.0%	)
47	(	47.0%	)

100 (100%)

0 (0%)

c_popden [numeric]

Mean (sd) : 257.3 (334.2) min < med < max: 0.7 < 95.2 < 1415.7 IQR (CV) : 387.6 (1.3)

44 distinct values

100 (100%)

0 (0%)

c_poppov [numeric]

Mean (sd) : 16.5 (5) min < med < max: 7.7 < 16.5 < 28.3 IQR (CV) : 6.7 (0.3)

39 distinct values

100 (100%)

0 (0%)

c_popblack [numeric]

Mean (sd) : 4.5 (3.3) min < med < max: 0.1 < 2.7 < 14.1 IQR (CV) : 5.5 (0.7)

30 distinct values

100 (100%)

0 (0%)

c_pophisp [numeric]

Mean (sd) : 37.5 (17.2) min < med < max: 8.4 < 41.2 < 82.7 IQR (CV) : 29 (0.5)

42 distinct values

100 (100%)

0 (0%)

t_popden [numeric]

Mean (sd) : 1778.7 (1807.6) min < med < max: 0.2 < 1344.4 < 8226.6 IQR (CV) : 2564.4 (1)

99 distinct values

100 (100%)

0 (0%)

t_poppov [numeric]

Mean (sd) : 20 (12.3) min < med < max: 1.9 < 17.6 < 56.4 IQR (CV) : 19 (0.6)

85 distinct values

100 (100%)

0 (0%)

t_popblack [numeric]

Mean (sd) : 5.1 (8.3) min < med < max: 0 < 1.4 < 57.6 IQR (CV) : 5.7 (1.6)

58 distinct values

100 (100%)

0 (0%)

t_pophisp [numeric]

Mean (sd) : 41.6 (25.5) min < med < max: 5.4 < 39.6 < 94.7 IQR (CV) : 41.1 (0.6)

93 distinct values

100 (100%)

0 (0%)

Generated by summarytools 0.9.6 (R version 4.0.0)
2020-05-15

Predictions data

Loading prediction grid shapefile:

shp_predictions <- readOGR("Point_Grid/point_grid_final.shp")
plot(shp_california)
points(shp_predictions,pch='.',col='blue')
title("Data loading: Predictors location")
footnote('Figure 4')

Convert shapefile to dataframe:

frm_predictions <- data.frame(
  easting=shp_predictions@data$POINT_X, # Coordinates
  northing=shp_predictions@data$POINT_Y, # Coordinates 
  p_elev=as.numeric(as.character(shp_predictions@data$ELEV)), # Elevation in m
  p_coast=shp_predictions@data$DISTCOAST, # Distance form the coast in m
  lc_1000=as.character(shp_predictions@data$RS_LC_1000), # Landcover majoritarian category in 1000mx1000m areas
  c_popden=shp_predictions@data$COU_POPDEN, # From county level: Population density
  c_poppov=shp_predictions@data$COU_POPPOV, # From county level: % of population in poverty
  c_popblack=shp_predictions@data$COU_POPBLA, # From county level: % of population black or african american
  c_pophisp=shp_predictions@data$COU_POPHIS # From county level: % of popylation of hispanic or latino origin
)

Filter locations where elevation is not available:

frm_predictions_filtered <- frm_predictions[frm_predictions$p_elev != '-9999',]

Convert dataframe to geodata:

geo_predictions <- as.geodata(frm_predictions_filtered,coords.col = 1:2, covar.col=3:9)

Check for duplicate prediction locations:

dup.coords(geo_predictions)

## NULL

Summary table on a sample of 9,999 values:

st_css()

frm_predictions_sample <- frm_predictions_filtered[sample(nrow(frm_predictions_filtered), 999), ]
print(dfSummary(frm_predictions_sample, plain.ascii = FALSE, graph.magnif = 0.85), max.distinct.values = 5, max.string.width=15, method = "render")

Data Frame Summary

frm_predictions_sample

Dimensions: 999 x 9
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

easting [numeric]

Mean (sd) : 46460.5 (220837.6) min < med < max: -360101.1 < 4898.9 < 524898.9 IQR (CV) : 357500 (4.8)

178 distinct values

999 (100%)

0 (0%)

northing [numeric]

Mean (sd) : -92429.2 (278658.1) min < med < max: -595327 < -115327 < 444673 IQR (CV) : 452500 (-3)

210 distinct values

999 (100%)

0 (0%)

p_elev [numeric]

Mean (sd) : 856 (775) min < med < max: -71 < 679 < 3905 IQR (CV) : 1038.5 (0.9)

770 distinct values

999 (100%)

0 (0%)

p_coast [numeric]

Mean (sd) : 140226.4 (90193.4) min < med < max: 148.4 < 130574.7 < 359574.3 IQR (CV) : 145888.9 (0.6)

999 distinct values

999 (100%)

0 (0%)

lc_1000 [character]

1. 0 2. 1 3. 2

942	(	94.3%	)
40	(	4.0%	)
17	(	1.7%	)

999 (100%)

0 (0%)

c_popden [numeric]

Mean (sd) : 85.1 (174.5) min < med < max: 0.6 < 40.5 < 1415.7 IQR (CV) : 53.7 (2)

58 distinct values

999 (100%)

0 (0%)

c_poppov [numeric]

Mean (sd) : 17.7 (5.1) min < med < max: 7.7 < 17.8 < 28.3 IQR (CV) : 7.5 (0.3)

50 distinct values

999 (100%)

0 (0%)

c_popblack [numeric]

Mean (sd) : 3.7 (3) min < med < max: 0.1 < 2.5 < 14.1 IQR (CV) : 4.8 (0.8)

37 distinct values

999 (100%)

0 (0%)

c_pophisp [numeric]

Mean (sd) : 35.2 (19.2) min < med < max: 7.2 < 33.1 < 82.7 IQR (CV) : 33.3 (0.5)

56 distinct values

999 (100%)

0 (0%)

Generated by summarytools 0.9.6 (R version 4.0.0)
2020-05-15

C. Exploratory analysis

Large-scale variations

Ploting monitors values:

area_california <- shp_california@polygons[[1]]@Polygons[[1]]@coords
colors <- c(rgb(252,146,114,maxColorValue=255),rgb(251,106,74,maxColorValue=255),rgb(222,45,38,maxColorValue=255),rgb(165,15,21,maxColorValue=255))
par(oma=c(0,0,2,0))
plot(geo_monitors_pm25, borders=area_california, lowess = TRUE,qt.col = colors)
mtext("PM2.5 Monitors", outer=TRUE,  cex=1.2, line=1)
footnote('Figure 6')

Short-scale variations

Estimate the semivariogram form the PM2.5 monitors data and then adjust a model:

# Estimate semivariogram from data
var_monitors_pm25.dot <- variog(geo_monitors_pm25)
names(var_monitors_pm25.dot)
max <- var_monitors_pm25.dot$max.dist
var_monitors_pm25.dot<-variog(geo_monitors_pm25,max.dist=max/2) #restricting to half the actual maximum distance

plot(var_monitors_pm25.dot,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=16, ylim=c(0,11))
title("Semivariogram of PM2.5 Data\n(restricted < 1/2 max distance)")
footnote('Figure 7')

D. Modelling

Ordinary Kriging (OK)

Ordinary Kriging - Semivariogram adjustments (WLS & MLE)

Adjust a model to the semivariogram form the PM2.5 monitors data:

plot(var_monitors_pm25.dot,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=16, ylim=c(0,11))
title("Semivariogram of PM2.5 Data\n(restricted < 1/2 max distance)")
footnote('Figure 8')

# Adjust semivariogram model
model<-"exponential" #"exponential" #"spherical" #"gaussian"
rangs <- c() #c(0.7e+05,0.8e+05,0.9e+05) #c(1.0e+05,1.1e+05,1.2e+05,1.3e+05,1.4e+05,1.5e+05)
sills <- c() #c(5,10,15) #c(9.0,9.5,10,10.5,11)
nugs <- c() #c(0,1,2) #c(0,0.2,0.4,0.6,0.8,1)
df.MLE<-data.frame(9999,9999,9999,9999,9999)
names(df.MLE)<-c("rang","sill","nug","AIC","BIC")
for (rang in rangs) { for (sill in sills) { for (nug in nugs) {
  psill<-sill-nug
  var_monitors_pm25.fit.MLE<-likfit(geo_monitors_pm25,ini.cov.pars=c(psill,rang),nugget=nug)
  temp <- data.frame(rang,sill,nug,var_monitors_pm25.fit.MLE$AIC,var_monitors_pm25.fit.MLE$BIC)
  names(temp)<-c("rang","sill","nug","AIC","BIC")
  df.MLE <- rbind(df.MLE, temp)
}}}

model<-"exponential" #"exponential" #"spherical" #"gaussian"
rangs <- c() #c(0.7e+05,0.8e+05,0.9e+05) #c(1.0e+05,1.1e+05,1.2e+05,1.3e+05,1.4e+05,1.5e+05)
sills <- c() #c(5,10,15) #c(9.0,9.5,10,10.5,11)
nugs <- c() #c(0,1,2) #c(0,0.2,0.4,0.6,0.8,1)
df.WLS<-data.frame(9999,9999,9999,9999)
names(df.WLS)<-c("rang","sill","nug","value")
for (rang in rangs) { for (sill in sills) { for (nug in nugs) {
  psill<-sill-nug
  var_monitors_pm25.fit.WLS<-variofit(var_monitors_pm25.dot,ini.cov.pars=c(psill,rang),cov.model=model,nugget=nug)
  temp <- data.frame(rang,sill,nug,var_monitors_pm25.fit.WLS$value)
  names(temp)<-c("rang","sill","nug","value")
  df.WLS <- rbind(df.WLS, temp)
}}}

model<-"exponential" #"exponential" #"spherical" #"gaussian"
rang.MLE <- 0.8e+05 #c(1.0e+05,1.1e+05,1.2e+05,1.3e+05,1.4e+05,1.5e+05)
sill.MLE <- 10 #c(9.0,9.5,10,10.5,11)
nug.MLE <- 0 #c(0,0.2,0.4,0.6,0.8,1)
psill.MLE<-sill.MLE-nug.MLE
var_monitors_pm25.fit.MLE<-likfit(geo_monitors_pm25,ini.cov.pars=c(psill.MLE,rang.MLE),nugget=nug.MLE)
rang.WLS <- 0.8e+05 #c(1.0e+05,1.1e+05,1.2e+05,1.3e+05,1.4e+05,1.5e+05)
sill.WLS <- 10 #c(9.0,9.5,10,10.5,11)
nug.WLS <- 0 #c(0,0.2,0.4,0.6,0.8,1)
psill.WLS<-sill.WLS-nug.WLS
var_monitors_pm25.fit.WLS<-variofit(var_monitors_pm25.dot,ini.cov.pars=c(psill.WLS,rang.WLS),cov.model=model,nugget=nug.WLS)

lines(var_monitors_pm25.fit.MLE,col="blue",lty=2)
lines(var_monitors_pm25.fit.WLS,col="red",lty=4)

legend(4.5e5,3,legend=c("MLE","WLS"),lty=2:4, col=c("blue","red"))

We tested a number of starting parameters to fit the variogram model and ended up with the value:

model: “exponential”
start values: rang: 0.8e+05 - sill: 10.0 - nug: 0.0
values adjusted by MLE: rang: 0.8000e+05 - sill: 8.5138 - nug: 0.3328
values adjusted by WLS: rang: 0.7861e+05 - sill: 9.3534 - nug: 0.0000

The MLE adjustement is more heavily affected by the values at larger distance (>500km) than the WLS adjustment.

Universal Kriging (UK)

Universal Kriging - Linear Models

Covariates selection:

Models generation:

linear_model_lc1000 <-lm(pm25~lc_1000,data=frm_monitors)
linear_model_lc100 <-lm(pm25~lc_100,data=frm_monitors)
linear_model_lvlcounty <-lm(pm25~c_popden+c_poppov+c_popblack+c_pophisp,data=frm_monitors)
linear_model_lvltract <-lm(pm25~t_popden+t_poppov+t_popblack+t_pophisp,data=frm_monitors)
linear_model_p_wfire <-lm(pm25~p_wfire,data=frm_monitors)
linear_model_p_coast <-lm(pm25~p_coast,data=frm_monitors)

Landcover resolution: 1,000m2 vs 100m2

{library(jtools); library(huxtable);}
model_names <- c("Model LC 1,000m2", "Model LC 100m2")
stats <- c(N = "nobs", R2 = "r.squared", AIC = "AIC", BIC = "BIC")
export_summs(linear_model_lc1000,linear_model_lc100, error_format = "[{conf.low}, {conf.high}]", model.names = model_names, statistics = stats)

	Model LC 1,000m2	Model LC 100m2
(Intercept)	7.26 ***	7.38 ***
	[5.99, 8.52]	[6.10, 8.66]
lc_10001	1.95 *
	[0.35, 3.56]
lc_10002	1.45
	[-0.06, 2.96]
lc_1001		1.42
		[-0.24, 3.08]
lc_1002		1.56 *
		[0.05, 3.06]
N	100	100
R2	0.06	0.04
AIC	498.55	500.09
BIC	508.97	510.51
* p < 0.001; p < 0.01; * p < 0.05.

Administrative level for population data (density, poverty, black, hispanic): County level vs Census Tract level

{library(jtools); library(huxtable);}
model_names <- c("Model lvl County", "Model lvl CTract")
export_summs(linear_model_lvlcounty,linear_model_lvltract, error_format = "[{conf.low}, {conf.high}]", model.names = model_names, statistics = stats)

	Model lvl County	Model lvl CTract
(Intercept)	3.15 ***	5.85 ***
	[1.32, 4.98]	[4.65, 7.04]
c_popden	0.00
	[-0.00, 0.00]
c_poppov	0.13 *
	[0.02, 0.25]
c_popblack	0.16
	[-0.01, 0.33]
c_pophisp	0.06 ***
	[0.03, 0.10]
t_popden		-0.00
		[-0.00, 0.00]
t_poppov		0.04
		[-0.01, 0.08]
t_popblack		0.06
		[-0.01, 0.14]
t_pophisp		0.04 ***
		[0.02, 0.07]
N	100	100
R2	0.36	0.24
AIC	463.99	481.38
BIC	479.62	497.01
* p < 0.001; p < 0.01; * p < 0.05.

Wildfire and Coast Distance covariates

{library(jtools); library(huxtable);}
model_names <- c("Wildfire", "Coast Distance")
export_summs(linear_model_p_wfire,linear_model_p_coast, error_format = "[{conf.low}, {conf.high}]", model.names = model_names, statistics = stats)

	Wildfire	Coast Distance
(Intercept)	8.77 ***	8.03 ***
	[8.14, 9.40]	[7.20, 8.85]
p_wfire	-0.38
	[-0.92, 0.16]
p_coast		0.00
		[-0.00, 0.00]
N	100	100
R2	0.02	0.03
AIC	500.59	499.13
BIC	508.40	506.95
* p < 0.001; p < 0.01; * p < 0.05.

Main models selection:

Models generation:

linear_model_1 <-lm(pm25~p_elev+p_coast+p_wfire+lc_1000+c_popden+c_poppov+c_popblack+c_pophisp,data=frm_monitors)
linear_model_2 <-lm(pm25~p_elev+p_coast+lc_1000+c_popden+c_poppov,data=frm_monitors)
linear_model_3 <-lm(pm25~p_elev+p_coast+lc_1000+c_popblack+c_pophisp,data=frm_monitors)

Visualization of coefficients and confidence intervals:

{library(jtools); library(huxtable);}
model_names <- c("Model 1 (All)", "Model 2 (Pop Poverty)", "Model 3 (Pop Characteristics)")
export_summs(linear_model_1,linear_model_2,linear_model_3, error_format = "[{conf.low}, {conf.high}]", model.names = model_names, statistics = stats)

	Model 1 (All)	Model 2 (Pop Poverty)	Model 3 (Pop Characteristics)
(Intercept)	3.11 **	3.42 **	3.61 ***
	[1.11, 5.12]	[1.24, 5.60]	[2.07, 5.15]
p_elev	-0.00 *	-0.00 *	-0.00 **
	[-0.00, -0.00]	[-0.00, -0.00]	[-0.00, -0.00]
p_coast	0.00 **	0.00 *	0.00 ***
	[0.00, 0.00]	[0.00, 0.00]	[0.00, 0.00]
p_wfire	-0.27
	[-0.75, 0.20]
lc_10001	1.44 *	1.80 *	1.64 *
	[0.14, 2.74]	[0.39, 3.22]	[0.37, 2.91]
lc_10002	0.49	0.73	0.59
	[-0.81, 1.80]	[-0.71, 2.16]	[-0.68, 1.86]
c_popden	0.00	0.00 *
	[-0.00, 0.00]	[0.00, 0.00]
c_poppov	0.05	0.20 ***
	[-0.07, 0.17]	[0.09, 0.32]
c_popblack	0.19 *		0.25 **
	[0.02, 0.36]		[0.10, 0.40]
c_pophisp	0.06 ***		0.06 ***
	[0.03, 0.10]		[0.04, 0.09]
N	100	100	100
R2	0.47	0.32	0.45
AIC	455.46	474.20	452.53
BIC	484.12	495.04	473.37
* p < 0.001; p < 0.01; * p < 0.05.

model_names <- c("Model 1", "Model 2", "Model 3")
plot_coefs(linear_model_1,linear_model_2,linear_model_3, model.names = model_names, inner_ci_level = .9)
#, coefs = c("Elevation"="elevation", "Landcover"="landcover", "Density"="density", "Poverty"="poverty", "Black Pop"="popblack", "Dist to Coast"="coast")
grid.text(label= 'Non-scaled' , x = unit(0.5,"npc"), y= unit(1, "npc"), just=c("center", "top"), gp=gpar(cex= 1.2, col='black'))
plot_summs(linear_model_1,linear_model_2,linear_model_3,model.names = model_names, scale = TRUE, inner_ci_level = .9)
grid.text(label= 'Scaled' , x = unit(0.5,"npc"), y= unit(1, "npc"), just=c("center", "top"), gp=gpar(cex= 1.2, col='black'))

Additional elements of model evaluation (more information):

#plot(linear_model_2)

#plot(linear_model_3)

Universal Kriging - Residuals

#Store and convert residuals to geodata:
resid_2<-linear_model_2$residuals
geo_monitors_resid_2<-as.geodata(cbind(geo_monitors_pm25$coords,resid_2))
resid_3<-linear_model_3$residuals
geo_monitors_resid_3<-as.geodata(cbind(geo_monitors_pm25$coords,resid_3))

# Estimate semivariograms from data
var_monitors_resid_2.dot <- variog(geo_monitors_resid_2)
max_2 <- var_monitors_resid_2.dot$max.dist
var_monitors_resid_2.dot<-variog(geo_monitors_resid_2,max.dist=max_2/2) #restricting to half the actual maximum distance
var_monitors_resid_3.dot <- variog(geo_monitors_resid_3)
max_3 <- var_monitors_resid_3.dot$max.dist
var_monitors_resid_3.dot<-variog(geo_monitors_resid_3,max.dist=max_3/2) #restricting to half the actual maximum distance

# Plot semivariogram of residuals
plot(var_monitors_resid_2.dot,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=4, ylim=c(0,11), col = 'blue')
points(var_monitors_resid_3.dot$u,var_monitors_resid_3.dot$v, pch=1, ylim=c(0,11), col = 'red')
# Original semivariogram for reference
points(var_monitors_pm25.dot$u,var_monitors_pm25.dot$v,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=16, ylim=c(0,11), col = 'black')
title("Residual semivariograms for PM2.5 Models 2 & 3\n(restricted < 1/2 max distance)")
footnote('Figure 16')
legend(3.85e5,3,legend=c("PM2.5 data","Model2 residuals","Model3 residuals"), pch=c(16,4,1), col=c("black","blue","red"))

Universal Kriging - Semivariogram adjustments (WLS & MLE)

WLS - Estimate the semivariogram form the PM2.5 monitors results and then adjust a model:

plot(var_monitors_resid_2.dot,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=4, ylim=c(0,11), col = 'blue')
title("Residual semivariograms for PM2.5 Models 2 & 3 - WLS\n(restricted < 1/2 max distance)")
footnote('Figure 17')

# Adjust semivariogram models
rang <-6e+04; sill<- 5; nug  <-4; psill<-sill-nug
model<-"exponential" #"exponential" #"spherical" #"gaussian"
var_monitors_resid_2.fit.WLS<-variofit(var_monitors_resid_2.dot,ini.cov.pars=c(psill,rang),cov.model=model,nugget=nug,weights="cressie")
lines(var_monitors_resid_2.fit.WLS, col = 'blue', lty = 2)

rang <-6e+04; sill<- 5; nug  <-4; psill<-sill-nug
model<-"exponential" #"exponential" #"spherical" #"gaussian"
var_monitors_resid_3.fit.WLS<-variofit(var_monitors_resid_3.dot,ini.cov.pars=c(psill,rang),cov.model=model,nugget=nug)

points(var_monitors_resid_3.dot$u,var_monitors_resid_3.dot$v, pch=1, ylim=c(0,11), col = 'red')
lines(var_monitors_resid_3.fit.WLS, col = 'red', lty = 4)

# Original semivariogram for reference
points(var_monitors_pm25.dot$u,var_monitors_pm25.dot$v,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=16, ylim=c(0,11), col = 'black')
lines(var_monitors_pm25.fit.WLS, col = 'black')

legend(3.85e5,3,legend=c("PM2.5 data","Model2 residuals","Model3 residuals"),lty="1":"2":"4", col=c("black","blue","red"))

## Warning in "1":"2":"4": numerical expression has 2 elements: only the first used

MLE adjustments:

rang <-6e+04; sill<- 5; nug  <-4; psill<-sill-nug
model<-"exponential" #"exponential" #"spherical" #"gaussian"
var_monitors_2.fit.MLE<-likfit(geo_monitors_pm25,ini.cov.pars=c(psill,rang),nugget=nug,trend=trend.spatial(~p_elev+p_coast+lc_1000+c_popden+c_poppov,geo_monitors_pm25))
var_monitors_3.fit.MLE<-likfit(geo_monitors_pm25,ini.cov.pars=c(psill,rang),nugget=nug,trend=trend.spatial(~p_elev+p_coast+lc_1000+c_popblack+c_pophisp,geo_monitors_pm25))

plot(var_monitors_pm25.dot$u,var_monitors_pm25.dot$v,xlab="Distance (meters)",ylab="Semivariogram",pty="m", pch=16, ylim=c(0,11), col = 'black')
footnote('Figure 18')
points(var_monitors_resid_2.dot$u,var_monitors_resid_2.dot$v, pch=4, ylim=c(0,11), col = 'blue')
lines(var_monitors_2.fit.MLE, col = 'blue', lty=2)
points(var_monitors_resid_3.dot$u,var_monitors_resid_3.dot$v, pch=1, ylim=c(0,11), col = 'red')
lines(var_monitors_3.fit.MLE, col = 'red', lty=4)
lines(var_monitors_pm25.fit.MLE, col = 'black')

title("Residual semivariograms for PM2.5 Models 2 & 3 - MLE\n(restricted < 1/2 max distance)")
legend(3.85e5,3,legend=c("PM2.5 data","Model2 residuals","Model3 residuals"),lty="1":"2":"4", col=c("black","blue","red"))

Models performance comparison

# Output all the RMSE values and compare them 
RMSE <- data.frame(c("OK.WLS", "OK.MLE", "UK.Model2.WLS", "UK.Model2.MLE", "UK.Model3.WLS", "UK.Model3.MLE"), 
                   c(CV.ok.wls, CV.ok.mle, CV.uk.m2.wls, CV.uk.m2.mle, CV.uk.m3.wls, CV.uk.m3.mle))
names(RMSE) <- c("Model", "RMSE")
RMSE

Model	RMSE
OK.WLS	1.77
OK.MLE	1.76
UK.Model2.WLS	1.8
UK.Model2.MLE	1.75
UK.Model3.WLS	1.78
UK.Model3.MLE	1.72

gls2<-cbind(Estimate=round(var_monitors_2.fit.MLE$beta, digits=3), SE = round(sqrt(diag(var_monitors_2.fit.MLE$beta.var)), digits=3), CI= 
             paste("[",round(var_monitors_2.fit.MLE$beta-1.96*sqrt(diag(var_monitors_2.fit.MLE$beta.var)), digits=3),";",round(var_monitors_2.fit.MLE$beta+1.96*sqrt(diag(var_monitors_2.fit.MLE$beta.var)), digits=3),"]"))
row.names(gls2)<-c("Intercept","p_elev","p_coast","lc_1000_1","lc_1000_2","c_popden","c_poppov")
gls2

##           Estimate SE      CI                   
## Intercept "5.636"  "1.291" "[ 3.104 ; 8.167 ]"  
## p_elev    "-0.002" "0.001" "[ -0.004 ; -0.001 ]"
## p_coast   "0"      "0"     "[ 0 ; 0 ]"          
## lc_1000_1 "0.757"  "0.475" "[ -0.175 ; 1.688 ]" 
## lc_1000_2 "0.395"  "0.502" "[ -0.59 ; 1.38 ]"   
## c_popden  "-0.001" "0.001" "[ -0.002 ; 0.001 ]" 
## c_poppov  "0.104"  "0.059" "[ -0.012 ; 0.22 ]"

gls3<-cbind(Estimate=round(var_monitors_3.fit.MLE$beta, digits=3), SE = round(sqrt(diag(var_monitors_3.fit.MLE$beta.var)), digits=3), CI= 
             paste("[",round(var_monitors_3.fit.MLE$beta-1.96*sqrt(diag(var_monitors_3.fit.MLE$beta.var)), digits=3),";",round(var_monitors_3.fit.MLE$beta+1.96*sqrt(diag(var_monitors_3.fit.MLE$beta.var)), digits=3),"]"))
row.names(gls3)<-c("Intercept","p_elev","p_coast","lc_1000_1","lc_1000_2","c_popblack","c_pophisp")
gls3

##            Estimate SE      CI                   
## Intercept  "5.095"  "1.065" "[ 3.008 ; 7.183 ]"  
## p_elev     "-0.002" "0.001" "[ -0.004 ; -0.001 ]"
## p_coast    "0"      "0"     "[ 0 ; 0 ]"          
## lc_1000_1  "0.682"  "0.465" "[ -0.228 ; 1.593 ]" 
## lc_1000_2  "0.257"  "0.486" "[ -0.696 ; 1.21 ]"  
## c_popblack "0.092"  "0.073" "[ -0.051 ; 0.236 ]" 
## c_pophisp  "0.051"  "0.02"  "[ 0.012 ; 0.089 ]"

PM(2.5) Predictions

Preparation

Find out the maximum and minium extents of our data to create a grid of locations to predict at

x_min <- min(area_california[,1])
x_max <- max(area_california[,1])
y_min <- min(area_california[,2])
y_max <- max(area_california[,2])

b_grid1<-expand.grid(easting=seq(x_min-1000,x_max+1000,len=100),northing=seq(y_min-10000,y_max+1000,len=100))
b_grid2<-pip(b_grid1,area_california)
frm_predictions_grid<- as.data.frame(b_grid2)

Plot to make sure that the layers all overlay nicely:

plot(shp_california)
points(frm_predictions_grid, col="blue", pch='.')
points(shp_monitors, col = 'red', pch=1)
title("Layers check 1")
footnote('Figure 20')

plot(shp_california)
points(frm_predictions_filtered, col="blue", pch='.')
points(shp_monitors, col = 'red', pch=1)
title("Layers check 2")
footnote('Figure 21')

Ordniary Kriging perdictions

Predict values for WLS-adjusted model:

predictions.OK.WLS<-krige.conv(geo_monitors_pm25,locations=frm_predictions_grid,krige=krige.control(obj.model=var_monitors_pm25.fit.WLS))

Predict values for MLE-adjusted models:

predictions.OK.MLE<-krige.conv(geo_monitors_pm25,locations=frm_predictions_grid,krige=krige.control(obj.model=var_monitors_pm25.fit.MLE))

Universal Kriging predictions

Grid for prediction:

n_min <- min(frm_predictions_filtered$northing)
e_min <- min(frm_predictions_filtered$easting)
temp <- frm_predictions_filtered %>% filter((((northing - n_min + 5000)/10000)%%1==0 ))
frm_predictions_reduced <- temp %>% filter(((easting - e_min + 5000)/10000)%%1==0 |((easting - 4898.9343)/10000)%%1==0)
grid <- data.frame(frm_predictions_reduced$easting,frm_predictions_reduced$northing)
names(grid) <- c("easting","northing")

Predict values for WLS-adjusted models:

predictions.UK.M2.WLS<-krige.conv(geo_monitors_pm25,locations=grid, krige=krige.control(obj.model=var_monitors_resid_2.fit.WLS, trend.d=trend.spatial(~p_elev+p_coast+lc_1000+c_popden+c_poppov,geo_monitors_pm25), trend.l=trend.spatial(~p_elev+p_coast+lc_1000+c_popden+c_poppov,geo_monitors_pm25)))
predictions.UK.M3.WLS<-krige.conv(geo_monitors_pm25,locations=grid, krige=krige.control(obj.model=var_monitors_resid_3.fit.WLS, trend.d=trend.spatial(~p_elev+p_coast+lc_1000+c_popblack+c_pophisp,geo_monitors_pm25), trend.l=trend.spatial(~p_elev+p_coast+lc_1000+c_popblack+c_pophisp,geo_monitors_pm25)))

Predict values for MLE-adjusted models:

krige.M2 <- krige.control(obj.model=var_monitors_2.fit.MLE, 
                      trend.d = trend.spatial(~p_elev+p_coast+lc_1000+c_popden+c_poppov, geo_monitors_pm25), 
                      trend.l = trend.spatial(~p_elev+p_coast+lc_1000+c_popden+c_poppov, frm_predictions_reduced)
                      )
krige.M3 <- krige.control(obj.model=var_monitors_3.fit.MLE, 
                      trend.d = trend.spatial(~p_elev+p_coast+lc_1000+c_popblack+c_pophisp, geo_monitors_pm25), 
                      trend.l = trend.spatial(~p_elev+p_coast+lc_1000+c_popblack+c_pophisp, frm_predictions_reduced)
                      )

predictions.UK.M2.MLE<-krige.conv(geo_monitors_pm25,locations=grid, krige=krige.M2)

## krige.conv: model with covariates matrix provided by the user
## krige.conv: Kriging performed using global neighbourhood

predictions.UK.M3.MLE<-krige.conv(geo_monitors_pm25,locations=grid, krige=krige.M3)

## krige.conv: model with covariates matrix provided by the user
## krige.conv: Kriging performed using global neighbourhood

WLS-adjusted models visualizations

Visualize the Universal Kriging and compare with Ordinary Kriging - WLS adjustments

# OK - WLS - Predictions
ggplot(frm_predictions_grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.OK.WLS$predict)) +   # fill the grid with the predictions.OK$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(3,16)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Ordinary Kriged Predictions - WLS') + labs(fill = "PM2.5\n(μg/m3)") + # add a title
  theme(plot.title = element_text(hjust = 0.5))
footnote('Figure 20')
# OK - WLS - Erros
ggplot(frm_predictions_grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.OK.WLS$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Ordinary Kriged Standard Errors - WLS') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "SE")
footnote('Figure 21')

# UK - Model2 - WLS - Predictions
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.UK.M2.WLS$predict)) +   # fill the grid with the predictions.UK.M2$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(3,16)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Predictions - Model 2 - WLS') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "PM2.5\n(μg/m3)") # centres the title
footnote('Figure 22')
# UK - Model2 - WLS - Erros
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.UK.M2.WLS$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Standard Errors - Model 2 - WLS') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "SE") # centres the title
footnote('Figure 23')

# UK - Model3 - WLS - Predictions
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.UK.M3.WLS$predict)) +   # fill the grid with the predictions.UK.M3$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(3,16)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Predictions - Model 3 - WLS') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "PM2.5\n(μg/m3)") # centres the title
footnote('Figure 24')
# UK - Model3 - WLS - Erros
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.UK.M3.WLS$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Standard Errors - Model 3 - WLS') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "SE") # centres the title
footnote('Figure 25')

MLE-adjusted models visualizations

Visualize the Universal Kriging and compare with Ordinary Kriging - MLE adjustments

# OK - MLE - Predictions
ggplot(frm_predictions_grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.OK.MLE$predict)) +   # fill the grid with the predictions.OK$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(1,18)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Ordinary Kriged Predictions - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "PM2.5\n(μg/m3)")
footnote('Figure 26')
# OK - MLE - Erros
ggplot(frm_predictions_grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.OK.MLE$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Ordinary Kriged Standard Errors - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5)) + labs(fill = "SE")
footnote('Figure 27')

# UK - Model2 - MLE - Predictions
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.UK.M2.MLE$predict)) +   # fill the grid with the predictions.UK.M2$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(1,18)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Predictions - Model 2 - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5))  + labs(fill = "PM2.5\n(μg/m3)") # centres the title
footnote('Figure 28')
# UK - Model2 - MLE - Erros
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.UK.M2.MLE$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Standard Errors - Model 2 - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5))  + labs(fill = "SE") # centres the title
footnote('Figure 29')

# UK - Model3 - MLE - Predictions
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=predictions.UK.M3.MLE$predict)) +   # fill the grid with the predictions.UK.M3$predict values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="red", limits=c(1,18)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Predictions - Model 3 - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5))  + labs(fill = "PM2.5\n(μg/m3)") # centres the title
footnote('Figure 30')
# UK - Model3 - MLE - Erros
ggplot(grid, aes(x=easting, y=northing)) +  # signify that you are predicting for the entire grid
  geom_tile(aes(fill=sqrt(predictions.UK.M3.MLE$krige.var))) +   # fill the grid with the krige.ok$krige.var values
  coord_equal() + # indicate that the X and Y coordinates you are plotting are on the same scale
  scale_fill_gradient(low = "white", high="orange", limits=c(0,3.5)) + # set the color scale to range from red to yellow
  geom_point(data = shp_monitors@data, aes(x = POINT_X, y = POINT_Y), pch=1) + # add the PM2.5 Monitor locations to the plot
  geom_polygon(data = fortify(shp_california), aes(x = shp_california@polygons[[1]]@Polygons[[1]]@coords[,1], y = shp_california@polygons[[1]]@Polygons[[1]]@coords[,2], group = group), fill = NA, colour = 'black') + # add the states
  ggtitle('Universally Kriged Standard Errors - Model 3 - MLE') + # add a title
  theme(plot.title = element_text(hjust = 0.5))  + labs(fill = "SE") # centres the title
footnote('Figure 31')
output<- cbind(grid, predictions.UK.M2.MLE$predict, sqrt(predictions.UK.M2.MLE$krige.var))
write.csv(output, 'output.csv')

Export results

output<- cbind(frm_predictions_grid, predictions.OK.MLE$predict, sqrt(predictions.OK.MLE$krige.var))
write.csv(output, 'output_OK.csv')
output<- cbind(grid, predictions.UK.M2.MLE$predict, sqrt(predictions.UK.M2.MLE$krige.var))
write.csv(output, 'output_UK_M2.csv')
output<- cbind(grid, predictions.UK.M3.MLE$predict, sqrt(predictions.UK.M3.MLE$krige.var))
write.csv(output, 'output_UK_M3.csv')

Characterizing the spatial distribution of annual particulate matter across California

Bruno Raimbault - raimbault.bruno@gmail.conm

May 14, 2020

A. Setup

Data sources

Dependencies

B. Data loading

State boundaries

Monitors data

Data Frame Summary

frm_monitors

Predictions data

Data Frame Summary

frm_predictions_sample

C. Exploratory analysis

Large-scale variations

Short-scale variations

D. Modelling

Ordinary Kriging (OK)

Ordinary Kriging - Semivariogram adjustments (WLS & MLE)

Universal Kriging (UK)

Universal Kriging - Linear Models

Covariates selection:

Main models selection:

Universal Kriging - Residuals

Universal Kriging - Semivariogram adjustments (WLS & MLE)

Models performance comparison

PM(2.5) Predictions

Preparation

Ordniary Kriging perdictions

Universal Kriging predictions

WLS-adjusted models visualizations

MLE-adjusted models visualizations

Export results