In today’s world, understanding and predicting mortality rates in different countries is of paramount importance for public health officials, policymakers, and researchers. Mortality rates are not only indicators of a country’s health status but also reflect the effectiveness of its healthcare systems, socio-economic conditions, and various other factors.
The global COVID-19 pandemic has underscored the critical need for accurate mortality rate predictions to effectively allocate resources, implement preventive measures, and devise policies aimed at minimizing the impact of future health crises. However, mortality rates are influenced by a multitude of complex factors, including demographics, healthcare infrastructure, lifestyle choices, and environmental conditions.
In this context, the development of regression models to predict mortality rates in countries becomes a significant endeavor. By leveraging historical data and incorporating relevant predictors, such as population demographics, economic indicators, and healthcare expenditures, we can strive to create robust regression models capable of providing valuable insights into mortality trends and facilitating proactive interventions.
The ultimate goal of this research is to build a regression model that accurately predicts mortality rates in different countries. Such a model could empower decision-makers with actionable insights to enhance public health policies, allocate resources efficiently, and ultimately improve the well-being and longevity of populations worldwide.
Dataset Overview: Adult Mortality Rate (2019-2021) The dataset provides comprehensive information on adult mortality rates across different countries from 2019 to 2021. Below is a summary table detailing key variables and their descriptions:
Adult mortality rate (2019-2021).csv
Variable | Description |
---|---|
Countries | Country of study. |
Continent | Continent location of the country. |
Average_Population | Average population of the country under study for 2019-2021 in thousands. |
Average_GDP | Average GDP of the country under study for 2019-2021 in millions of dollars. |
Average_GDP_per_capita | Average GDP per capita of the country under study for 2019-2021 in dollars. |
Average_HEXP | Health Expenditure Per Capita in the country under study in dollars. |
Development_level | Level of development of the state under study (calculated by GDP per capita of the country). Please note that in this dataset we calculate this indicator only by calculating GDP per capita! Despite the fact that the United Nations (UN) does not have an unambiguous classification of countries into developed, developing and backward based on only one indicator, such as the amount of GDP per capita. It uses a wider range of economic, social and quality indicators to determine the level of development of countries. |
AMR_female | Average mortality of adult women in the country under study (per 1000 adult women per year) for 2019-2023. |
AMR_male | Average mortality of adult men in the country under study (per 1000 adult men per year) for 2019-2023. |
Average_CDR | Average crude mortality rate for 2019–2021 in the country under study. |
`Adult_mortality` <- read.csv("C:/Users/gabri/Proyectos R/Adult Mortality Rate/Adult mortality rate (2019-2021).csv", stringsAsFactors=TRUE)
#View of Structure
glimpse(Adult_mortality)
## Rows: 156
## Columns: 10
## $ Countries <fct> Afghanistan, Albania, Algeria, Ango…
## $ Continent <fct> Asia, Europe, Africa, Africa, North…
## $ Average_Pop.thousands.people. <dbl> 38947.06, 2834.57, 43445.00, 33428.…
## $ Average_GDP.M.. <dbl> 17995.64, 16263.16, 160325.54, 7214…
## $ Average_GDP_per_capita... <dbl> 462.05, 5737.44, 3690.31, 2158.28, …
## $ Average_HEXP... <dbl> 80.53, 420.17, 215.53, 61.29, 868.6…
## $ Development_level <fct> Short, Short, Short, Short, Average…
## $ AMR_female.per_1000_female_adults. <dbl> 204.85, 53.54, 70.85, 218.03, 58.76…
## $ AMR_male.per_1000_male_adults. <dbl> 322.17, 103.55, 102.21, 316.81, 103…
## $ Average_CDR <dbl> 7.08, 10.20, 4.78, 7.88, 6.06, 8.35…
Our dataset comprises a total of 156 observations
and
10 columns
, featuring variables of only
two types
:
Fct
(Factor) : Comprising 3 variablesDbl
(Double) : Comprising 7 variables#Check Missing Values
colSums(is.na(Adult_mortality))
## Countries Continent
## 0 0
## Average_Pop.thousands.people. Average_GDP.M..
## 0 0
## Average_GDP_per_capita... Average_HEXP...
## 0 0
## Development_level AMR_female.per_1000_female_adults.
## 0 0
## AMR_male.per_1000_male_adults. Average_CDR
## 0 0
Great! We have 0 missing values
in this Dataset
#Quick view of Stadistical infomration about numerics variables
describe(
Adult_mortality %>%
select_if(is.numeric)
)
# Boxplots for better visualization of statistical information
ggplot(Adult_mortality, aes(x = "", y = Average_Pop.thousands.people.)) +
geom_boxplot() +
labs(title = "Boxplot of Average Population (thousands of people)",
x = NULL,
y = "Average Population (thousands)")
ggplot(Adult_mortality, aes(x = "", y = Average_GDP.M..)) +
geom_boxplot() +
labs(title = "Boxplot of Average GDP (in millions of dollars)",
x = NULL,
y = "Average GDP (M$)")
ggplot(Adult_mortality, aes(x = "", y = Average_GDP_per_capita...)) +
geom_boxplot() +
labs(title = "Boxplot of Average GDP per Capita",
x = NULL,
y = "Average GDP per Capita")
ggplot(Adult_mortality, aes(x = "", y = Average_HEXP...)) +
geom_boxplot() +
labs(title = "Boxplot of Average Health Expenditure Per Capita",
x = NULL,
y = "Average Health Expenditure Per Capita ($)")
ggplot(Adult_mortality, aes(x = "", y = AMR_female.per_1000_female_adults.)) +
geom_boxplot() +
labs(title = "Boxplot of AMR (per 1000 Female Adults)",
x = NULL,
y = "AMR (per 1000 Female Adults)")
ggplot(Adult_mortality, aes(x = "", y = AMR_male.per_1000_male_adults.)) +
geom_boxplot() +
labs(title = "Boxplot of AMR (per 1000 Male Adults)",
x = NULL,
y = "AMR (per 1000 Male Adults)")
ggplot(Adult_mortality, aes(x = "", y = Average_CDR)) +
geom_boxplot() +
labs(title = "Boxplot of Average Crude Mortality Rate",
x = NULL,
y = "Average Crude Mortality Rate")
In most cases, we observe two observations with very high values
across all variables. These observations correspond to
China
and India
, which, due to their immense
population, tend to skew the data. However, it’s essential to retain
these countries for our analysis and linear regression model, as their
data is relevant and representative of the global population
landscape.
Observemos algunas visualizaciones :
# Bar plot for Average GDP per capita grouped by Continent
ggplot(Adult_mortality, aes(x = reorder(Continent, Average_GDP_per_capita...),
y = Average_GDP_per_capita...,
fill = Continent)) +
geom_bar(stat = "identity", show.legend = FALSE) +
coord_flip() +
labs(title = "Average GDP per Capita by Continent",
x = "Continent",
y = "Average GDP per Capita")
# Bar plot for Average Population grouped by Continent
ggplot(Adult_mortality, aes(x = reorder(Continent, Average_Pop.thousands.people.),
y = Average_Pop.thousands.people.,
fill = Continent)) +
geom_bar(stat = "identity", show.legend = FALSE) +
coord_flip() +
labs(title = "Average Population by Continent",
x = "Continent",
y = "Average Population (thousands)")
# Density plot for Health Expenditure Per Capita grouped by Development Level
ggplot(Adult_mortality, aes(x = Average_HEXP..., fill = Development_level)) +
geom_density(alpha = 0.6) +
labs(title = "Health Expenditure Per Capita Density by Development Level",
x = "Health Expenditure Per Capita",
fill = "Development Level")
# Density plot for Adult Mortality Rate (Female) grouped by Development Level
ggplot(Adult_mortality, aes(x = AMR_female.per_1000_female_adults., fill = Development_level)) +
geom_density(alpha = 0.6) +
labs(title = "Adult Mortality Rate (Female) Density by Development Level",
x = "AMR Female (per 1000 Female Adults)",
fill = "Development Level")
# Density plot for Adult Mortality Rate (Male) grouped by Development Level
ggplot(Adult_mortality, aes(x = AMR_male.per_1000_male_adults., fill = Development_level)) +
geom_density(alpha = 0.6) +
labs(title = "Adult Mortality Rate (Male) Density by Development Level",
x = "AMR Male (per 1000 Male Adults)",
fill = "Development Level")
# Boxplot of Avg Crude Mortality by Continent
ggplot(Adult_mortality, aes(reorder(Continent, Average_CDR), Average_CDR, color = Continent)) +
geom_boxplot(show.legend = FALSE) +
coord_flip() +
labs(title = "Boxplot of Average Crude Mortality Rate by Continent",
x = "Continent",
y = "Average Crude Mortality Rate")
# Boxplot of Avg Crude Mortality by Continent
ggplot(Adult_mortality, aes(reorder(Development_level, Average_CDR), Average_CDR, color = Development_level)) +
geom_boxplot(show.legend = FALSE) +
coord_flip() +
labs(title = "Boxplot of Average Crude Mortality Rate by Development_level",
x = "Development_level",
y = "Average Crude Mortality Rate")
#Correlation between Variables
# Calculate correlation matrix
correlation <- cor(
Adult_mortality %>%
select_if(is.numeric)
)
# Melt correlation matrix for plotting
correlation_data <- melt(correlation)
# Plot correlation heatmap
ggplot(correlation_data, aes(Var1, Var2, fill = value))+
geom_tile()+
scale_fill_gradient2(low = "lightblue", high = "#F94C10", mid = "#F8DE22",
midpoint = 0, limit = c(-1,1),
name = "Correlation") +
theme_minimal() +
geom_text(aes(label = round(value, 2)))+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Correlation Heatmap", x = "", y = "")
# Create linear regression model
Regresion_model <- lm(Average_CDR ~ Average_GDP.M.. + AMR_male.per_1000_male_adults.+ Development_level+ Continent, data = Adult_mortality)
# Display model summary
summary(Regresion_model)
##
## Call:
## lm(formula = Average_CDR ~ Average_GDP.M.. + AMR_male.per_1000_male_adults. +
## Development_level + Continent, data = Adult_mortality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0593 -1.0887 -0.2009 1.0670 6.6111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.652e+00 8.025e-01 5.796 4.02e-08 ***
## Average_GDP.M.. 2.883e-07 1.188e-07 2.426 0.0165 *
## AMR_male.per_1000_male_adults. 1.306e-02 2.431e-03 5.371 3.03e-07 ***
## Development_levelHigh -1.464e+00 5.963e-01 -2.455 0.0153 *
## Development_levelShort -4.363e-01 4.830e-01 -0.903 0.3679
## ContinentAsia -7.679e-01 5.569e-01 -1.379 0.1701
## ContinentAustralia -3.240e-01 6.573e-01 -0.493 0.6229
## ContinentEurope 5.459e+00 6.050e-01 9.022 9.75e-16 ***
## ContinentNorth America 5.286e-01 6.076e-01 0.870 0.3857
## ContinentSouth America 7.510e-01 7.364e-01 1.020 0.3095
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.961 on 146 degrees of freedom
## Multiple R-squared: 0.5847, Adjusted R-squared: 0.5591
## F-statistic: 22.84 on 9 and 146 DF, p-value: < 2.2e-16
Now let’s check the validity of our model:
# Add predictions and residuals to the data
Ajust <- Adult_mortality %>%
add_predictions(Regresion_model) %>%
add_residuals(Regresion_model)
# Plot diagnostic plots
par(mfrow = c(2,2))
plot(Regresion_model)
Checking the normality of the model:
# Jarque-Bera test for normality
jarque.bera.test(Ajust$resid)
##
## Jarque Bera Test
##
## data: Ajust$resid
## X-squared = 15.171, df = 2, p-value = 0.0005078
# Summary of residuals
summary(Ajust$resid)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.0593 -1.0887 -0.2009 0.0000 1.0670 6.6111
Checking the Heteroscedasticity of the model :
# Breusch-Pagan test for heteroscedasticity
bptest(Regresion_model)
##
## studentized Breusch-Pagan test
##
## data: Regresion_model
## BP = 15.796, df = 9, p-value = 0.07127
# Heteroscedasticity-robust coefficient test
coeftest(Regresion_model, vcov = vcovHC(Regresion_model, type = "HC2"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6515e+00 7.7820e-01 5.9773 1.663e-08 ***
## Average_GDP.M.. 2.8833e-07 3.4669e-07 0.8316 0.40697
## AMR_male.per_1000_male_adults. 1.3058e-02 2.5455e-03 5.1297 9.086e-07 ***
## Development_levelHigh -1.4640e+00 7.7829e-01 -1.8810 0.06196 .
## Development_levelShort -4.3627e-01 5.8916e-01 -0.7405 0.46019
## ContinentAsia -7.6787e-01 4.4690e-01 -1.7182 0.08788 .
## ContinentAustralia -3.2395e-01 5.4732e-01 -0.5919 0.55484
## ContinentEurope 5.4589e+00 7.7925e-01 7.0053 8.383e-11 ***
## ContinentNorth America 5.2863e-01 5.5418e-01 0.9539 0.34171
## ContinentSouth America 7.5097e-01 5.2250e-01 1.4373 0.15278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1