DATA150_Serena

Literature Review

Spead of infectious disease in Africa

Word count: 2230

Introduction

At the beginning of the 21st century, infectious diseases caused at least 10 million deaths each year, which consists of nearly a quarter of deaths worldwide. Some generally recognized factors for emerging infectious diseases includes microbial adaptation, human susceptibility to infection, human demographics and behavior, climate and weather, changing ecosystems, international travel and commerce, breakdown of public health measures, poverty and social inequality, lack of political will and intent to harm. (Fenollar, F., & Mediannikov, O. 2018). Africa, as a typical case, is represented by most of these factors, which causes Africa to be the origin of many infectious diseases. In this literature review, I will focus on the spread of indirect contact diseases in Africa. Vector borne diseases is one typical type of infectious diseases that can be transmitted through parasites, bacteria or viruses. They account for more than 17% of all infectious diseases and more than 700,000 people die from them annually. The burden of these diseases is highest in tropical and subtropical regions and impact particularly the poorest population, especially considering that hygiene issues and insect problems are all contributing factors to the spread of infection.

According to Amartya Sen, development means granting freedom to people and remove unfreedom from the society. If hygiene issues like free access to clean water and uncontrolled parasite harm cannot be solved, not only are African people’s well-beings will constantly be affected, once the diseases are spread to other places of the world, this will pose threat to people around the whole world. Scientists have applied methods like gravity model, radiation model are used to model human mobility and transmission to achieve better understandings about the infections. In some articles, authors also use algorithm models like XGBoost, K-NN, Decision Tree, Random Forest, ExtraTree, AdaBoost, etc.

Spread of infectious diseases due to climate changes

Transmission of vector borne infectious diseases has three determinants: the pathogen, the vector and the transmission environment. According to this rule, in order for the pathogens to survive, reproduce and eventually successfully transmit, a favorable environment is important. However, it is hard to determine the exact relation between climate change and infectious diseases given the existence of different climate variables, disease types and environmental settings in addition to spatial segregation and time lag effects. Geographically, areas with abnormally high temperatures have received more research attention. Unfortunately, there is less research on the regions suffer most from climate variability and extreme events. In response to such a problem, Lu Liang and Peng Gong carried out their study and categorized their data based on Webber’s infectious disease classification system. The effect of temperature variability is visualized in a spatial map and a Climate-disease-method-scale thematic network is built from niche modelling. Failure to invest in adapting to human health or mitigating climate change may make communities and countries less prepared, thereby increasing the likelihood of serious adverse consequences. Only by comprehensively understand the association between infectious disease and climate change and identify the knowledge gap from the spatial-temporal perspective can people’s anxiety to infectious disease be alleviated.

Case study of Namibia where human mobility data are available

Apart from climate issues, Human mobility is another major influencing factor for disease spreading. For some countries in Africa where mobile phones are widely used by nearly all citizens, call detailed records (CDR) can be used as a pretty reliable indicator in measuring the migration at multiple temporal and spatial scales. For example, Namibia, in southern Africa, is a country where one major network operator obtains more than 76% of market share and provides network spatial coverage 75% population. Once the mobile phone data are proved useful, predicting human movement in Namibia would become much easier. To achieve the goal, researchers used three models that are non-spatial regression approaches: CDR-based linear models, gravity-type spatial interaction models and GTSIMs. CDR-based linear models refers to taking CDR-derived data and combine them with the covariates (including potential migration-related demographic, socioeconomic, environmental, and geographic variables like labor force participation, marital status, educated population by region, etc.), that are used in gravity model. Pearson correlation coefficients was initially used to assess how well CDR represent the census migration data. Then, by using simple the Gravity-type spatial interaction models, a proportional relationship is established between the flow of population and total population while there is an inversely proportional between flow of population and distance.

Given Namibia’s small population, the researchers only tested their models by replacing the total population variable with the percentage of the population living in urban areas and the amount of precipitation at the place of origin and destination in order to prevent overfitting. The researchers fit GTSIM to a logistic regression function and other CDR variables were tested to evaluate how CDR-derived migration data can improve the performance of the gravity model. (Lai, S., Erbach-Schoenberg, E.z., 2019) After fitting the census statistics to each model, the dataset was split to calculate indicators like RMSE and R-squared. The model with the smallest MSE value is the one with best performance. The result verified that CDR is a good indicator to estimate migration. Once human movement can be predicted, the transmission of pathogens could be better monitored and even prevented in advance. Before the epidemic season comes, ventilate and disinfect crowded places, especially confined spaces. If the spread of certain disease has unfortunately already started, officers in health bureau can quickly locate the source of a certain infectious disease, and control people’s access to places where the epidemic is severe, so as to block the further spread of the epidemic.

Migration prediction in the place where human mobility data not available

For some countries in west Africa, however, human mobility data are not readily available, so scientists have to build models to estimate the migration pattern by analyzing the general fluxes between districts. The first model used is the gravity model. In this model, researchers assume that there is a log-linear relationship between the relative flow of regional population and the distance between regions, so the model emphasizes the attractiveness of large population centers. The radiation model also considers the size and distance of the starting and ending populations, but also considers the extraction of other populations within the same radius. Therefore, assuming that each area has the potential to compete with each other, the radiation model will reflect the possible ways of commuting. The adjacency network encodes the number of regional boundaries that an individual needs to cross when moving from one area to another. Therefore, this indicator reflects the influence of national and subnational borders on regional movements. Each of these models has proven to be useful, depending on local conditions, and can infer daily commuting patterns, long-term movements, and overall population diffusion processes. We used these three indicators and their interactions to capture possible unexpected effects, but these indicators alone cannot describe these indicators. These indicators are terms used in disease transmission models. There are many other possible movement models, such as Markov models, but they usually require high-resolution information about the user’s personal trajectory. (Kraemer, M.U.G., Golding, N., Bisanzio, D. et al. 2019).

Additionally, a two-staged model is used to simulate the impact of human movement on the geographic distribution and propagation speed of EVD within and between the three core countries. The first stage is the invasion model, which is a logistic regression model, that estimate the probability that one or more cases would appear in previously disease-free region. To test the predictive power of this model in real life scenarios, the model is refitted to each week during the epidemic and use data available only from the previous weeks. The second stage is a disease transmission model that is built for all districts that report one or more cases within a two-week period. To fit covariate into transmission models, model selection is carried out in R using AIC and LRT. The performance of the model is evaluated by comparing the number of predicted cases (in the sample) two weeks ago with the number of cases observed. Mobile phone data from neighboring countries is used to test the entire model. The analysis includes re-evaluating the best country-specific mixing coefficients and fitting a country and region-based transmission model. AIC as well as R-squared are used to evaluate model performance. (Kraemer, M.U.G., Golding, N., Bisanzio, D. et al. 2019).

Respond to epidemic outbreak in place where relevant data are complete

Human society has already gone through several major spread of infectious diseases, for example, the influenza pandemic in 2009, the Middle-East Respiratory Syndrome coronavirus, Ebola virus outbreak in west Africa, and the on-going COVID-19. An important feature of the modern response to epidemics is the increasing emphasis on using all available data to inform the response in real time and allow evidence-based decision making. (Polonsky Jonathan A., Kamvar Zhian N., etc. 2019) In the article “Outbreak analytics: a developing data science for informing the response to emerging pathogens”, researchers divided “outbreak data” into case data, background data and intervention data. Case data includes the description of reported cases, intervention data refers to actions taken or decisions made to intervene disease outbreak. Background data, which I’m focusing on in this part of literature review since it is more related to data science and human development, includes demographic information, movement data, epidemiological data etc. In measuring transmissibility, one factor is growth rate which is estimated using a log-linear model. Other mathematical and statistical models are used to forecast future incidences for advocacy and planning purposes, while mechanistic or simulation model include a clearer representation of the different factors that may affect transmission. In addition to maps, censuses, serological surveys or genetic databases, natural history data of past epidemics (such as key delay distribution and transmissibility) can also replace real-time estimates, especially in the early stages. More efforts are needed to organize open data sources, evaluate their quality, and make them widely useful for the community.

Response to disease outbreak in place where dataset is not complete

In other cases when data are not complete (for example when studying Cholera dataset in Tanzania), Machines learning models are applied to establish a useful function for predicting values. Since there is no one single algorithm that works best for every problem due to factors like size or structure of dataset, the researchers select several algorithms (i.e. XGBoost, K-Nearest Neighbors, Decision Tree, Random Forest, ExtraTree, AdaBoost, and Linear Discriminant Analysis) to process data. Given the data imbalance problem, the researchers performed oversampling by using Adaptive Synthetic Sampling Approach to restore the sampling balance by reducing biases that are introduced by original imbalance data distribution. Principal component analysis is also carried out to reduce the high dimensionality of data in the original dataset. Among all these models, XGBoost and K-NN perform best in terms of their sensitivity, specificity, and balanced-accuracy metrics. (Judith Leo, Edith Luhanga, 2019) In the example of Tanzania’s Cholera infection condition, data analysis results show that compared with other months, the number of cholera patients increased in August, September and April. In addition, the temperature range is 22°C to 32°C, the rainfall level is greater than 50mm, and the humidity level is greater than 75%, which is conducive to the occurrence of cholera.

The study proves the significance of XGBoost algorithm: it is an implementation of gradient boosting decision trees, which aims to improve efficiency, flexibility and portability, and can improve execution speed and model performance. It is also ideal for detecting anomalies in regulated environments where data is often highly imbalanced, such as DNA sequencing, credit card transactions, and network security. This research has greatly improved our understanding of how to improve Tanzania’s healthcare system and policies and the research enlights researchers on how to deal with imbalanced data sets that reduce the predictive performance of the model, as well as the role of oversampling and machine learning strategies in healthcare data. The deployment of high-quality data collection and machine learning technology will significantly manage the complexity of practical problems, such as data-driven analysis, cholera prediction and eradication methods, and large-scale epidemics.

Reflection

Within all other human development goals, health and well-being that is most tightly related with human survival. Spread of infectious diseases post threats on not only one person but people living in a whole community, or even people around the world. Since scientists have identified human mobility or migration as one of the most significant and influential cause of spreading infectious diseases, so they started to build models or take other attempts to look deeper into this problem. These infectious diseases have impact on health indicators, economic indicators, education, as well as global economic and human development. Considering the enormous cost spent on treating these diseases, countries in poverty, especially those with difficult access to clean water and food, would suffer even harder poverty and be more deeply trapped, which ends up a vicious cycle. Even though researchers carry out studies to show how to effectively monitor human migration, making policies responsively is still be the responsibility of local health bureau. Moreover, although WHO calls for prevention, treatment, care and long-term support, the fact is that therapy care for patients is still far to be reached. The reason is that the urgent need in underdeveloped areas is still clean water and materials. One-dollar investment in sanitation can generate a return of three to thirty-four dollars. If health problems can be improved, the underdeveloped regions will save 7 billion U.S. dollars in treatment costs for various diseases each year and tens of thousands of deaths could be avoided. However, all of these require efforts from local governments. Based on this literature review, I would set my research question as “In Africa, what benefits does human mobility dataset bring to health-care industry regarding the control of infectious disease”.

Reference

  1. Lu Liang, Peng Gong, Climate change and human infectious diseases: A synthesis of research findings from global and spatio-temporal perspectives, Environment International, Volume 103, 2017, Pages 99-108, ISSN 0160-4120, https://doi.org/10.1016/j.envint.2017.03.011.
  2. Sorichetta, A., Bird, T., Ruktanonchai, N. et al. Mapping internal connectivity through human migration in malaria endemic countries. Sci Data 3, 160066 (2016). https://doi.org/10.1038/sdata.2016.66.
  3. Boutayeb A. (2010) The Impact of Infectious Diseases on the Development of Africa. In: Preedy V.R., Watson R.R. (eds) Handbook of Disease Burdens and Quality of Life Measures. Springer, New York, NY. https://doi.org/10.1007/978-0-387-78665-0_66
  4. Kraemer, M.U.G., Golding, N., Bisanzio, D. et al. Utilizing general human movement models to predict the spread of emerging infectious diseases in resource poor settings. Sci Rep 9, 5151 (2019). https://doi.org/10.1038/s41598-019-41192-3
  5. Polonsky Jonathan A., Baidjoe Amrish, Kamvar Zhian N., etc. Whitworth Jimmy and Jombart Thibaut 2019 Outbreak analytics: a developing data science for informing the response to emerging pathogensPhil. Trans. R. Soc. B3742018027620180276.(20 May 2019). https://doi.org/10.1098/rstb.2018.0276
  6. Fenollar, F., & Mediannikov, O. (2018). Emerging infectious diseases in Africa in the 21st century. New microbes and new infections, 26, S10–S18. https://doi.org/10.1016/j.nmni.2018.09.004.
  7. Judith Leo, Edith Luhanga, Kisangiri Michael, “Machine Learning Model for Imbalanced Cholera Dataset in Tanzania”, The Scientific World Journal, vol. 2019, Article ID 9397578, 12 pages, 2019. https://doi.org/10.1155/2019/9397578.