DATA150_Serena

The aim of this research proposal is to decide how to apply scientific methods to solve or alleviate the transmission of vector borne infectious diseases in non-subtropical and non-tropical African regions. Previous studies on this health problem are focused on tropical and subtropical African areas, so those poorest places in Africa that are also vulnerable to infection transmission received insufficient attention. Given the mobility of the population in various regions of Africa, in order to control the spread of infectious diseases in a macroscopic scale, every African region needs to be considered. Once the transmission of vector-borne infectious diseases is understood and managed, hundreds of thousands of deaths could be avoided annually. Local residents could also be saved from fear for infections and people’s well-beings could therefore be improved.

Research Plan

In order for the researchers to picturize the distribution of infections and conduct further research, helpful scientific models are definitely indispensable. Only by achieving all the scientific models and finish interpreting all the data can we eventually provide useful suggestions to health agencies on plausible responses to infection transmissions. We will carry out the whole experiment in the following steps.

1. Data Collection

We should start with collecting necessary data for non-subtropical and non-tropical areas. Sources for inferring human mobility patterns are census surveys, mobile phone data or other mobile technologies. Call Detailed Records are commonly used for predicting human mobility, which are easily accessible from local network operators. Disease related data such as virus infection records for certain disease (for example cholera) can be collected from local health agencies. For some areas where CDRs data are not ideally complete, simply ignoring the null values is not wise since this could result in missing important information. In response to these missing data, we may implement “imputer” from scikit-learn to replace the null values with our best guesses or representative values like mean, median or mode. We also need to collect data for predictors. For different areas in Africa, the transmission of vector borne infections may be influenced most significantly by different environmental factors, so we also need to sort out the factors that contribute to each major African district with computer models. Common environmental factors include water contamination, temperature, humidity and other seasonal changes. In most previous studies, little attention has been paid to density of wetlands and swamps. Considering many vector-borne diseases are transmitted by mosquitos which live in humid habitats, other factors like water source density or distance to wetlands and swamps may also be possible predictors. These data can be collected through satellite maps or manually collected by local people. Since these data may not equally distributed, we should also consider data imbalance problem. We may need to resample our dataset (either oversampling the minority class or under sampling the majority class). These data are going to be useful in the following analysis and prediction models.

2. Model Selection and Construction

The second step is to build models to interpret the data. From the CDRs data, we can infer the population change in a specific district within desired time periods. Based on this information, we can build a gravity-type spatial interaction model to estimate then migration flows between administrative units. Then, we extend the basic gravity-type spatial model by including additional geographic and socioeconomic factors. Since the CDR only represent a sample of the whole population, a regression model is built to model the census and come to a non-linear relationship. The correlation coefficient (R2) was selected to measure the variance explained after verifying homoscedasticity and testing over dispersion using a chi-squared test. The aim of this process is to map the internal connectivity through human migration of pandemic areas. To determine the environmental determinants, by reviewing literatures about infectious diseases in other regions, we can narrow down the types of algorithms to run the data on so as to reduce number of trials. The computer language most frequently used for processing geospatial data are Python and R, and according to previous literatures, the most promising algorithms include XGBoost, K-Nearest Neighbors (K-NN), Decision Tree, Random Forest, ExtraTree, AdaBoost, and Linear Discriminant Analysis (LDA). We first divide the data into equal folds and split them into training data and testing data. Then we compare the training scores and testing scores for each algorithm model and see which one shows the best predictive performance. The result of this step would tell us which environment factor is most closely related to burst of infectious disease transmission.

3. Response Measures and Anticipated Outcomes

Based on the research results, health bureau officers can enact policies accordingly and alleviate the infection transmission. For example, if the result shows that annual human mobility peaks in July which is a wet season in tropical regions, given that the hot and humid environment favors the transmission of malaria, health agents should renovate the local vegetation and lakes or ponds before the weather get hot and wet to reduce the environment suitable for mosquito breeding. In addition, government should help with promoting up to date vaccination and popularize repellents to make them affordable for the poor. Once someone is diagnosed, use the CDR data to track his or her movement. Provide him or her with instant treatment and prevent others from contacting with this person’s blood or any secretions. Health agents should also ensure strict hygiene control of food and avoid non-pasteurized dairy products. The anticipated outcome is to vaccinate most people in high-risk areas for infection to at least form herd immunity, and arm residents with sufficient repellents and pesticides and clean food, especially for those who live nearby the wetlands.

Argument For This Proposal and Budget Allocation

The design for this research is supported by similar studies carried out in other African regions in the previous years, so this research plan is pretty practical. If we do not pay attention to the place of origin of the vector-borne infections, these diseases may spread to all parts of the world, thus posing threat to the health of more people. We can stop this from happening by monitoring human mobility and take the law of transmission of these diseases under control, so this research is quite necessary. As for the $100,000, I’m considering allocating this one-year research budget in the following way. Some of the money, about 20%-30% will be used to cover on-site living expenses (including travelling expenses when commuting to different areas in Africa to conduct research) for scientific researchers. Researchers need a good study environment which is equipped with hardware facilities including computers and other equipment that help carrying out modelling and data analysis; this would take roughly $20,000. About $10,000 would be paid to organization like local network providers for information (like the CDRs) and we also need to spend approximately $10,000 to hire reliable workers to collect data related to wetlands and swamp in corresponding African areas which might not be directly obtained from satellite data. Another 10% of the budget would be spent on organizing and publishing study outcomes, and then communicate with local officials to formulate and implement countermeasures.

Possible Objections

Firstly, call detailed record data are often proprietary, expensive and time consuming to collect and process. It can be time and fund wasting to collect all necessary CDR data for those places in resource-poor settings that lack previous attention. However, according to the study conducted by Kraemer, Golding, N., Bisanzio, D. et al, general human movement estimates may provide insightful predictions of disease invasion in replace for CDR data, especially for areas where mobility data are often unavailable.

Secondly, the reliability of Call Detailed Records can also be called into questions. In sub-Saharan Africa, very limited areas are digitized, so mobile phones are not so widely applied that gives CDR the significance to represent the whole picture of human activity. That is to say the monitored amount of human movement would be far less than actual amount. This would lead to underestimation of overall human mobility, which may lead officers to overlook the risk brought by transmission among people. This might be a major concern in previous years; however, ever since 2018, the digital device market in sub-Saharan Africa has been surging in a rapid pace. According to a report issued by GSM Association, an industry organization that represents the interests of mobile network operators worldwide, the number of mobile users in sub-Saharan Africa is predicted reach 500 million in 2020, which consist nearly half of sub-Saharan African population. Given that the rate of popularization of electronic products will continue to increase in the following years, it will not be long before the mobile phone holding rate in Africa will reach a level that is representative of all sub-Saharan population.

Another possible objection is about whether call detailed records data can accurately reflect people’s migration pattern. According to a survey conducted by mobile wireless providers, 88% of millennials prefer texting over calling 60% of Generation Z’ers say they hate calling people. These statistics may not be as rigorous as a scientific research, but we can still infer that there’s a certain portion of population do not heavily rely on making phone calls for communication. That is to say, even when these members of the population do move from one place to another, they may not leave any phone call records which are believed to indicate their movements. However, the good news is that, this avoidance against making phone calls seem to be prevalent only among young people: the anxiety to making phone calls drop dramatically after the age 35. I assume that in most African regions, most of the young people do not own a mobile phone until they enter adolescence, and during adolescence, young people tend to stay in a specific area to go to school or help with family works, so they are not very likely to frequently move around. In contrast, those enter their 30s are very likely to travel from one location to another, either for searching for job opportunities or for relocating their families. In addition, since those over 35 don’t seem to mind calling, there’s hope that millennials will grow out of this anxiety towards phone calls soon.

Evaluation

This research proposal is a very rough one because the allocation of research budget should be more rigorously studied and the plan details are drafted based on idealistic situations. To implement these plans in real life, a more precise investigation of the actual situation will be indispensable. Moreover, this research needs more financial support, especially considering that the whole research might last more than one single year, so maybe the researchers should cooperate with more institutions and get more funding. This process would be smoother if the researchers can get support from the government or other official channels, so I suggest using media or other open channels to promote the importance of this research so as to receive support from the society.