Bias Reduction Using Propensity Score Matching In Observational Data – Complete project material

ABSTRACT

In observational studies, ―case-control groups‖ often exhibit imbalance on covariates. This covariate imbalance is confounded with treatments. It is difficult to attribute differences in responses to the ―treatment‖ because the covariates are also believed to influence the response. Propensity score matching attempts to reduce the confounding effects of covariates, and so allows differences of responses to be attributed to differences of treatments. In addition, the values of the propensity scores can serve as a diagnostic tool to evaluate the comparability of the groups in a quantitative way. When two groups are being compared, the propensity score can be calculated as the predicted probability of group membership from a logistic regression. It represents the ‗tendency‘ for an observation to be in one group or the other. By adjusting for the value of the propensity score in a linear model, one effectively adjusts for any group differences attributed to the variables used to create the propensity score. Here we present an experiment where propensity scores were used to adjust for differences between a case and a control group (treatment group and a non-randomized control group). Propensity scores were created using SPSS Version 16 Binary Logistic Regression Procedure on a Windows Vista platform. A linear model was also estimated using the same. Groups were compared using independent samples t-tests and chi-square tests as appropriate. Standardized differences were calculated and matching was done with Microsoft Excel Version 2007 on a Windows Vista platform. The results showed that the Propensity Score Matching was successful in reducing the bias on the covariates.

DEDICATION ………………………………………………………………………………………………….. I DECLARATION ……………………………………………………………………………………………… II CERTIFICATION …………………………………………………………………………………………. III ACKNOWLEDMENTS ………………………………………………………………………………….. IV ABSTRACT …………………………………………………………………………………………………….. V LIST OF TABLES …………………………………………………………………………………………. IX LIST OF APPENDICES …………………………………………………………………………………… X CHAPTER 1 Introduction …………………………………………………………………………………. 1 1.1 Statement of Problem …………………………………………………………………………………. 3 1.2 Aim and Objectives of the Study ………………………………………………………………….. 4 1.3 Justification for the Study ……………………………………………………………………………. 5 1.4 Significance of the Study …………………………………………………………………………….. 5 1.5 Definition of Terms ……………………………………………………………………………………. 5 CHAPTER 2 Literature Review ………………………………………………………………………… 8 2.1 Observational Studies …………………………………………………………………………………. 8 2.2 Case-Control Studies …………………………………………………………………………………. 9 2.2.1 Bias in Observational Case-Control Studies …………………………………………….. 9 2.2.2 Traditional Methods for Bias Reduction………………………………………………… 10 2.3 Propensity Scoring …………………………………………………………………………………… 12 2.3.1 Propensity Score Definition …………………………………………………………………. 12 2.4 The Use of Propensity Score Technique ……………………………………………………… 15 2.4.1 Matching with Propensity Scores …………………………………………………………. 15 2.4.2 Stratification (Subclassification) with Propensity Scores …………………………. 16 2.4.3 Regression (Covariance) Adjustments with Propensity Scores …………………. 19 2.5 Logistic Regression …………………………………………………………………………………. 21 2.5.1 Introduction ……………………………………………………………………………………….. 21 2.5.2 The Model …………………………………………………………………………………………. 22 2.5.3 Uses of Logistic Regression ………………………………………………………………… 23 2.6 Preterm Birth …………………………………………………………………………………………… 24
VII
2.6.1 Consequences of Preterm Births …………………………………………………………… 26 CHAPTER 3 Methodology ………………………………………………………………………………. 28 3.0 Introduction ……………………………………………………………………………………………… 28 3.1 Preliminary Statistical Analysis …………………………………………………………………. 31 3.1.1 Correlation Coefficients ………………………………………………………………………. 31 3.2 Propensity Scores …………………………………………………………………………………….. 33 3.2.1 Propensity Score Variable Identification ………………………………………………. 33 3.2.2 Variable Selection for Propensity Score Model………………………………………. 34 3.2.3 Propensity Score Modelling …………………………………………………………………. 35 3.3 Logistic Regression and Propensity Score …………………………………………………… 36 3.3.1 Elements of the Propensity Score …………………………………………………………. 36 3.4 Study Population ………………………………………………………………………………………. 38 3.5 Analysis Selection ……………………………………………………………………………………. 39 3.5.1 Matching …………………………………………………………………………………………… 40 3.5.2 Matching Metric ………………………………………………………………………………… 41 3.5.3 Matching Variable ……………………………………………………………………………… 41 3.5.4 Matching Algorithm …………………………………………………………………………… 41 3.5.5 Matching Structure …………………………………………………………………………….. 42 3.5.6 Replacement of Comparison Subjects …………………………………………………… 43 3.5.7 Nearest Neighbour Matching on Estimated Propensity Score ………………….. 43 3.6 Assessing the Matched Data ……………………………………………………………………… 44 3.6.1 The Standardized Bias ……………………………………………………………………….. 44 3.6.2 Bias Reduction ………………………………………………………………………………….. 45 3.6.3 Test on Means and Proportions ……………………………………………………………. 46 3.7 Analysis ………………………………………………………………………………………………….. 46 3.7.1 Software …………………………………………………………………………………………… 46 CHAPTER 4 Data Analysis and Interpretation ………………………………………………… 47 4.1 Correlations with Baby Status ……………………………………………………………………. 47 4.1.1 Correlation of Status with Covariates ……………………………………………………. 47 4.2 Logistic Regression on Baby‘s Status …………………………………………………………. 48 4.3 Group Comparisons ………………………………………………………………………………….. 51 CHAPTER 5 Summary, Conclusions and Recommendations ……………………………. 55
5.1 Summary …………………………………………………………………………………………………. 55
VIII
5.2 Conclusions……………………………………………………………………………………………… 56 5.3 Recommendations …………………………………………………………………………………….. 57 5.4 Contribution to Knowledge ……………………………………………………………………….. 57 5.5 Suggested Areas for Further Research ………………………………………………………… 58 REFERENCES ……………………………………………………………………………………………….. 59 Appendix I: List of Potential Matches for the 13 Cases …………………………………….. 65 Appendix II: Figures of Propensity and Logit Distributions ……………………………… 67 Appendix III: Dataset used for the Study …………………………………………………………. 71

CHAPTER ONE

INTRODUCTION In order to make group comparisons, the generally accepted pattern in research consists of the following method: Formation of treatment and experimental groups, sometimes with a single group serving as its own control. Mapping treatments to the groups. Analysing group differences. Generalising findings based on groups to tendencies among future individuals.
Defining groups is a crucial first step and once they are defined, one would want their composition to be identical. Statistical adjustments, often in the form of blocking variables, variables or covariate analysis could be used to adjust for the pre-treatment group differences. The random assignment of treatment to groups before comparison is often resorted to because, in theory, this assures that the groups are identical. This, however, is not always practical and does not necessarily result in groups that are equivalent in terms of all the important covariates. It is the expected values of the covariates over numerous replications that are equal.
A substitute to random assignment is a matched-pairs design whereby each member of the first group is matched with a member of the second group on all the factors the researcher considers to be viable and important. According to Rudner & Peyton (2006), in a well-matched pair, it is as if we are using the same individual twice. When matching is adequate, the variables used for matching that might cause confounding
2
problems are controlled. Matching on many covariates is however difficult, especially if one is trying to obtain an exact match when some of the covariates are continuous. The assignment of treatments to experimental subjects in observational studies is unselective and not arbitrary; which is why observational studies frequently provide biased estimation of treatment effects and have imbalance on covariates. In other words, treatment groups are often compared to non-randomized control groups whereby inference of causality in such studies can be biased. One can expect that treatment effects are confounded with factors that determine selection into treatment (case) and control groups. In this work, the propensity score matching procedure (PSM) is used. This procedure allows researchers to pair treatment and control individuals based on their propensity scores. It is particularly useful for ‗cases of causal inference and sample selection bias in non-experimental settings in which: (i) few units in the non-experimental comparison group are comparable to the treatment units; and (ii) selecting a subset of comparison units similar to the treatment unit is difficult because units must be compared across a high dimensional set of pre-treatment characteristics.
Propensity score matching is a refined approach to a matched-pairs design (Rosenbaum & Rubin, 1985b; Rubin, 1997; Joffe & Rosenbaum, 1999). Covariates are pooled to produce a propensity score, and individuals in the treatment group are matched to individuals in the control group based on their propensity score. By using this method, one is weighting the variables by their relative significance and matching based on best possible combination, preferably than by similarly weighted individual variables. Rubin
3
(1997) has shown that when one matches on the composite propensity score, the group means and standard deviations on the covariates will also be equivalent. 1.1 STATEMENT OF THE PROBLEM In observational studies, investigators have no control over the treatment assignment. The treated (case) and non-treated (that is, control) groups may have large differences on their observed covariates, and these differences can lead to biased estimates of treatment effects (D‘Agostino, 1998). This difficulty may to some extent be avoided if information on measured covariates is included in the design of a study (e.g. through matched sampling) or into estimation of the treatment effect (e.g. through stratification or covariance adjustment). These long-established methods of adjustment (matching, stratification and covariance analysis adjustment) may be insufficient in bias elimination as they can only use a limited number of covariates for adjustment. However, propensity scores, which provide a scalar summary of the covariate information, do not have this imperfection. In the statistical analysis of observational data, propensity score analysis is a methodology attempting to provide unbiased estimation of treatment effects (D‘Agostino, 1998).
The propensity score of an individual is formally defined as the conditional probability of being treated given his/her measured covariates. Consequentially, it is the measure of the likelihood that a person would have been treated using only their covariate scores. It can be used to stabilize the covariates in the two groups and as a result reduce the bias. Rosenbaum and Rubin (1983) showed that ―it is a balancing score and can be used in observational studies to reduce bias through the adjustment methods mentioned
4
above. In order to estimate the propensity score, one must model the distribution of the treatment indicator variable given the observed covariates. Once estimated the propensity score can be used to reduce bias through matching, stratification (subclassification), regression adjustment, or some combination of all three.‖ In this work the propensity score matching method is used to reduce bias in the comparison of a case and control group. 1.2 AIM AND OBJECTIVES OF THE STUDY The aim of the study is to reduce the impact of bias often found in observational data through the following objectives:
1. Creating covariate balance between a case and control group and hence rendering them comparable.
2. To generate a list of potential matches for each of the cases.
1.3 JUSTIFICATION FOR THE STUDY
 Comparing treatment effects on a case-control group is difficult because there are almost always baseline differences between the groups. Since observational studies are not experiments (as are randomised controlled trials) it is hard to control many external variables. Propensity-score methods are increasingly being used to reduce the impact of treatment-selection bias in the estimation of treatment effects using observational data in other countries: to date no work on propensity scores has been published in Nigeria. Moreover, though popular in cohort studies, little attention is being paid to the use of propensity scores in case-control and case-cohort studies (Mansson et al, 2007).
5
1.4 SIGNIFICANCE OF THE STUDY Possibility of identical comparison under some nonrandomized conditions. From this study, future researchers will gain theoretical and practical understanding of propensity score matching and its relevance to observational studies. They will be motivated to develop the skills to apply propensity scores to adjust for selection bias. Provision of a way to summarize covariate information about treatment selection into a scalar value. Its usefulness in adjusting for differences via study design. Our study has significant implications for researchers especially in the medical field. It would work for them during design consideration of their studies.
1.5 DEFINITION OF TERMS Bias: Any systematic error in the design or conduct of a study, that results in a mistaken estimate of a treatment‘s effect on outcome. Case-Control Study: Observational study that first identifies a group of subjects with a certain condition (treatment) and a control group without the condition, and then looks back in time (e.g. chart review) to find exposure to risk factors for the condition. This type of study is well suited for rare conditions. Cohort Study: Observational study in which subjects with an exposure of interest (e.g. hypertension) and subjects without the exposure are identified and then followed forward in time to determine outcomes (e.g. stroke). Confounding: This occurs when an investigator falsely concludes that a particular exposure is causally related to a condition without adjusting for other
6
factors that are known risk factors for the condition and are associated with the exposure. Confounding variable: A variable that is associated with both the exposure and outcome of interest that is not the variable being studied. Control group: A group of items/units without the condition of interest, or unexposed to or not treated with the agent of interest. Covariates: Often used simply as an alternative name for explanatory variables, but perhaps more specifically to refer to variables that are not of primary interest in an investigation, but are measured because it is believed that they are likely to affect the response variable and consequently need to be included in analyses and model building. A variable, such as age or gender, that is measured prior to the start of treatment, and hence is unaffected by the treatment, will be called a covariate. Cross-Sectional Study: Observational study that is done to examine presence or absence of a condition or presence or absence of an exposure at a particular time. Since exposure and outcome are ascertained at the same time, it is often unclear if the exposure preceded the outcome. Observational study: A study in which no intervention is made (in contrast with an experimental study). Such studies provide estimates and examine associations of events in their natural settings without recourse to experimental intervention. Randomized Controlled Trial: A special type of clinical trial in which assignment to an exposure is determined purely by chance. Selection Bias: Bias introduced by the way in which participants are chosen for a study. For example, in a case-control study using different criteria to select
7
cases (e.g. sick, hospitalized population) versus controls (young, healthy outpatients) other than the presence of condition can lead the investigator to a false conclusion about an exposure. Total T4: A measure of the total amount of circulating thyroxine in the blood. A high value can indicate hyperthyroidism; a low value can indicate hypothyroidism. Total T4 levels can be elevated due to pregnancy. TPO: Thyroid Peroxidase (TPO) antibodies are also known as Antithyroid Peroxidase Antibodies. These antibodies work against thyroid peroxidase, an enzyme that plays a part in the T4-to-T3 conversion and synthesis process. TPO antibodies can be evidence of tissue destruction, such as Hashimoto’s disease, less commonly, in other forms of thyroiditis such as post-partum thyroiditis. UIC: The concentration of iodine in the urine. This is included in surveillance systems for iodine deficiency; it has been accepted to be the most sensitive indicator of iodine nutrition status.
8

GET THE COMPLETE PROJECT»

Do you need help? Talk to us right now: (+234) 8111770269, 08111770269 (Call/WhatsApp). Email: [email protected]

IF YOU CAN’T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»

Disclaimer: This PDF Material Content is Developed by the copyright owner to Serve as a RESEARCH GUIDE for Students to Conduct Academic Research. You are allowed to use the original PDF Research Material Guide you will receive in the following ways: 1. As a source for additional understanding of the project topic. 2. As a source for ideas for you own academic research work (if properly referenced). 3. For PROPER paraphrasing ( see your school definition of plagiarism and acceptable paraphrase). 4. Direct citing ( if referenced properly). Thank you so much for your respect for the authors copyright. Do you need help? Talk to us right now: (+234) 8111770269, 08111770269 (Call/WhatsApp). Email: [email protected]

Purchase Detail

Hello, we’re glad you stopped by, you can download the complete project materials to this project with Abstract, Chapters 1 – 5, References and Appendix (Questionaire, Charts, etc) for N4000 ($15) only, To pay with Paypal, Bitcoin or Ethereum; please click here to chat us up via Whatsapp.
You can also call 08111770269 or +2348059541956 to place an order or use the whatsapp button below to chat us up.
Bank details are stated below.

Bank: UBA
Account No: 1021412898
Account Name: Starnet Innovations Limited

The Blazingprojects Mobile App

Download and install the Blazingprojects Mobile App from Google Play to enjoy over 50,000 project topics and materials from 73 departments, completely offline (no internet needed) with the project topics updated Monthly, click here to install.

0/5 (0 Reviews)