Multiple logistic regression

Statistics: Logistic regression

Logistic regression is used when the outcome variable is binary, and the input variables can be either binary or continuous.

In the simplest case when there is one input variable which is binary, then it gives the same result as a chi-squared test.

The outcome variable is binary, with either an event (eg death, or cure) or no event (eg survival or not cured).

Points:

  • The coefficients bi are the log odds ratios of an event for an increase in one unit of Xi. Thus if Xi is binary they are the log odds ratio for X=1 relative to X=0.
  • The model is usually fitted using maximum likelihood
  • When the probability of an event is rare, the odds ratios approximate the relative risk of an event.

Example of Logistic regression

Lavie et al. (BMJ, 2000) surveyed 2677 adults referred to a sleep clinic with suspected sleep apnoea. They developed an apnoea severity index, and related this to the presence or absence of hypertension.

They wished to answer two questions:

i.Is the apnoea index predictive of hypertension, allowing for age, sex and body mass index
ii.Is sex a predictor of hypertension, allowing for the other covariates?

The results are given in Table 1.

Table 1 Risk factors for hypertension

The coefficient associated with the dummy variable Sex is 0.161, so the odds of having hypertension for a man are exp(0.161)=1.17 times that of a woman in this study. On the odds ratio scale the 95% confidence interval is exp(-0.061) to exp(0.383)=0.94 to 1.47. Note that this includes one (as we would expect since the confidence interval for the regression coefficient includes zero) and so we cannot say that sex is a significant predictor of hypertension in this study. We interpret the age coefficient by saying that, if we had two people of the same sex, and given that their BMI and apnoea index were also the same, but one subject was 10 years older than the other, then we would predict that the older subject would be 2.24 times more likely to have hypertension. The reason for the choice of 10 years is because that is how age was scaled. Note that factors that are additive on the log scale are multiplicative on the odds scale. Thus a man who is ten years older than a woman is predicted to be 2.24×1.17=2.62 times more likely to have hypertension. Thus the model assumes that age and sex act independently on hypertension, and so the risks multiply.

Reference

  • Campbell MJ Statistics at Square Two. 2nd Ed Blackwell BMJ Books, 2006
  • © MJ Campbell 2006