Simple logistic regression - Handbook of Biological Statistics. Summary. Use simple logistic regression when you have one nominal variable and one measurement variable, and you want to know whether variation in the measurement variable causes variation in the nominal variable. When to use it. Use simple logistic regression when you have one nominal variable with two values (male/female, dead/alive, etc.) and one measurement variable. The nominal variable is the dependent variable, and the measurement variable is the independent variable. I'm separating simple logistic regression, with only one independent variable, from multiple logistic regression, which has more than one independent variable. Many people lump all logistic regression together, but I think it's useful to treat simple logistic regression separately, because it's simpler. Simple logistic regression is analogous to linear regression, except that the dependent variable is nominal, not a measurement. One goal is to see whether the probability of getting a particular value of the nominal variable is associated with the measurement variable; the other goal is to predict the probability of getting a particular value of the nominal variable, given the measurement variable. Grain size(mm)Spiders. As an example of simple logistic regression, Suzuki et al. Sand grain size is a measurement variable, and spider presence or absence is a nominal variable. Spider presence or absence is the dependent variable; if there is a relationship between the two variables, it would be sand grain size affecting spiders, not the presence of spiders affecting the sand. One goal of this study would be to determine whether there was a relationship between sand grain size and the presence or absence of the species, in hopes of understanding more about the biology of the spiders. Because this species is endangered, another goal would be to find an equation that would predict the probability of a wolf spider population surviving on a beach with a particular sand grain size, to help determine which beaches to reintroduce the spider to. This tutorial will help you set up and interpret a Logistic Regression in Excel using the XLSTAT software. Not sure this is the modeling feature you a. The Logistic Regression procedure is suitable for estimating Linear Regression models when the dependent variable is a binary (or dichotomous) variable, that is, it consists of two values such as. Logistic Regression in Excel with UNISTAT. The UNISTAT statistics add-in extends Excel with Logistic Regression capabilities. For further information visit UNISTAT 6.5 User's Guide section 7.2.6. Logistic regression, as shown in Graph B, fits the relationship between X and Y with a special S-shaped curve that is mathematically constrained to remain within the range of 0.0 to 1.0 on the Y axis. Logistic regression: theory summary, its use in MedCalc, and interpretation of results. You can also analyze data with one nominal and one measurement variable using a one- way anova or a Student's t–test, and the distinction can be subtle. One clue is that logistic regression allows you to predict the probability of the nominal variable. For example, imagine that you had measured the cholesterol level in the blood of a large number of 5. You could do a two- sample t–test, comparing the cholesterol levels of the women who did have heart attacks vs. However, if you wanted to predict the probability that a 5. For example, let's say you are studying the effect of incubation temperature on sex determination in Komodo dragons. You raise 1. 0 eggs at 3. It would be silly to compare the mean incubation temperatures between male and female hatchlings, and test the difference using an anova or t–test, because the incubation temperature does not depend on the sex of the offspring; you've set the incubation temperature, and if there is a relationship, it's that the sex of the offspring depends on the temperature. When there are multiple observations of the nominal variable for each value of the measurement variable, as in the Komodo dragon example, you'll often sees the data analyzed using linear regression, with the proportions treated as a second measurement variable. Often the proportions are arc- sine transformed, because that makes the distributions of proportions more normal. This is not horrible, but it's not strictly correct. One problem is that linear regression treats all of the proportions equally, even if they are based on much different sample sizes. If 6 out of 1. 0 Komodo dragon eggs raised at 3. Logistic regression analyzes each observation (in this example, the sex of each Komodo dragon) separately, so the 3. I'm not going to cover it here at all. Sorry. You can also do simple logistic regression with nominal variables for both the independent and dependent variables, but to be honest, I don't understand the advantage of this over a chi- squared or G–test of independence. Null hypothesis. The statistical null hypothesis is that the probability of a particular value of the nominal variable is not associated with the value of the measurement variable; in other words, the line describing the relationship between the measurement variable and the probability of the nominal variable has a slope of zero. How the test works. Simple logistic regression finds the equation that best predicts the value of the Y variable for each value of the X variable. What makes logistic regression different from linear regression is that you do not measure the Y variable directly; it is instead the probability of obtaining a particular value of a nominal variable. For the spider example, the values of the nominal variable are . This probability could take values from 0 to 1. The limited range of this probability would present problems if used directly in a regression, so the odds, Y/(1- Y), is used instead. In gambling terms, this would be expressed as . Maximum likelihood is a computer- intensive technique; the basic idea is that it finds the values of the parameters under which you would be most likely to get the observed results. For the spider example, the equation is. So if you went to a beach and wanted to predict the probability that spiders would live there, you could measure the sand grain size, plug it into the equation, and get an estimate of Y, the probability of spiders being on the beach. There are several different ways of estimating the P value. The Wald chi- square is fairly popular, but it may yield inaccurate results with small sample sizes. The likelihood ratio method may be better. It uses the difference between the probability of obtaining the observed results under the logistic model and the probability of obtaining the observed results in a model with no relationship between the independent and dependent variables. I recommend you use the likelihood- ratio method; be sure to specify which method you've used when you report your results. For the spider example, the P value using the likelihood ratio method is 0. The P value for the Wald method is 0. Assumptions. Simple logistic regression assumes that the observations are independent; in other words, that one observation does not affect another. In the Komodo dragon example, if all the eggs at 3. If you design your experiment well, you won't have a problem with this assumption. Simple logistic regression assumes that the relationship between the natural log of the odds ratio and the measurement variable is linear. You might be able to fix this with a transformation of your measurement variable, but if the relationship looks like a U or upside- down U, a transformation won't work. For example, Suzuki et al. In that case you couldn't do simple logistic regression; you'd probably want to do multiple logistic regression with an equation including both X and X2 terms, instead. Simple logistic regression does not assume that the measurement variable is normally distributed. Examples. An amphipod crustacean, Megalorchestia californiana. Mc. Donald (1. 98. Mpi) locus in the amphipod crustacean Megalorchestia californiana, which lives on sandy beaches of the Pacific coast of North America. There were two common alleles, Mpi. Mpi. 10. 0. The latitude of each collection location, the count of each of the alleles, and the proportion of the Mpi. Location. Latitude. Mpi. 90. Mpi. 10. Mpi. 10. 0Port Townsend, WA4. Neskowin, OR4. 5. Siuslaw R., OR4. 41. Umpqua R., OR4. 3. Coos Bay, OR4. 3. San Francisco, CA3. Carmel, CA3. 6. 6. Santa Barbara, CA3. Allele (Mpi. 90 or Mpi. If the biological question were . Doing a logistic regression, the result is chi. P=7. The equation of the relationship is ln(Y/(1. Solving this for Y gives. Y=e. All logistic regression equations have an S- shape, although it may not be obvious if you look over a narrow range of values. Mpi allele frequencies vs. Error bars are 9. Graphing the results. If you have multiple observations for each value of the measurement variable, as in the amphipod example above, you can plot a scattergraph with the measurement variable on the X axis and the proportions on the Y axis. You might want to put 9. There's no automatic way in spreadsheets to add the logistic regression line. Here's how I got it onto the graph of the amphipod data. First, I put the latitudes in column A and the proportions in column B. Then, using the Fill: Series command, I added numbers 3. In column C I entered the equation for the logistic regression line; in Excel format, it's=exp(- 7. A1. 0))/(1+exp(- 7. A1. 0)))for row 1. I copied this into cells C1. C2. 10. Then when I drew a graph of the numbers in columns A, B, and C, I gave the numbers in column B symbols but no line, and the numbers in column C got a line but no symbols. Central stoneroller, Campostoma anomalum. If you only have one observation of the nominal variable for each value of the measurement variable, as in the spider example, it would be silly to draw a scattergraph, as each point on the graph would be at either 0 or 1 on the Y axis. If you have lots of data points, you can divide the measurement values into intervals and plot the proportion for each interval on a bar graph. Here is data from the Maryland Biological Stream Survey on 2. Maryland streams. The measurement variable is dissolved oxygen concentration, and the nominal variable is the presence or absence of the central stoneroller, Campostoma anomalum. If you use a bar graph to illustrate a logistic regression, you should explain that the grouping was for heuristic purposes only, and the logistic regression was done on the raw, ungrouped data. Proportion of streams with central stonerollers vs. Dissolved oxygen intervals were set to have roughly equal numbers of stream sites. The thick black line is the logistic regression line; it is based on the raw data, not the data grouped into intervals.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2017
Categories |