TY - JOUR
T1 - A goodness-of-fit measure for logistic regression under separation
AU - Kotani, Naoki
AU - Kurosawa, Takeshi
AU - Eshima, Nobuoki
N1 - Publisher Copyright:
© 2024 Taylor & Francis Group, LLC.
PY - 2024
Y1 - 2024
N2 - Logistic regression models have a severe problem called separation. The maximum likelihood estimator does not exist in logistic regression models for data structures under separation. Under separation, the forcibly estimated maximum likelihood estimate may have an extremely large value. Separation often occurs when the size of dataset is small. Consequently, goodness-of-fit measures based on the likelihood ratio and those based on covariance functions using the maximum likelihood estimate indicate that the model is excessively good regardless of the cause of the separation. The Firth and exact logistic regression methods are valid estimation methods for separation problems. Therefore, we propose methods to reasonably evaluate the goodness-of-fit measures of statistical models under separation with dataset of a small sample size with the abovementioned methods. The goodness-of-fit measures based on covariance functions which are a generalization of the multiple correlation coefficient, referred to as the regression correlation coefficient and the entropy coefficient of determination are then used combined with the abovementioned methods for the separation data. In addition, we conducted a data analysis using the definition of the non separation ratio based on the regression depth.
AB - Logistic regression models have a severe problem called separation. The maximum likelihood estimator does not exist in logistic regression models for data structures under separation. Under separation, the forcibly estimated maximum likelihood estimate may have an extremely large value. Separation often occurs when the size of dataset is small. Consequently, goodness-of-fit measures based on the likelihood ratio and those based on covariance functions using the maximum likelihood estimate indicate that the model is excessively good regardless of the cause of the separation. The Firth and exact logistic regression methods are valid estimation methods for separation problems. Therefore, we propose methods to reasonably evaluate the goodness-of-fit measures of statistical models under separation with dataset of a small sample size with the abovementioned methods. The goodness-of-fit measures based on covariance functions which are a generalization of the multiple correlation coefficient, referred to as the regression correlation coefficient and the entropy coefficient of determination are then used combined with the abovementioned methods for the separation data. In addition, we conducted a data analysis using the definition of the non separation ratio based on the regression depth.
KW - entropy coefficient of determination
KW - Goodness-of-fit measures
KW - logistic regression models
KW - regression correlation coefficient
KW - regression depth
KW - separation problem
UR - https://www.scopus.com/pages/publications/85209678836
U2 - 10.1080/03610926.2024.2413845
DO - 10.1080/03610926.2024.2413845
M3 - Article
AN - SCOPUS:85209678836
SN - 0361-0926
JO - Communications in Statistics - Theory and Methods
JF - Communications in Statistics - Theory and Methods
ER -