TY - GEN
T1 - A Density-Weighted Information Gain Tree for Clustering Mixed-Type Data
AU - Zhao, Yu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Clustering mixed-type data, which includes both continuous and categorical features, presents significant challenges due to the distinct nature of these data types. Many traditional distance-based and density-based methods struggle with mixed-type data because they are not designed to handle continuous and categorical features simultaneously. To address these limitations, we propose the Density-Weighted Information Gain (DWIG) Tree algorithm, which effectively manages mixed datasets by integrating continuous and categorical features through a recursive partitioning strategy. The DWIG Tree maximizes information gain while accounting for local density variations, resulting in more accurate and interpretable clustering outcomes. Experiments on both synthetic and real-world datasets demonstrate that the DWIG Tree outperforms K-Prototypes, highlighting its superior capability to handle mixed-type data and capture natural groupings more accurately.
AB - Clustering mixed-type data, which includes both continuous and categorical features, presents significant challenges due to the distinct nature of these data types. Many traditional distance-based and density-based methods struggle with mixed-type data because they are not designed to handle continuous and categorical features simultaneously. To address these limitations, we propose the Density-Weighted Information Gain (DWIG) Tree algorithm, which effectively manages mixed datasets by integrating continuous and categorical features through a recursive partitioning strategy. The DWIG Tree maximizes information gain while accounting for local density variations, resulting in more accurate and interpretable clustering outcomes. Experiments on both synthetic and real-world datasets demonstrate that the DWIG Tree outperforms K-Prototypes, highlighting its superior capability to handle mixed-type data and capture natural groupings more accurately.
KW - Clustering
KW - Density-Weighted Information Gain
KW - Mixed-type data
KW - Tree-based method
UR - http://www.scopus.com/inward/record.url?scp=86000242226&partnerID=8YFLogxK
U2 - 10.1109/DSIT61374.2024.10882131
DO - 10.1109/DSIT61374.2024.10882131
M3 - Conference contribution
AN - SCOPUS:86000242226
T3 - Proceedings - 2024 7th International Conference on Data Science and Information Technology, DSIT 2024
BT - Proceedings - 2024 7th International Conference on Data Science and Information Technology, DSIT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Data Science and Information Technology, DSIT 2024
Y2 - 20 December 2024 through 22 December 2024
ER -