A Density-Weighted Information Gain Tree for Clustering Mixed-Type Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Clustering mixed-type data, which includes both continuous and categorical features, presents significant challenges due to the distinct nature of these data types. Many traditional distance-based and density-based methods struggle with mixed-type data because they are not designed to handle continuous and categorical features simultaneously. To address these limitations, we propose the Density-Weighted Information Gain (DWIG) Tree algorithm, which effectively manages mixed datasets by integrating continuous and categorical features through a recursive partitioning strategy. The DWIG Tree maximizes information gain while accounting for local density variations, resulting in more accurate and interpretable clustering outcomes. Experiments on both synthetic and real-world datasets demonstrate that the DWIG Tree outperforms K-Prototypes, highlighting its superior capability to handle mixed-type data and capture natural groupings more accurately.

Original languageEnglish
Title of host publicationProceedings - 2024 7th International Conference on Data Science and Information Technology, DSIT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350384093
DOIs
Publication statusPublished - 2024
Event7th International Conference on Data Science and Information Technology, DSIT 2024 - Nanjing, China
Duration: 20 Dec 202422 Dec 2024

Publication series

NameProceedings - 2024 7th International Conference on Data Science and Information Technology, DSIT 2024

Conference

Conference7th International Conference on Data Science and Information Technology, DSIT 2024
Country/TerritoryChina
CityNanjing
Period20/12/2422/12/24

Keywords

  • Clustering
  • Density-Weighted Information Gain
  • Mixed-type data
  • Tree-based method

Fingerprint

Dive into the research topics of 'A Density-Weighted Information Gain Tree for Clustering Mixed-Type Data'. Together they form a unique fingerprint.

Cite this