TY - GEN
T1 - Linux System Call Level Dynamic Analysis Towards Programming Language Translation
AU - Yoneda, Narumi
AU - Hatano, Ryo
AU - Nishiyama, Hiroyuki
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - In this study, we propose a methodology that utilizes data from system call level dynamic analysis (DA) to select better code translation candidates. For the DA data, we recorded the history of system call invocations to understand the program’s actions during execution, providing insights independent of the programming language. We implemented and released a DA system that enables fully automated analysis. Our method generates multiple translation candidates using TransCoder. We then performed DA on all the generated candidates as well as the original code. To select the optimal candidate, we compared the DA data of the original code with that of the generated candidates and calculated their similarities. We employed natural language processing techniques to normalize the sequence length of the DA data for comparison. Additionally, we explored direct comparisons of variable-length system call sequences. We found that DA data for the same code can exhibit significant variation in sequence length and that the initialization process for the modules can significantly influence the DA data. To address these issues, we developed an extended version of our DA system. We also present methods to reduce the variation in sequence length and obtain only the system call information invoked by specific lines of the program under analysis.
AB - In this study, we propose a methodology that utilizes data from system call level dynamic analysis (DA) to select better code translation candidates. For the DA data, we recorded the history of system call invocations to understand the program’s actions during execution, providing insights independent of the programming language. We implemented and released a DA system that enables fully automated analysis. Our method generates multiple translation candidates using TransCoder. We then performed DA on all the generated candidates as well as the original code. To select the optimal candidate, we compared the DA data of the original code with that of the generated candidates and calculated their similarities. We employed natural language processing techniques to normalize the sequence length of the DA data for comparison. Additionally, we explored direct comparisons of variable-length system call sequences. We found that DA data for the same code can exhibit significant variation in sequence length and that the initialization process for the modules can significantly influence the DA data. To address these issues, we developed an extended version of our DA system. We also present methods to reduce the variation in sequence length and obtain only the system call information invoked by specific lines of the program under analysis.
KW - Dynamic analysis
KW - Natural language processing
KW - Programming language translation
KW - System call sequence
UR - https://www.scopus.com/pages/publications/105008410365
U2 - 10.1007/978-3-031-87327-0_16
DO - 10.1007/978-3-031-87327-0_16
M3 - Conference contribution
AN - SCOPUS:105008410365
SN - 9783031873263
T3 - Lecture Notes in Computer Science
SP - 332
EP - 352
BT - Agents and Artificial Intelligence - 16th International Conference, ICAART 2024
A2 - Rocha, Ana Paula
A2 - Steels, Luc
A2 - van den Herik, Jaap
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th International Conference on Agents and Artificial Intelligence, ICAART 2024
Y2 - 24 February 2024 through 26 February 2024
ER -