Once the manual analyzer is more or less working, we can consider developing a machine learning-based tool. This might be something like an encoder-decoder network.
Ideally, the ML tool would be able to understand a wide range of languages with minimal changes (unlike the manual tool, which needs a lexer for each language). We should also evaluate its robustness to different source code transformations, its efficiency, etc. The most ideal model would be able to understand an arbitrary language without training: the user just submits text (assembly, Chinese, whatever) and the model figures out how to lex it on the spot.
Once the manual analyzer is more or less working, we can consider developing a machine learning-based tool. This might be something like an encoder-decoder network.
Ideally, the ML tool would be able to understand a wide range of languages with minimal changes (unlike the manual tool, which needs a lexer for each language). We should also evaluate its robustness to different source code transformations, its efficiency, etc. The most ideal model would be able to understand an arbitrary language without training: the user just submits text (assembly, Chinese, whatever) and the model figures out how to lex it on the spot.