Skip to content

dgopstein/clj-cdt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clj-cdt

A Clojure library for parsing C/C++ code

Contains facilities for:

  • Parsing C/C++ with Eclipse CDT library
  • Writing ASTs back to source
  • Identifying Operator Expressions
  • Modifying ASTs (incomplete)

This project was created through my work which is described in our paper at the Mining Software Repositories 2018 conference: Prevalence of Confusing Code in Software Projects - Atoms of Confusion in the Wild.

Namespaces

All of the interesting functions in this project are located in the following files in the src/clj_cdt/ directory:

  • clj-cdt - Generic tools for parsing C/C++ and reading/analyzing ASTs
  • writer-util - Serialize ASTs to source code, or tree view, or debugging output
  • expr-operator - CDT groups all unary/binary/ternary operators together (respectively). This module is a simple interface for explore which operators are inside those expressions.
  • modify-ast - incomplete Provides functions to update the AST in memory.

Using the framework to parse code

The first thing you might want to do, is parse some C code. There are three main functions for doing this, parse-file, parse-source and parse-frag. Both functions take a String as an argument, and return an IASTNode. parse-file and parse-source both require whole programs, the former accepting a filename as its argument and the latter a string containing the full code. parse-frag on the other hand can take any (read "many") partial program. For example:

(parse-file "gcc/testsuite/c-c++-common/wdate-time.c")  ;; => CPPASTTranslationUnit
(parse-source "int main() { 1 + 1; }")                  ;; => CPPASTTranslationUnit
(parse-frag "1 + 1")                                    ;; => CPPASTBinaryExpression

After you've parsed some code, you might reasonably want to see what it looks like:

(->> "gcc/testsuite/c-c++-common/wdate-time.c"
      parse-file
      (get-in-tree [2])
      print-tree)

Which should output:

[]  <SimpleDeclaration>                                      {:line 6, :off 238, :len 39}
[0]  <SimpleDeclSpecifier>                                   {:line 6, :off 238, :len 10}
[1]  <ArrayDeclarator>                                       {:line 6, :off 249, :len 27}
[1 0]  <Name>                                                {:line 6, :off 249, :len 9}
[1 1]  <ArrayModifier>                                       {:line 6, :off 258, :len 2}
[1 2]  <EqualsInitializer>                                   {:line 6, :off 261, :len 15}
[1 2 0]  <IdExpression>                                      {:line 6, :off 263, :len 13}
[1 2 0 0]  <Name>                                            {:line 6, :off 263, :len 13}

Some other useful functions are:

print-tree     -> Prints a debug view of the tree structure of an AST plus metadata
write-tree     -> Takes an AST and returns the code that generated it (inverse parsing)
get-in-tree    -> Digs down into an AST to get at nested children

About

A Clojure library for parsing C/C++ code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published