Personal tools
You are here: Home Projects CodeQuest
Document Actions

CodeQuest: Source Code Querying with Datalog

Even well-structured object oriented code today is often too large to easily understand all relations and dependencies in it. Source code querying and browsing tools are designed to help such understanding.

Code queries are important for checking coding styles (such as naming conventions), fault detection (to discover bugs at development time), refactoring (to detect code smells, optimise and improve code design), metrics (for measuring the complexity of the code), aspect weaving (to identify join point shadows of interest).

The main goal of the CodeQuest project is the development of a powerful, generic and optimised software querying tool, which can then be used in those various areas of software analysis and engineering.

The CodeQuest approach

At the moment CodeQuest primarily combines two previous proposals, namely the use of logic programming and a database system. Source code is parsed and a subset of its AST and control-flow graph information is stored in a relational database. Queries expressed in a logic manner are compiled in SQL and evaluated by the database system. The database is updated incrementally as the source code changes.

Having recourse to currently well-established databases should boost the speed of the query execution over large source code and greatly decrease the amount of memory that is being used. Database systems are indeed designed for environments where queries have to be run over enormously large data with limited memory resources.

As a query language we use a logic language called Datalog, whose syntax is a subset of Prolog, but only contains the constructs that are necessary for database queries. Datalog is concise, self-explaining and expressive enough for our task. Query evaluation is ensured to terminate and does not require any extra-logical annotations. A list of CodeQuest predefined predicates can be found here.

While leveraging performance efficiency on the database query optimiser, we also define and implement our own optimisations at the level of Datalog programs. Such optimisations are especially important for recursive queries, which require an intensive computation and can sometimes become expensive if computed entirely in the database.

A querying tool based on Datalog and RDBMS should strike the right balance between expressiveness, efficiency and scalability.

Main Challenges

The primary challenges we are facing include:

  • The construction of an optimising compiler from Datalog to SQL, targeting both built-in recursive queries and a custom implementation of recursion via stored procedures.
  • Compilation of the Datalog rules to database views to allow the incremental maintenance of the computed results.
  • Aggressive optimisations, in particular to improve evaluation of expensive recursive queries such as transitive closure.
  • Merging effectively query evaluation mechanisms based on BDDs or specialised algorithms with the current architecture of CodeQuest merely based on relational database systems.
  • Improving the incremental update of the ground facts in the database, for example by adjusting the level of granularity.
  • Use of various caching techniques to eliminate unnecessary computations.
  • Full flexibility on choosing which source code information should be stored in the database and which should be inferred, by defining the appropriate logic rules.
  • Integration with JunGL as a querying mechanism to fetch or infer certain semantic properties of a software.
  • Integration with the abc compiler as a matching engine for AspectJ poincuts with a more generalised syntax.

Resources

  • A paper titled CodeQuest: Scalable Source Code Queries with Datalog is published in the proceedings of the ECOOP 2006 conference.
  • Performance benchmarks was demonstrated at the poster session of the AOSD 2006 conference.
  • An extended abstract describing the CodeQuest aims and methodology is published in the OOPSLA 2005 Conference Companion.
  • A summary of the first experimental results is presented on this poster.
  • CodeQuest was inspired by JQuery, a source code browsing tool for Eclipse.

Contacts

Elnar Hajiyev, Mathieu Verbaere and Oege de Moor.


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: