Open Projects
This page lists several projects that would boost analyzer's usability and power. Most of the projects listed here are infrastructure-related so this list is an addition to the potential checkers list. If you are interested in tackling one of these, please send an email to the cfe-dev mailing list to notify other members of the community.
- Core Analyzer Infrastructure
- Explicitly model standard library functions with BodyFarm.
BodyFarm allows the analyzer to explicitly model functions whose definitions are not available during analysis. Modeling more of the widely used functions (such as the members of std::string) will improve precision of the analysis. (Difficulty: Easy, ongoing)
- Handle floating-point values.
Currently, the analyzer treats all floating-point values as unknown. However, we already have most of the infrastructure we need to handle floats: RangeConstraintManager. This would involve adding a new SVal kind for constant floats, generalizing the constraint manager to handle floats and integers equally, and auditing existing code to make sure it doesn't make untoward assumptions. (Difficulty: Medium)
- Implement generalized loop execution modeling.
Currently, the analyzer simply unrolls each loop N times. This means that it will not execute any code after the loop if the loop is guaranteed to execute more than N times. This results in lost basic block coverage. We could continue exploring the path if we could model a generic i-th iteration of a loop. (Difficulty: Hard)
- Enhance CFG to model C++ temporaries properly.
There is an existing implementation of this, but it's not complete and is disabled in the analyzer. (Difficulty: Medium; current contact: Alex McCarthy)
- Enhance CFG to model exception-handling properly.
Currently exceptions are treated as "black holes", and exception-handling control structures are poorly modeled (to be conservative). This could be much improved for both C++ and Objective-C exceptions. (Difficulty: Medium)
- Enhance CFG to model C++
new
more precisely.The current representation of
new
does not provide an easy way for the analyzer to model the call to a memory allocation function (operator new
), then initialize the result with a constructor call. The problem is discussed at length in PR12014. (Difficulty: Easy; current contact: Karthik Bhat) - Enhance CFG to model C++
delete
more precisely.Similarly, the representation of
delete
does not include the call to the destructor, followed by the call to the deallocation function (operator delete
). One particular issue (noreturn destructors) is discussed in PR15599 (Difficulty: Easy; current contact: Karthik Bhat) - Implement a BitwiseConstraintManager to handle PR3098.
Constraints on the bits of an integer are not easily representable as ranges. A bitwise constraint manager would model constraints such as "bit 32 is known to be 1". This would help code that made use of bitmasks. (Difficulty: Medium)
- Track type info through casts more precisely.
The DynamicTypePropagation checker is in charge of inferring a region's dynamic type based on what operations the code is performing. Casts are a rich source of type information that the analyzer currently ignores. They are tricky to get right, but might have very useful consequences. (Difficulty: Medium)
- Design and implement alpha-renaming.
Implement unifying two symbolic values along a path after they are determined to be equal via comparison. This would allow us to reduce the number of false positives and would be a building step to more advanced analyses, such as summary-based interprocedural and cross-translation-unit analysis. (Difficulty: Hard)
- Explicitly model standard library functions with BodyFarm.
- Bug Reporting
- Add support for displaying cross-file diagnostic paths in HTML output
(used by scan-build).
Currently scan-build output does not display reports that span multiple files. The main problem is that we do not have a good format to display such paths in HTML output. (Difficulty: Medium)
- Refactor path diagnostic generation in BugReporter.cpp.
It would be great to have more code reuse between "Minimal" and "Extensive" PathDiagnostic generation algorithms. One idea is to create an IR for representing path diagnostics, which would be later be used to generate minimal or extensive report output. (Difficulty: Medium)
- Add support for displaying cross-file diagnostic paths in HTML output
(used by scan-build).
- Other Infrastructure
- Rewrite scan-build (in Python).
(Difficulty: Easy)
- Do a better job interposing on a compilation.
Currently, scan-build just sets the CC and CXX environment variables to its wrapper scripts, which then call into an underlying platform compiler. This is problematic for any project that doesn't exclusively use CC and CXX to control its compilers.
(Difficulty: Medium-Hard)
- Create an analyzer_annotate attribute for the analyzer
annotations.
We would like to put all analyzer attributes behind a fence so that we could add/remove them without worrying that compiler (not analyzer) users depend on them. Design and implement such a generic analyzer attribute in the compiler. (Difficulty: Medium)
- Rewrite scan-build (in Python).
- Enhanced Checks
- Implement a production-ready StreamChecker.
A SimpleStreamChecker has been presented in the Building a Checker in 24 Hours talk (slides video). We need to implement a production version of the checker with richer set of APIs and evaluate it by running on real codebases. (Difficulty: Easy)
- Extend Malloc checker with reasoning about custom allocator,
deallocator, and ownership-transfer functions.
This would require extending the MallocPessimistic checker to reason about annotated functions. It is strongly desired that one would rely on the analyzer_annotate attribute, as described above. (Difficulty: Easy)
- Implement a BitwiseMaskingChecker to handle PR16615.
Symbolic expressions of the form
$sym & CONSTANT
can range from 0 toCONSTANT-
1 if CONSTANT is2^n-1
, e.g. 0xFF (0b11111111), 0x7F (0b01111111), 0x3 (0b0011), 0xFFFF, etc. Even without handling general bitwise operations on symbols, we can at least bound the value of the resulting expression. Bonus points for handling masks followed by shifts, e.g.($sym & 0b1100) >> 2
. (Difficulty: Easy) - Implement iterators invalidation checker.
(Difficulty: Easy)
- Write checkers which catch Copy and Paste errors.
Take a look at the CP-Miner paper for inspiration. (Difficulty: Medium-Hard; current contacts: Daniel Marjamäki and Daniel Fahlgren)
- Implement a production-ready StreamChecker.