6/30/2025
These rules aim to ensure high - quality LLVM code. It covers coding standards, language usage, source formatting, and performance. Adhere to LLVM's style, use modern C++ and Python, manage memory efficiently, and follow best practices in testing, security, and tooling. Avoid common pitfalls and contribute to a consistent codebase.
## Languages, Libraries, and Standards
- Use modern, standard-conforming, and portable C++ code (C++17 or later, as supported by major toolchains).
- Prefer LLVM's own libraries (e.g., `llvm::DenseMap`, `llvm::SmallVector`) over the C++ standard library when appropriate for performance or integration benefits. Consult the LLVM Programmer's Manual for details.
- Use Python (with the minimum version specified in the Getting Started documentation) for automation, build systems, and utility scripts.
- Format Python code with `black` and `darker` to adhere to PEP 8 standards.
## Mechanical Source Issues
- Write clear diagnostic messages using succinct and correct English.
- Follow the recommended `#include` style: Main module header first, followed by LLVM project headers (most specific to least specific), then system headers.
- Limit source code width to 80 columns.
- Prefer spaces to tabs in source files.
- Avoid trailing whitespace.
- Format multi-line lambdas like blocks of code.
- Use braced initializer lists as if the braces were parentheses in a function call.
## Language and Compiler Issues
- Treat compiler warnings as errors.
- Write portable code, encapsulating non-portable code behind well-defined interfaces.
- Do not use RTTI or exceptions.
- Prefer C++-style casts (`static_cast`, `reinterpret_cast`, `const_cast`) over C-style casts, except when casting to `void` to suppress unused variable warnings or between integral types.
- Do not use static constructors (global variables with constructors or destructors).
- Use `struct` when all members are public, and `class` otherwise.
- Avoid using braced initializer lists to call constructors with non-trivial logic.
- Use `auto` type deduction to make code more readable, but be mindful of unnecessary copies.
- Beware of non-determinism due to the ordering of pointers in unordered containers.
- Beware of non-deterministic sorting order of equal elements; use `llvm::sort` instead of `std::sort`.
## Style Issues
- Layer libraries correctly to avoid circular dependencies. Use Unix linker principles to enforce this.
- `#include` as little as possible, especially in header files. Use forward declarations when possible.
- Use namespace qualifiers to implement previously declared functions in source files; avoid opening namespace blocks.
- Use early exits and `continue` statements to simplify code and reduce indentation.
- Don't use `else` or `else if` after control flow interrupting statements like `return`, `break`, `continue`, or `goto`.
- Turn predicate loops into predicate functions.
## The Low-Level Issues
- Name types, functions, variables, and enumerators properly, using descriptive and consistent names.
- Assert liberally to check preconditions and assumptions. Include informative error messages in assertions. Use `llvm_unreachable` to mark code that should never be reached. Avoid `assert(false)`. When assertions produce “unused value” warnings, move the call into the assert or cast the value to void.
- Do not use `using namespace std`. Explicitly qualify identifiers from the standard namespace.
- Don't use default labels in fully covered switches over enumerations to catch new enumeration values. Use `llvm_unreachable` after the switch to suppress GCC warnings about reaching the end of a non-void function.
- Use range-based for loops wherever possible.
- Don’t evaluate `end()` every time through a loop; evaluate it once before the loop starts.
- `#include <iostream>` is forbidden. Use `raw_ostream` instead.
- Avoid `std::endl`; use `\n` instead.
- Don’t use `inline` when defining a function in a class definition; it's implicit.
## Microscopic Details
- Put a space before an open parenthesis only in control flow statements, but not in normal function call expressions and function-like macros.
- Prefer preincrement (`++X`) over postincrement (`X++`).
- Don't indent namespaces; add comments indicating which namespace is being closed when helpful.
## Code Organization and Structure
- **Directory Structure:** Follow the existing LLVM directory structure conventions. New components should typically reside in a subdirectory within an existing high-level category (e.g., `lib/Transforms`, `include/llvm/Analysis`). Consider the intended audience (end-user tool vs. internal library) when choosing a location.
- **File Naming:** Use descriptive names for files, matching the class or functionality they contain. Header files should have a `.h` or `.hpp` extension, and implementation files should have a `.cpp`, `.cc`, or `.c` extension (depending on whether they are C or C++). MLIR files should have `.mlir` extension.
- **Module Organization:** Break down large components into smaller, more manageable modules. Each module should have a well-defined purpose and interface. Minimize dependencies between modules.
- **Component Architecture:** Design components with clear separation of concerns. Use interfaces and abstract classes to promote loose coupling and enable polymorphism. Follow the principle of least privilege.
- **Code Splitting:** Decompose large functions into smaller, well-named helper functions. Use namespaces to group related functions and classes.
## Common Patterns and Anti-patterns
- **Visitor Pattern:** LLVM frequently uses the Visitor pattern for traversing and manipulating data structures (e.g., the LLVM IR). Implement visitor classes for custom analysis or transformations.
- **Pass Infrastructure:** Leverage LLVM's pass infrastructure for implementing optimization and analysis passes. Create new pass types (e.g., analysis pass, transformation pass) as needed.
- **Factory Pattern:** Use the Factory pattern to create instances of classes based on runtime parameters or configuration settings.
- **Singleton Pattern (Avoid):** Minimize the use of the Singleton pattern, as it can lead to tight coupling and make testing difficult. Consider dependency injection instead.
- **Global Variables (Avoid):** Avoid global variables whenever possible, as they can introduce unexpected side effects and make code harder to reason about. Use configuration objects or dependency injection instead.
## Performance Considerations
- **Inline Hints:** Use `inline` judiciously to guide the compiler to inline frequently called functions. Avoid inlining large or complex functions.
- **Profile-Guided Optimization (PGO):** Use PGO to optimize code based on runtime profiles. This can significantly improve performance for frequently executed code paths.
- **LTO (Link-Time Optimization):** Enable LTO to allow the compiler to optimize code across module boundaries.
- **Memory Management:** Use LLVM's memory management facilities (e.g., `BumpPtrAllocator`, `ScopedHashTable`) for efficient memory allocation. Avoid excessive memory allocation and deallocation.
- **Avoid Unnecessary Copies:** Pass objects by reference or pointer to avoid unnecessary copying.
- **Use `SmallVector`:** Use `llvm::SmallVector` for small, fixed-size vectors to avoid dynamic memory allocation.
- **Cache Locality:** Consider cache locality when designing data structures and algorithms.
## Security Best Practices
- **Input Validation:** Validate all external inputs to prevent vulnerabilities such as buffer overflows and code injection. Use LLVM's API for parsing various file formats.
- **Sanitizers:** Use AddressSanitizer (ASan), MemorySanitizer (MSan), and UndefinedBehaviorSanitizer (UBSan) during development and testing to detect memory errors and undefined behavior.
- **Avoid Code Injection:** Do not construct code by concatenating strings or using other techniques that could lead to code injection vulnerabilities.
- **Data Validation:** Make sure to validate any data being retrieved from memory or other storage locations. Use `llvm::isSafeToLoadUnconditionally` to check if data is safe to load.
## Testing Approaches
- **Unit Tests:** Write unit tests for individual components and functions. Use the LLVM testing framework (e.g., `lit`) to automate test execution.
- **Integration Tests:** Write integration tests to verify the interaction between different components. Create realistic test cases that exercise common use scenarios.
- **Regression Tests:** Add regression tests for any bug fixes to prevent regressions in future versions.
- **Fuzzing:** Use fuzzing techniques to identify corner cases and potential vulnerabilities. Integrate fuzzing into the CI/CD pipeline.
- **Code Coverage:** Measure code coverage to ensure that tests exercise all important code paths. Use `llvm-cov` to generate code coverage reports.
## Common Pitfalls and Gotchas
- **Incorrect Use of APIs:** Carefully read the documentation for LLVM APIs and ensure they are used correctly. Pay attention to preconditions and postconditions.
- **Memory Leaks:** Ensure that all allocated memory is properly deallocated. Use memory leak detection tools to identify memory leaks.
- **Dangling Pointers:** Avoid dangling pointers by ensuring that objects are not deleted while they are still being referenced.
- **Undefined Behavior:** Avoid undefined behavior, as it can lead to unpredictable results. Use sanitizers to detect undefined behavior.
- **Version Compatibility:** Be aware of version compatibility issues when using LLVM with other libraries or tools. Test your code with different versions of LLVM.
## Tooling and Environment
- **Clang:** Use Clang as the primary compiler for LLVM projects. Clang provides excellent support for C++ standards and LLVM extensions.
- **CMake:** Use CMake to manage the build process for LLVM projects. CMake is a cross-platform build system generator that is well-supported by LLVM.
- **LLVM Tools:** Utilize the LLVM tools (e.g., `llvm-as`, `llvm-dis`, `opt`, `lli`) for various tasks such as assembling and disassembling LLVM IR, optimizing code, and executing LLVM bitcode.
- **Git:** Use Git for version control. Follow the LLVM Git commit message conventions.
- **CI/CD:** Integrate CI/CD pipelines into LLVM development. Use test suites as part of your development cycle.
## Additional Notes
- When extending or modifying existing code, adhere to the style already in use.
- Document your code clearly and concisely. Provide comments explaining the purpose of functions, classes, and variables.
- Keep pull requests small and focused. This makes it easier for reviewers to understand and approve your changes.
- Be responsive to feedback from reviewers and address any issues promptly.
- Engage with the LLVM community to ask questions, share knowledge, and contribute to the project.
- Refer to existing documentation and examples in the LLVM codebase for guidance on how to implement new features and functionality.
By following these guidelines, you can write high-quality LLVM code that is efficient, maintainable, and robust. This ensures consistent code quality and facilitates collaboration within the LLVM community.