Skip to content

Chapter 7: Docs from Code

Learning Objectives

By the end of this chapter you will be able to:

  1. Explain why API documentation should be generated from code rather than written manually.
  2. Use Python's ast module to extract API signatures, docstrings, and type annotations.
  3. Build a CodeAnalyzer that maps FCC's source tree into a structured API model.
  4. Generate Markdown API documentation from the structured model.
  5. Integrate documentation generation into CI/CD for automatic freshness checks.

The figure below shows the docs-from-code pipeline: AST parsing feeds a CodeAnalyzer that produces module, class, and function info, which a reference generator renders through Jinja2 into Markdown or HTML, with an incremental path driven by git changes.

flowchart LR
    PY[Source .py Files] --> AST["ast.parse()"]
    AST --> CA[CodeAnalyzer]
    CA --> MI[ModuleInfo]
    CA --> CI[ClassInfo]
    CA --> FI[FunctionInfo]
    MI --> ARG[APIReferenceGenerator]
    CI --> ARG
    FI --> ARG
    ARG --> J2[Jinja2 Templates]
    J2 --> OUT[Markdown / HTML Docs]

    GIT[Git Changes] --> INC{Incremental?}
    INC -->|Changed files only| CA
    INC -->|No changes| SKIP[Skip Regeneration]

    style CA fill:#2196F3,color:#fff
    style OUT fill:#4CAF50,color:#fff

Because the CodeAnalyzer never imports the target code, documentation builds remain reliable when optional dependencies are absent — a non-trivial property for a framework with as many optional extras as FCC.

The Documentation Freshness Problem

Documentation that is written separately from code has a half-life. The moment it is written, it starts drifting from the actual implementation. A function's signature changes, a new parameter is added, a class is renamed -- and the documentation is wrong. In a framework like FCC with 6,179 tests and 99%+ coverage, the code is highly reliable. The documentation should derive its reliability from the code, not from a human's memory of the last change.

The solution is documentation intelligence: generating API documentation directly from the source code, using the same techniques that tools like Sphinx and pdoc use, but tailored to FCC's specific documentation needs and style.

The AST Approach

FCC uses Python's built-in ast (Abstract Syntax Tree) module for code analysis (see ADR-002: Docs from Code AST). The AST approach has several advantages over alternative strategies:

Why AST Over Runtime Inspection?

Runtime inspection (inspect module) requires importing every module, which means: - All dependencies must be installed (including optional ones). - Import side effects execute. - Circular imports can cause failures.

AST analysis works on source files directly, without importing. It reads the .py file, parses it into a syntax tree, and extracts information structurally. No imports, no side effects, no dependency requirements.

Why AST Over Regex?

Regular expressions can extract simple patterns (function names, class names), but they break on: - Multi-line signatures - Nested structures - Complex type annotations - Decorators

The AST handles all of these correctly because it understands Python syntax, not just text patterns.

Why Not Sphinx?

Sphinx is excellent for standalone documentation projects, but FCC needs: - Programmatic access to the API model (for embedding in knowledge graphs). - Custom output formats beyond RST/HTML (Markdown, JSON-LD, YAML). - Integration with the FCC template system (Jinja2). - Selective generation (generate docs for one persona's module, not the entire codebase).

The AST-based CodeAnalyzer provides these capabilities as a library, not a standalone tool.

The CodeAnalyzer

The CodeAnalyzer is the core class for extracting API information from FCC source files:

from fcc.docs.code_analyzer import CodeAnalyzer

analyzer = CodeAnalyzer()

# Analyze a single module
module_info = analyzer.analyze_file("src/fcc/personas/registry.py")

print(f"Module: {module_info.name}")
print(f"Docstring: {module_info.docstring[:100]}...")
print(f"Classes: {len(module_info.classes)}")
print(f"Functions: {len(module_info.functions)}")

for cls in module_info.classes:
    print(f"\n  Class: {cls.name}")
    print(f"  Bases: {cls.bases}")
    print(f"  Methods: {len(cls.methods)}")
    for method in cls.methods:
        print(f"    {method.name}({', '.join(method.parameters)})")
        print(f"    -> {method.return_type}")

What the Analyzer Extracts

For each Python source file, the analyzer produces:

Module Level: - Module docstring - Module-level constants and their types - __all__ exports list - Import statements (for dependency tracking)

Class Level: - Class name, bases, and metaclass - Class docstring - Decorators (including @dataclass, @frozen, @traced) - Class attributes with type annotations - Methods with full signatures

Function/Method Level: - Name and decorators - Parameters with types and defaults - Return type annotation - Docstring (parsed into summary, parameters, returns, raises sections) - Whether the function is a property, classmethod, or staticmethod

Type Annotations: - Simple types (str, int, bool) - Generic types (list[str], dict[str, int], Optional[PersonaSpec]) - Union types (str | None) - Protocol references

Analyzing the Full Source Tree

# Analyze the entire FCC package
api_model = analyzer.analyze_package("src/fcc/")

print(f"Total modules: {len(api_model.modules)}")
print(f"Total classes: {sum(len(m.classes) for m in api_model.modules)}")
print(f"Total functions: {sum(len(m.functions) for m in api_model.modules)}")

# Find all public APIs
public_apis = api_model.public_apis()
for api in public_apis:
    print(f"  {api.qualified_name}: {api.kind}")

Generating Documentation

The API model feeds into Jinja2 templates to produce documentation:

from fcc.docs.api_generator import APIDocGenerator

generator = APIDocGenerator(
    api_model=api_model,
    template_dir="src/fcc/templates/docs/",
    output_dir="generated_docs/api/",
)

# Generate docs for the entire package
generator.generate_all()

# Generate docs for a specific module
generator.generate_module("fcc.personas.registry")

Generated Output Structure

generated_docs/api/
├── index.md                    # Package overview with module listing
├── fcc/
│   ├── personas/
│   │   ├── index.md            # Subpackage overview
│   │   ├── registry.md         # PersonaRegistry API docs
│   │   ├── models.md           # PersonaSpec, RISCEARSpec docs
│   │   ├── dimensions.md       # DimensionAttribute, DimensionRegistry docs
│   │   └── cross_reference.md  # CrossReferenceMatrix docs
│   ├── workflow/
│   │   ├── actions.md          # WorkflowAction, ActionRegistry docs
│   │   └── action_engine.md    # ActionEngine docs
│   ├── simulation/
│   │   └── engine.md           # SimulationEngine docs
│   └── ...

Template Customization

The default templates produce Markdown suitable for MkDocs or GitHub Pages. You can customize the templates:

# Custom template for a class
generator.set_template("class", "my_templates/class.md.j2")

A class template receives the ClassInfo object and can access all extracted information:

# {{ cls.name }}

{{ cls.docstring }}

{% if cls.bases %}
**Inherits from:** {{ cls.bases | join(", ") }}
{% endif %}

## Constructor

```python
{{ cls.name }}({{ cls.constructor.signature }})

{{ cls.constructor.docstring }}

Methods

{% for method in cls.methods if not method.name.startswith("_") %}

{{ method.name }}

{{ method.signature }}

{{ method.docstring }}

{% endfor %}

## Cross-Reference Generation

The API documentation generator produces cross-references that link API docs to:

- **Source code:** Each documented element includes a link to its source file and line number.
- **Tests:** If a test file mirrors the source structure, a link to the corresponding tests is included.
- **Guidebook chapters:** Based on module-to-chapter mapping, cross-references to relevant guidebook chapters are included.
- **Notebooks:** If a notebook exercises the documented module, a link is included.

```python
generator.set_cross_references({
    "fcc.personas.registry": {
        "guidebook": "guidebook/ch03_riscear_specification.md",
        "notebook": "notebooks/02_persona_explorer.ipynb",
    },
    "fcc.workflow.action_engine": {
        "guidebook": "guidebook/ch05_workflow_system.md",
        "notebook": "notebooks/04_action_engine.ipynb",
    },
})

CI/CD Integration

Integrate documentation generation into your CI pipeline to catch documentation drift:

- name: Generate API Docs
  run: |
    python -c "
    from fcc.docs.code_analyzer import CodeAnalyzer
    from fcc.docs.api_generator import APIDocGenerator

    analyzer = CodeAnalyzer()
    api_model = analyzer.analyze_package('src/fcc/')
    generator = APIDocGenerator(api_model=api_model, output_dir='generated_docs/api/')
    generator.generate_all()

    # Verify no undocumented public APIs
    undocumented = [api for api in api_model.public_apis() if not api.docstring]
    if undocumented:
        for api in undocumented:
            print(f'UNDOCUMENTED: {api.qualified_name}')
        exit(1)
    print('All public APIs are documented')
    "

This step: 1. Parses the entire source tree. 2. Generates documentation. 3. Checks that every public API has a docstring. 4. Fails the build if any public API is undocumented.

Feeding Documentation into the Knowledge Graph

Generated API documentation can be indexed in the knowledge graph (Chapter 2) for semantic search:

for module in api_model.modules:
    ontology.add_triple(
        subject=f"fcc:module:{module.name}",
        predicate="fcc:hasDocumentation",
        object=f"fcc:doc:{module.name}",
    )
    for cls in module.classes:
        ontology.add_triple(
            subject=f"fcc:class:{cls.qualified_name}",
            predicate="fcc:definedIn",
            object=f"fcc:module:{module.name}",
        )

This enables queries like "which module defines the PersonaRegistry class?" or "which classes have methods that return ActionResult?"

Key Takeaways

  • AST-based analysis extracts API information without importing modules, avoiding side effects and dependencies.
  • The CodeAnalyzer produces a structured API model covering modules, classes, functions, and type annotations.
  • Jinja2 templates transform the API model into Markdown documentation.
  • Cross-references link API docs to source code, tests, guidebook chapters, and notebooks.
  • CI/CD integration catches undocumented public APIs and documentation drift.
  • API documentation feeds into the knowledge graph for semantic search and provenance queries.

Cross-References


← Chapter 6: Cross-Project Orchestration | Next: Chapter 8 -- Scaling to Enterprise →