Chapter 2: Knowledge Graphs¶

Learning Objectives¶

By the end of this chapter you will be able to:

Explain why FCC needs structured knowledge representation beyond flat artifacts.
Describe the FCC ontology design using OWL for full ontology and SKOS for taxonomy.
Construct knowledge graph triples from FCC personas, workflows, and artifacts.
Serialize knowledge graphs using pure-Python serializers (no external RDF libraries required).
Query the knowledge graph to answer questions about persona relationships and artifact provenance.

The ER diagram below shows the KnowledgeGraph schema — nodes and edges typed by enums — that underpins FCC's ontology, artifact provenance, and federated queries.

erDiagram
    KnowledgeGraph ||--o{ KnowledgeNode : contains
    KnowledgeGraph ||--o{ KnowledgeEdge : contains
    KnowledgeNode {
        string id PK
        string label
        NodeType node_type
        dict properties
    }
    KnowledgeEdge {
        string source FK
        string target FK
        EdgeType edge_type
        float weight
    }
    KnowledgeNode }|--|| NodeType : has_type
    KnowledgeEdge }|--|| EdgeType : has_type
    NodeType {
        enum PERSONA
        enum WORKFLOW
        enum ARTIFACT
        enum CONCEPT
        enum PROJECT
    }
    EdgeType {
        enum COLLABORATES
        enum PRODUCES
        enum CONSUMES
        enum BELONGS_TO
        enum RELATED
    }

The closed set of NodeType and EdgeType values keeps the graph tractable for serialization and reasoning; custom taxonomies plug in via OWL extensions rather than by widening the core enum.

Beyond Flat Artifacts¶

Chapter 1 introduced semantic search -- the ability to find artifacts by meaning. But search is only half the picture. Search answers "which artifacts are relevant?" It does not answer "how do these artifacts relate to each other?" or "what is the provenance chain from requirement to deliverable?"

Knowledge graphs fill this gap. A knowledge graph represents entities (personas, artifacts, sessions, quality gates) and relationships (created-by, reviewed-by, depends-on, satisfies) as a network of typed, directed edges. This structure enables queries that flat artifact storage cannot support:

"Which persona created the artifact that failed the security gate?"
"What is the complete provenance chain for this deliverable?"
"Which requirements are not covered by any Create-phase artifact?"
"How many feedback cycles did it take to pass the governance review?"

These queries are essential for audit, compliance, and continuous improvement.

Ontology Design¶

The FCC knowledge graph uses a two-layer ontology design (see ADR-003: Knowledge Graph Export):

Layer 1: OWL for Full Ontology¶

The Web Ontology Language (OWL) defines the complete class hierarchy, property definitions, and logical constraints for the FCC domain. The FCCOntology includes:

Classes: - fcc:Persona -- an agent identity with R.I.S.C.E.A.R. specification - fcc:Workflow -- a directed graph of persona activations - fcc:WorkflowNode -- a single activation point in a workflow - fcc:Artifact -- any output produced by a persona - fcc:Session -- a collaboration session - fcc:QualityGate -- a named quality checkpoint - fcc:Constitution -- a set of governance rules - fcc:Scenario -- a problem definition with configuration

Properties (Object): - fcc:createdBy -- links Artifact to Persona - fcc:reviewedBy -- links Artifact to Persona (critique phase) - fcc:hasNode -- links Workflow to WorkflowNode - fcc:activatesPersona -- links WorkflowNode to Persona - fcc:producedIn -- links Artifact to Session - fcc:satisfies -- links Artifact to QualityGate - fcc:governedBy -- links Persona to Constitution

Properties (Data): - fcc:hasPhase -- the FCC phase (find/create/critique) - fcc:hasCategory -- the persona category - fcc:hasScore -- a numeric quality score - fcc:hasTimestamp -- an ISO 8601 timestamp

Layer 2: SKOS for Taxonomy¶

The Simple Knowledge Organization System (SKOS) defines the classification vocabularies used by FCC: persona categories, action types, FCC phases, tag hierarchies, and archetype vocabularies. SKOS is lighter than OWL and is specifically designed for taxonomies and controlled vocabularies.

fcc:PersonaCategory
  ├── fcc:Core
  ├── fcc:Integration
  ├── fcc:Governance
  ├── fcc:Stakeholder
  ├── fcc:Champion
  ├── fcc:DataEngineering
  ├── fcc:MLLifecycle
  ├── fcc:MLModels
  ├── fcc:DevOps
  └── fcc:AppDevelopment

fcc:FCCPhase
  ├── fcc:Find
  ├── fcc:Create
  └── fcc:Critique

fcc:ActionType
  ├── fcc:Scaffold
  ├── fcc:Refactor
  ├── fcc:Debug
  ├── fcc:Test
  ├── fcc:Compare
  └── fcc:Document

Why Two Layers?¶

OWL is expressive but heavyweight. SKOS is lightweight but limited to taxonomies. The two-layer design gives FCC the best of both worlds:

Use OWL when you need logical inference ("if an artifact was created by a persona governed by constitution C, and C prohibits personal data, then the artifact must not contain personal data").
Use SKOS when you need simple classification and navigation ("show me all personas in the DataEngineering category").

Most queries use SKOS. OWL is reserved for compliance reasoning and advanced provenance analysis.

Constructing the Knowledge Graph¶

The FCCOntology class builds the knowledge graph from FCC's runtime data:

from fcc.knowledge.ontology import FCCOntology
from fcc.personas.registry import PersonaRegistry

registry = PersonaRegistry()
registry.load_all()

ontology = FCCOntology()

# Add all personas
for persona in registry.all():
    ontology.add_persona(persona)

# Add a simulation trace
ontology.add_trace(simulation_result.trace)

# Add collaboration session
ontology.add_session(session)

Each add_* method generates the appropriate triples. For a persona, this includes:

fcc:research_analyst rdf:type fcc:Persona .
fcc:research_analyst fcc:hasPhase fcc:Find .
fcc:research_analyst fcc:hasCategory fcc:Core .
fcc:research_analyst fcc:governedBy fcc:constitution_core .

For a trace step:

fcc:artifact_001 rdf:type fcc:Artifact .
fcc:artifact_001 fcc:createdBy fcc:software_architect .
fcc:artifact_001 fcc:producedIn fcc:session_42 .
fcc:artifact_001 fcc:satisfies fcc:test_coverage_minimum .
fcc:artifact_001 fcc:hasScore "0.95"^^xsd:decimal .

Pure-Python Serialization¶

A key design decision is that FCC's knowledge graph uses pure-Python serializers -- no dependency on rdflib or other external RDF libraries (see ADR-003). This keeps the dependency footprint minimal and avoids C-extension compilation issues.

The serializers support three formats:

Turtle (TTL)¶

Human-readable format, good for debugging and documentation:

ttl = ontology.serialize(format="turtle")
print(ttl)
# @prefix fcc: <https://fcc.example.org/ontology/> .
# fcc:research_analyst a fcc:Persona ;
#     fcc:hasPhase fcc:Find ;
#     fcc:hasCategory fcc:Core .

N-Triples¶

Line-oriented format, good for streaming and batch processing:

nt = ontology.serialize(format="ntriples")

JSON-LD¶

JSON-compatible format, good for web APIs and interoperability:

jsonld = ontology.serialize(format="jsonld")

Querying the Knowledge Graph¶

The ontology supports SPARQL-like queries through a Python API:

# Find all artifacts created by a specific persona
artifacts = ontology.query(
    subject_type="Artifact",
    predicate="createdBy",
    object_value="software_architect",
)

# Find the provenance chain for an artifact
chain = ontology.provenance_chain("artifact_001")
for step in chain:
    print(f"{step.artifact} <- {step.persona} in {step.session}")

# Find uncovered requirements
uncovered = ontology.uncovered_requirements(scenario)
for req in uncovered:
    print(f"Requirement {req.id} has no satisfying artifact")

Graph Traversal¶

For complex queries, traverse the graph directly:

# Starting from a failed quality gate, find the responsible persona
gate_node = ontology.get_node("gate:test_coverage_minimum")
failing_artifacts = gate_node.incoming("satisfies", passed=False)
for artifact in failing_artifacts:
    creator = artifact.get("createdBy")
    print(f"Artifact {artifact.id} failed gate, created by {creator}")

Integration with CONSTEL¶

When the FCC knowledge graph is shared across the ecosystem, CONSTEL indexes the graph's metadata and makes it queryable by other projects. The integration point is the serialized output:

# Export the knowledge graph for CONSTEL
ontology.export("output/fcc_knowledge_graph.ttl", format="turtle")

# CONSTEL indexes the export
# Other projects can now query FCC's knowledge graph

This cross-project integration is covered in depth in Chapter 4 (Federated Knowledge).

Key Takeaways¶

Knowledge graphs represent entities and relationships, enabling queries that flat search cannot support.
FCC uses a two-layer ontology: OWL for full semantics, SKOS for lightweight taxonomy.
The FCCOntology class constructs triples from personas, traces, and sessions.
Pure-Python serializers (Turtle, N-Triples, JSON-LD) avoid external RDF library dependencies.
Graph queries support provenance analysis, coverage checking, and compliance reasoning.
CONSTEL integration enables cross-project knowledge sharing.

Cross-References¶

Chapter 3: RAG Pipelines -- knowledge graph as retrieval source
Chapter 4: Federated Knowledge -- cross-project knowledge graphs
ADR-003: Knowledge Graph Export -- design rationale
FCC Guidebook, Chapter 17 -- federation reference
See Notebook 15 for hands-on knowledge graph construction and querying

← Chapter 1: Semantic Search | Next: Chapter 3 -- RAG Pipelines →