Skip to content

ADR-003: Knowledge Graph Export

Date: 2026-03-29 Status: Accepted

Context

FCC workflows produce structured artifacts with rich metadata: which persona created the artifact, in which session, at which workflow node, with which quality gate results. To enable provenance queries, compliance auditing, and cross-project knowledge sharing, we need a knowledge representation that captures entities and their relationships as a queryable graph.

We evaluated four approaches:

  1. rdflib (Python RDF library). A full-featured RDF library that supports OWL, RDFS, SPARQL, and multiple serialization formats.
  2. NetworkX. A general-purpose graph library. Not RDF-specific but widely used in Python.
  3. Neo4j / property graphs. A graph database with its own query language (Cypher).
  4. Pure-Python serializers. Custom, lightweight serializers that output standard RDF formats (Turtle, N-Triples, JSON-LD) without depending on rdflib.

Key requirements:

  • Must represent the FCC domain model faithfully (personas, workflows, artifacts, sessions, quality gates).
  • Must support both full ontology (OWL) and lightweight taxonomy (SKOS) representations.
  • Must serialize to standard formats for interoperability with other tools and projects.
  • Must not add heavy dependencies (rdflib requires C extensions on some platforms).
  • Must be testable with the existing mock infrastructure.

Decision

We will use OWL for full ontology, SKOS for taxonomy, and pure-Python serializers for knowledge graph export.

The implementation consists of:

  1. FCCOntology -- a Python class that builds a knowledge graph from FCC runtime data (personas, traces, sessions). Internally represents triples as a list of (subject, predicate, object) tuples.
  2. Two-layer design:
  3. OWL layer: Defines the class hierarchy (Persona, Workflow, Artifact, Session, QualityGate, Constitution), object properties (createdBy, reviewedBy, satisfies, governedBy), and data properties (hasPhase, hasCategory, hasScore, hasTimestamp).
  4. SKOS layer: Defines controlled vocabularies for persona categories, FCC phases, action types, tag hierarchies, and archetype vocabularies.
  5. Pure-Python serializers: Custom serializers for Turtle (.ttl), N-Triples (.nt), and JSON-LD (.jsonld) formats. These serializers implement the subset of the RDF serialization specifications needed for FCC's ontology, without depending on rdflib.
  6. Query API: A Python query interface that supports subject/predicate/object pattern matching, provenance chain traversal, and coverage analysis.

Consequences

Positive

  • Standard formats. Turtle, N-Triples, and JSON-LD are W3C-standard formats that any RDF tool can import. The knowledge graph is interoperable with Protege, Apache Jena, Stardog, and other RDF ecosystems.
  • Two-layer flexibility. OWL handles complex reasoning (compliance, provenance inference). SKOS handles lightweight classification (persona browsing, tag navigation). Users choose the layer appropriate for their query.
  • Minimal dependencies. Pure-Python serializers avoid rdflib's C extension compilation issues. The knowledge graph module has zero additional dependencies beyond Python's standard library.
  • Testability. The mock infrastructure can generate test triples without external services. All serialization formats can be round-tripped (serialize then parse) for validation.
  • Federation-ready. Standard RDF formats with namespaced IRIs are the natural foundation for the federated knowledge graphs in ADR-005.

Negative

  • Limited serializer coverage. The pure-Python serializers handle the subset of RDF needed for FCC, not the full specification. Unusual RDF constructs (blank nodes, reification, named graphs) are not supported.
  • No SPARQL engine. The query API uses pattern matching, not SPARQL. Users who need full SPARQL must export the graph and load it into an external triple store.
  • Maintenance burden. Maintaining custom serializers means fixing format compliance issues ourselves rather than relying on rdflib's well-tested implementations.

Mitigations

  • The limited serializer coverage is documented, and users who need full RDF support can export to Turtle and load into rdflib or an external triple store.
  • The query API covers the most common use cases (provenance chain, coverage analysis, compliance checking). SPARQL is available via export.
  • Serializer compliance is verified by round-trip tests and validation against W3C test suites for the supported subset.