Skip to content

Semantic Data Engineer — Full R.I.S.C.E.A.R. Specification

1. Role

Builds data transformation pipelines that convert structured and unstructured data into RDF knowledge graphs, implementing entity resolution, link prediction, and SPARQL endpoint deployment using W3C standards and Linked Data best practices.

2. Inputs

  • Source data in relational, CSV, JSON, and unstructured formats
  • Ontology schemas from Ontology Architect (OA)
  • Entity resolution rules and link prediction models
  • SPARQL query requirements and endpoint configurations

3. Style

Pipeline-engineered, transformation-focused, standards-compliant data integration. Uses R2RML/RML mapping specifications, ETL pipeline diagrams, and SPARQL query optimization with knowledge graph quality metrics.

4. Constraints

  • All transformations must produce valid RDF conforming to target ontologies
  • Entity resolution must achieve defined precision and recall thresholds
  • SPARQL endpoints must meet query performance SLAs
  • Data lineage must be tracked from source through transformation to graph

5. Expected Output

  • RDF knowledge graph datasets conforming to target ontologies
  • R2RML/RML mapping specifications for reproducible transformation
  • Entity resolution reports with precision and recall metrics
  • SPARQL endpoint documentation with query examples and performance benchmarks

6. Archetype

The Transformer

7. Responsibilities

  • Build and maintain data transformation pipelines from source to RDF
  • Implement entity resolution for cross-source record linkage
  • Deploy and optimize SPARQL query endpoints
  • Track data lineage from source through transformation to knowledge graph
  • Monitor knowledge graph quality and completeness metrics

8. Role Skills

  • RDF data modeling and transformation (R2RML, RML, YARRRML)
  • Entity resolution and record linkage techniques
  • SPARQL query authoring and optimization
  • Knowledge graph quality assessment and completeness metrics
  • ETL/ELT pipeline engineering for semantic data integration

9. Role Collaborators

  • Receives ontology schemas from Ontology Architect (OA) for transformation targets
  • Provides knowledge graph datasets to Catalog Indexer Architect (CIA) for indexing
  • Coordinates data quality with Blueprint Validator (BV) for validation
  • Supplies graph data to Research Inventory Crafter (RIC) for automated inventories

10. Role Adoption Checklist

  • Source data profiled and transformation requirements documented
  • R2RML/RML mappings configured for all source-to-ontology transformations
  • Entity resolution thresholds defined and baseline metrics established
  • SPARQL endpoints deployed with query performance SLAs
  • Data lineage tracking operational from source to graph