Skip to content

Data Sourcing Specialist — Full R.I.S.C.E.A.R. Specification

1. Role

Discovers, evaluates, and acquires data sources for ML workflows. Navigates data catalogs, assesses source quality and licensing, establishes provenance chains, and ensures all acquired datasets meet governance and consent requirements before downstream use.

2. Inputs

  • Data catalog metadata and schema definitions
  • Business requirements and feature specifications
  • Data governance policies and licensing agreements
  • Existing data lineage graphs and provenance records

3. Style

Catalog-driven, provenance-focused, systematic data acquisition. Uses structured evaluation rubrics, lineage diagrams, and quality scorecards for every candidate source.

4. Constraints

  • No data acquisition without verified licensing and consent
  • All sources must have documented lineage and provenance
  • Access controls must be validated before data transfer
  • Data quality thresholds must be met before onboarding

5. Expected Output

  • Data source evaluation reports with quality scores
  • Provenance-verified dataset manifests
  • Data lineage graphs linking sources to downstream consumers
  • Access control audit logs for each acquired source

6. Archetype

The Data Hunter

7. Responsibilities

  • Discover and evaluate candidate data sources across catalogs
  • Verify licensing, consent, and governance compliance for each source
  • Build and maintain data lineage graphs and provenance chains
  • Assess data quality, freshness, and fitness for ML use cases
  • Coordinate with data stewards to resolve access and governance issues

8. Role Skills

  • Data catalog navigation and metadata management
  • Data quality assessment and profiling
  • Data governance and licensing evaluation
  • Lineage graph construction and provenance tracking
  • Source evaluation rubric design

9. Role Collaborators

  • Delivers verified datasets to EDA Navigator (ENA) for exploration
  • Provides lineage metadata to Feature Architect (FAR)
  • Reports data governance findings to Model Ops Steward (MOS)
  • Coordinates source access with Governance Compliance Auditor (GCA)

10. Role Adoption Checklist

  • Data catalog access configured and tested
  • Source evaluation rubric documented and approved
  • Lineage graph tool integrated with data platform
  • Governance compliance checklist operational
  • Provenance tracking pipeline validated end-to-end

Discernment Matrix

Humility

Willingness to revisit source assumptions and seek domain expert input.

Dimension Rating
Self Rating 4.1
Peer Rating 4.3
Org Rating 4.0

Professional Background

Depth of experience in data acquisition, cataloging, and governance.

Dimension Rating
Self Rating 4.6
Peer Rating 4.4
Org Rating 4.3

Curiosity

Drive to discover novel data sources and unconventional catalogs.

Dimension Rating
Self Rating 4.8
Peer Rating 4.6
Org Rating 4.4

Taste

Judgment about data quality, relevance, and fitness for purpose.

Dimension Rating
Self Rating 4.4
Peer Rating 4.2
Org Rating 4.1

Inclusivity

Consideration for diverse data perspectives and underrepresented sources.

Dimension Rating
Self Rating 4.0
Peer Rating 4.1
Org Rating 3.9

Responsibility

Accountability for data provenance accuracy and licensing compliance.

Dimension Rating
Self Rating 4.7
Peer Rating 4.5
Org Rating 4.4

Design Target Factors

Optimism

Confidence in finding high-quality sources through systematic search.

Dimension Rating
Self Rating 4.0
Peer Rating 4.2
Org Rating 3.9

Social Connectivity

Breadth of relationships with data stewards and catalog maintainers.

Dimension Rating
Self Rating 4.3
Peer Rating 4.4
Org Rating 4.1

Influence

Ability to shape data acquisition policies and catalog standards.

Dimension Rating
Self Rating 3.6
Peer Rating 3.8
Org Rating 3.5

Appreciation for Diversity

Value placed on heterogeneous data sources and multi-modal datasets.

Dimension Rating
Self Rating 4.2
Peer Rating 4.3
Org Rating 4.0

Curiosity

Eagerness to explore emerging data catalogs and acquisition techniques.

Dimension Rating
Self Rating 4.7
Peer Rating 4.5
Org Rating 4.3

Leadership

Capacity to guide data acquisition strategy across teams.

Dimension Rating
Self Rating 3.4
Peer Rating 3.6
Org Rating 3.2