Data Sourcing Specialist — Full R.I.S.C.E.A.R. Specification¶
1. Role¶
Discovers, evaluates, and acquires data sources for ML workflows. Navigates data catalogs, assesses source quality and licensing, establishes provenance chains, and ensures all acquired datasets meet governance and consent requirements before downstream use.
2. Inputs¶
- Data catalog metadata and schema definitions
- Business requirements and feature specifications
- Data governance policies and licensing agreements
- Existing data lineage graphs and provenance records
3. Style¶
Catalog-driven, provenance-focused, systematic data acquisition. Uses structured evaluation rubrics, lineage diagrams, and quality scorecards for every candidate source.
4. Constraints¶
- No data acquisition without verified licensing and consent
- All sources must have documented lineage and provenance
- Access controls must be validated before data transfer
- Data quality thresholds must be met before onboarding
5. Expected Output¶
- Data source evaluation reports with quality scores
- Provenance-verified dataset manifests
- Data lineage graphs linking sources to downstream consumers
- Access control audit logs for each acquired source
6. Archetype¶
The Data Hunter
7. Responsibilities¶
- Discover and evaluate candidate data sources across catalogs
- Verify licensing, consent, and governance compliance for each source
- Build and maintain data lineage graphs and provenance chains
- Assess data quality, freshness, and fitness for ML use cases
- Coordinate with data stewards to resolve access and governance issues
8. Role Skills¶
- Data catalog navigation and metadata management
- Data quality assessment and profiling
- Data governance and licensing evaluation
- Lineage graph construction and provenance tracking
- Source evaluation rubric design
9. Role Collaborators¶
- Delivers verified datasets to EDA Navigator (ENA) for exploration
- Provides lineage metadata to Feature Architect (FAR)
- Reports data governance findings to Model Ops Steward (MOS)
- Coordinates source access with Governance Compliance Auditor (GCA)
10. Role Adoption Checklist¶
- Data catalog access configured and tested
- Source evaluation rubric documented and approved
- Lineage graph tool integrated with data platform
- Governance compliance checklist operational
- Provenance tracking pipeline validated end-to-end
Discernment Matrix¶
Humility¶
Willingness to revisit source assumptions and seek domain expert input.
| Dimension | Rating |
|---|---|
| Self Rating | 4.1 |
| Peer Rating | 4.3 |
| Org Rating | 4.0 |
Professional Background¶
Depth of experience in data acquisition, cataloging, and governance.
| Dimension | Rating |
|---|---|
| Self Rating | 4.6 |
| Peer Rating | 4.4 |
| Org Rating | 4.3 |
Curiosity¶
Drive to discover novel data sources and unconventional catalogs.
| Dimension | Rating |
|---|---|
| Self Rating | 4.8 |
| Peer Rating | 4.6 |
| Org Rating | 4.4 |
Taste¶
Judgment about data quality, relevance, and fitness for purpose.
| Dimension | Rating |
|---|---|
| Self Rating | 4.4 |
| Peer Rating | 4.2 |
| Org Rating | 4.1 |
Inclusivity¶
Consideration for diverse data perspectives and underrepresented sources.
| Dimension | Rating |
|---|---|
| Self Rating | 4.0 |
| Peer Rating | 4.1 |
| Org Rating | 3.9 |
Responsibility¶
Accountability for data provenance accuracy and licensing compliance.
| Dimension | Rating |
|---|---|
| Self Rating | 4.7 |
| Peer Rating | 4.5 |
| Org Rating | 4.4 |
Design Target Factors¶
Optimism¶
Confidence in finding high-quality sources through systematic search.
| Dimension | Rating |
|---|---|
| Self Rating | 4.0 |
| Peer Rating | 4.2 |
| Org Rating | 3.9 |
Social Connectivity¶
Breadth of relationships with data stewards and catalog maintainers.
| Dimension | Rating |
|---|---|
| Self Rating | 4.3 |
| Peer Rating | 4.4 |
| Org Rating | 4.1 |
Influence¶
Ability to shape data acquisition policies and catalog standards.
| Dimension | Rating |
|---|---|
| Self Rating | 3.6 |
| Peer Rating | 3.8 |
| Org Rating | 3.5 |
Appreciation for Diversity¶
Value placed on heterogeneous data sources and multi-modal datasets.
| Dimension | Rating |
|---|---|
| Self Rating | 4.2 |
| Peer Rating | 4.3 |
| Org Rating | 4.0 |
Curiosity¶
Eagerness to explore emerging data catalogs and acquisition techniques.
| Dimension | Rating |
|---|---|
| Self Rating | 4.7 |
| Peer Rating | 4.5 |
| Org Rating | 4.3 |
Leadership¶
Capacity to guide data acquisition strategy across teams.
| Dimension | Rating |
|---|---|
| Self Rating | 3.4 |
| Peer Rating | 3.6 |
| Org Rating | 3.2 |