Skip to content

Open Science Publication

Duration: 60 minutes Difficulty: Intermediate Pattern: Sequential Chain

This scenario demonstrates a FAIR data curation workflow leading to open access publication, using open science personas to ensure research data management best practices.

Scenario Overview

Problem: A research team has produced a dataset and analysis results that need to be curated for FAIR compliance, documented with proper citations, and published through an open access repository.

Goal: Execute a four-persona open science workflow that produces FAIR-compliant data packages, proper citations, and an open access publication plan.

Persona Team

Persona ID Role Category
FAIR Data Steward FDS Ensures FAIR (Findable, Accessible, Interoperable, Reusable) compliance open_science
Research Software Narrator RSN Documents research software and computational methods open_science
Citation Standards Liaison CSL Manages citation standards and data references open_science
Open Access Advocate OAA Promotes open access and advises on licensing open_science

Setup

from fcc.personas.registry import PersonaRegistry
from fcc.simulation.engine import SimulationEngine
from fcc.simulation.messages import SimulationMessage
from fcc.messaging.bus import EventBus
from fcc.messaging.events import Event, EventType

registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
bus = EventBus()
engine = SimulationEngine(registry=registry, mode="deterministic")

research_project = {
    "title": "Semantic Analysis of Agent Collaboration Patterns",
    "dataset": "agent_interaction_traces_2025.csv",
    "software": "fcc-analysis-toolkit v1.2.0",
    "authors": ["A. Researcher", "B. Scientist"],
    "institution": "Information Collective, LLC",
}

Phase 1: FAIR Compliance Assessment

The FAIR Data Steward audits the research outputs for FAIR compliance:

fds_message = SimulationMessage(
    sender="orchestrator",
    receiver="FDS",
    content=(
        f"Assess FAIR compliance for the research project:\n"
        f"Title: {research_project['title']}\n"
        f"Dataset: {research_project['dataset']}\n"
        f"Software: {research_project['software']}\n\n"
        "Evaluate each FAIR principle:\n"
        "- Findable: persistent identifiers, rich metadata, indexed\n"
        "- Accessible: retrievable by ID, open protocol, metadata persist\n"
        "- Interoperable: formal language, FAIR vocabularies, references\n"
        "- Reusable: rich description, clear license, provenance, standards\n\n"
        "Produce a FAIR assessment report with scores (0-1) per principle "
        "and specific remediation steps for any gaps."
    ),
    phase="find",
)

fair_report = engine.step(fds_message)
print(f"FAIR Report: {len(fair_report.content)} chars")

Phase 2: Research Software Documentation

The Research Software Narrator documents computational methods:

rsn_message = SimulationMessage(
    sender="FDS",
    receiver="RSN",
    content=(
        f"Document the research software and computational methods:\n\n"
        f"Software: {research_project['software']}\n"
        f"FAIR context:\n{fair_report.content[:400]}\n\n"
        "Produce:\n"
        "- Software metadata (CodeMeta/CITATION.cff format)\n"
        "- Computational environment specification (dependencies, versions)\n"
        "- Reproducibility guide (steps to recreate results)\n"
        "- Data processing pipeline documentation\n"
        "- Input/output format specifications\n"
        "Follow FORCE11 Software Citation Principles."
    ),
    phase="create",
)

software_docs = engine.step(rsn_message)
print(f"Software Documentation: {len(software_docs.content)} chars")

Phase 3: Citation Management

The Citation Standards Liaison establishes proper citations:

csl_message = SimulationMessage(
    sender="RSN",
    receiver="CSL",
    content=(
        f"Establish citation standards for the publication:\n\n"
        f"Project: {research_project['title']}\n"
        f"Authors: {', '.join(research_project['authors'])}\n"
        f"Software docs:\n{software_docs.content[:400]}\n\n"
        "Produce:\n"
        "- Dataset citation (DataCite format)\n"
        "- Software citation (CITATION.cff)\n"
        "- BibTeX entries for all cited works\n"
        "- ORCID integration for authors\n"
        "- DOI reservation strategy\n"
        "Ensure compliance with DataCite Metadata Schema 4.4 "
        "and FORCE11 Data Citation Principles."
    ),
    phase="create",
)

citation_package = engine.step(csl_message)
print(f"Citation Package: {len(citation_package.content)} chars")

Phase 4: Open Access Publication Plan

The Open Access Advocate creates the publication strategy:

oaa_message = SimulationMessage(
    sender="CSL",
    receiver="OAA",
    content=(
        f"Create an open access publication plan:\n\n"
        f"Project: {research_project['title']}\n"
        f"Institution: {research_project['institution']}\n"
        f"FAIR report:\n{fair_report.content[:300]}\n"
        f"Citations:\n{citation_package.content[:300]}\n\n"
        "Produce:\n"
        "- Repository selection (Zenodo, Figshare, institutional)\n"
        "- License recommendation (CC-BY, CC0, MIT for software)\n"
        "- Embargo and pre-print strategy\n"
        "- Metadata record template\n"
        "- Dissemination plan (preprint servers, social media, conferences)\n"
        "- Long-term preservation strategy\n"
        "Align with Plan S and institutional OA policies."
    ),
    phase="create",
)

publication_plan = engine.step(oaa_message)
print(f"Publication Plan: {len(publication_plan.content)} chars")

Research Data Management Plan

Compile all outputs into a data management plan:

from fcc.collaboration.scoring import ScoringEngine

scorer = ScoringEngine()

scores = {
    "fair_compliance": scorer.score_text(fair_report.content),
    "software_docs": scorer.score_text(software_docs.content),
    "citations": scorer.score_text(citation_package.content),
    "publication_plan": scorer.score_text(publication_plan.content),
}

overall = sum(scores.values()) / len(scores)

print("\nResearch Data Management Plan")
print("=" * 40)
print(f"Project: {research_project['title']}")
print(f"\nQuality Scores:")
for area, score in scores.items():
    print(f"  {area}: {score:.2f}")
print(f"  Overall: {overall:.2f}")

import json
rdm_plan = {
    "project": research_project,
    "workflow": ["FDS", "RSN", "CSL", "OAA"],
    "scores": scores,
    "overall_score": overall,
    "artifacts": {
        "fair_report": len(fair_report.content),
        "software_docs": len(software_docs.content),
        "citation_package": len(citation_package.content),
        "publication_plan": len(publication_plan.content),
    },
}
print(f"\n{json.dumps(rdm_plan, indent=2)}")

Exercises

  1. Knowledge graph: Build a knowledge graph linking the research project, dataset, software, and publication repository as nodes.
  2. FAIR scoring: Implement a detailed FAIR scoring rubric with sub-scores for each of the 15 FAIR sub-principles.
  3. Multi-dataset: Extend the workflow to handle multiple datasets with different FAIR compliance levels.
  4. Community review: Add a feedback loop where the Open Access Advocate reviews the FAIR report and suggests improvements before publication.

Summary

In this scenario you executed an open science publication workflow:

  • FDS assessed FAIR compliance across all four principles
  • RSN documented research software with reproducibility guides
  • CSL established citations in DataCite and CITATION.cff formats
  • OAA created an open access publication plan with licensing and dissemination
  • All outputs were compiled into a research data management plan

Next Steps