Open Science Publication¶
Duration: 60 minutes Difficulty: Intermediate Pattern: Sequential Chain
This scenario demonstrates a FAIR data curation workflow leading to open access publication, using open science personas to ensure research data management best practices.
Scenario Overview¶
Problem: A research team has produced a dataset and analysis results that need to be curated for FAIR compliance, documented with proper citations, and published through an open access repository.
Goal: Execute a four-persona open science workflow that produces FAIR-compliant data packages, proper citations, and an open access publication plan.
Persona Team¶
| Persona | ID | Role | Category |
|---|---|---|---|
| FAIR Data Steward | FDS | Ensures FAIR (Findable, Accessible, Interoperable, Reusable) compliance | open_science |
| Research Software Narrator | RSN | Documents research software and computational methods | open_science |
| Citation Standards Liaison | CSL | Manages citation standards and data references | open_science |
| Open Access Advocate | OAA | Promotes open access and advises on licensing | open_science |
Setup¶
from fcc.personas.registry import PersonaRegistry
from fcc.simulation.engine import SimulationEngine
from fcc.simulation.messages import SimulationMessage
from fcc.messaging.bus import EventBus
from fcc.messaging.events import Event, EventType
registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
bus = EventBus()
engine = SimulationEngine(registry=registry, mode="deterministic")
research_project = {
"title": "Semantic Analysis of Agent Collaboration Patterns",
"dataset": "agent_interaction_traces_2025.csv",
"software": "fcc-analysis-toolkit v1.2.0",
"authors": ["A. Researcher", "B. Scientist"],
"institution": "Information Collective, LLC",
}
Phase 1: FAIR Compliance Assessment¶
The FAIR Data Steward audits the research outputs for FAIR compliance:
fds_message = SimulationMessage(
sender="orchestrator",
receiver="FDS",
content=(
f"Assess FAIR compliance for the research project:\n"
f"Title: {research_project['title']}\n"
f"Dataset: {research_project['dataset']}\n"
f"Software: {research_project['software']}\n\n"
"Evaluate each FAIR principle:\n"
"- Findable: persistent identifiers, rich metadata, indexed\n"
"- Accessible: retrievable by ID, open protocol, metadata persist\n"
"- Interoperable: formal language, FAIR vocabularies, references\n"
"- Reusable: rich description, clear license, provenance, standards\n\n"
"Produce a FAIR assessment report with scores (0-1) per principle "
"and specific remediation steps for any gaps."
),
phase="find",
)
fair_report = engine.step(fds_message)
print(f"FAIR Report: {len(fair_report.content)} chars")
Phase 2: Research Software Documentation¶
The Research Software Narrator documents computational methods:
rsn_message = SimulationMessage(
sender="FDS",
receiver="RSN",
content=(
f"Document the research software and computational methods:\n\n"
f"Software: {research_project['software']}\n"
f"FAIR context:\n{fair_report.content[:400]}\n\n"
"Produce:\n"
"- Software metadata (CodeMeta/CITATION.cff format)\n"
"- Computational environment specification (dependencies, versions)\n"
"- Reproducibility guide (steps to recreate results)\n"
"- Data processing pipeline documentation\n"
"- Input/output format specifications\n"
"Follow FORCE11 Software Citation Principles."
),
phase="create",
)
software_docs = engine.step(rsn_message)
print(f"Software Documentation: {len(software_docs.content)} chars")
Phase 3: Citation Management¶
The Citation Standards Liaison establishes proper citations:
csl_message = SimulationMessage(
sender="RSN",
receiver="CSL",
content=(
f"Establish citation standards for the publication:\n\n"
f"Project: {research_project['title']}\n"
f"Authors: {', '.join(research_project['authors'])}\n"
f"Software docs:\n{software_docs.content[:400]}\n\n"
"Produce:\n"
"- Dataset citation (DataCite format)\n"
"- Software citation (CITATION.cff)\n"
"- BibTeX entries for all cited works\n"
"- ORCID integration for authors\n"
"- DOI reservation strategy\n"
"Ensure compliance with DataCite Metadata Schema 4.4 "
"and FORCE11 Data Citation Principles."
),
phase="create",
)
citation_package = engine.step(csl_message)
print(f"Citation Package: {len(citation_package.content)} chars")
Phase 4: Open Access Publication Plan¶
The Open Access Advocate creates the publication strategy:
oaa_message = SimulationMessage(
sender="CSL",
receiver="OAA",
content=(
f"Create an open access publication plan:\n\n"
f"Project: {research_project['title']}\n"
f"Institution: {research_project['institution']}\n"
f"FAIR report:\n{fair_report.content[:300]}\n"
f"Citations:\n{citation_package.content[:300]}\n\n"
"Produce:\n"
"- Repository selection (Zenodo, Figshare, institutional)\n"
"- License recommendation (CC-BY, CC0, MIT for software)\n"
"- Embargo and pre-print strategy\n"
"- Metadata record template\n"
"- Dissemination plan (preprint servers, social media, conferences)\n"
"- Long-term preservation strategy\n"
"Align with Plan S and institutional OA policies."
),
phase="create",
)
publication_plan = engine.step(oaa_message)
print(f"Publication Plan: {len(publication_plan.content)} chars")
Research Data Management Plan¶
Compile all outputs into a data management plan:
from fcc.collaboration.scoring import ScoringEngine
scorer = ScoringEngine()
scores = {
"fair_compliance": scorer.score_text(fair_report.content),
"software_docs": scorer.score_text(software_docs.content),
"citations": scorer.score_text(citation_package.content),
"publication_plan": scorer.score_text(publication_plan.content),
}
overall = sum(scores.values()) / len(scores)
print("\nResearch Data Management Plan")
print("=" * 40)
print(f"Project: {research_project['title']}")
print(f"\nQuality Scores:")
for area, score in scores.items():
print(f" {area}: {score:.2f}")
print(f" Overall: {overall:.2f}")
import json
rdm_plan = {
"project": research_project,
"workflow": ["FDS", "RSN", "CSL", "OAA"],
"scores": scores,
"overall_score": overall,
"artifacts": {
"fair_report": len(fair_report.content),
"software_docs": len(software_docs.content),
"citation_package": len(citation_package.content),
"publication_plan": len(publication_plan.content),
},
}
print(f"\n{json.dumps(rdm_plan, indent=2)}")
Exercises¶
- Knowledge graph: Build a knowledge graph linking the research project, dataset, software, and publication repository as nodes.
- FAIR scoring: Implement a detailed FAIR scoring rubric with sub-scores for each of the 15 FAIR sub-principles.
- Multi-dataset: Extend the workflow to handle multiple datasets with different FAIR compliance levels.
- Community review: Add a feedback loop where the Open Access Advocate reviews the FAIR report and suggests improvements before publication.
Summary¶
In this scenario you executed an open science publication workflow:
- FDS assessed FAIR compliance across all four principles
- RSN documented research software with reproducibility guides
- CSL established citations in DataCite and CITATION.cff formats
- OAA created an open access publication plan with licensing and dissemination
- All outputs were compiled into a research data management plan
Next Steps¶
- Cross-Project Federation -- Share open science outputs across projects
- Docs-as-Code Pipeline -- Documentation automation