Zum Hauptinhalt springen

Guideline for Analyzing Competency Questions

This guideline outlines the process for working with Competency Questions.

1- Pre-processing Phase

In this phase, we establish the foundation for handling Competency Questions, ensuring data privacy, clarity, and organization. This ensures effective analysis and use of the questions.

1-1- Anonymizing and Preserving Privacy

To maintain data privacy and confidentiality, we anonymize personal identifiers of each stakeholder. This involves replacing actual names with labels such as 'User 1' or 'Stakeholder 1.' However, we will keep a separate record of the original identities for potential future reference.

1-2- Data Protection Regulations

Following data protection regulations, we include a statement in the header of each pad associated with each Competency Question’s stakeholder. This informs participants that their information will be processed anonymously and emphasizes compliance with regulations such as GDPR. If necessary, we can provide a data privacy statement for participants to acknowledge and agree upon.

1-3- Assigning Unique IDs

For effective tracking and management, each Competency Question is assigned a unique ID. These IDs, such as CQ1, CQ2, etc., facilitate organization and future reference. All the gathered CQs are combined into one file to make it easier to find and use them.

1-4- Masking Sensitive Data

Where applicable, we mask or alter specific sensitive details within CQs, such as numbers, dates, or locations, to ensure the main meaning and purpose of the questions remain intact.

1-5- Ensuring CQ Understandability

To ensure the questions are clear and understandable for our working group, we rephrase domain-specific terms and abbreviations and make necessary adjustments. This step enhances clarity and readability, aligning the questions with the group's expertise.

1-6- Quality Assurance of Initial CQs

Before proceeding, we review the collected Competency Questions to ensure they are correctly framed and aligned with the project's objectives.

1-7- Documenting Modifications

As part of maintaining transparency and accountability, we maintain a detailed log of all modifications made during the anonymization and masking processes. This documentation acts as a point of reference and helps maintain the accuracy of the CQs.



2- Organizing and Prioritizing Phase

In this phase, we focus on organizing and prioritizing Competency Questions.

2-1- Checking for Overlap or Redundancy

We examine the collected CQs to identify any redundancy or overlap among them.

2-2- Prioritizing Based on Relevance and Importance

We prioritize the CQs based on their relevance to the project's goals and their overall importance.

2-3- Create Consolidated Questions

Write new questions based on the old ones but splitting, unifying, ... them according to the overall understanding of information needs.

2-4- Analyzing Gaps and Coverage

Our analysis will focus on whether the collected CQs cover a comprehensive range of topics or if there are significant gaps in coverage. If we choose an iterative process for gathering and analyzing CQs, addressing this step may not be straightforward, especially with respect to missing stakeholder responses (e.g. due to them not being available or not being approached so far).

2-5- Documenting the Process

We ensure that all processes conducted in this phase are documented.

2-6- Defining Required Categories

We decide on the necessary categories, considering options such as categories discussed during the second openDVA Congress or categories based on the GerPS ontology. Additionally, we explore grouping CQs based on their similarity and identifying common topics among stakeholders. If possible, subcategories will be defined within each category.



3- Reflection and Usage of CQs in KG

In this phase, we reflect on and utilize the Competency Questions in the Knowledge Graph context (Linking CQs with Knowledge Graph).

3-1- Identifying Classes and Overall Hierarchy

From the CQs, identify relevant classes and an initial hierarchy.

3-2- Identifying Data Sources for Reuse within the KG

Identify terminologies, ontologies, and other data sources to be reused in the final graph. These are based on the important topics derived from the CQs. (Will likely make us revise some of the decisions made in 3-1.)

3-3- Identifying Key Attributes and Properties

Our focus will be on identifying the necessary attributes or properties within each class of the ontology/KG that hold an important role in addressing the CQs.

3-4- Preparing for Integration into KG

We detail the process of integrating the identified attributes and properties of CQs into the Knowledge Graph, ensuring effective linkage and representation.



4- Ontology Development and Knowledge Graph Modeling Phase

In this phase, we use the insights gained from the Competency Questions to design and build a Knowledge Graph (KG) that effectively represents the domain's interests.

4-1- Defining Classes and Relationships

We identify the entities (classes) and relationships that will form the foundation of our Knowledge Graph. We map these elements to the categories and concepts derived from the Competency Questions. Whereever possible, existing vocabularies, terminologies, and ontologies will be reused.

4-2- Mapping Attribute and Property

We integrate the attributes and properties we identified in the Competency Questions to the corresponding entities in the KG. This ensures alignment between the collected information and the KG's structure.

4-3- Ontology Population

We populate our KG with real instances to enrich its content.

4-4- KG Validation and Testing

We validate and test the Knowledge Graph based on a set of quality criteria to verify that it accurately reflects the insights obtained from the Competency Questions. Example answers to the CQs will be provided as A-Box instances. Accordingly SPARQL queries will be created for each CQ to verfiy that the KG can answer the questions posed to it.

4-5- Iterative Refinement

We consider an iterative approach to refine the KG based on feedback and further analysis.