Skip to content

Affiliations

Joana Maia edited this page Jun 5, 2025 · 2 revisions

🔗 Affiliations

Principles behind affiliation

Affiliation links individuals to organizations they represent while contributing to a project. It shows that their activities were done as part of their role in the organization, not as independent contributors.

Affiliations are key to the Linux Foundation:

  • Trust: They help build trust among peers.
  • Security: They are essential to prevent supply chain attacks.
  • Company engagement: They help track how a company engages in open source.
  • Expertise and credibility: They show the individual's professional background and commitment to quality.

Affiliation goal

Community Data Platform (CDP) is a critical data source in LFX, gathering information from multiple sources and enhancing it through deduplication and enrichment. This process provides deep insights into open-source communities and contributions.

Activities are the starting point of the CDP system. From activities, we can identify:

  • The people involved.
  • The organizations they are associated with.

Both people, activities, and organizations are vital data points, offering visibility into who, when, and where contributions happen.

CDP Affiliations aim to ensure that contributions and activities are always linked to a relevant organization—based on work history or specific projects.

Affiliation concepts

Work history / Organizations

CM can link organizations to individuals based on their work history. These organizations:

  • Are mainly sourced via enrichment (from 3rd party data vendors) but can also be manually added.
  • Include an organization, job title, and time period.

The time period is especially important as it helps automatically match contributions to the correct organization based on the date.

Contributions during a specific time can also be marked as Individual contributions (not affiliated with an organization).

Project Affiliations

By default, contributions are linked to organizations based on work history and time period.

However, contributions can also be manually linked to a project. Once set, these manual affiliations won't be overridden by updates to work history.

Contributions during a specific period can also be marked as Individual contributions (not affiliated with an organization). This is useful if a person contributes in their free time and not on behalf of their employer.

Primary/Affiliated organizations

Each profile can have multiple work experiences reflecting the individual's employment history. CM uses the employment period to determine the active or current employer.

A primary organization represents:

  • The organization where the person worked.
  • The organization on whose behalf they contributed.

There is a concern that having this algorithm to define the primary organization relying purely on dates, isn't always reliable. One example of this is that some profiles may include volunteering roles or non-professional affiliations.

CDP should have a well defined algorithm to automatically define which organization is the primary one and which organizations should be considered for affiliations. In addition, it should always be possible to manually update affiliations and work history so that data can be corrected if needed.

"Individual - No Account" organization

CDP uses a placeholder organization, "Individual - No Account" to represent unaffiliated contributions.

  • This allows contributions made outside any organization to still be tracked.
  • The correct time period can be set manually for these contributions, whether linked to work experiences or projects.

Affiliation process

Automatic affiliation - New algorithm

Scenarios where affiliations need to be updated

  1. Merge profiles
  2. Unmerge profiles
  3. Update work history
  4. Update project affiliations

Categorizing organizations for affiliations

A member organization can meet two different criteria:

  • It is eligible for affiliation
    • Goal: Exclude specific work history organizations from being considered for affiliations
    • Possible values:
      • Yes: it will always be considered in the affiliations algorithm
      • No: it will never be considered in the affiliations algorithm
  • It is a primary organization
    • Goal: Prioritize specific work history organizations for affiliations, and identify the person's present or past employers.
    • Possible values:
      • Yes: it will always take priority in the affiliations algorithm
      • No: it won't take priority in the affiliations algorithm

Conflicting periods

When the decision to mark an organization as primary is ambiguous (overlapping or unknown periods), use the following criteria:

  1. Open Source Activity: Prioritize the organization with more active contributors, reflecting deeper involvement in the open source community.
  2. Period Duration (if period is defined): Select the organization associated with the longer time period.
  3. Technical Relevance (if role is defined): Favor the organization where the role is directly related to technical contributions or software development. Use a blacklist of non-technical roles as needed for comparison.

New algorithm to validate and update affiliations

For each activity, determine the correct organization based on a set of conditions and priorities. The new algorithm flow for a given profile should be as follows:

TLDR Diagram

Affiliations algorithm flow

Technical implementation

  • The affiliation update process is handled by the profiles-worker service.
  • The affiliation update process consists of two main steps:
    1. The prepareMemberAffiliationsUpdate method prepares the data for the update.
    2. The runMemberAffiliationsUpdate method executes the update operations.
  • Updates are triggered automatically through the memberUpdate workflow, which is orchestrated by Temporal.
  • The main focus of the update process is to maintain the relationships between activities, memberId, and organizationId.

Activity Updates

The updateActivities method handles all activity-related updates in the application. It performs two main operations:

  1. Kafka Queue Processing

    • Inserts activities into the activities Kafka topic
    • Distributes updates to multiple databases (Snowflake, QuestDB, and Tinybird) via Kafka connectors
    • More details on this architecture can be found here
  2. Database Updates

    • Updates the activityRelations table to maintain relationship data.
      • Note: Currently, relationship data exists in both activities and activityRelations tables
    • The activityRelations table was introduced for the Insights project
    • Future plan: Remove all relationship data from activities to reduce complexity.
Clone this wiki locally