Crafting Data Platform Personas: A Deep Dive

9 min readOct 16, 2024

Plenty of blogs and articles discuss the importance of defining personas or archetypes for a Product, highlighting their similarities and differences, and how they can be used.

For this blog, I’m going to use personas and archetypes interchangeably, considering both help illustrate the same type of information. They both summarise research data about users and depict a particular audience. They consider the overall behaviour, attitudes, motivations, and challenges of a user group.

The purpose of this blog is not to provide yet another view of these concepts but rather to showcase the creation of high-level “personas” (or archetypes) when building a Data Platform for internal users in an organisation.

Whichever you want to use — personas or archetypes — are essential tools for understanding, categorising and clustering the various behaviours, actions, attitudes, motivations, pain points, and goals of users.

In my view, at least for a Data Platform built and offered to internal users in an organisation, this segmentation should not be determined by demographics such as age, country, or personality traits. Instead, it should be created by identifying areas of overlap in key characteristics, which can help a Data Platform Team better understand the unique differences and similarities between different types of users.

A Data Platform team can use personas to inform decision-making in a broad spectrum — at a strategic level by identifying a particular segment a Data Platform service offering could be focused on or at an operational level by tailoring documentation according to a specific segment attribute or action.

The personas described in this blog focus on users’ attributes and actions in relation to data. These personas do not describe users’ main responsibilities in their respective roles; they surmise affinity rather than competence. Therefore, the first step is to define the relevant Affinities for your organisation.

All information used to create the segmentation should come from user research. It can be actual user interviews, user observations, comments from issues/tickets/requests raised to your Data Platform team or simply informal coffee chats with people in your organisation.

Affinity Definitions

The following is an extract of the Affinities we defined in an organisation where I worked. They will likely vary depending on your organisation’s maturity level, needs and your discovery journey. You might find others or decide to break them differently as you discover and talk to data practitioners across the organisation.

Technical Involvement

None: Individuals that don’t make use of Data Platform services. The reasons could range from not needing it to not knowing it exists or the barrier of entry is too high for them to use it.
Use: Individuals who leverage Data Platform services to achieve specific goals. They could range from individuals who are vaguely familiar with the Data Platform services to proficient users who know how to work with them (or a subset) inside out.
Operation: Individuals developing, testing, deploying, monitoring, supporting and maintaining a technology environment or solution to achieve service-level targets.
Design: Individuals involved in defining the collection of “hardware” and software components and their interfaces to create a cohesive technical solution for a particular problem or opportunity. This might include various areas such as software(backend) architecture, user interface (frontend) architecture, and information architecture.

Data Fluency

Based on three components of data fluency (Reading data, Working with data, and Communicating with data), different levels of data literacy can be identified.

Conversational: Basic understanding of data concepts, analytics and use cases. An individual who “gets it” but cannot explain it to others.
Literacy: Ability to speak, write and engage in data and analytics solutions and use cases.
Competency: Competent in designing, developing and applying data and analytic solutions.
Fluency: Fluent across the three components of data fluency within one or more domains.

Data Access

Read Only: Cannot alter or modify Dev, QA, SBOX or Production data. Only read access to data within one or more domains.
Authorise: Approves data sharing and access requests for data within their data domain.
Write: Through services, adhering to change control processes can alter or modify data on Dev, QA, SBOX or playground production.
Execute: Adhering to change control processes can deploy, execute/run services to interact with data.

Decision Making

Operational: Refers to decisions individuals make each day to make the organisation run. (e.g. How should I balance my work demands? How often should I communicate with coworkers?)
Tactical: Decisions about how things will get done. (e.g. How should we market the new product?)
Strategic: Set the course of a plan that increases the likelihood of achieving organisation goals — Long-term horizont. (e.g. What set of technologies should we adopt to develop the new product?)

In our case, we created five high-level personas, leveraging the previously defined affinities and the knowledge gained during user research.

Each persona contains the following information:

Short description of the persona.
Scoring level depicting each persona’s affinities.
Label showing the type of data practitioner (Producer or consumer).
List of possible roles/job titles that could be part of the persona.
List of key attributes.

Personas

🎩 The Data Maverick

I’m a data virtuoso, proficient in SQL and various programming languages and technologies. I am the guardian of data quality, constantly vigilant for any degradation or incidents that might affect my data assets. I have direct access to data and a suite of tools to conduct experiments, build measurement frameworks, and contribute collaboratively to dashboards, reports, and data documentation. I play a pivotal role in shaping data-informed decisions and maintaining data integrity within my domain.

Key attributes:

I have strong SQL abilities for both querying and transforming data.
I might also have knowledge and experience with other programming languages (e.g. Python) and Technologies (e.g. Spark, Flink, Databricks…etc)
I have excellent data fluency and in-depth knowledge of data within one or more data domains.
I define, implement and maintain data quality rules.
I am alerted and take action for data quality degradation/incidents of the data assets in my domain(s).
I have direct access to data (Data Lake or Data Warehouse) and frequently use tools like dbt and Airflow as part of my job.
I have experience conducting experiments, building measurement frameworks and validating the results with relevant quantitative methods.
I’m a contributor/collaborator in the creation of dashboards/reports, data documentation, and analysis.

🥷 The Tech Ninja

I am adept at programming in various languages and have a knack for designing scalable systems. While I am not primarily focused on data tools, I maintain a level of data proficiency and ensure data quality. I contribute to data pipelines, occasionally in data documentation and dashboards, and add an extra layer of technical insight to the data-driven ecosystem. Overall, I am instrumental in the technical foundation of systems/services within my domain.

Key attributes:

My mother tongue is C#, C++, Go, Java, Node.js…etc.
I have experience designing and managing large, scalable, high throughput, highly available and fault-tolerant systems — I might be familiar working with Kafka, Kinesis or similar Message Brokers.
I know my way around SQL, and actually, I might have mid-level to strong SQL abilities for querying and transforming data.
I define, implement and maintain data quality rules.
I am alerted and take action for data quality degradation/incidents of the data assets in my domain.
I have decent data fluency and in-depth knowledge of data within a data domain.
I have direct access to data (Data Lake or Data Warehouse) and can use tools like dbt and Airflow, but they are not my forte.
I create/auth data pipelines using Fivetran, Kafka Connect or similar data replication tools — I might need some jump-start, but I’m sure I’ll be fine after.
I might be a contributor/collaborator in dashboards creation and/or data documentation.

⛵ The Explorer

I can extract insights from data using SQL and have a basic to mid-level understanding of data within my domain. While I primarily consume data, I may also contribute to defining data quality rules. I leverage data for reporting, analysis, and decision-making in my respective domain. My activities revolve around utilising data to enhance my work and contribute to the organisation’s success.

Key attributes:

I have mid-level to strong SQL abilities for querying data.
I have basic to mid-level data fluency and knowledge of data within one or more data domains.
I might contribute to data quality rules definition.
I have direct access to some data in the Data Lake or Data Warehouse, but I’m neither familiar with nor use dbt and Airflow for my day-to-day activities.
When I need access to new data assets, I contact my team members or the Data Platform for guidance or to sort things out.
I leverage data by contributing/collaborating in dashboards/reports creation and documentation or exporting it in Excel to explore, slice and dice it.

🎼 The Conductor

I’m a key player in ensuring data quality, governance, and effective utilisation within my data domain(s). While I may have limited SQL abilities, I possess a solid understanding of the data I oversee. My role is pivotal in maintaining data quality, supporting data governance, ensuring the right people have access to the data in my domain(s), and making data-driven decisions that align with the goals of my team(s) and/or the entire organisation.

Key attributes:

I have limited SQL abilities or limited time to query and “play” with data myself.
I have basic to decent data fluency and knowledge of data within one or more data domains.
I’m informed about the data quality status of the data assets in my domain(s).
I might be the escalation point for data quality issues and/or allocate resources for data quality remediation within my domain(s).
I might rely on direct reports for dashboard creation and data documentation. I use dashboards to gain insights or have more visibility about the performance of my domain(s).
I might be accountable for data governance and support implementing and adhering to the organisational Data Governance standards/policies/framework, including removing blockers in my data domain(s).
I might approve access to data tools and/or assets within my data domain(s).

🔭 The Observer

While I’m not a data expert, I excel in my specific domain due to my in-depth knowledge. I may have limited data fluency and access to data, but I recognise the importance of data quality. I primarily use dashboards and reports to make informed decisions within my domain and collaborate with more data-fluent team members or the Central Analytics team when additional data or analysis is required. My value lies in my domain expertise and my ability to translate data insights into meaningful actions within my area of influence.

Key attributes:

I have limited to no SQL abilities or, seriously, no time or interest in querying or “playing” with data
I might have limited data fluency. However, I have in-depth domain knowledge — I might well be The Maverick of my domain.
I might contribute to the definition of data quality rules.
I might be informed about the data quality status of the data assets in my domain.
I have limited access to data — I don’t have access to the Data Lake or Data Warehouse — I might have heard of them, but I’m not sure I know what they are.
I have access to dashboards/reports (e.g. Looker, Apache Superset…etc), and I might use them to perform activities in my domain.
A more data-fluent person, part of my team or the Central Analytics team, helps when I need access to additional data, create dashboards or analyse data for decision-making.

I recommend you have conversations with your team to nail down the tone and depth needed to create those personas. I hope this blog serves as inspiration for you to craft your personas, define the affinities relevant to your organisation and leverage the data gathered during your user research.

Crafting personas is not an exact science; rather, it is a journey of trial-and-error where each discovery might bring you closer to the right fit. Personas are not set in stone — they are dynamic and ever-evolving. It is worth revisiting and tweaking them periodically, according to your organisation’s growth and changes.

Feel free to reach out if you have any questions or need to bounce ideas to build personas for your Data Platform.

Crafting Data Platform Personas: A Deep Dive

Affinity Definitions

Technical Involvement

Data Fluency

Data Access

Decision Making

Personas

🎩 The Data Maverick

🥷 The Tech Ninja

⛵ The Explorer

🎼 The Conductor

🔭 The Observer

Written by Sthiven Pinilla

No responses yet