The language of data science
Why marketers struggle with it - and the "starter for 10" terms that are useful to know
Hi dream team,
This week:
[Feature] The language of data science
[Video] MLs like teen spirit podcast - Riaz explains the concept of the "dark funnel," a term used to describe marketing activities like word-of-mouth and online discussions on platforms like Reddit or YouTube.
The language of Data Science
I was inspired to write this after being in a meeting watching a marketer use the word “cluster” to describe a group of customers. And the analysts being puzzled since cluster using comes with an ML technique to find structures in data.
What they were really discussing was probably a “segment” in marketing land - a subset of customers defined by rules.
So let’s get into it - if you’ve ever sat in a meeting where someone dropped “regression” or “DAX” and watched eyes glaze over, or indeed, felt anxious cos you didn’t follow - then this is for you!
Marketers are expected to use data science outputs - segmentation, scoring, predictions - but are rarely taught the language data scientists use to describe them.
Let’s look at why this language barrier exists and then get you fluent in the 10 terms that probably matter most.
Why it feels hard: a reality check
Even on data-science-focused forums, people say this stuff is confusing.
I found this quote from a Reddit user who confessed:
“I am very confused on the meaning of some terms in the data space … some seem to be synonyms, some seem different, but the definitions are unclear.” (Reddit)
This is what I have seen too. Marketers working with data science/scientists describe the transition as a challenge sometimes - not because they lack domain knowledge, but because the language is entirely different. I guess marketing has its own jargon, and data science adds another layer on top of it. Without translation, the two speak past each other.
The “Language of Data Science”: 10 terms marketers need to master
These are the concepts that show up most in data science work in marketing.
1. Model / modelling
Simply put, a model is a mathematical representation of a process. In marketing, this could be a predictive model (e.g., “likelihood to churn”) or attribution model (how credit is distributed across customer touchpoints). You’ll probably hear “Predictive AI” potentially too. Some of the algorithms are 200 years old so don’t be too impressed!
See #2 and #4 to extend this.
2. Regression
A way to measure relationships between multiple variables. The target variable is what we call “continuous” - so on a scale, normally from 0 (sometimes negative) to any number. There are a bunch of other variables that are used to try and explain lumps and bumps in the target.
If you want to know how price affects conversion rate, of media spend affects sales you’re probably talking about regression.
One use-case for regression that a lot of marketers might have heard of is “econometrics” - this uses regression analysis to understand how media impacts sales, and uses special features (see #3) that have lags and transformations applied (i.e. when I do advertising today, it takes a while for consumers to recognise my brand enough to consider us, then extra sales start arriving several weeks or months down the line)
Enjoy this video explainer if you are new to regression:
3. Feature
Not a product feature - this is a variable used in modelling. Some examples: age, recency of purchase, number of website visits, etc.
Some features are a column being used directly (i.e. the age example - maybe this is put into 5 year bands or something). Some features are derived (rolling open rate % over last 5 emails received). Some features are transformed (number of website visits transformed with a mathematical functions (since we see more recent web visits have a bigger impact on our target variable for example)).
A dataset used for building models could have hundreds of features. See #10.
4. Propensity / propensity score
A type of predictive model output that tells you how likely something is to happen - like a customer buying a product, or churning. At the data level, think that we have something labelled “Yes” / “No” (or 1/0) that we want to model.
In marketing, this is often a targeting score used to prioritise audiences (e.g., “top 20% segment”).
5. Lift / uplift
Not the same as “increase”. Increase is when some quantity is bigger compared to a prior period. Lift measures the incremental impact attributable to an action - e.g., how much more likely someone is to convert because they received an email compared to if they hadn’t.
Another bonus video for if you’d like to understand this more:
6. Cohort
A group of users defined by a shared trait - like signup month or first purchase quarter. Cohort analysis lets you see behaviour over time compared to peers.
7. A/B test
A controlled experiment comparing two versions - “A” vs “B” - to see which performs better.
8. Bias
Not a moral label her! It’s anything that systematically skews results. Marketers often underestimate how selection bias or confirmation bias can derail data analysis. I’ll let you look into those.
9. Confidence interval
This expresses the uncertainty around a metric or model output. Instead of saying “we expect $500k lift,” a confidence interval might say “we’re 95% confident the lift is between $400k–$600k.”
That helps stakeholders make better judgement calls, based on how certain you are, given the data you have accumulated.
10. Dimensionality
A fancy way of saying “number of variables.” High dimensionality (lots of features) can improve models - but also increase complexity and risk of overfitting (overfitting = model performs differently on unseen data compared to results with the training dataset) - unless handled properly.
Why this language matters
Marketers make decisions based on:
What the model means
When it’s reliable
How confidently we can act on it
Interpreting outputs without shared language can lead to:
Misapplied insights
Incorrect decisions
Potentially wasted spend
Miscommunication between teams
It could even include marketers approaching their data science team less for support in future
How marketers can learn the language
Here are two practical strategies:
1. Build analogies, not just definitions
Translate terms into marketing equivalents. If you’re a marketer - request a plain English, relatable explanation.
For example:
Regression → “Attribution of cause vs correlation”
Cohort → “Behaviour-based customer grouping, and we the study the groups over time”
This helps teams think in marketing logic, not just in data science terms.
2. Pull key terms into reporting
Include short definitions right inside dashboards or reports. Imagine hovering over “lift” and seeing: “Incremental difference between exposed and control group” - suddenly it’s less intimidating.
Final thought: shared language = shared action
Learning the language of data science isn’t about memorising terminology.
When the marketer, analyst, scientist, and product owner all understand what a model’s output really means - and what it doesn’t - something actionable is way more likely to happen: You stop debating semantics and start debating strategy, tactics or “what next”. And that’s where the conversation goes from being interesting to impactful.
Video: MLs like teen spirit poddy
In this episode, Riaz explains the concept of the “dark funnel,” a term used to describe marketing activities like word-of-mouth and online discussions on platforms like Reddit or YouTube. He highlights how the digital age has made it possible to track these interactions, but still difficult to measure their true impact on sales, as they don’t occur within a company’s own channels. The discussion touches on how B2B marketing has evolved, with potential customers engaging much later in the sales cycle, often through thought leadership content or community discussions. Riaz emphasizes the importance of understanding and tracking indirect channels to gauge brand affinity and market share of voice, and how marketers must get comfortable with uncertainty, relying on triangulation of data from multiple sources, including community engagement, intent data, and traditional sales processes.
Thank you for the support folks!
Cheers,
John
PS - There is another way I might be able to help you:
MLs like teen spirit is our sister podcast show - I cover all manner of Data Science topics here, often with a marketing orientation.


