The Dot Product Is the Whole Game: The One Bit of Linear Algebra ML Actually Runs On

I'll let you in on something that took me embarrassingly long to figure out, even after a master's degree and years of building models for a living.

Almost everything in machine learning — recommendations, search, image generation, the "attention" inside large language models — is the same tiny operation, repeated billions of times. It has a boring name and a one-line formula. It's the dot product. And once it clicks, a huge amount of the field stops looking like magic and starts looking like arithmetic you could do on a napkin.

So let's make it click. No prerequisites, no proofs you'll never use. Just the one idea, and then we'll watch it do something.

A vector is just a list of numbers wearing a costume

A vector is a list of numbers. That's it. [5, 1, 0] is a vector. The costume part is that we agree to interpret those numbers as a direction in space — an arrow pointing from the origin to a point.

Here's the move that unlocks everything: the numbers can mean anything you choose. A movie can be a vector [action, romance, comedy]. A user can be a vector of how much they like each of those things. A word, a face, a product, a sentence — anything you can describe with traits becomes an arrow living in some high-dimensional space.

Mad Max is roughly [9, 1, 2] — lots of action, barely any romance. A rom-com is [1, 9, 4]. You and I are arrows in that same space too. And the instant everything is an arrow, we can ask the only question that matters in ML: do these two arrows point the same way?

The dot product: a similarity score you can compute in your head

The dot product takes two vectors, multiplies them position-by-position, and adds up the result. One number out.

You ([8, 2, 3] — you love action) versus Mad Max ([9, 1, 2]):

8×9 + 2×1 + 3×2 = 72 + 2 + 6 = 80

You versus the rom-com ([1, 9, 4]):

8×1 + 2×9 + 3×4 = 8 + 18 + 12 = 38

Eighty beats thirty-eight, so we recommend Mad Max. That's a recommendation engine. Not a toy one — the actual shape of the real thing. Netflix's version has hundreds of learned dimensions instead of three, and it learns the numbers instead of me making them up, but the engine at the center is this exact sum.

The reason it works is geometric and genuinely beautiful: the dot product measures alignment. Arrows pointing the same way give a big positive number. Arrows at right angles give zero (they share nothing — they're "perpendicular," which is the maths word for "unrelated"). Arrows pointing opposite ways give a negative number. One sum quietly encodes both how strong each thing is and how much they agree on direction — a whole relationship distilled into a single, intuitive measure of alignment.

Let's make it real with NumPy:

import numpy as np

you      = np.array([8, 2, 3])
mad_max  = np.array([9, 1, 2])
rom_com  = np.array([1, 9, 4])

print(you @ mad_max)   # 80  ->  strong match
print(you @ rom_com)   # 38  ->  meh

That @ is Python's "matrix multiply" operator, and for two vectors it is the dot product. Keep that symbol in mind — it's about to scale up.

Cosine similarity: the dot product, but fair

There's one wrinkle. Big arrows get big dot products just for being big, even if their direction isn't a great match. A blockbuster everybody rates highly will score high with everyone. So we often want to strip out length and keep only direction.

Divide the dot product by the lengths of both vectors and you get cosine similarity — literally the cosine of the angle between the arrows. It lives between -1 (opposite) and +1 (identical direction), with 0 meaning unrelated. Same idea, now scale-proof.

def cosine(a, b):
    return (a @ b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(round(cosine(you, mad_max), 3))  # 0.983  -> nearly identical taste
print(round(cosine(you, rom_com), 3))  # 0.437  -> lukewarm

This isn't academic trivia. It's the beating heart of modern AI search. When you query a vector database — the thing powering "chat with your PDF" and most RAG systems — it's converting your text into an arrow and finding the stored arrows that point most the same way. And here's a lovely practical secret: most systems first normalize every vector to length 1. Once every arrow has the same length, the denominator in cosine similarity is just 1 × 1, so cosine collapses back into a plain dot product — identical answers, cheaper to compute. That's why production vector databases lean on the dot product directly.

A matrix is just a stack of dot products in a trench coat

Now the part that scares people: matrix multiplication. It's the same operation, batched.

A matrix is a stack of vectors. When you multiply a matrix by a vector, you are simply taking the dot product of each row with that vector. That's the entire definition. If you can do one dot product, you can do a million — that's all a matrix multiply is.

Say you have three users and you want all their scores against Mad Max at once:

users = np.array([
    [8, 2, 3],   # action lover (you)
    [2, 8, 1],   # romance lover
    [5, 5, 5],   # likes everything
])

scores = users @ mad_max
print(scores)   # [80 28 60]

One line, three dot products, three recommendations ranked. Stack more arrows on either side and you compute every user against every movie in a single shot — which is exactly why GPUs (which are absurdly good at parallel multiply-and-add) made deep learning take off. Deep learning didn't need fancier maths. It needed this one humble operation, run fast, run often.

A neural network layer? output = weights @ input — a pile of dot products, plus a squiggle to add nonlinearity. The "attention" mechanism inside an LLM, the thing that lets it decide which earlier words matter for the next one? It scores relevance by taking dot products between word vectors. Same operation. Same napkin arithmetic. All the way down.

The analogy to keep

Think of every vector as a person at a party facing some direction. The dot product is how much two people are nodding along with each other. Facing the same way, deep agreement — big number. Turned away, talking past each other — negative. Standing at right angles, total strangers — zero. Recommendations, semantic search, attention: every one of them is the machine quietly checking who's nodding along with whom, then betting on the strongest nods.

What to actually take away

A vector is a list of traits; the dot product scores how aligned two of them are. Multiply position-by-position, add it up. That's the whole operation.
Cosine similarity is the dot product with length divided out — use it when you care about direction, not magnitude. It's the engine under vector search and RAG.
A matrix multiply is just many dot products at once. Neural net layers and LLM attention are this, repeated and made fast.
You do not need the 600-page textbook to start. You need this, understood deeply, and the rest grows from it.

Spend an hour playing with the snippets above — change the numbers, watch the rankings flip, build a wrong intuition and fix it. That hour will pay you back across your entire ML career, because you'll never again look at a model and see a black box. You'll see arrows, leaning toward each other.

If you'd like a real human to walk you from "lists of numbers" all the way to your own working model — at exactly your pace, with the gaps in your maths gently filled in — that's literally my favourite thing to do. Book a 1:1 session and let's make this stuff feel obvious together.

The Dot Product Is the Whole Game: The One Bit of Linear Algebra ML Actually Runs On

Article Summary

A vector is just a list of numbers wearing a costume

The dot product: a similarity score you can compute in your head

Cosine similarity: the dot product, but fair

A matrix is just a stack of dot products in a trench coat

The analogy to keep

What to actually take away

Written by Ali Jabbary

Linear Algebra

Machine Learning

Maths

Is statistics worth learning in the age of AI? (Yes — here's the part that actually protects you)

Get ahead for next year: a low-stress summer math plan (Algebra → Calculus)

The Calculus II survival guide: integrals and series without the tears

A vector is just a list of numbers wearing a costume

The dot product: a similarity score you can compute in your head

Cosine similarity: the dot product, but fair

A matrix is just a stack of dot products in a trench coat

The analogy to keep

What to actually take away

Enjoyed this post? Get the next one in your inbox.

Written by Ali Jabbary

Related subjects

Linear Algebra

Machine Learning

Maths

Read next

Is statistics worth learning in the age of AI? (Yes — here's the part that actually protects you)

Get ahead for next year: a low-stress summer math plan (Algebra → Calculus)

The Calculus II survival guide: integrals and series without the tears