This article is the first of three parts. Each part stands on its own, so you don’t need to read the others to understand it.
The dot product is one of the most important operations in machine learning – but it’s hard to understand without the right geometric foundations. In this first part, we build those foundations:
· Unit vectors
· Scalar projection
· Vector projection
Whether you are a student learning Linear Algebra for the first time, or want to refresh these concepts, I recommend you read this article.
In fact, we will introduce and explain the dot product in this article, and in the next article, we will explore it in greater depth.
The vector projection section is included as an optional bonus: helpful, but not necessary for understanding the dot product.
The next part explores the dot product in greater depth: its geometric meaning, its relationship to cosine similarity, and why the difference matters.
The final part connects these ideas to two major applications: recommendation systems and NLP.
A vector 𝐯→\large \mathbf\vecc is called a unit vector if its magnitude is 1:
|𝐯→|=1\LARGE \mathbf = 1
To remove the magnitude of a non-zero vector while keeping its direction, we can normalize it. Normalization scales the vector by the factor:
1|𝐯→|\LARGE \frac\mathbfc
The normalized vector 𝐯^\large \mathbfc is the unit vector in the direction of 𝐯→\large \mathbf:
𝐯^=𝐯→|𝐯→|\LARGE \beginc \hline \mathbf = \fracc \\ \hline \end
Notation 1. From now on, whenever we normalize a vector 𝐯→\large \mathbf, or write 𝐯^\large \mathbf, we assume that 𝐯→≠0\large \mathbf \neq 0. This notation, along with the ones that follow, is also relevant to the following articles.
This operation naturally separates a vector into its magnitude and its direction:
𝐯→=|𝐯→|⏟magnitude⋅𝐯^⏟direction\LARGE \beginc \hline \rulecl \mathbf = \underbrace{|\mathbf|}_{\textc} \cdot \underbrace{\mathbf{\hatc}}_{\text} \\[4.5em] \hline \end
Figure 1 illustrates this idea: 𝐯{\mathbf} and 𝐯^\large \mathbf{\hat} point in the same direction, but have different magnitudes.
Figure 1-Separating “How Much” from “Which Way”. Any vector can be written as the product of its magnitude and its unit vector, which preserves direction but has length 1. Image by Author (created using Claude).
Similarity of unit vectors
In two dimensions, all unit vectors lie on the unit circle (radius 1, centered at the origin). A unit vector that forms an angle θ with the x-axis has coordinates (cos θ, sin θ).
This means the angle between two unit vectors encodes a natural similarity score - as we will show shortly, this score is exactly cos θ: equal to 1 when they point the same way, 0 when perpendicular, and −1 when opposite.
Notation 2. Throughout this article, θ denotes the smallest angle between the two vectors, so 0°≤θ≤180°0° \leq \theta \leq 180° .
In practice, we don’t know θ directly – we know the vectors’ coordinates.
We can show why the dot product of two unit vectors: a^\large\hat{a} and b^\large\hat{b} equals cos θ using a geometric argument in three steps:
1. Rotate the coordinate system until b^\large\hat{b} lies along the x-axis. Rotation doesn’t change angles or magnitudes.
2. Read off the new coordinates. After rotation, b^\large\hat{b} has coordinates (1 , 0). Since a^\large\hat{a} is a unit vector at angle θ from the x-axis, the unit circle definition gives its coordinates as (cos θ, sin θ).
3. Multiply corresponding components and sum:
a^⋅b^=ax⋅bx+ay⋅by=cosθ⋅1+sinθ⋅0=cosθ\Large \begin{aligned} \hat{a} \cdot \hat{b} = a_x \cdot b_x + a_y \cdot b_y = \\ \cos\theta \cdot 1 + \sin\theta \cdot 0 = \cos\theta \end{aligned}
This sum of component-wise products is called the dot product:
a→⋅b→=a1⋅b1+a2⋅b2+⋯+an⋅bn\Large \boxed{ \begin{aligned} \vec{a} \cdot \vec{b} = a_1 \cdot b_1 + a_2 \cdot b_2 \\ + \cdots + a_n \cdot b_n \end{aligned} }
See the illustration of these three steps in Figure 2 below:
Figure 2- By rotating our perspective to align with the x-axis, the coordinate math simplifies beautifully to reveal why the two unit vectors’ dot product is equal to cos(θ). Image by Author (created using Claude).
Everything above was shown in 2D, but the same result holds in any number of dimensions. Any two vectors, no matter how many dimensions they live in, always lie in a single flat plane. We can rotate that plane to align with the xy-plane — and from there, the 2D proof applies exactly.
Notation 3. In the diagrams that follow, we often draw one of the vectors (typically b→\large\vec{b}) along the horizontal axis. When b→\large\vec{b} is not already aligned with the x-axis, we can always rotate our coordinate system as we did above (the “rotation trick”). Since rotation preserves all lengths, angles, and dot products, every formula derived in this orientation holds for any direction of b→\large\vec{b}.
A vector can contribute in many directions at once, but often we care about only one direction.
Scalar projection answers the question: How much 𝒂→\large \boldsymbol{\vec{a}} of lies along the direction of 𝒃→\large \boldsymbol{\vec{b}}?
This value is negative if the projection points in the opposite direction of b→\large\vec{b}.
The Shadow Analogy
The most intuitive way to think about scalar projection is as the length of a shadow. Imagine you hold a stick (vector a→\large \vec{a}) at an angle above the ground (the direction of b→\large\vec{b}), and a light source shines straight down from above.
The shadow that the stick casts on the ground is the scalar projection.
The animated figure below illustrates this idea:
Figure 3- Scalar projection as a shadow.
The scalar projection measures how much of vector a lies in the direction of b.
It equals the length of the shadow that a casts onto b (Woo, 2023). The GIF was created by Claude
Calculation
Imagine a light source shining straight down onto the line PS (the direction of b→\large\vec{b}). The “shadow” that a→\large\vec{a} (the arrow from P to Q ) casts onto that line is exactly the segment PR. You can see this in Figure 4.
Figure 4: Measuring Directional Alignment. The scalar projection (segment PR) visually answers the core question: “How much of vector a lies in the exact direction of vector b.” Image by Author (created using Claude).
Deriving the formula
Now look at the triangle PQR\large\ PQR: the perpendicular drop from Q\large\ Q creates a right triangle, and its sides are:
- PQ=|a→|\large\ PQ = |\vec{a}| (the hypotenuse).
- PR\large\ PR (the adjacent side – the shadow).
- QR\large\ QR (the opposite side – the perpendicular component).
From this triangle:
- The angle between a→\large\vec{a} and b→\large\vec{b} is θ.
- cos(θ)=PR|a→|\large \cos(\theta) = \frac{PR}{|\vec{a}|} (the most basic definition of cosine).
- Multiply both sides by |a→|\large|\vec{a}| :
PR=|a→|cos(θ)\LARGE \begin{array}{|c|} \hline PR = |\vec{a}| \cos(\theta) \\ \hline \end{array}
The Segment 𝑷𝑹\boldsymbol{PR} is the shadow length – the scalar projection of 𝒂→\large \boldsymbol{\vec{a}} on 𝒃→\large \boldsymbol{\vec{b}}.
When θ > 90°, the scalar projection becomes negative too. Think of the shadow as flipping to the opposite side.
How is the unit vector related?
The shadow’s length (PR) doesn’t depend on how long b→\large\vec{b} is. It depends on |a→|\large|\vec{a}| and on θ.
When you compute a→⋅b^\large\vec{a} \cdot \hat{b}, you are asking: how much of a→\large\vec{a} lies along b→\large\vec{b} direction? This is the shadow length.
The unit vector acts like a direction filter: multiplying a→\large\vec{a} by it extracts the component of a→\large\vec{a} along that direction.
Let’s see it using the rotation trick. We place b̂ along the x-axis:
a→=(|a→|cosθ, |a→|sin(θ))\Large \vec{a} = (|\vec{a}|\cos\theta,\ |\vec{a}|\sin(\theta))
and:
b^=(1,0)\Large \hat{b} = (1, 0)
Then:
a→⋅b^=|a→|cosθ⋅1+|a→|sin(θ)⋅0=|a→|cosθ\Large \begin{aligned} \vec{a} \cdot \hat{b} = |\vec{a}|\cos\theta \cdot 1 \\ + |\vec{a}|\sin(\theta) \cdot 0 = |\vec{a}|\cos\theta \end{aligned}
The scalar projection of 𝒂→\large \boldsymbol{\vec{a}} in the direction of 𝒃→\large \boldsymbol{\vec{b}} is:
|a→|cosθ=a→⋅b^=a→⋅b→|b→|\LARGE \renewcommand{\arraystretch}{2} \begin{array}{|c|} \hline \begin{aligned} |\vec{a}|\cos\theta &= \vec{a} \cdot \hat{b} \\ &= \frac{\vec{a} \cdot \vec{b}}{|\vec{b}|} \end{aligned} \\ \hline \end{array}
We apply the same rotation trick one more time, now with two general vectors: a→\large\vec{a} and b→\large\vec{b}.
After rotation:
a→=(|a→|cosθ, |a→|sinθ)\Large \vec{a} = (|\vec{a}|\cos\theta,\ |\vec{a}|\sin\theta) ,
b→=(|b→|, 0)\Large \vec{b} = (|\vec{b}|,\ 0)
so:
a→⋅b→=|a→|cosθ⋅|b→|+|a→|sinθ⋅0=|a→||b→|cosθ\Large \begin{aligned} \vec{a} \cdot \vec{b} = |\vec{a}|\cos\theta \cdot |\vec{b}| \\ + |\vec{a}|\sin\theta \cdot 0 = |\vec{a}||\vec{b}|\cos\theta \end{aligned}
The dot product of 𝒂→\large \boldsymbol{\vec{a}} and 𝒃→\large \boldsymbol{\vec{b}} is:
a→⋅b→=a1b1+⋯+anbn=∑i=1naibi=|a→||b→|cosθ\Large \renewcommand{\arraystretch}{2} \begin{array}{|l|} \hline \vec{a} \cdot \vec{b} = a_1 b_1+ \dots + a_n b_n \\ = \sum_{i=1}^{n} a_i b_i = |\vec{a}||\vec{b}|\cos\theta \\ \hline \end{array}
Vector projection extracts the portion of vector 𝒂→\large \boldsymbol{\vec{a}} that points along the direction of vector 𝒃→\large \boldsymbol{\vec{b}}.
The Trail Analogy
Imagine two trails starting from the same point (the origin):
- Trail A leads to a whale-watching spot.
- Trail B leads along the coast in a different direction.
Here’s the question projection answers:
You’re only allowed to walk along Trail B. How far should you walk so that you end up as close as possible to the endpoint of Trail A?
You walk along B, and at some point, you stop. From where you stopped, you look toward the end of Trail A, and the line connecting you to it forms a perfect 90° angle with Trail B. That’s the key geometric fact – the closest point is always where you’d make a right-angle turn.
The spot where you stop on Trail B is the projection of A onto B. It represents “the part of A that goes in B’s direction.
The remaining gap - from your stopping point to the actual end of Trail A – is everything about A that has nothing to do with B’s direction. This example is illustrated in Figure 5 below: The vector that starts at the origin, points along Trail B, and ends at the closest point –is the vector projection of a→\large\vec{a} onto b→\large\vec{b} .
Figure 5 — Vector projection as the closest point to a direction.
Walking along trail B, the closest point to the endpoint of A occurs where the connecting segment forms a right angle with B. This point is the projection of A onto B. Image by Author (created using Claude)..
Scalar projection answers: “How far did you walk?”
That’s just a distance, a single number.
Vector projection answers: “Where exactly are you?”
More precisely: “What is the actual movement along Trail B that gets you to that closest point?”
Now “1.5 kilometers” isn’t enough, you need to say “1.5 kilometers east along the coast.” That’s a distance plus a direction: an arrow, not just a number. The arrow starts at the origin, points along Trail B, and ends at the closest point.
The distance you walked is the scalar projection value. The magnitude of the vector projection equals the absolute value of the scalar projection.
Unit vector answers : “Which direction does Trail B go?”
It is exactly what b^\large\hat{b} represents. It’s Trail B stripped of any length information - just the pure direction of the coast.
vector projection=(how far you walk)⏟scalar projection×(B direction)⏟b^\begin{aligned} &\text{vector projection} = \\ &\underbrace{(\text{how far you walk})}_{\text{scalar projection}} \times \underbrace{(\text{B direction})}_{\hat{b}} \end{aligned}
I know the whale analog is very specific; it was inspired by this good explanation (Michael.P, 2014)
Figure 6 below shows the same shadow diagram as in Figure 4, with PR drawn as an arrow, because the vector projection is a vector (with both length and direction), not just a number.
Figure 6 — Vector projection as a directional shadow.
Unlike scalar projection (a length), the vector projection is an arrow along vector b. Image by Author (created using Claude).
Since the projection must lie along b→\large\vec{b} , we need two things for PR→\large\vec{PR} :
- Its magnitude is the scalar projection: |a→|cosθ\large|\vec{a}|\cos\theta
- Its direction is: b^\large\hat{b} (the direction of b→\large\vec{b})
Any vector equals its magnitude times its direction (as we saw in the Unit Vector section), so:
PR→=|a→|cosθ⏟scalar projection⋅b^⏟direction of b→\large \begin{array}{|c|} \hline \hspace{10pt} \vec{PR} = \underbrace{|\vec{a}| \cos \theta}_{\text{scalar projection}} \cdot \underbrace{\hat{b}}_{\text{direction of } \vec{b}} \hspace{20pt} \\ \hline \end{array}
This is already the vector projection formula. We can rewrite it by substituting b^=b→|b→|\large\hat{b} = \frac{\vec{b}}{|\vec{b}|} , and recognizing that |a→||b→|cosθ=a→⋅b→\large|\vec{a}||\vec{b}|\cos\theta = \vec{a} \cdot \vec{b}
The vector projection of 𝒂→\large \boldsymbol{\vec{a}} in the direction of 𝒃→\large \boldsymbol{\vec{b}} is:
projb→(a→)=(|a→|cosθ)b^=(a→⋅b→|b→|2)b→=(a→⋅b^)b^\Large \renewcommand{\arraystretch}{1.5} \begin{array}{|c|} \hline \begin{aligned} \text{proj}_{\vec{b}}(\vec{a}) &= (|\vec{a}|\cos\theta)\hat{b} \\ &= \left(\frac{\vec{a} \cdot \vec{b}}{|\vec{b}|^2}\right)\vec{b} \\ &= (\vec{a} \cdot \hat{b})\hat{b} \end{aligned} \\ \hline \end{array}
- A unit vector isolates a vector’s direction by stripping away its magnitude.
𝐯^=𝐯→|𝐯→|\LARGE \begin{array}{|c|} \hline \mathbf{\hat{v}} = \frac{\mathbf{\vec{v}}}{|\mathbf{\vec{v}}|} \\ \hline \end{array}
- The dot product multiplies corresponding components and sums them. It is also equal to the product of the magnitudes of the two vectors multiplied by the cosine of the angle between them.
a→⋅b→=a1b1+⋯+anbn=∑i=1naibi=|a→||b→|cosθ\ \renewcommand{\arraystretch}{2} \begin{array}{|l|} \hline \vec{a} \cdot \vec{b} = a_1 b_1+ \dots + a_n b_n \\ = \sum_{i=1}^{n} a_i b_i = |\vec{a}||\vec{b}|\cos\theta \\ \hline \end{array}
- Scalar projection uses the dot product to measure how far one vector reaches along another’s direction - a single number, like the length of a shadow
|a→|cosθ=a→⋅b^=a→⋅b→|b→|\Large \begin{array}{|c|} \hline |\vec{a}|\cos\theta = \vec{a} \cdot \hat{b} = \frac{\vec{a} \cdot \vec{b}}{|\vec{b}|} \\ \hline \end{array}
- Vector projection goes one step further, returning an actual arrow along that direction: the scalar projection times the unit vector.
(|a→|cosθ)b^=(a→⋅b^)b^\Large \renewcommand{\arraystretch}{2} \begin{array}{|l|} \hline (|\vec{a}|\cos\theta)\hat{b} = (\vec{a} \cdot \hat{b})\hat{b} \\ \hline \end{array}
In the next part, we will use the tools we learned in this article to truly understand the dot product.

