docs: Improve clarity of explanations and update code example outputs for accuracy across various math and tensor topics.

vukrosic · vukrosic · commit d6502ad6a51b · 2025-12-28T09:31:28.000+01:00
diff --git a/public/content/learn/activation-functions/softmax/softmax-content.md b/public/content/learn/activation-functions/softmax/softmax-content.md
@@ -166,7 +166,7 @@ def softmax(x):
 logits = torch.tensor([2.0, 1.0, 0.5])
 output = softmax(logits)
 print(output)
-# tensor([0.6364, 0.2341, 0.1295])
+# tensor([0.6285, 0.2312, 0.1402])
 print(output.sum())
 # tensor(1.0000) ← Sums to 1!
 ```
@@ -230,17 +230,17 @@ logits = torch.tensor([2.0, 1.0, 0.5])
 # Normal softmax (temperature = 1)
 probs_normal = torch.softmax(logits, dim=0)
 print(probs_normal)
-# tensor([0.6364, 0.2341, 0.1295])
+# tensor([0.6285, 0.2312, 0.1402])
 
 # Low temperature (sharper, more confident)
 probs_sharp = torch.softmax(logits / 0.5, dim=0)
 print(probs_sharp)
-# tensor([0.8360, 0.1131, 0.0508])
+# tensor([0.8360, 0.1131, 0.0509])
 
 # High temperature (softer, less confident)
 probs_soft = torch.softmax(logits / 2.0, dim=0)
 print(probs_soft)
-# tensor([0.4750, 0.3107, 0.2143])
+# tensor([0.4705, 0.3060, 0.2235])
 ```
 
 **Effect of temperature:**
@@ -316,8 +316,8 @@ logits = torch.tensor([[2.0, 1.0, 0.5],
 # Softmax across classes (dim=1)
 probs = torch.softmax(logits, dim=1)
 print(probs)
-# tensor([[0.6364, 0.2341, 0.1295],
-#         [0.1899, 0.6841, 0.1260]])
+# tensor([[0.6285, 0.2312, 0.1402],
+#         [0.1583, 0.5806, 0.2611]])
 
 print(probs.sum(dim=1))  # tensor([1., 1.])
 # Each row sums to 1!
diff --git a/public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md b/public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md
@@ -239,7 +239,7 @@ Higher score = more relevant!
 ```python
 weights = F.softmax(scores, dim=0)
 print("Weights:", weights)
-# tensor([0.5308, 0.0874, 0.3818])
+# tensor([0.5118, 0.0693, 0.4190])
 ```
 
 Softmax normalizes scores into probabilities:
@@ -256,14 +256,14 @@ for i, weight in enumerate(weights):
     output += weight * values[i]
 
 print("Output:", output)
-# tensor([28.1820, 38.1820])
+# tensor([28.1447, 38.1447])
 ```
 
 **Manual calculation:**
 ```
-output = 0.5308×[10, 20] + 0.0874×[30, 40] + 0.3818×[50, 60]
-       = [5.31, 10.62] + [2.62, 3.50] + [19.09, 22.91]
-       = [27.02, 37.03]
+output = 0.5118×[10, 20] + 0.0693×[30, 40] + 0.4190×[50, 60]
+       = [5.118, 10.236] + [2.079, 2.772] + [20.950, 25.140]
+       = [28.147, 38.148]
 ```
 
 The output is **mostly from position 0** (weight 53%) because it matched the query best!
diff --git a/public/content/learn/math/derivatives/derivatives-content.md b/public/content/learn/math/derivatives/derivatives-content.md
@@ -61,7 +61,7 @@ The derivative takes the average rate of change (slope) between two points, then
 
 Here we have linearly growing function.
 
-Derivative is always 3 for any `x` value, which means that in the original function, growth of `y` at any point is 3x (if you increase `x` by 1, `y` will increase by 3, check it).
+Derivative is always 3 for any `x` value, which means that in the original function, the rate of growth of `y` is 3 (if you increase `x` by 1, `y` will increase by 3, check it).
 
 ![Linear Function Derivative](/content/learn/math/derivatives/linear-function-derivative.png)
 
@@ -71,7 +71,7 @@ Derivative shows this accelerating growth, you can notice that derivative is inc
 
 ![Quadratic Function Derivative](/content/learn/math/derivatives/quadratic-function-derivative.png)
 
-In previous example derivative was always 3, which meant that function is always consistantly growing by 3 times `x`.
+In previous example derivative was always 3, which meant that function is always consistantly growing by 3.
 
 Here, on the other hand, the growth is growing. 
 
diff --git a/public/content/learn/math/gradients/gradients-content.md b/public/content/learn/math/gradients/gradients-content.md
@@ -15,17 +15,17 @@ Welcome! This guide will walk you through the concept of gradients. We'll start
 
 ## Step 1: From Line Slope (Derivative) To Surface Slope (Gradient)
 
-Let's start with what you know. For a simple function like `f(x) = x²`, the derivative `f'(x) = 2x` gives you the slope of the curve at any point `x`. So for `x=3`, derivative is `2*3=6`. That means as you increase `x` but a tiny bit, `f(x) = x²` will increase by 6.
+Let's start with what you know. For a simple function like `f(x) = x²`, the derivative `f'(x) = 2x` gives you the slope of the curve at any point `x`. So for `x=3`, derivative is `2 * 3 = 6`. That means as you increase `x` by a tiny bit, `f(x) = x²` will increase by 6.
 
-At `x=4`, derivative is `2*4=8`, so at that point `f(x) = x²` is increasing by 8x.
+At `x=4`, derivative is `2 * 4 = 8`, so at that point `f(x) = x²` is increasing by 8.
 
 
 
 
 
 Notice that I say "if you increase x by a bit, `f(x) = x²` will increase by 6" and I don't say "if you increase x by 1", because increasing x by 1 (from 3 to 4 in this case) is a lot and by that point derivative (rate of change) will go from 6 to 8.
 
-On this image you can see that the red slope at `x=3` is smaller than thes green slope at `x=4`.
+On this image you can see that the red slope at `x=3` is smaller than the green slope at `x=4`.
 
 ![Derivatives with Tangent Lines](/content/learn/math/gradients/derivatives-tangent-lines.png)
 
@@ -45,7 +45,7 @@ There isn't just one. There's a slope if you take a step in the x-direction, a d
 
 To handle this, we use **partial derivatives**.
 
-- **Partial Derivative with respect to x (∂f/∂x):** This is the slope if you only move in the x-direction. You treat y as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂x = 2x` - remember the rule for a constant that stands alone, constants become 0 in the derivative, and since we treat y as a constant, `+ y²` will ecome `+ 0`.
+- **Partial Derivative with respect to x (∂f/∂x):** This is the slope if you only move in the x-direction. You treat y as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂x = 2x` - remember the rule for a constant that stands alone, constants become 0 in the derivative, and since we treat y as a constant, `+ y²` will become `+ 0`.
 
 - **Partial Derivative with respect to y (∂f/∂y):** This is the slope if you only move in the y-direction. You treat x as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂y = 2y`.
 
@@ -96,15 +96,15 @@ Let's go back to our bowl function, `f(x, y) = x² + y²`, and its gradient, `
 Let's calculate the gradient at a specific point, say `(3, 1)`.
 
 ```
-∇f(3, 1) = [ 2  3, 2  1 ] = [6, 2]
+∇f(3, 1) = [ 2 * 3, 2 * 1 ] = [6, 2]
 ```
 
 This vector `[6, 2]` is an arrow that points "6 units in the x-direction and 2 units in the y-direction." This is an arrow pointing up and to the right, away from the minimum at `(0, 0)`. This makes perfect sense! From the point `(3, 1)`, the steepest way up the bowl is away from the bottom.
 
 What about the point `(-2, -2)`?
 
 ```
-∇f(-2, -2) = [ 2  -2, 2  -2 ] = [-4, -4]
+∇f(-2, -2) = [ 2 * -2, 2 * -2 ] = [-4, -4]
 ```
 
 This vector points down and to the left, again, away from the bottom of the bowl at `(0, 0)`.
diff --git a/public/content/learn/math/matrices/matrices-content.md b/public/content/learn/math/matrices/matrices-content.md
@@ -59,7 +59,7 @@ Multiply each element by the scalar. For `(2A)`:
 
 ### 3.3 Matrix multiplication
 
-You do a dot product of a row of th first matrix with the column of the second matrix and write result at the position where that row and column intercept.
+You do a dot product of a row of the first matrix with the column of the second matrix and write the result at the position where that row and column intersect.
 
 If `(A)` is `(m x p)` and `(B)` is `(p x n)`, then `(AB)` is `(m x n)`. Multiply rows of `(A)` by columns of `(B)` and sum.
 
diff --git a/public/content/learn/math/vectors/vectors-content.md b/public/content/learn/math/vectors/vectors-content.md
@@ -106,7 +106,7 @@ Multiplying a vector by a regular number (a **scalar**) changes its magnitude bu
 
 If `k` is a scalar and `v = [x, y]`, then:
 ```
-k  v = [kx, k*y]
+k * v = [k * x, k * y]
 ```
 
 
@@ -142,14 +142,14 @@ The **dot product** is a way of multiplying two vectors that results in a single
 If `u = [x₁, y₁]` and `v = [x₂, y₂]`, the dot product `u · v` is:
 
 ```
-u · v = (x₁  x₂) + (y₁  y₂)
+u · v = (x₁ * x₂) + (y₁ * y₂)
 ```
 
 ### Geometric Meaning & Finding Angles
 The dot product also has a powerful geometric definition:
 
 ```
-u · v = ||u||  ||v||  cos(θ)
+u · v = ||u|| * ||v|| * cos(θ)
 ```
 
 where `θ` (theta) is the angle between the two vectors. We can rearrange this formula to find the angle between any two vectors!
diff --git a/public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md b/public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md
@@ -268,8 +268,8 @@ weights = torch.tensor([[0.1, 0.2, 0.3, 0.4],
 outputs = inputs @ weights  # Shape: (2, 4)
 
 print(outputs)
-# tensor([[3.2000, 3.8000, 4.4000, 5.0000],
-#         [7.7000, 9.2000, 10.7000, 12.2000]])
+# tensor([[ 3.8000,  4.4000,  5.0000,  5.6000],
+#         [ 8.3000,  9.8000, 11.3000, 12.8000]])
 ```
 
 **What happened:**
diff --git a/public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md b/public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md
@@ -15,47 +15,51 @@ Transposing is like **flipping** a tensor - rows become columns, and columns bec
 
 Think of it like rotating a table 90 degrees. The first row becomes the first column, the second row becomes the second column, and so on.
 
-## Vector Transpose
-
-When you transpose a vector, you change it from horizontal to vertical (or vice versa):
-
-![Vector Transpose](/content/learn/tensors/transposing-tensors/vector-transpose.png)
+When you transpose a 1D tensor (a vector), it actually stays exactly the same in PyTorch! This is a common point of confusion.
 
 **Example:**
 
 ```python
 import torch
 
-# Horizontal vector (row)
+# 1D vector
 v = torch.tensor([1, 2, 3, 4])
 print(v.shape)  # torch.Size([4])
 
-# Transpose to vertical (column)
+# Transpose
 v_t = v.T
-print(v_t)
-# tensor([[1],
-#         [2],
-#         [3],
-#         [4]])
-print(v_t.shape)  # torch.Size([4, 1])
+print(v_t.shape)  # torch.Size([4]) - Still the same!
+print(torch.equal(v, v_t))  # True
+```
+
+To actually turn a 1D vector into a column vector (2D), you need to reshape it:
+
+```python
+# Change to column vector (4 rows, 1 column)
+v_col = v.reshape(4, 1) 
+print(v_col.shape)  # torch.Size([4, 1])
+
+# Now transposing works as expected
+v_row = v_col.T
+print(v_row.shape)  # torch.Size([1, 4])
 ```
 
 **Manual visualization:**
 
 ```yaml
-Before: [1, 2, 3, 4]  →  Shape: (4,)
+1D Vector: [1, 2, 3, 4]  →  Shape: (4,)
+
+Column Vector (2D): [[1],
+                     [2],
+                     [3],
+                     [4]]  →  Shape: (4, 1)
 
-After:  [[1],
-         [2],
-         [3],
-         [4]]          →  Shape: (4, 1)
+Row Vector (2D):    [[1, 2, 3, 4]] → Shape: (1, 4)
 ```
 
 ## Matrix Transpose
 
-This is where transpose really shines! Rows become columns, columns become rows:
-
-![Matrix Transpose](/content/learn/tensors/transposing-tensors/matrix-transpose.png)
+This is where transpose really shines! Rows become columns, and columns become rows:
 
 **Example:**
 
@@ -96,8 +100,6 @@ Transpose (3×2):
 
 Here's exactly what happens to each element during transpose:
 
-![Transpose Detailed](/content/learn/tensors/transposing-tensors/transpose-detailed.png)
-
 **The pattern:** Position `[i, j]` → Position `[j, i]`
 
 **Example tracking specific elements:**
@@ -119,8 +121,6 @@ Original position → Transposed position
 
 Square matrices (same number of rows and columns) have a special property:
 
-![Square Transpose](/content/learn/tensors/transposing-tensors/square-transpose.png)
-
 **Example:**
 
 ```python
@@ -184,36 +184,22 @@ transposed_shape = (4, 4)  # Still square!
 
 The most common reason: **making shapes compatible for matrix multiplication!**
 
-![Why Transpose](/content/learn/tensors/transposing-tensors/why-transpose.png)
-
 **Example:**
 
 ```python
 import torch
 
 A = torch.randn(2, 3)  # Shape: (2, 3)
-B = torch.randn(2, 4)  # Shape: (2, 4)
+B = torch.randn(4, 3)  # Shape: (4, 3)
 
-# This WON'T work - shapes incompatible
-# result = A @ B  # Error! 3 ≠ 2
+# This WON'T work - shapes (2,3) and (4,3) are incompatible
+# result = A @ B  # Error! 3 != 4
 
 # Transpose B to make it work!
-B_T = B.T  # Shape: (4, 2)
-
-# Now this works!
-result = A @ B_T  # (2, 3) @ (4, 2)? Wait, still wrong!
-
-# Actually, we need different dimensions
-# Let's try a real example:
-A = torch.randn(2, 3)
-B = torch.randn(4, 3)  # Same inner dimension as A's columns
-
-# Without transpose - doesn't work
-# result = A @ B  # Error! (2,3) @ (4,3) - 3 ≠ 4
-
-# With transpose - works!
-result = A @ B.T  # (2,3) @ (3,4) = (2,4) ✓
+B_T = B.T  # Shape: (3, 4)
 
+# Now it works!
+result = A @ B_T  # (2, 3) @ (3, 4) -> (2, 4)
 print(result.shape)  # torch.Size([2, 4])
 ```