Skip to content

Commit d6502ad

Browse files
committed
docs: Improve clarity of explanations and update code example outputs for accuracy across various math and tensor topics.
1 parent 7605036 commit d6502ad

File tree

8 files changed

+56
-70
lines changed

8 files changed

+56
-70
lines changed

public/content/learn/activation-functions/softmax/softmax-content.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ def softmax(x):
166166
logits = torch.tensor([2.0, 1.0, 0.5])
167167
output = softmax(logits)
168168
print(output)
169-
# tensor([0.6364, 0.2341, 0.1295])
169+
# tensor([0.6285, 0.2312, 0.1402])
170170
print(output.sum())
171171
# tensor(1.0000) ← Sums to 1!
172172
```
@@ -230,17 +230,17 @@ logits = torch.tensor([2.0, 1.0, 0.5])
230230
# Normal softmax (temperature = 1)
231231
probs_normal = torch.softmax(logits, dim=0)
232232
print(probs_normal)
233-
# tensor([0.6364, 0.2341, 0.1295])
233+
# tensor([0.6285, 0.2312, 0.1402])
234234

235235
# Low temperature (sharper, more confident)
236236
probs_sharp = torch.softmax(logits / 0.5, dim=0)
237237
print(probs_sharp)
238-
# tensor([0.8360, 0.1131, 0.0508])
238+
# tensor([0.8360, 0.1131, 0.0509])
239239

240240
# High temperature (softer, less confident)
241241
probs_soft = torch.softmax(logits / 2.0, dim=0)
242242
print(probs_soft)
243-
# tensor([0.4750, 0.3107, 0.2143])
243+
# tensor([0.4705, 0.3060, 0.2235])
244244
```
245245

246246
**Effect of temperature:**
@@ -316,8 +316,8 @@ logits = torch.tensor([[2.0, 1.0, 0.5],
316316
# Softmax across classes (dim=1)
317317
probs = torch.softmax(logits, dim=1)
318318
print(probs)
319-
# tensor([[0.6364, 0.2341, 0.1295],
320-
# [0.1899, 0.6841, 0.1260]])
319+
# tensor([[0.6285, 0.2312, 0.1402],
320+
# [0.1583, 0.5806, 0.2611]])
321321

322322
print(probs.sum(dim=1)) # tensor([1., 1.])
323323
# Each row sums to 1!

public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ Higher score = more relevant!
239239
```python
240240
weights = F.softmax(scores, dim=0)
241241
print("Weights:", weights)
242-
# tensor([0.5308, 0.0874, 0.3818])
242+
# tensor([0.5118, 0.0693, 0.4190])
243243
```
244244

245245
Softmax normalizes scores into probabilities:
@@ -256,14 +256,14 @@ for i, weight in enumerate(weights):
256256
output += weight * values[i]
257257

258258
print("Output:", output)
259-
# tensor([28.1820, 38.1820])
259+
# tensor([28.1447, 38.1447])
260260
```
261261

262262
**Manual calculation:**
263263
```
264-
output = 0.5308×[10, 20] + 0.0874×[30, 40] + 0.3818×[50, 60]
265-
= [5.31, 10.62] + [2.62, 3.50] + [19.09, 22.91]
266-
= [27.02, 37.03]
264+
output = 0.5118×[10, 20] + 0.0693×[30, 40] + 0.4190×[50, 60]
265+
= [5.118, 10.236] + [2.079, 2.772] + [20.950, 25.140]
266+
= [28.147, 38.148]
267267
```
268268

269269
The output is **mostly from position 0** (weight 53%) because it matched the query best!

public/content/learn/math/derivatives/derivatives-content.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ The derivative takes the average rate of change (slope) between two points, then
6161

6262
Here we have linearly growing function.
6363

64-
Derivative is always 3 for any `x` value, which means that in the original function, growth of `y` at any point is 3x (if you increase `x` by 1, `y` will increase by 3, check it).
64+
Derivative is always 3 for any `x` value, which means that in the original function, the rate of growth of `y` is 3 (if you increase `x` by 1, `y` will increase by 3, check it).
6565

6666
![Linear Function Derivative](/content/learn/math/derivatives/linear-function-derivative.png)
6767

@@ -71,7 +71,7 @@ Derivative shows this accelerating growth, you can notice that derivative is inc
7171

7272
![Quadratic Function Derivative](/content/learn/math/derivatives/quadratic-function-derivative.png)
7373

74-
In previous example derivative was always 3, which meant that function is always consistantly growing by 3 times `x`.
74+
In previous example derivative was always 3, which meant that function is always consistantly growing by 3.
7575

7676
Here, on the other hand, the growth is growing.
7777

public/content/learn/math/gradients/gradients-content.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,17 @@ Welcome! This guide will walk you through the concept of gradients. We'll start
1515

1616
## Step 1: From Line Slope (Derivative) To Surface Slope (Gradient)
1717

18-
Let's start with what you know. For a simple function like `f(x) = x²`, the derivative `f'(x) = 2x` gives you the slope of the curve at any point `x`. So for `x=3`, derivative is `2*3=6`. That means as you increase `x` but a tiny bit, `f(x) = x²` will increase by 6.
18+
Let's start with what you know. For a simple function like `f(x) = x²`, the derivative `f'(x) = 2x` gives you the slope of the curve at any point `x`. So for `x=3`, derivative is `2 * 3 = 6`. That means as you increase `x` by a tiny bit, `f(x) = x²` will increase by 6.
1919

20-
At `x=4`, derivative is `2*4=8`, so at that point `f(x) = x²` is increasing by 8x.
20+
At `x=4`, derivative is `2 * 4 = 8`, so at that point `f(x) = x²` is increasing by 8.
2121

2222

2323

2424

2525

2626
Notice that I say "if you increase x by a bit, `f(x) = x²` will increase by 6" and I don't say "if you increase x by 1", because increasing x by 1 (from 3 to 4 in this case) is a lot and by that point derivative (rate of change) will go from 6 to 8.
2727

28-
On this image you can see that the red slope at `x=3` is smaller than thes green slope at `x=4`.
28+
On this image you can see that the red slope at `x=3` is smaller than the green slope at `x=4`.
2929

3030
![Derivatives with Tangent Lines](/content/learn/math/gradients/derivatives-tangent-lines.png)
3131

@@ -45,7 +45,7 @@ There isn't just one. There's a slope if you take a step in the x-direction, a d
4545

4646
To handle this, we use **partial derivatives**.
4747

48-
- **Partial Derivative with respect to x (∂f/∂x):** This is the slope if you only move in the x-direction. You treat y as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂x = 2x` - remember the rule for a constant that stands alone, constants become 0 in the derivative, and since we treat y as a constant, `+ y²` will ecome `+ 0`.
48+
- **Partial Derivative with respect to x (∂f/∂x):** This is the slope if you only move in the x-direction. You treat y as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂x = 2x` - remember the rule for a constant that stands alone, constants become 0 in the derivative, and since we treat y as a constant, `+ y²` will become `+ 0`.
4949

5050
- **Partial Derivative with respect to y (∂f/∂y):** This is the slope if you only move in the y-direction. You treat x as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂y = 2y`.
5151

@@ -96,15 +96,15 @@ Let's go back to our bowl function, `f(x, y) = x² + y²`, and its gradient, `
9696
Let's calculate the gradient at a specific point, say `(3, 1)`.
9797

9898
```
99-
∇f(3, 1) = [ 2 3, 2 1 ] = [6, 2]
99+
∇f(3, 1) = [ 2 * 3, 2 * 1 ] = [6, 2]
100100
```
101101

102102
This vector `[6, 2]` is an arrow that points "6 units in the x-direction and 2 units in the y-direction." This is an arrow pointing up and to the right, away from the minimum at `(0, 0)`. This makes perfect sense! From the point `(3, 1)`, the steepest way up the bowl is away from the bottom.
103103

104104
What about the point `(-2, -2)`?
105105

106106
```
107-
∇f(-2, -2) = [ 2 -2, 2 -2 ] = [-4, -4]
107+
∇f(-2, -2) = [ 2 * -2, 2 * -2 ] = [-4, -4]
108108
```
109109

110110
This vector points down and to the left, again, away from the bottom of the bowl at `(0, 0)`.

public/content/learn/math/matrices/matrices-content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Multiply each element by the scalar. For `(2A)`:
5959

6060
### 3.3 Matrix multiplication
6161

62-
You do a dot product of a row of th first matrix with the column of the second matrix and write result at the position where that row and column intercept.
62+
You do a dot product of a row of the first matrix with the column of the second matrix and write the result at the position where that row and column intersect.
6363

6464
If `(A)` is `(m x p)` and `(B)` is `(p x n)`, then `(AB)` is `(m x n)`. Multiply rows of `(A)` by columns of `(B)` and sum.
6565

public/content/learn/math/vectors/vectors-content.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Multiplying a vector by a regular number (a **scalar**) changes its magnitude bu
106106

107107
If `k` is a scalar and `v = [x, y]`, then:
108108
```
109-
k v = [kx, k*y]
109+
k * v = [k * x, k * y]
110110
```
111111

112112

@@ -142,14 +142,14 @@ The **dot product** is a way of multiplying two vectors that results in a single
142142
If `u = [x₁, y₁]` and `v = [x₂, y₂]`, the dot product `u · v` is:
143143

144144
```
145-
u · v = (x₁ x₂) + (y₁ y₂)
145+
u · v = (x₁ * x₂) + (y₁ * y₂)
146146
```
147147

148148
### Geometric Meaning & Finding Angles
149149
The dot product also has a powerful geometric definition:
150150

151151
```
152-
u · v = ||u|| ||v|| cos(θ)
152+
u · v = ||u|| * ||v|| * cos(θ)
153153
```
154154

155155
where `θ` (theta) is the angle between the two vectors. We can rearrange this formula to find the angle between any two vectors!

public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -268,8 +268,8 @@ weights = torch.tensor([[0.1, 0.2, 0.3, 0.4],
268268
outputs = inputs @ weights # Shape: (2, 4)
269269
270270
print(outputs)
271-
# tensor([[3.2000, 3.8000, 4.4000, 5.0000],
272-
# [7.7000, 9.2000, 10.7000, 12.2000]])
271+
# tensor([[ 3.8000, 4.4000, 5.0000, 5.6000],
272+
# [ 8.3000, 9.8000, 11.3000, 12.8000]])
273273
```
274274

275275
**What happened:**

public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md

Lines changed: 31 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -15,47 +15,51 @@ Transposing is like **flipping** a tensor - rows become columns, and columns bec
1515

1616
Think of it like rotating a table 90 degrees. The first row becomes the first column, the second row becomes the second column, and so on.
1717

18-
## Vector Transpose
19-
20-
When you transpose a vector, you change it from horizontal to vertical (or vice versa):
21-
22-
![Vector Transpose](/content/learn/tensors/transposing-tensors/vector-transpose.png)
18+
When you transpose a 1D tensor (a vector), it actually stays exactly the same in PyTorch! This is a common point of confusion.
2319

2420
**Example:**
2521

2622
```python
2723
import torch
2824

29-
# Horizontal vector (row)
25+
# 1D vector
3026
v = torch.tensor([1, 2, 3, 4])
3127
print(v.shape) # torch.Size([4])
3228

33-
# Transpose to vertical (column)
29+
# Transpose
3430
v_t = v.T
35-
print(v_t)
36-
# tensor([[1],
37-
# [2],
38-
# [3],
39-
# [4]])
40-
print(v_t.shape) # torch.Size([4, 1])
31+
print(v_t.shape) # torch.Size([4]) - Still the same!
32+
print(torch.equal(v, v_t)) # True
33+
```
34+
35+
To actually turn a 1D vector into a column vector (2D), you need to reshape it:
36+
37+
```python
38+
# Change to column vector (4 rows, 1 column)
39+
v_col = v.reshape(4, 1)
40+
print(v_col.shape) # torch.Size([4, 1])
41+
42+
# Now transposing works as expected
43+
v_row = v_col.T
44+
print(v_row.shape) # torch.Size([1, 4])
4145
```
4246

4347
**Manual visualization:**
4448

4549
```yaml
46-
Before: [1, 2, 3, 4] → Shape: (4,)
50+
1D Vector: [1, 2, 3, 4] → Shape: (4,)
51+
52+
Column Vector (2D): [[1],
53+
[2],
54+
[3],
55+
[4]] → Shape: (4, 1)
4756

48-
After: [[1],
49-
[2],
50-
[3],
51-
[4]] → Shape: (4, 1)
57+
Row Vector (2D): [[1, 2, 3, 4]] → Shape: (1, 4)
5258
```
5359
5460
## Matrix Transpose
5561
56-
This is where transpose really shines! Rows become columns, columns become rows:
57-
58-
![Matrix Transpose](/content/learn/tensors/transposing-tensors/matrix-transpose.png)
62+
This is where transpose really shines! Rows become columns, and columns become rows:
5963
6064
**Example:**
6165
@@ -96,8 +100,6 @@ Transpose (3×2):
96100

97101
Here's exactly what happens to each element during transpose:
98102

99-
![Transpose Detailed](/content/learn/tensors/transposing-tensors/transpose-detailed.png)
100-
101103
**The pattern:** Position `[i, j]` → Position `[j, i]`
102104

103105
**Example tracking specific elements:**
@@ -119,8 +121,6 @@ Original position → Transposed position
119121

120122
Square matrices (same number of rows and columns) have a special property:
121123

122-
![Square Transpose](/content/learn/tensors/transposing-tensors/square-transpose.png)
123-
124124
**Example:**
125125

126126
```python
@@ -184,36 +184,22 @@ transposed_shape = (4, 4) # Still square!
184184

185185
The most common reason: **making shapes compatible for matrix multiplication!**
186186

187-
![Why Transpose](/content/learn/tensors/transposing-tensors/why-transpose.png)
188-
189187
**Example:**
190188

191189
```python
192190
import torch
193191
194192
A = torch.randn(2, 3) # Shape: (2, 3)
195-
B = torch.randn(2, 4) # Shape: (2, 4)
193+
B = torch.randn(4, 3) # Shape: (4, 3)
196194
197-
# This WON'T work - shapes incompatible
198-
# result = A @ B # Error! 3 ≠ 2
195+
# This WON'T work - shapes (2,3) and (4,3) are incompatible
196+
# result = A @ B # Error! 3 != 4
199197
200198
# Transpose B to make it work!
201-
B_T = B.T # Shape: (4, 2)
202-
203-
# Now this works!
204-
result = A @ B_T # (2, 3) @ (4, 2)? Wait, still wrong!
205-
206-
# Actually, we need different dimensions
207-
# Let's try a real example:
208-
A = torch.randn(2, 3)
209-
B = torch.randn(4, 3) # Same inner dimension as A's columns
210-
211-
# Without transpose - doesn't work
212-
# result = A @ B # Error! (2,3) @ (4,3) - 3 ≠ 4
213-
214-
# With transpose - works!
215-
result = A @ B.T # (2,3) @ (3,4) = (2,4) ✓
199+
B_T = B.T # Shape: (3, 4)
216200
201+
# Now it works!
202+
result = A @ B_T # (2, 3) @ (3, 4) -> (2, 4)
217203
print(result.shape) # torch.Size([2, 4])
218204
```
219205

0 commit comments

Comments
 (0)