-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathAlgos.txt
More file actions
1684 lines (1257 loc) · 44.7 KB
/
Algos.txt
File metadata and controls
1684 lines (1257 loc) · 44.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
ALGORITHMS PART 1 (PRINCETON)
=============================
Dynamic Programming Problems:
1 Sequence Alignment
2 Tower of Hannoi Puzzle
3 Egg dropping puzzle
4 Matrix chain multiplication
5
References:
[1] http://en.wikipedia.org/wiki/Dynamic_programming
[2] http://www.quora.com/Programming-Interviews/What-are-the-top-10-most-popular-dynamic-programming-problems-among-interviewers
- Dynamic Connectivity Problem (How many connected components exist)
- Quick Find
// algo
id[component] = component
union(p, q):
id[p] = id[q]
each element that has id[p] has to be changed to id[q]
(this is O(N) step)
connected(p, q):
return (id[p] == id[q])
Algo Initialize Union Find
-------------------------------------------------
Quick-Find N N 1
- Problem: Union is expensive. N^2 time for N union's
- Too slow algo.
- Quick Union: Lazy approach to above problem.
Data Structure
- Integer array id[] of size N
- Interpretation: id[i] is parent of i
- Root of i is id[id[...id[i] ...]]] - keep going until there is no
change. Algo ensures no cycles.
Find():
- Check if p and q have same root?
Union():
- Merge components containing p and q, set the id of p's root to
id of q's root. Only one value changes.
// Algo
private int root(int i)
{
while(i != id[i])
i = id[i];
return i;
}
public bool connected(int p, int q)
{
return root(p) == root(q);
}
public void union(int p, int q)
{
int pr = root(p);
int qr = root(q);
id[pr] = qr;
}
Algo Initialize Union Find
-------------------------------------------------
Quick-Find N N 1
Quick-Union N N* 1
* - includes cost of finding roots
- Quick Union with Weighting (Improvement #1)
- Modify QU to avoid tall trees
- Keep track of size of each tree (# of objects)
- Balance by linking root of smaller tree to root of larger tree.
// Algo
- Maintain sz[i] = size of tree rooted at i
Find()/Connected(p, q):
return root(p) == root(q);
QUwithWeight(p, q)
{
int pr = root(p);
int qr = root(q);
if(pr == qr) return; // same tree
if(sz[pr] < sz[qr])
id[pr] = qr;
sz[qr] += sz[pr];
else
id[qr] = pr;
sz[pr] += sz[qr];
}
- Depth of each node is atmost lg N (base 2 log)
Algo Initialize Union Find/Connected
-------------------------------------------------
Quick-Find N N 1
Quick-Union N N* 1
Weighted QU N lg N lg N
- Weighted QU (Improvement #2): Path compression. Flatten the tree.
- After computing root of p, set id of each examined node to point to
that root.
- Starting from empty data structure, any sequence of M union-find ops
on N objects makes <= c (N + M lg* N) array accesses.
lg* N = 5 for 2^65536, which is pretty much constant.
- Therefore, almost linear time algo.
Algo worst-cast time
-------------------------------------------------
Quick-Find MN
Quick-Union MN
Weighted QU N + Mlog N
QU + Path Compression N + Mlog N
Weighted QU + Path Comp N + Mlg *N
M union-find ops on set of N objects
Example: 10^9 union-finds with 10^9 objects
- WQUPC reduces time from 30 years to 6 seconds.
Applications:
- Percolation
- Games (Go, Hex)
Percolation Problem Modeling
- How to check whether an NxN system percolates?
1 Create an object for each site and name them 0 to N^2-1
2 Sites are in same component if connected by open sites.
3 Percolates iff any site on bottom row is connected to site on top row.
This can be brute force and N^2 calls to connected() routine.
4. Instead, create a virtual site at top & bottom that is connected to
all nodes at top row and bottom row respectively.
Stacks & Queues
--------
Stacks: LIFO principle
-------
- push(), pop(), is_empty(), size() are main APIs
- Can be implemented using Linked Lists, Arrays.
- Arrays are very fast implementation
* resizing problem is fundamental to Arrays
- How to grow an array?
* "Repeated Doubling" - new array of double size. Copy elements.
- How to shrink the array?
* Half the array when 1/4 full. Otherwise can run in to "Thrashing"
Queue: FIFO principle
-------
- enqueue(), dequeue(), is_empty(), size() etc
- Linked Lists, Arrays (though tricky to implement)
SUM in an Array Algos:
-------
2SUM/3SUM - Given N distinct integers, how many triples (or doubles) sum
to zero (or a particular number).
// Algos
- Brute Force - N^3
BinarySearch of an Array
--------
// Code - from lecture (non-recursive)
public static int binarySearch(int[] a, int key)
{
int lo = 0, hi = a.length-1;
while(lo <= hi)
{
int mid = lo + (hi-lo)/2;
if (key < a[mid]) hi = mid-1;
else if (key > a[mid]) lo = mid+1;
else return mid;
}
return -1;
}
- An (N^2)log N algo for 3-SUM
- Sort N distinct number (Nlog N)
- For each pair of numbers in a[i] and a[j], binary search for
negative (a[i]+a[j]) - this is (N^2)log N step.
- Total: (N^2)log N + (Nlog N)
Memory Usage for Primitive Data Types in Java
---------
Type Bytes
--------------------
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
char[] 2N + 24
int[] 4N + 24
double[] 8N + 24
- Object overhead in Java = 16 bytes
- Reference overhead = 8 bytes
- Memory alignment = Each object uses multiples of 8 (on 64 bit machine)
SORTING
========
Selection Sort:
-------
- In iteration i, find index of smallest remaining entry (call it `min`)
- Swap a[i] and a[min].
Invariants:
-------
- No entry to right of i is smaller than elements less than i.
// Code
public static void selectionSort(Comparable[] a)
{
int N = a.length;
for (int i=0; i<N; i++)
{
int min = i;
for (int j=i+1; j<N; j++)
{
if(less(a[j], a[min])
min = j;
}
exch(a, i, min);
}
}
- ~(N^2)/2 compares and N exchanges.
- Quadratic time input, even if input is sorted.
Insertion Sort:
-------
- In iteration i, swap a[i] with each larger entry to its left.
- All entries to left of i is sorted.
// Code
public static void insertionSort(Comparable[] a)
{
int N = a.length;
for (int i=0; i<N; i++)
for (int j=i; j>0; j--)
if (less(a[j], a[j-1]))
exch(a, j, j-1)
else
break;
}
- On average, ~1/4(N^2) compares and ~1/4(N^2) exchanges.
- Best case: If array is in sorted (say, ascending order), only N-1
compares and 0 exchanges.
- Worst case: If array is in descending order, ~1/2(N^2) compares and
~1/2(N^2) exchanges.
Insertion Sort: Partially-Sorted Array
-------
Def: An inversion is a pair of keys that are out of order.
A E E L M O T R X P S
There are 6 inversions in above array.
T-R T-P T-S R-P X-P X-S
Def: An array is partially sorted if the number of inversions is <= cN
(some constant times N).
- For partially sorted arrays, insertion sort runs in linear time because
number of exchanges = number of inversions.
number of compares = number of exchanges + (N-1)
Shell Sort:
--------
Idea: Move entries more than one position at a time by h-sorting array.
An h-sorted array is h interleaved sorted subsequence.
- How to h-sort an array? Insertion sort, with stride length of h
- Why insertion sort?
- Big increments = small subarrays.
- Small increments = nearly in order (or partially sorted).
- Worst case: number of compares is O(N^(3/2)) for 3x+1 increments.
- Why are we interested in shellsort?
- Useful in practice.
- Fast unless array size is huge.
- Tiny, fixed footprint of code (used in embedded sys).
- Hardware sort prototype.
Shuffling: How do you shuffle a deck of cards?
-------
- Generate a random number for each card and sort them. But there is a
cost of sorting the deck.
- Can we do better? Yes! Knuth Shuffle.
Knuth Shuffle:
-------
- In iteration i, pick integer r between 0 and i uniformly at random.
- Swap a[i] and a[r].
- This is linear time shuffling algo, making use of random number.
// Code
public void knuthShuffle(Object[] a)
{
int N = a.length;
for(int i=0; i<N; i++)
{
int r = StdRandom.uniform(i+1);
exch(a, i, r);
}
}
- Bug: Choosing to swap a random number between 0 and N-1. It is not
uniformly random. However, choosing between i and N-1 will work.
Convex Hull:
-------
Smallest perimeter fence enclosing all points in a plane.
Mechanical Algo: Hammer nails on points and stretch elastic rubber band around
points.
Convex Hull Applications:
-----------
Robot Motion Planning: Path from point s to t and an obstacle in between.
- If there's no direct, straight path between s and t, then it's along the
perimeter of Convex Hull including points s and t.
Farthest Pair Problem: Given N points in plane, find a pair of points with
largest Euclidean distance between them.
- Those points are on Convex Hull.
Graham Scan: To determine Convex Hull
------
- Choose point p with smallest y-coordinate
- Sort points by polar angle with p
- Consider points in order; discard unless it create a ccw turn.
- There's lot of implementation challenges to Graham Scan
- How to find point p with smallest y-coordinate?
- How to sort points by polar angle w.r.t p?
- How to determine p1->p2->p3 is CCW direction?
- How to sort efficiently?
Implementing CCW
------
Given 3 points a, b and c, is a->b->c a CCW turn?
- Determinant (or cross product) gives 2x signed area of planar triangle
| ax ay 1 |
2 x Area(a,b,c) = | bx by 1 | = (bx-ax)(cy-ay) - (by-ay)(cx-ax)
| cx cy 1 |
- If signed area > 0, then a->b->c is CCW
- If signed area < 0, then a->b->c is CW
- If signed area = 0, then a->b->c is collinear (straight line)
Mergesort:
Split an input array in to two halves recursively, sort each one and merge
Runtime: O(N lgN)
Memory: O(N) as it uses auxiliary array
- A sorting algo is considered in-place if it uses <= c(lgN) extra memory, for
a constant c.
Ex: Insertion, Shell, Selection sorts.
Practical Improvements to Mergesort:
------
- Too much overhead for small arrays (or subarrays within a large array).
- Cutoff to insertion sort is ~7 items
- Biggest item in first half <= smallest item in second half. Stop.
Bottom-Up Mergesort:
------
- Pass through array, merging subarrays of size 1.
- Repeat for subarrays of size 2, 4, 8, 16, ...
- Uses O(N) auxiliary space
- Runs O(N lgN) time
- Overhead is much smaller. Easier to code.
void merge(int arr, int lo, int mid, int hi) {/* standard */}
void sort(int arr)
for (int sz=1; sz<N; sz = sz+sz)
for (int lo=0; lo=N-sz; lo+=sz+sz)
merge(arr, lo, lo+sz-1, min(lo+sz+sz-1, N-1))
Stability of Sorting Algos:
------
Stable sort is one that preserves relative position of items with equal keys.
- Insertion and Mergesort are stable. Shell and Selection are not.
Quicksort:
------
- Shuffle the array
- Partition so that, for some j
* entry a[j] is in place
* no larger entry to the left of j
* no smaller entry to the right of j
- Sort each piece recursively
Phase 1: Repeat until i and j pointer cross.
- Scan i from left to right so long as a[i] < a[lo]
- Scan j from right to left so long as a[j] > a[lo]
- Exchange a[i] with a[j]
Phase 2: When pointers cross
- Exchane a[lo] with a[j]
********* array is partitioned at this point *******
Partition Code:
------
int partition(int a[], int lo, int hi)
{
int i = lo, j = hi+1; /* 1st elem is partitioning elem */
while (true)
{
while (less(a[++i], a[lo])) /* find item on left */
if (i == hi) break;
while (less(a[lo], a[--j])) /* find item on right */
if (j == lo) break;
if (i >= j) break; /* check if pointers cross */
exch(a, i, j); /* swap */
}
exch(a, lo, j);
return j; /* index of partition item which is in place */
}
void Quicksort(int a[], int lo ,int hi)
{
/* shuffle the array for performance gaurantee */
if (hi <= lo) return;
int j = partition(a, lo, hi); /* j elem is in place now */
Quicksort(a, lo, j-1);
Quicksort(a, j+1, hi);
}
Worst Case: O(N^2)
Average Case: ~1.39(N lgN) --> O(N lgN)
Selection Problem: Given an array of N items, find the Kth largest.
------
Partition array so that:
- Entry a[j] is in place
- No larger entry to left of j
- No smaller entry to the right of j
- Repeat in one subarray, depending on j
- Done when j == k
void select(int a[], int k)
{
/* shuffle the array */
int lo = 0, hi = a.length - 1;
while (hi > lo)
{
int j = partition(a, lo, hi);
if (j < k) lo = j + 1;
else if (j > k) hi = j - 1;
else return a[k];
}
return a[k];
}
Average: Linear time
Worst Case: Quadratic time, but with random shuffle, very rare
Dijkstra 3-Way Partitioning. (AKA 3-way Quicksort)
-----
Goal: Partition an array in to 3 parts so that:
+----------------------------------------------+
| < V | = V | > V |
+----------------------------------------------+
where V is the partitioning element.
Priority Queues:
---------------
Queues with elements having priorities. Usually implemented using Heaps.
Challenge: Find largest M items in a stream of N items.
- Many applications
Constraint: Not enough memory to store N items.
Solution:
- Maintain a Min PQ, maintains M lowest number on front of PQ.
while (input)
PQ.insert(input);
if (PQ.size() > M) // PQ contains largest M items
PQ.delMin(); // Min in the PQ is front of PQ
Basic PQ implementations:
------
Algo Time Space
---------------------------------
sort N lgN N
elementary PQ M N M
Binary Heap N lgM M
Best N M
PQ implementations
---
- Unordered Array: Keep inserting as elems come in. Search when required.
- Ordered Array: Insert elems as they come in. Shift elems around to maintain
order.
- Linked List:
Goal of PQ implementation: lg N insert, lg N max, lg N delete
Practical PQ Implementation: Complete binary tree
------
Array Representation:
- Keys are elems of array
- Parent's key no smaller than children's keys
- Indices start at 1
- Take nodes in level order
- No explicit links needed
a[] = T S R P N O A E I H G
T
S R
P N O A
E I H G
- Largest key is a[1], which is root of binary tree
- Parent of node at k is at k/2
- Children of node at k are at 2k and 2k+1
Scenario1: Child's key becomes larger than it's parent's key
- Exchange key in child with parent
- Repeat until heap order is restored
void swim(int k)
{
while (k>1 && less(k/2, k))
{
exch(k, k/2);
k = k/2;
}
}
Insertion: Add node at end and swim up
Cost: At most 1+lg N compares
void insert(Key x)
{
pq[++N] = x;
swim(N);
}
Scenario2: Parent's key becomes smaller than one or both of it's children.
- Exchange key in parent with key in larger child
- Repeat until heap order is restored
void sink(int k)
{
while (2*k <= N)
{
int j = 2*k;
if (j<N && less(j, j+1)) j++;
if (!less(k, j)) break;
exch(k, j);
k = j;
}
}
Delete Max: Exchange root with node at end, then sink it down
Cost: At most 2*lgN compares
Key delMax()
{
Key max = pq[1];
exch(1, N--);
sink(1);
pq[N+1] = null;
return max;
}
Heapsort
--------
- Create max-heap with all N keys
- Repeatedly remove the max key
- In-place sorting algo with N lgN worst case
Mergesort: takes linear extra space
Quicksort: quadratic worst case
Heapsort: gauranteed N lgN worst case
Disadvantages of Heapsort
-----
- Makes poor use of cache memory. In modern systems, caching is very
important. Lot of references in array.
- Not stable.
Summary of Sorting Algorithms
=============================
inplace? stable? worst average best remarks
Selection Yes N^2/2 N^2/2 N^2/2 N exchanges
Insertion Yes Yes N^2/2 N^2/4 N use for small or
partially ordered array
Shell Yes ?? ?? N tight code
Quick Yes N^2/2 2N lgN N lgN fastest in practice
3-way Quick Yes N^2/2 2N lgN N improves Quicksort
when duplicate keys are
present
Merge Yes N lgN N lgN N lgN N lgN gaurantee, stable
Heap Yes 2N lgN 2N lgN N lgN N lgN gaurantee, in-place
???? Yes Yes N lgN N lgN N lgN N lgN gaurantee, in-place
N body simulation
------
- Application of PQ
- Uses Hard disc model
+ Moving particles interact via elastic collisions with each other and
walls.
+ Each particle is a disc with known position, velocity, mass and radius.
+ No other forces.
Event-driven Simulation
-------
Problem: To find when two bodies collide using brute force is hard problem
(Quadratic time).
Solution: Event-Driven Simulation
- Change state only when something happens
- Between collisions, particles move in straight-line trajectories
- Focus only on (predicted) times when collisions occur
- Maintain PQ of collision events, prioritize by time
- Remove the min, get next collision
Collision Prediction: Given position, velocity and radius of particle, when
will it collide next with a wall or another particle?
Collision Resolution: If collision occurs, update colliding particle(s)
according to laws of elastic collisions.
Todo: Watch video: Event Driven simulation
Symbol Tables
-------------
Key-Value pair abstraction, with two main operations
- Insert a value with specified key
- Given a key, search for the corresponding value
Eg: DNS lookup, given URL, provide IP and vice versa
Data Struct: Unordered Linked List of Key-Value pairs.
Search: Scan through all keys until you find a match.
Insert: Search. If no match, add to front.
Data Struct: Maintain an ordered array of KV pairs. Use binary search to find
keys.
Problem: Insert - need to shift all greater keys over.
Binary Search Trees (BSTs)
---------------
- Binary tree in symmetric order.
+ Symmetric means: Every node's key is larger than all keys in it's left
subtree and smaller than all keys in its right subtree.
- There's 1-1 correspondance between BSTs and Quicksort partitioning
How to find following in BSTs
------
Min: Smallest key in table
Max: Largest key in table
Floor: Largest key <= to a given key
Ceiling: Smallest key >= to a given key
// floor and ceiling isn't straightforward
Count: Store # of nodes in the subtree rooted at that node. To
implement size(), return count at the root
Rank: How many keys < k?
//Easy recursive algo
Inorder Traversal: Keys traversed in ascending order
+ Traverse left subtree
+ Enqueue key
+ Traverse right subtree
Deletion in BST: Hibbard Deletion
Deleting a node in BSTs
Case 1: Deleting node N has no subtrees.
Case 2: Deleting node N has 1 subtree.
parent(child(N)) = parent(N)
Case 3: Deleting node N has 2 subtrees
+ find successor node X to N (smallest node > N)
+ Replace N with X and free N
+ X is minimum node to right subtree of N
- Why is successor node X chosen and not predecessor? No real reason.
Disadvantages:
- Tree becomes unbalanced after repeated deletion/addition. Choosing
predecessor and successor randomly doesn't solve the problem
- The height of the tree becomes sqrt(N) instead of lgN
- Long standing open problem
Red-Black BST: Gaurantees logarithmic performance
--------
2-3 Trees
---------
- Allow 1 or 2 keys per node
- 2-node: 1 key and 2 children
- 3-node: 2 keys and 3 children
Perfect Balance: Every path from root to null link has same length
Symmetric Order: Inorder traversal yields keys in ascending order
Insert:
Case 1: Insert in to a 2-node at bottom
Case 2: Insert in to a 3-node at bottom. More work.
+ Add new key to 3-node to create temporary 4 node
+ Move middle key in 4 node in to parent, along with any child
// See video for clarity
Tree height:
------
Worst Case: lgN //all 2-nodes
Best Case: lg_3N ~ 0.631 lgN // all 3-nodes. Base 3 log
Between 12 and 20 for a million nodes
Between 18 and 30 for a billion nodes
- Gauranteed logarithmic performnace for search and insert
//TODO: Watch this video again. Very detailed implementation of RB BST
Red-Black Trees (left-leaning RB tree)
---------------
- RB Trees can be represent using 2-3 tree
- Take the largest elem in a 3-node and make it root, with smaller elem on
left subtree (hence left-leaning RB BSTs)
- Call this link as Red link
- Black links are those pointing to a 2-node
- Vice Versa is true. Given a BST, there's a direct correspondance to RB tree.
A BST such that:
- No node has 2 red links connected to it
- Every path from root to null link has same number of black links
- Red links lean left
- Search in RB BST: Same as BST. Most other operations are same.
- Change w.r.t BST, have a 'color' field to indicate left link (RED) and right
link (BLACK)
Applications:
----
- java.util.TreeMap, TreeSet use RB trees
- C++ STIL: map, multimap, multiset
- Linux kernel: CFS, libux/rbtree.h
// TODO: Watch video
B-Trees
-------
- Generalized version of 2-3 trees
- Variants: B+ tree, B* tree, B# tree ...
Applications:
----
- Windows NTFS
- Mac: HFS, HFS+
- Linux: ReiserFS, XFS, Ext3FS, JFS
- Databases: Oracle, DB2, INGRES, SQL, PostgreSQL
Geometric Applications of Binary Search Trees (BST)
---------
1d Range Search
---------------
Problems:
[1] How many points in a plane lie within a rectangle
[2] If we have set of rectangles, how many are intersecting or how many of
them are intersecting
- Geometric objects intersections.
- Finding points in a 2d range
- Intersections of orthogonal rectangles.
- Many applications.
Solution: Binary Search Trees (and extensions)
Operations:
- Insert KV pairs in to Symbol Table
- Search for key k
- Delete key k
- Range search: find all keys between k1 and k2
- Range count: number of keys between k1 and k2
Application: Database queries
Geometric Interpretation:
- Keys are points on a line
- Find/count points in a given interval in 1-dimension (1d).
** * * ***** * * * ** * ** * *
^ ^
S E
- How many points/keys between S and E?
Implementation Costs
----
DS insert range count range search
Unordered Array 1 N N
Ordered Array N log N R + log N
Goal log N log N R + log N
N = number of keys
R = number of keys that match
Implementation
----
- Keep a rank(node) function that counts # of keys less than that node in
a BST.
- Range(low, high) = rank(high) - rank(low) [+1 depending on high/low are
part of the query]
S 6
/ \
/ X 7
E 2
/ \
0 A \
\ \
1 C R 5
/
H 3
\
M 4
public int size(Key lo, Key hi) {
if (contains(hi)) return rank(hi) - rank(lo) + 1;
else return rank(hi) - rank(lo);
}
- Num of keys between E and S?
// Algo: Find all keys between lo and hi
- Recursively find all keys in left subtree.
- Check key in current node.
- Recursively find all keys in right subtree.
- Run time is R + logN
Line Segment Intersection Search
-------
// Algo
- Sweep vertical line from left to right
- x-coordinates define events. Put x-coordinates in a PQ (or sort) - N log N
- h-line (left endpoint): insert y-coord in to BST - N log N
- h-line (right endpoint): remove y-coord from BST - N log N
- v-line: do a 1d-range search in BST for y-range - R + N log N
Bottom Line: Sweep line reduces 2d orthogonal line segment intersection search
to 1D range search.
2D Orthogonal Range Search
----
- Insert 2D key
- Delete a 2D key
- Search for a 2D key
Problem:
---
- Range Search: Find all keys that lie in a 2D range
- Range Count: Number of keys that lie in a 2D range
Line Sweep Algos
-------
Applications:
- Closest Pair of points.
- Line segment intersections.
- Area of union of rectangles.
- Convex hull.
Kd-Trees
--------
Extension of 1d-range
- Operations: Insert a 2d key; Delete a 2d key; Search for a 2d key
- Range search: find all keys that lie in a 2d range.
- Range count: # of keys that lie in a 2d range.
Geometric Interpretation:
- Keys are point in a plane.
- Find/count points in a given h-v rectangle.
Applications:
- Networking, Circuit Design, Databases.
Grid Implementation
--------
- Divide space in to MxN squares/grids.
- Create a list of points contained in each square.
- Use 2d array to directly index relevant square.
- Insert: add(x,y) to list for corresponding square.
- Range Search: Examine only squares that intersect 2d range query.
Space-Time Tradeoff:
---------
- Space: M^2 + N
- Time: 1 + N/(M^2) per square examined, on average
Choose grid square size to tune performance
---------
- Too small: wastes space.
- Too large: too many points per square.
- Rule of thumb: sqrt(N) by sqrt(N) grid.