euccas.github.io/atom.xml at master · euccas/euccas.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[euccas.github.io]]></title>
  <link href="http://euccas.github.io/atom.xml" rel="self"/>
  <link href="http://euccas.github.io/"/>
  <updated>2023-01-07T23:19:24-08:00</updated>
  <id>http://euccas.github.io/</id>
  <author>
    <name><![CDATA[euccas]]></name>

  </author>
  <generator uri="http://octopress.org/">Octopress</generator>


  <entry>
    <title type="html"><![CDATA[Tmux Cheatsheet]]></title>
    <link href="http://euccas.github.io/blog/20230107/tmux-cheatsheet.html"/>
    <updated>2023-01-07T20:37:11-08:00</updated>
    <id>http://euccas.github.io/blog/20230107/tmux-cheatsheet</id>
    <content type="html"><![CDATA[<p><em>Note: This post is written with the help of <a href="https://chat.openai.com/chat">ChatGPT</a> (Dec 15 version).</em></p>

<h1 id="what-is-tmux">What is Tmux</h1>

<p><a href="https://github.com/tmux/tmux/wiki">Tmux</a> is a terminal multiplexer for Unix-like systems. Similar to <a href="https://linux.die.net/man/1/screen">Linux Screen</a>, Tmux allows you to create, manage, and easily switch between multiple terminal sessions within one single terminal window or console. It also has features such as the ability to detach and reattach sessions, split terminal windows into panes, and more. It is useful for managing multiple terminal sessions and for running long-running commands in the background while you do other work in the same terminal window.</p>

<h1 id="tmux-vs-screen">Tmux vs. Screen</h1>

<p>One difference between the Tmux and Screen is that Tmux is more modern and has a more user-friendly interface, with support for mouse operations and better window and pane management. Screen, on the other hand, is an older tool that is more lightweight and simple, and does not have as many features as Tmux.</p>

<p>Another difference is that Tmux is more configurable and extensible, with support for custom scripts and plugins, while Screen is more bare-bones and does not have as much support for customization.</p>

<p>Ultimately, the choice between Tmux and Screen depends on your personal preferences and needs. Both are powerful tools that can be useful in different situations.</p>

<p>To learn more about the usage of Screen, please read another post in this blog: <a href="https://euccas.github.io/blog/20140531/the-elements-of-linux-screen.html">The Element of Linux Screen</a>.</p>

<h1 id="tmux-common-usages">Tmux common usages</h1>

<h3 id="install-tmux-on-mac">Install tmux on Mac</h3>

<ul>
  <li><code>brew install tmux</code></li>
</ul>

<h3 id="start-a-new-session">Start a new session</h3>

<ul>
  <li><code>tmux new</code>: start a session, the session gets an automatically generated name.</li>
  <li><code>tmux new -s &lt;session name&gt;</code>: start a session with specified name.</li>
</ul>

<h3 id="list-tmux-sessions">List tmux sessions</h3>

<ul>
  <li><code>tmux ls</code></li>
</ul>

<h3 id="kill-a-tmux-session">Kill a tmux session</h3>

<ul>
  <li><code>tmux kill-session</code>: kill the last session.</li>
  <li><code>tmux kill-session -t &lt;session name&gt;</code>: kill the session with specified name.</li>
</ul>

<h3 id="attach-to-session">Attach to session</h3>

<ul>
  <li><code>tmux attach-session</code>: attach to the last session.</li>
  <li><code>tmux attach-session -t &lt;session name&gt;</code>: kill the session with specified name.</li>
  <li><code>tmux a</code>: shortcut of tmux attach-session</li>
</ul>

<h3 id="working-with-tmux-windows">Working with tmux windows</h3>

<p>When you start a new Tmux session, by default, it creates a single window with a shell in it. You can create more windows and switch between them.</p>

<ul>
  <li><code>Ctrl+b c</code>: Create a new window</li>
  <li><code>Ctrl+b w</code>: Show window list and choose from it</li>
  <li><code>Ctrl+b n</code>: Move to the next window</li>
  <li><code>Ctrl+b p</code>: Move to the previous window</li>
  <li><code>Ctrl+b 0</code>: Switch to window 0</li>
  <li><code>Ctrl+b ,</code>: Rename the current window</li>
  <li><code>Ctrl+b &amp;</code>; Kill the current window</li>
</ul>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Read Inspired: How to Create Tech Products Customers Love]]></title>
    <link href="http://euccas.github.io/blog/20220529/read-inspired-how-to-create-tech-products-customers-love.html"/>
    <updated>2022-05-29T20:45:56-07:00</updated>
    <id>http://euccas.github.io/blog/20220529/read-inspired-how-to-create-tech-products-customers-love</id>
    <content type="html"><![CDATA[<p>Last month I read this book: <a href="https://www.svpg.com/books/inspired-how-to-create-tech-products-customers-love-2nd-edition/">Inspired - How to Create Tech Products Customers Love</a>, by Marty Cagan. Before reading it, I thought this book was a collection of product case studies. But it’s not. This book writes about the principles, processes and techniques needed for creating a successful tech product. I find it to be a pragmatic guide, and it can be more useful to people who already have some experience in designing and developing products in the tech world. This will be a book that I keep with me and read again.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2022/20220521-book-inspired.jpeg" width="360" /></p>

<p>The book contains four parts. Part 1 is a strong opening, in this part the author discussed the key concepts that form the foundation of modern product work, the core principles that behind great products, and possible causes of the failed product efforts. Part 2 are sections on product teams, different roles that a successful product team needs, and how each role should work in order to lead to product success. Part 3 are sections on principles and techniques of product roadmaps, product vision, and objectives. I find this part especially valuable. Part 4 describes the right process from product discovery to delivery. In the last part, the author shared his view on the right culture that great products rely on.</p>

<!--more-->

<p>Here I’ll jog down five notes covering a few topics discussed in this book.</p>

<h2 id="one">One</h2>

<ul>
  <li>
    <p>Behind every great product there is someone - usually someone behind the scenes, working tirelessly - who led the product team to combine technology and design to solve real customer problems in a way that met the needs of the business.</p>
  </li>
  <li>
    <p>In a startup, the product manager role is usually covered by one of the co-founders. Typically, there are fewer than 25 engineers, covering a range of from one product team up to maybe four or five.</p>
  </li>
</ul>

<h2 id="two">Two</h2>

<p>Two inconvenient truths about product:</p>

<ul>
  <li>The first truth is that at least half of our ideas are just not going to work. At least half the ideas on your roadmap are not going to deliver what you hope.</li>
  <li>The second inconvenient truth is that even with the ideas that do prove to have potential, it typically takes several iterations to get the implementation of this idea to the point where it delivers the necessary business value. We call that <em>time to money</em>.</li>
</ul>

<p>One of the most important things about product that I’ve learned is that there is simply no escaping these inconvenient truths, no matter how smart you might be.</p>

<h2 id="three">Three</h2>

<p>Three overarching principles at work:</p>

<ul>
  <li>Risks are tackled up front, rather than at the end. In modern teams, we tackle these risks prior to deciding to build anything.</li>
  <li>Products are defined and designed collaboratively, rather than sequentially. In strong teams, product, design, and engineering work side by side, in a give-and-take way, to come up with technology-powered solutions that our customers love and that work for our business.</li>
  <li>Finally, it’s all about solving problems, not implementing features. Conventional product roadmaps are all about output. Strong teams know it’s not only about implementing a solution. They must ensure that solution solves the underlying problem. It’s about business results.</li>
</ul>

<h2 id="four">Four</h2>

<p>The purpose of product discovery is to address these four critical risks:</p>

<ul>
  <li>Will the customer buy this, or choose to use it? (Value risk)</li>
  <li>Can the user figure out how to use it? (Usability risk)</li>
  <li>Can we build it? (Feasibility risk)</li>
  <li>Does this solution work for our business? (Business viability risk)
    <ul>
      <li>Whether this solution also works for the various aspects of our business – sales, marketing, finance, legal, etc.</li>
    </ul>
  </li>
</ul>

<h2 id="five">Five</h2>
<p>Good product strategies have these five principles in common:</p>

<ul>
  <li>Focus on one target market persona at a time.</li>
  <li>Product strategy needs to be aligned with business strategy.</li>
  <li>Product strategy needs to be aligned with sales and go-to-market strategy.</li>
  <li>Obsess over customers, not over competitors.</li>
  <li>Communicate the strategy across the organization.</li>
</ul>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Abstract: The Art of Design]]></title>
    <link href="http://euccas.github.io/blog/20220403/abstract-the-art-of-design.html"/>
    <updated>2022-04-03T23:48:22-07:00</updated>
    <id>http://euccas.github.io/blog/20220403/abstract-the-art-of-design</id>
    <content type="html"><![CDATA[<p>A few years ago, I watched the documentary series “<a href="https://www.netflix.com/title/80057883">Abstract: the Art of Design</a>” when it was originally released on Netflix. This documentary is composed of stories about designers in a variety of displines: Their work, how they create them and how they think about design. This series is very well produced, I enjoyed it and still find it inspiring even after years. So here I’d like to write down some notes about it for myself, and for anyone else who might be interested in this documentary.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2022/20220404-netflix-abstract.png" width="800" /></p>

<!--more-->

<h2 id="what-this-documentary-is-about">What this documentary is about</h2>
<p>This documentary has two seasons. In total it has fourteen episodes and each one of them introduces one designer in a different fields, for example graphic design, footware design, architecture, automotive, digital product, and more. I feel each episode is made in a way that is customized to the designer: it tells their stories, shows their work and speaks their minds in a particular way they choose. You can watch the <a href="https://www.youtube.com/watch?v=DYaq2sWTWAA">official trailer</a> here to get a feeling about it, and on <a href="https://en.wikipedia.org/wiki/Abstract:_The_Art_of_Design">this wiki</a> you can find more information about the designers get highlighted in each episode.</p>

<h2 id="what-i-like-about-it">What I like about it</h2>
<p>Many things I like about it, to describe it in a few short words: creative, insightful, thoughful, and encouraging.</p>

<h2 id="where-you-can-watch-it">Where you can watch it</h2>
<p>Netflix has put this series <a href="https://www.youtube.com/watch?v=q_k8fVNzbGU&amp;list=PLuctemCzX-m4svPpBctWUp0oG__Lhglq9">on Youtube</a> so you can watch it for free now.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Read Continuous Delivery With Spinnaker]]></title>
    <link href="http://euccas.github.io/blog/20220305/read-continuous-delivery-with-spinnaker.html"/>
    <updated>2022-03-05T22:30:58-08:00</updated>
    <id>http://euccas.github.io/blog/20220305/read-continuous-delivery-with-spinnaker</id>
    <content type="html"><![CDATA[<p><a href="https://spinnaker.io/">Spinnaker</a> is an open-source, multi-cloud continuous delivery platform originally developed by Netflix. Today, Spinnaker has built a community and many companies adopted it to power their Continous Delivery. Pinterest also uses Spinnaker to deploy some of its core services, including web and API. Recently I read this eBook: <a href="https://spinnaker.io/docs/concepts/ebook/">Continuous Delivery with Spinnaker</a>. What I like about this short eBook is that it explains the key design considerations of Spinnaker, and those are, as I think of them, what really matter when designing a good cloud CD platform. In this post, I will share a few topics mentioned in this eBook together with my thoughts after reading.</p>

<h2 id="cloud-deployment-considerations">Cloud Deployment Considerations</h2>

<p>Important things to consider:</p>

<ul>
  <li>Credentials management</li>
  <li>Regional isolation</li>
  <li>Autoscaling</li>
  <li>Immutable infrastructure and data persistence</li>
  <li>Service discovery</li>
  <li>Using multiple clouds</li>
  <li>Abstracting cloud operations from users</li>
</ul>

<!--more-->

<p>When you work on designing a CD platform, pay attention to where do you focus on. When I joined Pinterest back in 2019, I was working on the Continuous Delivery Platform team initially and I started designing Pinterest’s new CD platform. For that project, the team and I put a lot of focus on the developer experience and making the new system easy to use. While I think that was good, I also think we should have put more thoughts into the other areas such as credentials management, autoscaling, deploy policy support, different ways of triggering a deployment, in the design phase.</p>

<h2 id="structuring-deployments-as-pipelines">Structuring Deployments as Pipelines</h2>

<p>Benefits of flexible user-defined pipelines: allowing each team to build and maintain their own deployment pipeline from the building blocks the platform provides lets engineers experiment freely according to their needs.</p>

<p>Encapsulate the built-in features as platform defined pipeline stages:</p>

<ul>
  <li>Infrastructure stages: operate on the underlying cloud infrastructure by creating, updating, or deleting resources.</li>
  <li>External systems integration stages: examples are integrations with CI (Jenkins/Travis CI), run job, webhook.</li>
  <li>Testing stages: Chaos automation platform, Canary + ACA</li>
  <li>Controlling flow stages: allows to control the flow of pipelines, whether that is authorization, timing, or branching logic.</li>
  <li>Triggers stages: controls how a pipeline is started, eg. time-based triggers, event-based triggers.</li>
</ul>

<p>Continuous delivery is a complex process. I think using <em>pipeline</em> and <em>stage</em> as the two core concepts in Spinnaker’s design is an awesome idea. It abstracts the complexity of various type of deployments, and allows enough flexibility and extensibility by having both the <em>managed stages</em> and <em>customized stages</em>. On a side note, <a href="https://airflow.apache.org/">Apache Airflow</a>, the data pipeline orchestration system uses a similar principle in its design by providing <em>operators</em>. I may delve into the design of Airflow in another post later.</p>

<h2 id="working-with-cloud-vms-and-kubernetes">Working with Cloud VMs and Kubernetes</h2>

<p>For continuous deployment into Amazon’s EC2 virtual machine–based cloud, Spinnaker models a well-known set of operations as pipeline stages. Other VMbased cloud providers have similar functionality. Those operations mainly include:</p>

<ul>
  <li>Baking AMIs</li>
  <li>Tagging AMIs</li>
  <li>Deploying in EC2</li>
  <li>Availability zones</li>
  <li>Health checks</li>
  <li>Autoscaling</li>
</ul>

<p>Kubernetes makes deployment to the cloud much easier because of some of its advantages comparing to the VM-based cloud platform:</p>

<ul>
  <li>Faster: provisioning resources in Kubernetes takes seconds, while provisioning a VM can take minutes.</li>
  <li>Declarative: Kubernetes uses manifest files (YAML) to provide a declarative description of your infrastructure, and this is central to how Kubernetes works.</li>
  <li>Multi-cloud: whether Kubernetes is running in Google’s cloud or Amazon’s, in your on-premise datacenter or on your laptop, it exposes the same interface and behavior for running your workloads. This makes it trivial to deploy the same application to multiple clouds and regions, when you can treat each as being identical.</li>
  <li>Native deployment orchestration: when a change is submitted to a running Kubernetes workload, it orchestrates a rollout of your change according to policies you specify. In some cases, this becomes the only deployment orchestration that you need.</li>
</ul>

<p>Pinterest has an internal deployment system <a href="https://github.com/pinterest/teletraan">Teletraan</a>, and it was designed and used for deploying services to VMs (Amazon EC2). After joining Pinterest, I learned that a major user pain point of Teletraan was the complicated configurations needed for setting up a deployment environment for user’s services. For example, in Teletraan users need to configure AMI, place AZs, etc. I agree that Teletraan’s UI can be improved to make the experience better, but I think this problem is a result of the complexity of VM deployment. To solve it, we could either choose to move to Kubernetes, or redesign some parts of Teletraan to have a better abstraction model, similar to what Spinnaker did. In short, reducing the complexity in developer experience cannot be done just by fixing the UI.</p>

<p>Right now I’m no longer working on the Continuous delivery platform, but I still like to think about the problems and solutions in this area because I used to work on it, built a new system from scratch when I didn’t really have much knowledge in the cloud deployment space, had many questions and learned lessons (and got some wins too). If you happen to read this and want to continue the discussion with me on a specific topic, please feel free to drop me a line.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Read D3.js in Action]]></title>
    <link href="http://euccas.github.io/blog/20210425/read-d3-dot-js-in-action.html"/>
    <updated>2021-04-25T17:56:43-07:00</updated>
    <id>http://euccas.github.io/blog/20210425/read-d3-dot-js-in-action</id>
    <content type="html"><![CDATA[<p>Recently at work, I have been working on developing a few features on top of <a href="https://airflow.apache.org/">Apache Airflow</a>. Some of the features are UI heavy, and require some amount of the data visualization using <a href="https://d3js.org/">D3.js</a>. While working on those features, I thought it could be a good chance to spend some time on learning D3.js in-depth, so I chose to read this book <a href="https://www.manning.com/books/d3js-in-action-second-edition"><strong>D3.js in Action</strong></a> on manning.com. Here comes a summary of this book, and some notes I took while reading.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2021/20210425-d3js-book.png" width="360" /></p>

<p>Overall I find this book is easy to read as long as you have some knowledge in JavaScript.  In this book, a few key concepts in D3.js are clearly laid out,  and the examples cover a good set of common usages and tactics you need to know for building data visualization features using D3.js.</p>

<!--more-->

<p>This book has 11 chapters. Chapter 1, 2 and 3 are introduction to D3.js, the high level flow and common operations of using D3.js for information visualization, and how to structure a data visualization project with D3.js.</p>

<p>In the first three chapters, alongside with the basic concepts, a few tactics I find worth learning from the very beginning are:</p>

<ul>
  <li>
    <p><strong>Integrate scales in data binding</strong>: D3.js provides handy scale functions to normalize data values for better display. Example built-in scale functions include: <code>d3.scaleLinear()</code>, <code>d3.scaleSequential()</code>, <code>d3.scaleQuantize()</code> and so on. A D3 scale has two primary functions: .<code>.domain()</code> and <code>.range()</code>, both of which expect arrays and must have arrays of the same length to get the right results. The array in <code>.domain()</code> indicates the series of values being mapped to <code>.range()</code>.</p>
  </li>
  <li>
    <p><strong>Enter, update, merge, and exit to update DOM elements</strong>: Understanding how to create, change, and move elements using <code>enter()</code>, <code>exit()</code>, and selections is the basis for all the complex D3 functionality. One note here is D3 doesn’t follow the convention that when the data changes, the corresponding display is updated; you need to build that functionality yourself.</p>
  </li>
  <li>
    <p><strong>Getting access to the actual DOM element</strong> in the selection can be accomplished in one of two ways:</p>
    <ul>
      <li>Using <code>this</code> in the inline functions (cannot be used with arrow functions)</li>
      <li>Using the <code>.node()</code> function</li>
    </ul>
  </li>
</ul>

<p>Using <code>this</code></p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
</pre></td><td class="code"><pre><code class=""><span class="line">d3.select("circle").each(function(d,i) {
</span><span class="line">    console.log(d);console.log(i);console.log(this);
</span><span class="line">})</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Using <code>.node()</code> function</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class=""><span class="line">d3.select("circle").node();</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Chapter 4, 5, 6, 7 and 8 introduce the methods and details of building specific types of visualization for specific types of data: chart components, layouts, complex hierarchical data visualization,  network visualization, and visualizing geospatial information.</p>

<ul>
  <li>One note about <strong>Layouts</strong>: D3 contains a variety of functions, referred to as layouts, that help you format your data so that it can be presented using a popular charting method. D3 layouts don’t result in charts; they result in the settings necessary for charts. Example D3 built-in layouts: <code>d3.layout.histogram()</code>, <code>d3.layout.pie()</code>, <code>d3.layout.tree()</code> etc.</li>
</ul>

<p>Chapter 9 covers how to using D3 with React. The challenge of integrating D3 with React is that React and D3 both want to control the DOM. The entire select/enter/exit/update pattern with D3 is in direct conflict with React and its virtual DOM. The way most people use D3 with React is to use React to build the structure of the application, and to render traditional HTML elements, and then when it comes to the data visualization section, they pass a DOM container (typically an <code>&lt;svg&gt;</code>) over to D3 and use D3 to create and destroy and update elements.</p>

<p>Chapter 10 and 11 are advanced usages about customizing layouts and components, and mixed mode rendering in HTML canvas.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Learn Golang - a Mind Map]]></title>
    <link href="http://euccas.github.io/blog/20181227/golang-mindmap.html"/>
    <updated>2018-12-27T21:29:12-08:00</updated>
    <id>http://euccas.github.io/blog/20181227/golang-mindmap</id>
    <content type="html"><![CDATA[<p>Earlier this year I started to learn Golang. There were three good reasons why I wanted to learn this programming language.</p>

<ol>
  <li>For work: Some of the projects my team have worked on, or planned to work on, use Golang to improve applications’ performance.</li>
  <li>For better understanding cloud infrastructure: Some of the key cloud infrastructure open source projects, including Kubernetes and Docker, are written in Golang.</li>
  <li>For using it to easily create multithreaded and concurrent programs.</li>
</ol>

<!--more-->

<p>One thing I did when learning Golang was creating a mind mapping diagram. The mind map helps me to organize the different topics of Golang that I need to learn about, and to dig into each part without getting lost in too many details. It also makes it much easier to remember information.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20181227-golang.png" width="600" /></p>

<p>If you are also learning Golang, you can take a look at the Golang mind map <a href="https://github.com/euccas/gogocode/tree/master/doc">here on my github</a>. It mainly covers Golang syntax, flow control, data structures, methods, functions, interfaces and basic concurrency. One thing it doesn’t have yet is Go Modules, which was added in Go 1.11 (released in August 2018). As Go dev team announced, current module support is priliminary. In Go 1.12, scheduled for February 2019, they will refine module support. I will update this mind map to add Go Modules then.</p>

<p>Lastly, 2019 is around the corner. Happy New Year!</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Reading Notes - Designing Distributed Systems]]></title>
    <link href="http://euccas.github.io/blog/20180513/read-designing-distributed-systems.html"/>
    <updated>2018-05-13T18:22:12-07:00</updated>
    <id>http://euccas.github.io/blog/20180513/read-designing-distributed-systems</id>
    <content type="html"><![CDATA[<p>Recently I read a book <a href="http://shop.oreilly.com/product/0636920072768.do">Designing Distributed Systems</a>, which is written by <a href="https://twitter.com/brendandburns">Brendan Burns</a>, and published by O’Reilly earlier this year. This book introduces the patterns and components used in the development of distributed systems.</p>

<p>Before I started to read this book, I had three questions in my mind, and try to find the answers from the book. Those three questions are:</p>

<ol>
  <li>What’s the most important difference between designing distributed systems and single machine systems?</li>
  <li>Why container technology, such as docker, kubernates, is so popular? How could it be helpful?</li>
  <li>What are the common patterns used in distributed systems design, and when shall I use them?</li>
</ol>

<p>This book does give me the answers, at least partial ones. I put my reading notes into a Google Slides, and  <a href="https://docs.google.com/presentation/d/1srX9hRS9tbtrEx7T1abxbHiD1gASkvllf1_W2SjuadA/edit?usp=sharing">you can find it here to read the details</a>. A PDF version in light background color is <a href="https://github.com/euccas/euccas.github.io/blob/source/data/read-2018-design_distributed_systems_lightver.pdf">available here</a>.</p>

<p>The short answers to my questions are as in the following:</p>

<!--more-->

<h2 id="whats-the-most-important-difference-between-designing-distributed-systems-and-single-machine-systems">What’s the most important difference between designing distributed systems and single machine systems?</h2>

<ul>
  <li>Designing distributed systems can be significantly more complicated to design, build, and debug correctly.</li>
  <li>Designing distributed systems need much more efforts in designing for scalability and reliability.</li>
  <li>In a distributed system, tasks/data are spreaded to multiple workers. It requires techniques like containers and load balancing to utilize parallelisation</li>
</ul>

<h2 id="why-container-technology-docker-kubernetes-is-so-popular-how-could-they-be-helpful">Why container technology (docker, kubernetes) is so popular? How could they be helpful?</h2>

<ul>
  <li>Containers are not only useful for applications which have components running on multiple machines, but also for single machine applications.</li>
  <li>The goal of containerization is to <strong>establish boundaries</strong> around specific resources, team ownership, separation of concerns.</li>
  <li>The benefits include <strong>resource isolation</strong>, <strong>scaling teams</strong>, <strong>reuse components and modules</strong>, <strong>break big problems</strong> into smaller ones (Small, focused applications are easier to understand, be tested, updated and deployed)</li>
</ul>

<h2 id="what-are-the-common-patterns-used-in-distributed-systems-design-and-when-shall-i-use-them">What are the common patterns used in distributed systems design, and when shall I use them?</h2>

<p>The book describes three types of patterns.</p>

<ul>
  <li><strong>Single node</strong> patterns: sidecar, ambassadors, adapters</li>
  <li><strong>Serving</strong> patterns: sharded services, scatter/gather, FaaS, etc.</li>
  <li><strong>Batch computational</strong> patterns: Work queue, Event-driven batch processing, coordinated batch processsing, etc.</li>
</ul>

<p>You can find more detailed description of each design pattern in my reading notes.</p>

]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Performance Profiling Tools on Windows]]></title>
    <link href="http://euccas.github.io/blog/20180213/performance-profiling-tools-on-windows.html"/>
    <updated>2018-02-13T17:21:11-08:00</updated>
    <id>http://euccas.github.io/blog/20180213/performance-profiling-tools-on-windows</id>
    <content type="html"><![CDATA[<p>Last year, I wrote a blog post about <a href="http://euccas.github.io/blog/20170827/cpu-profiling-tools-on-linux.html">CPU Profiling and the tools on Linux</a>. Today I’m going to write about a few Performance profiling tools on Windows platform. Last week I was working on profiling and analyzing the build process of a project on Windows, and during the process I experimented with a few different tools.</p>

<h1 id="performance-monitor-perfmon">Performance Monitor (perfmon)</h1>

<p><strong>Performance Monitor</strong> is a small utility provided by Windows OS, you can start it by running command <code>perfmon</code>. With perfmon, you can monitor real-time system performance, and record performance to files for post analysis. This tool provides some extremely useful interfaces in its GUI.</p>

<h2 id="real-time-performance">Real-time Performance</h2>

<p>To view current performance activity, you just need click on the <strong>Performance Monitor</strong> button in the left panel:</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-perfmon_0.png" width="600" /></p>

<!--more-->

<p>By default, this view has only one performance counter: <code>% CPU Processor Time</code>. You can add more counters you need, such as Processor’s Idle Time, Cache Performance, Network Performance and a lot more.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-perfmon_1.png" width="600" /></p>

<h2 id="performance-recording">Performance Recording</h2>

<p>When analyzing an application’s performance, we often need record all the performance data and generate various reports to help analysis. We can do this in perfmon by adding <strong>User Defined Data Collector Sets</strong> (from Menu Action -&gt; New -&gt; Data Collector Sets).</p>

<p>Perfmon allows you to choose a template to start with, and specify the location where the performance data will be saved. The process is quite straightforward as provided in the GUI. There is only one thing that you need pay attention to: the <strong>Stop Condition</strong>. By default, a newly created Data Collector Sets has “stop condition” as <strong>“Overall duration: 1 minute”</strong>. With this condition set, the performance recording will stop in 1 minute after starting. If the process you are monitoring takes longer than 1 minute to finish, you definitely want to increase this “Overall duration” to some longer time.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-perfmon_2.png" width="600" /></p>

<p>With the added Data Collector Sets, you can start recording before running your application, and stop recording any time you want. The recorded data will be shown in the <strong>Reports</strong> session in the left panel. The report can also be viewed as graphs in the Performance Monitor.</p>

<p>The following is one example of displaying performance report in <strong>Stacked Area Graph</strong>. The other graph types you can choose are: <strong>Line</strong>, <strong>Histogram bar</strong>, <strong>Area</strong>.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-perfmon_3.png" width="600" /></p>

<h1 id="windows-performance-recorder-wpr">Windows Performance Recorder (WPR)</h1>

<p><strong>Windows Performance Recorder (WPR)</strong> is a performance recording tool that is based on Event Tracing for Windows. It is available for Windows 8 or later. It records system events that you can then analyze by using Windows Performance Analyzer (WPA). This tool is included in the Windows Assessment and Deployment Kit (Windows ADK), and you can download it <a href="https://insider.windows.com/">here</a>.</p>

<h2 id="recording-with-wpr">Recording with WPR</h2>

<p>When WPR starts, it will guide you to choose a few configurations: <strong>profiles</strong>, <strong>scenario</strong>, <strong>details level</strong> and <strong>logging mode</strong>. You can follow the instructions <a href="https://docs.microsoft.com/en-us/windows-hardware/test/wpt/wpr-quick-start">here on Microsoft Docs</a> to decide how to choose for your needs.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-wpr_0.png" width="600" /></p>

<p>Then you can start recording performance by clicking the <strong>“Start”</strong> button. The recording will end when you click the <strong>“Save”</strong> button or <strong>“Cancel”</strong> button. If “Save” is clicked, the performance data will be stored to files, and Windows Performance Analyzer (WPA) will be automatically launched to show the performance reports.</p>

<h2 id="reporting-in-wpa">Reporting in WPA</h2>

<p>WPA provides detailed performance analysis data in its rich user interface. In the left <strong>“Graph Explorer”</strong>, you can choose to view performance graphs for <strong>System Activities</strong>, <strong>Computation</strong>, <strong>Storage</strong>, <strong>Memory</strong>, and <strong>Power</strong>. To see the graphs, just drag the graph to the <strong>“Analysis”</strong> tab on the right side.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-wpa_0.png" width="600" /></p>

<p>Comparing to Performance Monitor (perfmon), WPA reports give you more details and flexibility to explore the data.</p>

<p>This graph is a process lifetime graph generated by WPA.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-wpa_1.png" width="600" /></p>

<h2 id="load-symbols-in-wpa">Load symbols in WPA</h2>

<p>WPA supports <strong>loading symbols</strong> so you can see more details of each process or command. The paths of symbols can be added either from UI, or by setting environment variable <code>_NT_SYMBOL_PATH</code>. Read <a href="https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-8.1-and-8/hh448137(v%3dwin.10)">this instruction</a> if you need understand how to load symbols or configure symbol paths in WPA.</p>

<h1 id="xperf">Xperf</h1>

<p><strong>Xperf</strong> is a command-line tool for performance recording on Windows. It is also included in the Windows Assessment and Deployment Kit (Windows ADK). Starting from Windows 8, WPR becomes the recommended tool for performance recording, the support is still maintained for Xperf though.</p>

<p>Xperf works in a similarly way as WPR. It doesn’t have a GUI, but provides about ten command line options to process performance recording. The most commonly used ones probably are just <code>start</code> and <code>stop</code>.</p>

<p>You can simply start Xperf performance recording using this command:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class=""><span class="line">xperf -on PROC_THREAD+LOADER+Base -BufferSize 1024</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>When recording is done, the generated <code>*.etl</code> file can be opened and viewed in WPA.</p>

<h1 id="process-explorer">Process Explorer</h1>

<p>Lastly, I’d like to introduce a light-weight tool <strong>Process Explorer</strong>, aka. procexp. Process Explorer is included in Windows’ <a href="https://docs.microsoft.com/en-us/sysinternals/downloads/process-utilities"><strong>Sysinternals Process Utilities</strong></a>.</p>

<p>Process Explorer provides a CPU performance monitor. Comparing to the CPU monitor in Task Manager, this one has enhanced features for you to monitor CPU utilization of each core and each thread. You can view a graph for each CPU.</p>

<p><img class="center" src="http://euccas.github.io/images/post_images/2018/20180213-procexp_0.png" width="600" /></p>

]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Effective Jenkins Plugins (1)]]></title>
    <link href="http://euccas.github.io/blog/20171203/effective-jenkins-plugins-1.html"/>
    <updated>2017-12-03T16:18:31-08:00</updated>
    <id>http://euccas.github.io/blog/20171203/effective-jenkins-plugins-1</id>
    <content type="html"><![CDATA[<p>(<em>Update 2018-05-13</em>: Here is a translation of this post in <a href="https://dealsdaddy.co.uk/translations/jenkinsplugins/">Hindi</a>, provided by Nikol @ DealsDaddy.)</p>

<p><a href="https://jenkins.io/">Jenkins</a>, originally founded in 2006 as “Hudson”, is one of the leading automation applications which support building, deploying and automating software projects. One great advantage of Jenkins is there are hundreds of plugins available which enable various kinds of extended features needed in the Continous Integration and Continuous Delivery process. As I just checked on the <a href="https://plugins.jenkins.io/">Jenkins Plugins page</a>, there are 873 plugins that fall into five categories: Platforms, User interface, Administration, Source code management and Build management.</p>

<p>Effectively using Jenkins plugins makes your experience with Jenkins more productive. I’m going to occasionally write about Jenkins plugins that I used or learned about. The first post will start from some of the plugins I used when I worked on building a Continuous Delivery system last year (from 2015 to 2016).</p>

<h1 id="job-configuration-history">Job Configuration History</h1>

<p><a href="https://plugins.jenkins.io/jobConfigHistory"><em>JobConfigHistory Plugin</em></a></p>

<p>This plugin saves every change made to a job. It allows you to see the history of job configurations, compare configuration differences, and restore a particular version of config. You can also see which user makes a particular change if you configured a security policy.</p>

<!--more-->

<p>The configuration changes are saved by means of saving copies of the configuration file of a job (config.xml in Jenkins Home directory).</p>

<h1 id="dependency-graph-view">Dependency Graph View</h1>

<p><a href="https://plugins.jenkins.io/depgraph-view"><em>Dependency Graph View Plugin</em></a></p>

<p>This plugin visualize dependencies of multiple jobs by generating graphs via <a href="https://graphviz.gitlab.io/">graphviz</a>. You can choose to show the dependency of jobs in a view. To generate the graph, it is required to have graphviz installed on the Jenkins server.</p>

<p>This plugin is very useful when you have many jobs which have dependency relationship. Visualizing the dependency helps you easily find possible mistakes in the setting of dependencies.</p>

<h1 id="build-timeout">Build Timeout</h1>

<p><a href="https://plugins.jenkins.io/build-timeout"><em>Build-timeout Plugin</em></a></p>

<p>This plugin allows you to set runtime limit of jobs, and automatically abort a build if it’s taking longer than expected time. In my experience, this plugin was extremely useful as it solved the problem that some builds got stuck and didn’t release the Jenkins slave slots.</p>

<p>Noted this plugin isn’t applicable to pipelines.</p>

<h1 id="perforce">Perforce</h1>

<p><a href="https://wiki.jenkins.io/display/JENKINS/P4+Plugin"><em>P4 Plugin</em></a></p>

<p>This plugin manages Perforce workspaces, synchronising code and polling/triggering builds. It also supports a few common Perforce operations such as credential authentication, changelists browsing, and labeling builds.</p>

<h1 id="jira">JIRA</h1>

<p><a href="https://plugins.jenkins.io/jira"><em>JIRA Plugin</em></a></p>

<p>This plugin integrates <a href="https://www.atlassian.com/software/jira">JIRA</a> to Jenkins. It uses JIRA REST API, and allows you to display Jenkins builds inside JIRA.</p>

<h1 id="parameterized-trigger">Parameterized Trigger</h1>

<p><a href="https://plugins.jenkins.io/parameterized-trigger"><em>Parameterized Trigger Plugin</em></a></p>

<p>This plugin lets you trigger new builds with various ways of specifying parameters for the new builds. The parameters could be a set of predefined properties, or based on information/results of the upstream builds.</p>

<p>As an example, you can tell a build job where to find packages it needs to install.</p>

<h1 id="log-parser">Log Parser</h1>

<p><a href="https://plugins.jenkins.io/log-parser"><em>Log Parser Plugin</em></a></p>

<p>This plugin parses the console log generated by the Jenkins build. It could highlight lines of interest in log, like the lines with <code>errors</code>, <code>warnings</code>, <code>information</code>. It divides a log into sections, such as <em>errors section</em>, <em>warnings section</em>, etc. The number of errors or warnings are also displayed. Useful for triaging errors in long build logs.</p>

<h1 id="email-extension">Email Extension</h1>

<p><a href="https://plugins.jenkins.io/email-ext"><em>Email Extension Plugin</em></a></p>

<p>This plugin extends the email notification functionality that Jenkins provides. You can customize when an email is sent, who should receive it, and the content of the email.</p>

<h1 id="disk-usage">Disk Usage</h1>

<p><a href="https://plugins.jenkins.io/disk-usage"><em>Disk-Usage Plugin</em></a></p>

<p>This plugin calculates disk usage of projects and builds, and shows the disk usage information on a page. It also displays a trend chart of display usage. It makes Jenkins job and workspace maintenance easier.</p>

<h1 id="thinbackup">ThinBackup</h1>

<p><a href="https://plugins.jenkins.io/thinBackup"><em>ThinBackup Plugin</em></a></p>

<p>This plugin backs up the global and job specific configurations. You can see the backup history, and choose to restore a particular backup. The backup provides setting options for the backup schedule, backup directory, maximum number of backup sets, etc.</p>

<h1 id="scriptlermanaged-scripts">Scriptler/Managed Scripts</h1>

<p><a href="https://wiki.jenkins.io/display/JENKINS/Scriptler+Plugin"><em>Scriptler Plugin</em></a></p>

<p>This plugin allows you to edit, store, and resue groovy scripts, and execute the script on any of the slaves or nodes. But since 2016 the distribution of this plugin has been suspended due to security issues. <em>The current version of this plugin may not be safe to use</em>.</p>

<p>An alternative choice is the <a href="hhttps://plugins.jenkins.io/managed-scripts"><em>Managed Scripts Plugin</em></a>.</p>

]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Python: Why Decorators Are Useful]]></title>
    <link href="http://euccas.github.io/blog/20171129/python-why-decorators-are-useful.html"/>
    <updated>2017-11-29T22:41:51-08:00</updated>
    <id>http://euccas.github.io/blog/20171129/python-why-decorators-are-useful</id>
    <content type="html"><![CDATA[<p>In Python, by definition <em>decorators</em> are <em>functions</em> that <em>accept a function</em> as an argument, and <em>return a new function</em> as their return value. The reason why decorators exist in Python, but not in other similar language such as Ruby, is that <em>functions</em> are <em>objects</em> in Python. Functions can be assigned to variables and passed around the same as other object in Python. For example, a list can have functions as its elements, and functions can take other functions as arguments. Functions can be defined inside another function, and this forms closures.</p>

<h1 id="when-to-use-decorators">When to use decorators?</h1>

<p>It’s easy to understand what decorators are, while the real question you may have is: Why decorators are useful? When shall I use decorators in my Python program?</p>

<p>In some way, I see decorator functions are useful whenever you need process or extend the inputs or outputs of a given function (or more often, multiple functions) in some way you want. Here I list three usages of decorators that I can think of:</p>

<!--more-->

<h2 id="extend-the-functionality-of-your-functions">1. Extend the functionality of your functions</h2>

<p>Usually the extended functionalities are for some kind of enhancement, format changing, or temporary usage. In other words, you are adding some functionalities without touching the core logic of the original functions. A few common use cases:</p>

<ul>
  <li>Convert the output of your functions to another format, like JSON, YAML, etc.</li>
  <li>Add logging to your functions, or formatting the logging output</li>
  <li>Measure timing of your functions</li>
  <li>Count the time of function calls</li>
</ul>

<h2 id="add-caching-process-to-your-functions-to-make-them-faster">2. Add caching process to your functions, to make them faster</h2>

<p>When you have some functions which are possibly called for many times with the same input, you can write a decorator function that stores a cache of inputs and outputs of a given function. In this way, the function doesn’t need to re-compute everything each time, and make it faster to run it multiple times. This is related to the <a href="https://en.wikipedia.org/wiki/Memoization"><strong>Memoization technique</strong></a>.</p>

<h2 id="handle-exceptions">3. Handle exceptions</h2>

<p>You can use decorator functions to process exceptions. One example is supressing particular types of system exceptions raised by the target function. Another thing you can do is catching all exceptions raised by a function, prompt the user to ask what the program should act.</p>

<h1 id="two-examples">Two examples</h1>

<p>Now let me use two examples to describe the <strong>syntax</strong> of decorators.</p>

<h2 id="measure-timing">Measure timing</h2>

<p>This example comes from a good answer on <a href="https://stackoverflow.com/a/490228/3109254">Stack Overflow, by user RSabet</a>.</p>

<p>A decorator function <code>time_dec</code> tells you how long it takes to finish a function.
Python has a shortened syntax for using decorators which allows us to wrap a function in a decorator after we define it. This shortened syntax is syntactic sugar <code>@decorator_function</code>.</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
<span class="line-number">8</span>
<span class="line-number">9</span>
<span class="line-number">10</span>
<span class="line-number">11</span>
<span class="line-number">12</span>
<span class="line-number">13</span>
</pre></td><td class="code"><pre><code class="python"><span class="line"><span class="k">def</span> <span class="nf">time_dec</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
</span><span class="line">    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">arg</span><span class="p">):</span>
</span><span class="line">        <span class="n">t</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span>
</span><span class="line">        <span class="n">res</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">arg</span><span class="p">)</span>
</span><span class="line">        <span class="k">print</span><span class="p">(</span><span class="n">func</span><span class="o">.</span><span class="n">func_name</span><span class="p">,</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span><span class="o">-</span><span class="n">t</span><span class="p">)</span>
</span><span class="line">        <span class="k">return</span> <span class="n">res</span>
</span><span class="line">
</span><span class="line">    <span class="k">return</span> <span class="n">wrapper</span>
</span><span class="line">
</span><span class="line">
</span><span class="line"><span class="nd">@time_dec</span>
</span><span class="line"><span class="k">def</span> <span class="nf">myFunction</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span><span class="line">    <span class="o">...</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Note the syntactic sugar <code>@time_dec</code> was used. It causes Python to rebind the function name <code>myFunction</code> as:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class="python"><span class="line"><span class="n">myFunction</span> <span class="o">=</span> <span class="n">time_dec</span><span class="p">(</span><span class="n">myFunction</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<h2 id="memoization">Memoization</h2>

<p>This example shows how we can add caching to the calculation of prime numbers.</p>

<p>A decorator function <code>memoize</code> is used to store inputs and calculate outputs of the original function <code>is_prime</code>.
The second time when you call function <code>is_prime</code> with the same input number, it runs much faster than the first time.</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
<span class="line-number">8</span>
<span class="line-number">9</span>
<span class="line-number">10</span>
<span class="line-number">11</span>
<span class="line-number">12</span>
<span class="line-number">13</span>
<span class="line-number">14</span>
</pre></td><td class="code"><pre><code class="python"><span class="line"><span class="k">def</span> <span class="nf">memoize</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
</span><span class="line">    <span class="n">cache</span> <span class="o">=</span> <span class="p">{}</span>
</span><span class="line">    <span class="k">def</span> <span class="nf">new_func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
</span><span class="line">        <span class="k">if</span> <span class="n">args</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">cache</span><span class="p">:</span>
</span><span class="line">            <span class="n">cache</span><span class="p">[</span><span class="n">args</span><span class="p">]</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
</span><span class="line">        <span class="k">return</span> <span class="n">cache</span><span class="p">[</span><span class="n">args</span><span class="p">]</span>
</span><span class="line">    <span class="k">return</span> <span class="n">new_func</span>
</span><span class="line">
</span><span class="line"><span class="nd">@memoize</span>
</span><span class="line"><span class="k">def</span> <span class="nf">is_prime</span><span class="p">(</span><span class="n">number</span><span class="p">):</span>
</span><span class="line">    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">number</span><span class="p">):</span>
</span><span class="line">        <span class="k">if</span> <span class="n">number</span> <span class="o">%</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span><span class="line">            <span class="k">return</span> <span class="bp">False</span>
</span><span class="line">    <span class="k">return</span> <span class="bp">True</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<h1 id="built-in-decorators">Built-in decorators</h1>

<p>Python also has several built-in decorators, and you might see them before you know the term decorator. The built-in decorators are mainly used to annotate methods of a class: <code>@property</code>, <code>@classmethod</code>, <code>@staticmethod</code>.</p>

<p><strong>@property</strong>: transforms a method function into a descriptor. When applied to a method, it creates extra properties objects: <code>getter</code>, <code>setter</code>, and <code>deleter</code>. By using <code>@property</code>, we can access a method as if it was an attribute.</p>

<p><strong>@classmethod</strong>: transforms a method function into a class-level function.</p>

<p><strong>@staticmethod</strong>: transforms a method function into a class-level function, and neither the object instance nor the class is implicitly passed as the first argument.</p>

<p>As decorators are just ordinary functions and the decorator syntax is just a syntactic sugar, you can easily turn any Python <a href="https://docs.python.org/3/library/functions.html">built-in function</a> to a decorator if it makes sense to use it that way.</p>

<h1 id="one-more-thing">One more thing</h1>

<p>One more thing, you may want to take a look at <a href="https://wiki.python.org/moin/PythonDecoratorLibrary">this PythonDecoratorLibaray page</a>. It collects a number of decorator examples and code pieces.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[CPU Profiling Tools on Linux]]></title>
    <link href="http://euccas.github.io/blog/20170827/cpu-profiling-tools-on-linux.html"/>
    <updated>2017-08-27T14:23:21-07:00</updated>
    <id>http://euccas.github.io/blog/20170827/cpu-profiling-tools-on-linux</id>
    <content type="html"><![CDATA[<p>Profiling is an effective method to provide measurements for the performance of software applications. With profiling, you get fine grained information for the components of an application, such as how often a function is called, how long a routine takes to execute and how much time are spent of different spots in the code. With these information, you could identify the performance bottlenecks and the poorly implemented parts in a software application, and find effective methods to improve them.</p>

<p>In this post I’ll write a brief summary of two profiling methods: <strong>Instrumentation</strong> and <strong>Sampling</strong>, and four CPU profiling tools on Linux: <strong>perf</strong>, <strong>gprof</strong>, <strong>Valgrind</strong> and Google’s <strong>gperftools</strong>.</p>

<h1 id="profiling-methods">Profiling Methods</h1>

<p>Different profiling methods use different ways to measure the performance of an application when it is executed. <strong>Instrumentation</strong> and <strong>Sampling</strong> are the two categories that profiling methods fall into.</p>

<!--more-->

<h2 id="instrumentation">Instrumentation</h2>

<p>Instrumentation method inserts special code at the beginning and end of each routine to record when the routine starts and ends. The time spent on calling other routines within a routine may also be recorded. The profiling result shows the actual time taken by the routine on each call.</p>

<p>There are two types of instrumenting profiler tools: <strong>source-code modifying</strong> profilers and <strong>binary profilers</strong>. Source-code modifying profilers insert the instrumenting code in the source code, while the binary profilers insert instrumentation into an application’s executable code once it is loaded in memory.</p>

<p>The good thing of instrumentation method is it gives you the actual time. The inserted instrumentation code (timer calls) take some time themselves. To reduce the impact of that, at the start of each run profilers measure the overhead incurred from the instrumenting process, and later subtract this overhead from the measurement result. But the instrumenting process could still significantly affect an application’s performance in some cases, for example when the routine is very short and frequently called, as the inserted instrumentation would disturb the way the routine executes in the CPU.</p>

<h2 id="sampling">Sampling</h2>

<p>Sampling measures applications without inserting any modifications. Sampling profilers record the executed instruction when the operating system interrupts the CPU at regular intervals to execute process switches, and correlates the recorded execution points with the routines and source code during the linking process. The profiling result shows the frequency with which a routine and source line is executing during the application’s run.</p>

<p>Sampling profilers causes little overhead to the application run process, and they work well on small and often-called routines. One drawback is the evaluations of time spent are statistical approximations rather than actual time. Also sampling could only tell what routine is executing currently, not where it was called from. As a result, sampling profilers can’t report call traces of an application.</p>

<h1 id="cpu-profiling-tools-on-linux">CPU Profiling Tools on Linux</h1>

<h2 id="perf">1. perf</h2>

<p>The <a href="https://perf.wiki.kernel.org/index.php/Main_Page"><strong>perf</strong></a> tool is provided by Linux kernel (2.6+) for profiling CPU and software events. You can get the tool installed by:</p>

<ul>
  <li>Ubuntu: install <em>linux-tools_common</em></li>
  <li>Debian: install <em>linux-base</em></li>
  <li>Arch: install <em>perf-utils</em></li>
  <li>Fedora: install <em>perf</em></li>
</ul>

<p><code>perf</code> is based on the perf_events system, which is based on event-based sampling, and it uses CPU performance counters to profile the application. It can instrument hardware counters, static tracepoints, and dynamic tracepoints. It also provide per task, per CPU and per-workload counters, sampling on top of these and source code event annotation. It does <em>not</em> instrument the code, so that it has a really fast speed and generates precise results.</p>

<p>You can use <code>perf</code> to profile with <code>perf record</code> and <code>perf report</code> commands:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
</pre></td><td class="code"><pre><code class=""><span class="line">perf record -g &lt;app&gt; &lt;options&gt;
</span><span class="line">perf report</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>The <code>perf record</code> command collects samples and generates an output file called <code>perf.data</code>. This file can then be analyzed using <code>perf report</code> and <code>perf annotate</code> commands. Sampling frequency can be specified with <code>-F</code> option. As an example, <code>perf record -F 1000</code> means 1000 samples per second.</p>

<h2 id="gprof">2. gprof</h2>

<p>GNU profiler <a href="https://sourceware.org/binutils/docs/gprof/"><strong>gprof</strong></a> tool uses a hybrid of instrumentation and sampling. Instrumentation is used to collect function call information, and sampling is used to gather runtime profiling information.</p>

<p>Using <code>gprof</code> to profile your applications requires the following steps:</p>

<ol>
  <li>Compile and link the application with <code>-pg</code> option</li>
  <li>Execute the application to generate a profile data file, default file name is <code>gmon.out</code></li>
  <li>Run <code>gprof</code> command to analyze the profile data</li>
</ol>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
</pre></td><td class="code"><pre><code class=""><span class="line">g++ -pg myapp.cpp -o myapp.o
</span><span class="line">./myapp.o
</span><span class="line">gprof myapp.o  </span></code></pre></td></tr></table></div></figure></notextile></div>

<p>The <code>gprof</code> command prints a flat profile and a call graph on standard output. The flat profile shows how much time was spent executing directly in each function. The call graph shows which functions called which others, and how much time each function used when its subroutine calls are included. You can use the supported options <a href="https://ftp.gnu.org/old-gnu/Manuals/gprof-2.9.1/html_mono/gprof.html#SEC4">listed here</a> to control <code>gprof</code> output styles, such as enabling line-by-line analysis and annotated source.</p>

<h2 id="valgrind-callgrind">3. Valgrind Callgrind</h2>

<p><a href="http://www.valgrind.org/"><strong>Valgrind</strong></a> is an instrumentation framework for building dynamic analysis tools. Valgrind distribution includes six production-quality tools that can detect memory issues and profile programs. <strong>Callgrind</strong>, built as an extension to <strong>Cachegrind</strong>, provides function call call-graph. A separated visualisation tool <a href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex"><strong>KCachegrind</strong></a> could also be used to visualize Callgrind’s output.</p>

<p>Valgrind is a CPU emulator. The technology behind Valgrind is Dynamic binary instrumentation (DBI), whereby the analysis code is added to the original code of the client program at run-time. The profiling tool Callgrind is simulation based, it uses Valgrind as a runtime instrumentation framework. The following two papers explain how Valgrind and Callgrind work in detail.</p>

<ul>
  <li><a href="http://www.valgrind.org/docs/valgrind2007.pdf">Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation (<em>Nicholas Nethercote and Julian Seward</em>)</a></li>
  <li><a href="http://www.valgrind.org/docs/callgrind2004.pdf">A Tool Suite for Simulation Based Analysis of Memory Access Behavior (<em>Josef Weidendorfer, Markus Kowarschik and Carsten Trinitis</em>)</a></li>
</ul>

<p>You need use the following commands to profile your program with <code>valgrind</code>:</p>

<ol>
  <li>Build your program as usual, no need adding any special compiler or linker flags</li>
  <li>Execute the program with callgrind tool to generate a profile data file, default file name is <code>callgrind.out.&lt;pid&gt;</code></li>
  <li>View the generated profile data with <code>callgrind_annotate</code> or <code>kcachegrind</code> tool</li>
</ol>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
</pre></td><td class="code"><pre><code class=""><span class="line">g++ myapp.cpp -o myapp.o
</span><span class="line">valgrind --tool=callgrind myapp.o
</span><span class="line">callgrind_annotate callgrind.out.&lt;pid&gt;</span></code></pre></td></tr></table></div></figure></notextile></div>

<h2 id="gperftools">4. gperftools</h2>

<p><a href="https://github.com/gperftools/gperftools"><strong>gperftools</strong></a>, originally “Google Performance Tools”, is a collection of tools for analyzing and improving performance of multi-threaded applications. It offers a fast malloc, a thread-friendly heap-checker, a heap-profiler, and a cpu-profiler. gperftools was developed and tested on x86 Linux systems, and it works in its full generality only on those systems. Some of the libraries and functionality have been ported to other Unix systems and Windows.</p>

<p>To use the CPU profiler in gperftools, you need:</p>

<ol>
  <li>Install the gperftools, following the instructions <a href="https://github.com/gperftools/gperftools">here</a></li>
  <li>Include gperftools header file in your application’s source files, and compile the application</li>
  <li>Link the library into an application with <code>-lprofiler</code></li>
  <li>Set enrionement variable <code>CPUPROFILE</code>, then run the application</li>
  <li>Analyze the output with <code>pprof</code> commands</li>
</ol>

<p>Include gperftools header files in your source file:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class=""><span class="line">#include "gperftools-2.6.1/src/gperftools/profiler.h"</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Link with <code>-lprofiler</code>, <code>profiler</code> is in the installation directory of <code>gperftools</code>:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class=""><span class="line">g++ -DWITHGPERFTOOLS -lprofiler -g myapp.cpp -o myapp.o</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Set CPUPROFILE environment variable, which controls the location of profiler output data file:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class=""><span class="line">export CPUPROFILE=./prof.out</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>Run <code>pprof</code> commands to analyze the profiling result:</p>

<div class="bogus-wrapper"><notextile><figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
</pre></td><td class="code"><pre><code class=""><span class="line">pprof --text &lt;app&gt; ./prof.out # text output
</span><span class="line">pprof --gv &lt;app&gt; ./prof.out # graphical output, requires gv installed</span></code></pre></td></tr></table></div></figure></notextile></div>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Facebook Infrastructure: Streaming Video Engine (SVE)]]></title>
    <link href="http://euccas.github.io/blog/20170627/facebook-infrastructure-streaming-video-engine-sve.html"/>
    <updated>2017-06-27T22:16:02-07:00</updated>
    <id>http://euccas.github.io/blog/20170627/facebook-infrastructure-streaming-video-engine-sve</id>
    <content type="html"><![CDATA[<p>In last year’s <a href="https://developers.facebook.com/videos/?category=f8_2016"><strong>Facebook F8 conference</strong></a>, Sachin Kulkarni, who worked on Facebook’s Video Infrastructure, gave a talk (<a href="https://developers.facebook.com/videos/f8-2016/inside-look-at-facebook-media-infrastructure/">watch it here</a>) to introduce the design of Facebook’s <strong>Streaming Video Engine System (SVE)</strong>. I found this talk particularly interesting because it revealed, in a very well structured, concise yet informative way, how Facebook infrastructure team came up with a solution to build a video system solving user frustrations by reviewing the end-to-end process, and how such a design meet the goal of being <strong>fast</strong>, <strong>flexible</strong>, <strong>scalable</strong>, and <strong>efficient</strong>. After watching the presentation video for a few times, I thought it would be helpful to write down some notes here, for my own reviewing in the future, and for people who might be interested in Facebook’s media infrastructure.</p>

<p>Sharing on Facebook started from largely text, and quickly changed to be largely photos. Since 2014, more videos started to be posted and shared among users. The challenge was, building a video processing system is much harder than building a text or image processing system. Videos are greedy, they will consume all your resources: CPU, memory, disk, network, and anything else.</p>

<p>Before building the Streaming Video Engine system, the team started by reviewing Facebook’s existing video uploading and processing process, which was slow and not scalable. They found several problems need change or improvement:</p>

<!--more-->

<ul>
  <li>No unified clients</li>
  <li>Several disk reads and writes in the critical path</li>
  <li>Was doing serial processing throughout</li>
  <li>Read a video as one single big file, instead of splitting it up to chunks</li>
</ul>

<p>The new Streaming Video Engine (SVE) is expected to solve the aforementioned problems, and to meet the four design goals:</p>

<ul>
  <li>Fast: make users upload their videos super fast</li>
  <li>Flexible: usable for different Facebook products</li>
  <li>Scalable: everything at Facebook has to scale</li>
  <li>Efficient: storage efficiency, processing efficiency, and more importantly consume less bytes of users’ data plan</li>
</ul>

<p>These four design goals, in my opinion, are also the most common goals applicable to most engineering infrastructure systems.</p>

<p>Let’s take a deep dive to see how SVE was designed to meet these goals.</p>

<h1 id="fast">Fast</h1>

<ul>
  <li>First step is build a common library (for video uploading) that could be used for the clients cross platforms (Web, Mobile, Android, etc.). With the common library, optimizations on video uploading can be applied to all platforms.</li>
  <li>The uploading library has functions to split a video by GOP (Group of Pictures, a GOP roughly is a scene in the video) alignment. So any given video can be split to segments, which can have multiple GOPs.</li>
  <li>Uploading process starts as soon as the clients split a video into segaments. The <strong>client</strong> uploads one segment a time to the <strong>web server</strong>.</li>
  <li>Web server sends out segments to the <strong>preprocessor</strong>, which is a write-through cache.</li>
  <li>Proprocessor handles:
    <ul>
      <li>Normalize the video (segment) if it needs to</li>
      <li>Notify the <strong>scheduler</strong> that there are video segments available to be encoded</li>
      <li>Write the video (segment) to the <strong>original storage</strong></li>
      <li>Further split the video segment into GOPs</li>
    </ul>
  </li>
  <li>Scheduler will find workers to encode videos. Multiple works can be utilized and each worker will process one or multiple GOPs.</li>
  <li>Overlapped upload and encoding process: While proprocessor, scheduler and works are working, the uploading process is still ongoing. Clients continues splitting videos into segments and uploading to the web server.</li>
</ul>

<p><img class="center" src="http://euccas.github.io/images/post_images/2017/20170627-fb_00.png" width="600" /></p>

<p>With this design, the process speedup reached 2.3x (small videos &lt; 3MB) ~ 9.3x (large videos &gt; 1G).</p>

<h1 id="flexible">Flexible</h1>

<ul>
  <li>The key insight that allows SVE to be flexible is, all the video processing pipelines can be represented as a DAG (Directed Acyclic Graph).</li>
  <li>Arbitrary dependencies can be added between the tasks in the video processing pipepline. The added tasks can be executed in parallel while the main pipeline tasks are running.</li>
  <li>SVE provides very simple API functions for the video pipeline (Ideally, you can add a video processing pipeline in your product in less than 10 lines of code).</li>
</ul>

<p><img class="center" src="http://euccas.github.io/images/post_images/2017/20170627-fb_01.png" width="600" /></p>

<h1 id="scalable">Scalable</h1>

<ul>
  <li>SVE was designed to prepare for overloads, such as handling the worldwide uploading “spike” on New Year’s Eve (could be 3x video uploads).</li>
  <li>Building a scalable system is relevant only when the system is <strong>robust</strong>. When the system gets overloaded, it must <strong>gracefully degrade</strong>. It cannot crash and burn.</li>
  <li>Prepare for overload along two dimensions: at the pipeline level, and the task level.</li>
  <li>Pipeline level, when uploads overwhelm the system:
    <ul>
      <li>Do not cache original videos in upload: Preprocessor stops caching original videos. Workers then need fetch videos from the original storage, not from preprocessor. The cost here is disk latency is added to the critical path.</li>
      <li>Delay pipeline generation for incoming video. Distinguish the critical video pipeline requests and the non-critical ones, then delay the non-critical ones.</li>
      <li>Reroute traffic to a different (less busy) region (Asia, Europe, US west, etc.)</li>
    </ul>
  </li>
  <li>Task level (the tasks executed by <strong>workers</strong> in the pipeline), when too many tasks are running:
    <ul>
      <li>Push back non latency-sensitive jobs</li>
      <li>Turn off A/B tests, which try to figure out the best encoding for the given video</li>
      <li>Add more workers (this requires making it easy to add capacity to SVE)</li>
    </ul>
  </li>
</ul>

<h1 id="efficient">Efficient</h1>

<ul>
  <li>The high level problem statement here is: If we could use 100% CPU, how can we make the encoded video as small as possible?</li>
  <li>Find the optimal encoding settings to get the best balance between encoded video file size and time spent on encoding. The difficult part is modern encoders can have hundreds of settings for one video. Chance of picking optimal combination is extremely low.</li>
  <li>The adopted solution is:
    <ul>
      <li>Categorize each scene such as “minimal motion”, “rapid movement”, and “complex crowded scene”.</li>
      <li>Build a Neural Network Model and a large training data set to train the network.</li>
      <li>In SVE, video scene segments are sent to a Fingerprint generator, which generates fingerprints and sends them to the Neural Network Model.</li>
      <li>The neural network figures out optimal encoding settings (could be multiple) for each scene, and sends the encoding settings to encoders.</li>
      <li>the encoder takes the settings, and encodes the video scenes in multiple ways. Then discard the encoded videos which are below quality bar.</li>
    </ul>
  </li>
</ul>

<p><img class="center" src="http://euccas.github.io/images/post_images/2017/20170627-fb_02.png" width="600" /></p>

<p>SVE achieved 20% smaller video file sizes. This is a huge saving of user’s data plans.</p>

<p>This Streaming Video Engine was designed, coded and tested in roughly 9 months. The most important learnings are:</p>

<ul>
  <li>E2E view: To find an optimal solution, we need look at the flow end to end</li>
  <li>Multi-dimensional flexibility is a key for making the system most useful</li>
  <li>Parallel and shadow mode testing to find correctness and scalability issues before production</li>
  <li>Design the ability to handle extreme products such as 360 videos</li>
  <li>Track direct measures (latency, reliability, etc.) and indirect measures (number of videos uploaded, watch times, etc.). Mapping indirect measures to direct measures could give you a good view in figuring out what you could do better next.</li>
</ul>

]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[How Instagram Moved to Python 3]]></title>
    <link href="http://euccas.github.io/blog/20170616/how-instagram-moved-to-python-3.html"/>
    <updated>2017-06-16T17:52:34-07:00</updated>
    <id>http://euccas.github.io/blog/20170616/how-instagram-moved-to-python-3</id>
    <content type="html"><![CDATA[<p>Instagram, the famous brunch sharing app, presented in <a href="https://us.pycon.org/2017/">PyCon 2017</a> and gave a talk in the keynote session on “How Instagram moves to Python 3”. If you have 15 minutes, read the interview with the speakers, Hui Ding and Lisa Guo from Instagram Infrastructure team, <a href="https://thenewstack.io/instagram-makes-smooth-move-python-3/]"><strong>here</strong></a>. If you have 45 minutes, watch their PyCon talk video, <a href="https://www.youtube.com/watch?v=66XoCk79kjM"><strong>here</strong></a>. If you have only 5 minutes, continue reading, <strong>right here</strong>.</p>

<p>Instagram’s backend, which serves over 400 million active users every day, is built on Python/Django stack. The decision on whether moving from Python 2 to Python 3, was really a decision on whether investing in a version of the language that was mature, but wasn’t going anywhere (Python 2 is expected to retire in 2020) – or the language that was the next version and had great and growing community support. The major motivations behind Instagram’s migration to Python 3 are:</p>

<ul>
  <li><strong>Typing support</strong> for dev velocity</li>
  <li>Better <strong>performance</strong> than Python 2</li>
  <li><strong>Community</strong> continues to make Python 3 better and faster</li>
</ul>

<p>The whole migration process took about 10 months, in roughly 3 stages.</p>

<!--more-->

<p><img class="center" src="http://euccas.github.io/images/post_images/2017/20170616-instagram_python3_00.png" width="520" /></p>

<ul>
  <li>First off, the migration was done directly on the Master Branch, which means the developers were adding new features to the code while migration was ongoing. So in the beginning of the Mirgration process, infrastructure added support of Python 3 on the Master Branch to make the code be able to run with both Python 2 and Python 3 environment.</li>
  <li>Massive code modification for 3 months, with the help of Python package <a href="https://pypi.python.org/pypi/modernize"><strong>“modernize”</strong></a>. Meanwhile, upgraded Third-party packages to Python 3 (working rule: <em>No Python 3, no new package</em>). Also deleted unused, incompatible packages.</li>
  <li>Intensive unit testing for 2 months. One limitation is data compatibility issues typically do not show up in unit tests.</li>
  <li>Production rollout for another 4 months (push Python 3 to every developer’s sandbox)</li>
</ul>

<p>In the talk, Lisa shared the challenges they faced in the migration process and how did they solved those problems.</p>

<ul>
  <li>Differences in <strong>unicode</strong>, <strong>str</strong>, <strong>bytes</strong>. Solved by using helper functions.</li>
  <li><strong>Pickle memcache data format incompatibility</strong> in Python 2 and Python 3. Solved by isolating memcaches for Python 2 and Python 3.</li>
  <li><strong>Iterator</strong> differences, such as <code>map</code>. Solved by converting all maps to list in Python 3.</li>
  <li><strong>Dictionary order</strong> is different in different Python versions, which caused differences in the dumped JSON data. Solved by forcing <code>sorted_keys</code> in <code>json.dump</code> function.</li>
  <li>With Python 3, while CPU instructions per request decreased by 12%, max requests per second (capacity) had 0% increase! Found the root cause in the code of checking memory configuration, and the issue was memory optimization condition was never met in Python 3 as <code>True</code> because of unicode issue. Solved by adding a magical character <strong>“b”</strong>, just like this:</li>