-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathquickstart.html
More file actions
500 lines (456 loc) · 50.2 KB
/
quickstart.html
File metadata and controls
500 lines (456 loc) · 50.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
<!DOCTYPE html>
<html class="writer-html5" lang="English" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Users Guide — SKADA: Scikit Adaptation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=b86133f3" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=e59714d7" />
<link rel="stylesheet" type="text/css" href="_static/graphviz.css?v=4ae1632d" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery.css?v=d2d258e8" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-binder.css?v=f4aeca0c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-dataframe.css?v=2082cf3c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-rendered-html.css?v=1277b6f3" />
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=5e5be09d"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="API and modules" href="all.html" />
<link rel="prev" title="How to use SKADA" href="auto_examples/plot_how_to_use_skada.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
SKADA
<img src="_static/skada_logo_full_white.svg" class="logo" alt="Logo"/>
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="index.html">SKADA: SciKit Adaptation</a></li>
<li class="toctree-l1"><a class="reference internal" href="auto_examples/plot_how_to_use_skada.html">How to use SKADA</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Users Guide</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#sample-domain">Sample Domain</a></li>
<li class="toctree-l2"><a class="reference internal" href="#dataset">Dataset</a></li>
<li class="toctree-l2"><a class="reference internal" href="#adapters-and-estimators">Adapters and Estimators</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#adapter">Adapter</a></li>
<li class="toctree-l3"><a class="reference internal" href="#pipeline">Pipeline</a></li>
<li class="toctree-l3"><a class="reference internal" href="#selector">Selector</a></li>
<li class="toctree-l3"><a class="reference internal" href="#test-time-domain-adaptation">Test-time Domain Adaptation</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#model-selection">Model Selection</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#scoring">Scoring</a></li>
<li class="toctree-l3"><a class="reference internal" href="#splitters">Splitters</a></li>
<li class="toctree-l3"><a class="reference internal" href="#metrics-for-da">Metrics for DA</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="all.html">API and modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="auto_examples/index.html">Examples gallery</a></li>
<li class="toctree-l1"><a class="reference internal" href="releases.html">Release of SKADA</a></li>
<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing to SKADA</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">SKADA</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Users Guide</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/quickstart.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="users-guide">
<h1>Users Guide<a class="headerlink" href="#users-guide" title="Link to this heading"></a></h1>
<p>The core concept introduced with this API are the following:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> labels</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">DomainAwareDataset</span></code> API</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Adapter</span></code> interface</p></li>
<li><p>Pipeline, <code class="docutils literal notranslate"><span class="pre">make_da_pipeline</span></code> and selectors</p></li>
<li><p>Model selection (model scoring, splitters)</p></li>
</ul>
<p>The are a few test suites available to see examples, specifically</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_mapping.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_reweight.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_subspace.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_pipeline.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_scorer.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tests/test_cv.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">datasets/tests/test_samples_generator.py</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">datasets/tests/test_office.py</span></code></p></li>
</ul>
<p>To run all tests, simply execute</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pytest<span class="w"> </span>skada/<span class="w"> </span>--ignore<span class="o">=</span>skada/deep
</pre></div>
</div>
<p>A test suite for new datasets API is on its way. A separate test suite for new <code class="docutils literal notranslate"><span class="pre">Office31</span></code> dataset is already in there (note, it take a bit longer to run compared to other tests as it has to fetch datasets first).</p>
<section id="sample-domain">
<h2>Sample Domain<a class="headerlink" href="#sample-domain" title="Link to this heading"></a></h2>
<p>Typically, in supervised learning we deal with samples (<code class="docutils literal notranslate"><span class="pre">X</span></code>) and labels (<code class="docutils literal notranslate"><span class="pre">y</span></code>). Like that:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
</pre></div>
</div>
<p>With domain adaptation it's a bit more complicated as we have multiple <code class="docutils literal notranslate"><span class="pre">(X,</span> <span class="pre">y)</span></code> pairs originating from different domains. A core theme of the new API is an explicit labeling of domains per each sample: all methods (like <code class="docutils literal notranslate"><span class="pre">fit</span></code>, <code class="docutils literal notranslate"><span class="pre">predict</span></code>, <code class="docutils literal notranslate"><span class="pre">score</span></code>, <code class="docutils literal notranslate"><span class="pre">adapt</span></code> and others) takes additional argument <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code>. Each domain is assigned with an integer label. When passing into processing, source domains are marked with positive labels and target as negatives. A bunch of helpers are available to make work with domain labeling simple and straightforward. Common use case looks like</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">DomainAwareEstimator</span><span class="p">(</span><span class="n">CORALAdapter</span><span class="p">(),</span> <span class="n">LogisticRegression</span><span class="p">())</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain_train</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">,</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain_test</span><span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> could be skipped if a) there's a single source and single target domain, b) target labels are masked with the special value <code class="docutils literal notranslate"><span class="pre">-1</span></code>. In such a case, <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> would be automatically derived. In other scenarios, <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> is required.</p>
</section>
<section id="dataset">
<h2>Dataset<a class="headerlink" href="#dataset" title="Link to this heading"></a></h2>
<p>The skada.datasets.DomainAwareDataset class acts as a dataset container for all domains. Its API is built around two main methods: <code class="docutils literal notranslate"><span class="pre">add_domain</span></code> and <code class="docutils literal notranslate"><span class="pre">pack</span></code>.</p>
<p>The class is initially empty. Data and labels of a new domain can be added to the dataset with <code class="docutils literal notranslate"><span class="pre">add_domain</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">datasets</span> <span class="o">=</span> <span class="n">DomainAwareDataset</span><span class="p">()</span>
<span class="n">datasets</span><span class="o">.</span><span class="n">add_domain</span><span class="p">(</span><span class="n">X_subj1</span><span class="p">,</span> <span class="n">y_subj1</span><span class="p">,</span> <span class="n">domain_name</span><span class="o">=</span><span class="s2">"subj_1"</span><span class="p">)</span>
<span class="n">datasets</span><span class="o">.</span><span class="n">add_domain</span><span class="p">(</span><span class="n">X_subj3</span><span class="p">,</span> <span class="n">y_subj3</span><span class="p">,</span> <span class="n">domain_name</span><span class="o">=</span><span class="s2">"subj_3"</span><span class="p">)</span>
<span class="n">datasets</span><span class="o">.</span><span class="n">add_domain</span><span class="p">(</span><span class="n">X_subj12</span><span class="p">,</span> <span class="n">y_subj12</span><span class="p">,</span> <span class="n">domain_name</span><span class="o">=</span><span class="s2">"subj_12"</span><span class="p">)</span>
</pre></div>
</div>
<p>A domain label (int) is assigned to each domain in the order they were provided. For example, here the domain "subj_1" will have the domain label 1 and "subj_12" will have the domain label 3.</p>
<p>Once all the desired domains have been included in the dataset, the <code class="docutils literal notranslate"><span class="pre">pack</span></code> method is used to aggregate the selected domains and create the associated <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code>, depending on whether the domains is denoted as <code class="docutils literal notranslate"><span class="pre">source</span></code> or as <code class="docutils literal notranslate"><span class="pre">target</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'subj_12'</span><span class="p">,</span> <span class="s1">'subj_1'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'subj_3'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> values are generated by taking the domain labels and changing their sign, according to the convention that source gets non-negative integer (1,2,..) and target always gets negative (-1,-2,...). In the previous example, the sample domain values for 'subj_12' and 'subj_3' will be 3 and -2, respectively.</p>
<p><code class="docutils literal notranslate"><span class="pre">mask_target_labels</span></code> is a mandatory parameter of the <code class="docutils literal notranslate"><span class="pre">pack</span></code> method.
With mask_target_labels set to True, the labels y of the target domains are masked (set to -1 for classfication and nan for regression), which enables unsupervised domain adaptation.
With mask_target_labels set to False, labels are returned for all domains, which is useful for supervised evaluation or analysis.</p>
<p>Working with an estimator with a new API would look like the following:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">office31</span> <span class="o">=</span> <span class="n">fetch_office31_surf_all</span><span class="p">()</span>
<span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'amazon'</span><span class="p">,</span> <span class="s1">'dslr'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'webcam'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">estimator</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span><span class="n">CORALAdapter</span><span class="p">(),</span><span class="n">LogisticRegression</span><span class="p">())</span>
<span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain</span><span class="p">)</span>
<span class="c1"># predict and score on target domain</span>
<span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'webcam'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">webcam_idx</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">select_domain</span><span class="p">(</span><span class="n">sample_domain</span><span class="p">,</span> <span class="s1">'webcam'</span><span class="p">)</span>
<span class="n">y_target</span> <span class="o">=</span> <span class="n">estimator</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">,[</span><span class="n">webcam_idx</span><span class="p">],</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain</span><span class="p">[</span><span class="n">webcam_idx</span><span class="p">])</span>
<span class="n">score</span> <span class="o">=</span> <span class="n">estimator</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">[</span><span class="n">webcam_idx</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">y_test</span><span class="p">[</span><span class="n">webcam_idx</span><span class="p">],</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain</span><span class="p">[</span><span class="n">webcam_idx</span><span class="p">])</span>
<span class="c1"># pick multiple domains</span>
<span class="n">source_idx</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">select_domain</span><span class="p">(</span><span class="n">sample_domain</span><span class="p">,</span> <span class="p">(</span><span class="s1">'amazon'</span><span class="p">,</span> <span class="s1">'dslr'</span><span class="p">))</span>
<span class="c1"># or using markers from `DomainAware*` API (see description below)</span>
<span class="n">target_idx</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">select_domain</span><span class="p">(</span><span class="n">sample_domain</span><span class="p">,</span> <span class="n">DomainAwareEstimator</span><span class="o">.</span><span class="n">INCLUDE_ALL_TARGETS</span><span class="p">)</span>
<span class="c1"># generic helper to simplify flow when the dataset is created "on the fly"</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">skada.datasets</span><span class="w"> </span><span class="kn">import</span> <span class="n">select_domain</span>
<span class="n">source_idx</span> <span class="o">=</span> <span class="n">select_domain</span><span class="p">(</span><span class="n">office31</span><span class="o">.</span><span class="n">domain_names</span><span class="p">,</span> <span class="n">sample_domain</span><span class="p">,</span> <span class="p">(</span><span class="s1">'amazon'</span><span class="p">,</span> <span class="s1">'dslr'</span><span class="p">))</span>
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">pack</span></code> method is also compatible with fetchers, like:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">office31</span> <span class="o">=</span> <span class="n">fetch_office31_surf_all</span><span class="p">()</span>
<span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'amazon'</span><span class="p">,</span> <span class="s1">'dslr'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'webcam'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">pack</span></code> has an optional <code class="docutils literal notranslate"><span class="pre">return_X_y</span></code> argument (defaults to <code class="docutils literal notranslate"><span class="pre">True</span></code>). When this argument is set to <code class="docutils literal notranslate"><span class="pre">False</span></code>, the method returns <code class="docutils literal notranslate"><span class="pre">Bunch</span></code> object with the following set of keys:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">office31</span> <span class="o">=</span> <span class="n">fetch_all_office31_surf</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="n">office31</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'amazon'</span><span class="p">,</span> <span class="s1">'dslr'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'webcam'</span><span class="p">],</span> <span class="n">return_X_y</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">data</span><span class="o">.</span><span class="n">keys</span><span class="p">()</span>
<span class="go">dict_keys(['X', 'y', 'sample_domain', 'domain_names'])</span>
</pre></div>
</div>
<p>This is mostly to cover use cases where you need access to <code class="docutils literal notranslate"><span class="pre">'domain_names'</span></code> labels. Since labels are assigned in the order that datasets are provided, it should make it easier to "reconstruct" labels even working with tuple output (without access to <code class="docutils literal notranslate"><span class="pre">Bunch</span></code> object). Absolute value of the label is always static for a given domain name, for example if "amazon" domain gets index 2 it will be included in <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> as 2 when included as source and -2 when included as target. Such convention is required to avoid fluctuations of domain labels (otherwise multi-estimator API won't be possible).</p>
</section>
<section id="adapters-and-estimators">
<h2>Adapters and Estimators<a class="headerlink" href="#adapters-and-estimators" title="Link to this heading"></a></h2>
<section id="adapter">
<h3>Adapter<a class="headerlink" href="#adapter" title="Link to this heading"></a></h3>
<p>The next building block for domain adaptation API is "Adapter" (see <code class="docutils literal notranslate"><span class="pre">skada.base.BaseAdapter</span></code> for details). The job of the adapter is to transform source and target samples (and, possibly, labels or weights) into the output space where estimator is going to be defined. "Adapter" is defined by providing <code class="docutils literal notranslate"><span class="pre">fit</span></code> and <code class="docutils literal notranslate"><span class="pre">adapt</span></code> methods (the closest analogy for adapters is <code class="docutils literal notranslate"><span class="pre">sklearn</span></code> transformers, typical workflow is also similar).</p>
<p>The list of adapters that were moved to a new API:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">ClassRegularizerOTMappingAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">CORALAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">EntropicOTMappingAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">LinearOTMappingAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">OTMappingAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">DiscriminatorReweightAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">GaussianReweightAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">KLIEPReweightAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">DensityReweightAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">SubspaceAlignmentAdapter</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">TransferComponentAnalysisAdapter</span></code></p></li>
</ul>
</section>
<section id="pipeline">
<h3>Pipeline<a class="headerlink" href="#pipeline" title="Link to this heading"></a></h3>
<p>You can create a domain aware estimator as the pipeline that combines together adapter of your choice (to perform transformation) and the estimator (well, as an estimator):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">skada</span><span class="w"> </span><span class="kn">import</span> <span class="n">make_da_pipeline</span>
<span class="n">estimator</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">CORALAdapter</span><span class="p">(),</span>
<span class="n">LogisticRegression</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain</span><span class="p">)</span>
</pre></div>
</div>
<p>The helper function <code class="docutils literal notranslate"><span class="pre">make_da_pipeline</span></code> creates a built-in <code class="docutils literal notranslate"><span class="pre">sklearn.pipeline.Pipeline</span></code> meta-estimator, which exposes all estimator-related calls (like <code class="docutils literal notranslate"><span class="pre">fit</span></code> and <code class="docutils literal notranslate"><span class="pre">predict</span></code>), it also defines additional methods based on the functionality provided in the base estimator (like <code class="docutils literal notranslate"><span class="pre">predict_proba</span></code> or <code class="docutils literal notranslate"><span class="pre">score</span></code>). It also has a special method <code class="docutils literal notranslate"><span class="pre">adapt</span></code> to perform transformation without passing it into estimator.</p>
<p>Feel free to stack more transformers as necessary:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">estimator</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">StandardScaler</span><span class="p">(),</span>
<span class="n">PCA</span><span class="p">(),</span>
<span class="n">CORALAdapter</span><span class="p">(),</span>
<span class="n">LogisticRegression</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span><span class="o">=</span><span class="n">sample_domain</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="selector">
<h3>Selector<a class="headerlink" href="#selector" title="Link to this heading"></a></h3>
<p><code class="docutils literal notranslate"><span class="pre">Shared</span></code> is a simplest select that always returns the same entity (note that <code class="docutils literal notranslate"><span class="pre">BaseAdapter</span></code> is also sklearn estimator for additional conveniences). Also note, that a single adapter and/or estimator would still work on multiple domains by concatenating them. Other selectors available:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">PerDomainSelector</span></code> (single base adapter/estimator, cloned and fitted per each domain)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">SourceTargetSelector</span></code> (one adapter/estimator for all sources, one for all targets)</p></li>
</ul>
<p>Even though, as of now, we don't have any adapters that would be reasonable to split per domain, - when they are ready, the usage would look as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">estimator</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">OTMappingAdapter</span><span class="p">(),</span>
<span class="n">PerDomain</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
<span class="p">)</span>
<span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">sample_domain</span><span class="p">)</span>
</pre></div>
</div>
<p>If you have the scenario that fits neither, low-level API is available at your convenience (see the section below).</p>
</section>
<section id="test-time-domain-adaptation">
<h3>Test-time Domain Adaptation<a class="headerlink" href="#test-time-domain-adaptation" title="Link to this heading"></a></h3>
<p>When working with multiple domains, <code class="docutils literal notranslate"><span class="pre">predict</span></code> only respects domains that were seen during the fitting. For doing test-time domain adaptation (when new adapter or estimator is fit at a test time) <code class="docutils literal notranslate"><span class="pre">update</span></code> and <code class="docutils literal notranslate"><span class="pre">update_predict</span></code> methods are available. Those work the same way as <code class="docutils literal notranslate"><span class="pre">fit</span></code> and <code class="docutils literal notranslate"><span class="pre">fit_predict</span></code> with the only difference that they take in new domains (previously unseen).</p>
</section>
</section>
<section id="model-selection">
<h2>Model Selection<a class="headerlink" href="#model-selection" title="Link to this heading"></a></h2>
<p>The implementation is largely compatible with scikit-learn's model selection tools, such as <code class="docutils literal notranslate"><span class="pre">cross_validate</span></code> and <code class="docutils literal notranslate"><span class="pre">GridSearchCV</span></code>. When using these tools, the <code class="docutils literal notranslate"><span class="pre">sample_domain</span></code> should be included in the <code class="docutils literal notranslate"><span class="pre">params</span></code> dictionary passed to the respective method. For practical usage examples, refer to the tests in <code class="docutils literal notranslate"><span class="pre">skada/tests/test_cv.py</span></code>, which showcase how to integrate these splitters with scikit-learn's model selection framework effectively.</p>
<section id="scoring">
<h3>Scoring<a class="headerlink" href="#scoring" title="Link to this heading"></a></h3>
<p>The library ships a few scorers for domain adaptation models. The following scorers are plug-and-play compatible:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">ImportanceWeightedScorer</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">PredictionEntropyScorer</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">SoftNeighborhoodDensity</span></code></p></li>
</ul>
<p>See API usage examples in <code class="docutils literal notranslate"><span class="pre">skada/tests/test_scorer.py</span></code>.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">SupervisedScorer</span></code> is a unique scorer that necessitates special consideration. Since it requires access to target labels, which are masked during the dataset packing process for training, this scorer mandates an additional key to be passed within the <code class="docutils literal notranslate"><span class="pre">params</span></code>. The usage is as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">da_dataset</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'s'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'t'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">estimator</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">DensityReweightAdapter</span><span class="p">(),</span>
<span class="n">LogisticRegression</span><span class="p">()</span><span class="o">.</span><span class="n">set_score_request</span><span class="p">(</span><span class="n">sample_weight</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">cv</span> <span class="o">=</span> <span class="n">ShuffleSplit</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">_</span><span class="p">,</span> <span class="n">target_labels</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">da_dataset</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'s'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'t'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">scoring</span> <span class="o">=</span> <span class="n">SupervisedScorer</span><span class="p">()</span>
<span class="n">scores</span> <span class="o">=</span> <span class="n">cross_validate</span><span class="p">(</span>
<span class="n">estimator</span><span class="p">,</span>
<span class="n">X</span><span class="p">,</span>
<span class="n">y</span><span class="p">,</span>
<span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">'sample_domain'</span><span class="p">:</span> <span class="n">sample_domain</span><span class="p">,</span> <span class="s1">'target_labels'</span><span class="p">:</span> <span class="n">target_labels</span><span class="p">},</span>
<span class="n">scoring</span><span class="o">=</span><span class="n">scoring</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
</div>
<p>The code fails if the validation uses <code class="docutils literal notranslate"><span class="pre">SupervisedScorer</span></code> but <code class="docutils literal notranslate"><span class="pre">target_labels</span></code> are not provided.</p>
</section>
<section id="splitters">
<h3>Splitters<a class="headerlink" href="#splitters" title="Link to this heading"></a></h3>
<p>The library includes a range of splitters designed specifically for domain adaptation scenarios.</p>
<p><code class="docutils literal notranslate"><span class="pre">skada.model_selection.SourceTargetShuffleSplit</span></code>: This splitter functions similarly to the standard <code class="docutils literal notranslate"><span class="pre">ShuffleSplit</span></code> but takes into account the distinct separation between source and target domains. It follows the standard API structure:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">da_dataset</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="n">as_sources</span><span class="o">=</span><span class="p">[</span><span class="s1">'s'</span><span class="p">,</span> <span class="s1">'s2'</span><span class="p">],</span> <span class="n">as_targets</span><span class="o">=</span><span class="p">[</span><span class="s1">'t'</span><span class="p">,</span> <span class="s1">'t2'</span><span class="p">],</span> <span class="n">mask_target_labels</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">pipe</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">SubspaceAlignmentAdapter</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">2</span><span class="p">),</span>
<span class="n">LogisticRegression</span><span class="p">(),</span>
<span class="p">)</span>
<span class="n">n_splits</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">cv</span> <span class="o">=</span> <span class="n">SourceTargetShuffleSplit</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="n">n_splits</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">scores</span> <span class="o">=</span> <span class="n">cross_validate</span><span class="p">(</span>
<span class="n">pipe</span><span class="p">,</span>
<span class="n">X</span><span class="p">,</span>
<span class="n">y</span><span class="p">,</span>
<span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">'sample_domain'</span><span class="p">:</span> <span class="n">sample_domain</span><span class="p">},</span>
<span class="n">scoring</span><span class="o">=</span><span class="n">PredictionEntropyScorer</span><span class="p">(),</span>
<span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">skada.model_selection.LeaveOneDomainOut</span></code> is a cross-validator that, in each iteration, randomly selects a single domain to serve as the target. After this selection, the train/test split is performed using the <code class="docutils literal notranslate"><span class="pre">ShuffleSplit</span></code> algorithm. The <code class="docutils literal notranslate"><span class="pre">max_n_splits</span></code> parameter limits the number of splits; in its absence, each domain is used as a target exactly once.</p>
<p>This splitter requires the dataset to be specially prepared so that each domain is represented as both a source and a target simultaneously. This preparation can be achieved using the <code class="docutils literal notranslate"><span class="pre">pack_lodo</span></code> method. An example is provided below for clarity:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_domain</span> <span class="o">=</span> <span class="n">da_dataset</span><span class="o">.</span><span class="n">pack_lodo</span><span class="p">()</span>
<span class="n">pipe</span> <span class="o">=</span> <span class="n">make_da_pipeline</span><span class="p">(</span>
<span class="n">SubspaceAlignmentAdapter</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">2</span><span class="p">),</span>
<span class="n">LogisticRegression</span><span class="p">(),</span>
<span class="p">)</span>
<span class="n">cv</span> <span class="o">=</span> <span class="n">LeaveOneDomainOut</span><span class="p">(</span><span class="n">max_n_splits</span><span class="o">=</span><span class="n">max_n_splits</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">scores</span> <span class="o">=</span> <span class="n">cross_validate</span><span class="p">(</span>
<span class="n">pipe</span><span class="p">,</span>
<span class="n">X</span><span class="p">,</span>
<span class="n">y</span><span class="p">,</span>
<span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">'sample_domain'</span><span class="p">:</span> <span class="n">sample_domain</span><span class="p">},</span>
<span class="n">scoring</span><span class="o">=</span><span class="n">PredictionEntropyScorer</span><span class="p">(),</span>
<span class="p">)</span>
</pre></div>
</div>
<p>More examples demonstrating the usage of splitters and scorers can be found in
the <code class="docutils literal notranslate"><span class="pre">skada/tests/test_cv.py</span></code> test suite.</p>
</section>
<section id="metrics-for-da">
<h3>Metrics for DA<a class="headerlink" href="#metrics-for-da" title="Link to this heading"></a></h3>
<p>To evaluate an estimator or to select the best parameters for it, it is
necessary to define a score. In <a class="reference external" href="https://scikit-learn.org/">sklearn</a>,
several functions and objects can make use of the scoring API like
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score">cross_val_score</a>
or
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV">GridSearchCV</a>.
To avoid overfitting, these methods split the initial data into training
set and test set. The training set is used to fit the estimator and the
test set is used to compute the score.</p>
<p>In domain adaptation (DA) problems, source data and target data have a
shift in their distributions.</p>
<p>Let's load a DA dataset:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>>>> from skada.datasets import make_shifted_datasets
>>> from skada import EntropicOTmapping
>>> from skada.metrics import TargetAccuracyScorer
>>> RANDOM_SEED = 0
>>> X, y, X_target, y_target = make_shifted_datasets(
... n_samples_source=30,
... n_samples_target=20,
... shift="covariate_shift",
... label="binary",
... noise=0.4,
... random_state=RANDOM_SEED,
... )
</pre></div>
</div>
<p>Now let's define a DA estimator to evaluate on this data:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>>>> from skada import DensityReweight
>>> from sklearn.linear_model import LogisticRegression
>>> base_estimator = LogisticRegression()
>>> estimator = DensityReweight(base_estimator=base_estimator)
</pre></div>
</div>
<p>Having a distribution shift between the two domains means that if the
validation is done on samples from source like shown in the images
below, there is high chance that the score does not reflect the score on
target because the distributions are different.</p>
<p><img alt="Source Only Scorer" src="_images/source_only_scorer.png" />{width="400px"
height="240px"}</p>
<p>To evaluate the estimator on the source data, one can use:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.model_selection import ShuffleSplit
>>> cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)
>>> cross_val_score(
... estimator,
... X,
... y,
... cv=cv,
... fit_params={'X_target': X_target},
... scoring=None,
... )
array([0.72222222, 0.83333333, 0.81944444])
</pre></div>
</div>
<p>skada offers a way to do the evaluation on the target data, while
reusing the scikit-learn methods and scoring API.</p>
<p>Different methods are available, to start we will use
skada.metrics.SupervisedScorer that computes the score on the target
domain:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>>>> from skada.metrics import SupervisedScorer
>>> cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)
>>> cross_val_score(
... estimator,
... X,
... y,
... cv=cv,
... fit_params={'X_target': X_target},
... scoring=SupervisedScorer(X_target, y_target),
... )
array([0.975 , 0.95625, 0.95625])
</pre></div>
</div>
</section>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="auto_examples/plot_how_to_use_skada.html" class="btn btn-neutral float-left" title="How to use SKADA" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="all.html" class="btn btn-neutral float-right" title="API and modules" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright 2023, The SKADA team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" data-toggle="rst-versions" role="note"
aria-label="versions">
<!-- add shift_up to the class for force viewing ,
data-toggle="rst-current-version" -->
<span class="rst-current-version" style="margin-bottom:1mm;">
<span class="fa fa-book"> SKADA: SciKit-ADAptation</span> 0.5.0
<hr style="margin-bottom:1.5mm;margin-top:5mm;">
<!-- versions
<span class="fa fa-caret-down"></span>-->
<span class="rst-current-version" style="display: inline-block;padding:
0px;color:#fcfcfcab;float:left;font-size: 100%;">
Versions:
<a href="https://scikit-adaptation.github.io/"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Release</a>
<a href="https://scikit-adaptation.github.io/dev"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Development</a>
<a href="https://github.com/scikit-adaptation/skada"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Code</a>
</span>
</span>
<!--
<div class="rst-other-versions">
<div class="injected">
<dl>
<dt>Versions</dt>
<dd><a href="https://pythonot.github.io/">Release</a></dd>
<dd><a href="https://pythonot.github.io/master">Development</a></dd>
<dt><a href="https://github.com/PythonOT/POT">Code on Github</a></dt>
</dl>
<hr>
</div>
</div>-->
</div><script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>