In the Figure 4 of the paper it is written that supervised mode of SCOPE was run by using known population allele frequencies from TGP superpopulations to infer continental ancestry for all individuals in the UK Biobank.
Is it possible to share this code please? It would help to see how you managed to make sure that the ordering of the SNPs match between the frequency file (from TGP) and the UKB dataset. Did you have to filter the UKB PLINK files (post-QC and LD pruned) to contain only the SNPs that overlap with those in TGP? Does the frequency file passed to the -freq flag need to contain the columns CLST and MAF (as shown in the frequency file provided in #1), and it must be labelled as such? That is, does SCOPE look for columns CLST and MAF in supervised mode?
Thanks for your consideration.
In the Figure 4 of the paper it is written that supervised mode of SCOPE was run by using known population allele frequencies from TGP superpopulations to infer continental ancestry for all individuals in the UK Biobank.
Is it possible to share this code please? It would help to see how you managed to make sure that the ordering of the SNPs match between the frequency file (from TGP) and the UKB dataset. Did you have to filter the UKB PLINK files (post-QC and LD pruned) to contain only the SNPs that overlap with those in TGP? Does the frequency file passed to the
-freqflag need to contain the columns CLST and MAF (as shown in the frequency file provided in #1), and it must be labelled as such? That is, does SCOPE look for columns CLST and MAF in supervised mode?Thanks for your consideration.