Add data field for "taxa is present in Australia"

This is probably a pipelines process that will compare the taxon for a given record with a list of all known taxa for Australia.

The know list of Australian taxa should be derived from the ALA Biocache data, using a [filter for `country:Australia`](https://biocache-ws.ala.org.au/ws/occurrences/search?q=country%3AAustralia&qualityProfile=ALA) (uses AUS EEC layer).

CSV download:

~https://biocache.ala.org.au/occurrences/facets/download?q=*%3A*&qualityProfile=ALA&facets=taxon_name~
~https://biocache-ws.ala.org.au/ws/occurrences/facets/download?q=*%3A*&qualityProfile=ALA&facets=taxon_name~
~https://biocache-ws.ala.org.au/ws/occurrences/facets/download?q=country:Australia&qualityProfile=AVH&facets=taxonConceptID&count=true&file=AU_all_taxa_tc_counts.csv~

[https://biocache-ws.ala.org.au/ws/occurrences/facets/download?q=country:Australia&fq=taxonRankID:[6000 TO 7000]&qualityProfile=AVH&facets=scientificName,taxonConceptID&count=true&file=AU_all_taxa_counts](https%3A%2F%2Fbiocache-ws.ala.org.au%2Fws%2Foccurrences%2Ffacets%2Fdownload%3Fq%3Dcountry%3AAustralia%26fq%3DtaxonRankID%3A%5B6000%20TO%207000%5D%26qualityProfile%3DAVH%26facets%3DscientificName%2CtaxonConceptID%26count%3Dtrue%26file%3DAU_all_taxa_counts)

Trying to generate a list of taxa for that query using SOLR or biocache-service is difficult due to the huge result set size and the API times out trying. 

~~One option is to use SOLR with deep pagination using `cursors`. 
Another is to run the query on Pipelines via Spark and save the result in S3. This seems to be the safest and most reliable option.~~ Use the CSV download (above) to get data into Pipelines. The existing `species-list` pipeline would be a good starting point in the code. This pipeline accesses the ALA list API to pull down KV data and populate avro files using the taxon as a primary key. 

It needs a field name for this data, something like `presentInCountry:Australia`. There might be an existing term for this, so needs some research.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data field for "taxa is present in Australia" #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add data field for "taxa is present in Australia" #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions