Some extraction functions (starter, namer, etc.) might need access to the page data. Most use the naive approach and just call self.getPage(url), which leads to another HTTP request and another parsing of the page. This is not ideal.
Check which is the better design:
- Cache everything
getPage does for some time? (Maybe just the current page and be done with it?)
- Rework methods which need this data to get them as a parameter?
- Other?
Some extraction functions (starter, namer, etc.) might need access to the page data. Most use the naive approach and just call
self.getPage(url), which leads to another HTTP request and another parsing of the page. This is not ideal.Check which is the better design:
getPagedoes for some time? (Maybe just the current page and be done with it?)