I am trying to learn web scraping using Apify SDK. The basic example for this utility loads in two modules, then executes a short scraping script:
module 1) apify
module 2) request-promise
The module require debugger succeeds in loading these modules via the Global Leaks Pattern:
Apify = require('apify@0.17.0/build/index.js').catch(() => window["_events"])
requestPromise = require('request-promise@4.2.5/lib/rp.js').catch(() => window["Bluebird"])
However, when plugging in the basic example scraping script, it fails with an error TypeError: Cannot read property 'main' of undefined
.
The console identifies that the apify module attempts to load subsequent ‘require’ operations, but that it fails—presumably b/c they are written to use node + npm and need alternative require
patterns in Observable?
To be specific, one of two places the console is telling me that it is failing is when loading this index.js (apparently from within the apify module loaded via npm):
"use strict";
var _events = _interopRequireDefault(require("events"));
var _log = _interopRequireDefault(require("apify-shared/log"));
var _consts = require("apify-shared/consts");
var _actor = require("./actor");
var _autoscaled_pool = _interopRequireDefault(require("./autoscaling/autoscaled_pool"));
var _basic_crawler = _interopRequireDefault(require("./crawlers/basic_crawler"));
var _cheerio_crawler = _interopRequireDefault(require("./crawlers/cheerio_crawler"));
var _dataset = require("./dataset");
var _events2 = _interopRequireWildcard(require("./events"));
var _key_value_store = require("./key_value_store");
var _puppeteer = require("./puppeteer");
var _puppeteer_crawler = _interopRequireDefault(require("./crawlers/puppeteer_crawler"));
var _puppeteer_pool = _interopRequireDefault(require("./puppeteer_pool"));
var _request = _interopRequireDefault(require("./request"));
var _request_list = require("./request_list");
var _request_queue = require("./request_queue");
var _settings_rotator = _interopRequireDefault(require("./settings_rotator"));
var _utils = require("./utils");
var _puppeteer_utils = require("./puppeteer_utils");
var _utils_social = require("./utils_social");
var _enqueue_links = require("./enqueue_links/enqueue_links");
var _pseudo_url = _interopRequireDefault(require("./pseudo_url"));
var _live_view_server = _interopRequireDefault(require("./live_view/live_view_server"));
var _utils_request = require("./utils_request");
var _session_pool = require("./session_pool/session_pool");
var _session = require("./session_pool/session");
... [I cut the rest for space considerations]
So what to do? Is it possible to get around this by manually requiring in each of the failed modules within modules? Or is this not advisable?
I’d greatly appreciate any insights.
Here’s my attempt at reproducing the example:
Thank you!
Other References Consulted: