I’m in the process of deploying a scraper on a DigitalOcean instance. The scraper uses RSelenium
with the PhantomJS browser. I ran into a problem though. Although it worked flawlessly on my local machine, on the remote instance it broke with the following error:
Selenium message:Java heap space
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.OutOfMemoryError
Further Details: run errorDetails method
Execution halted
Clearly Java a memory issue.
Since the Selenium server is being launched from within R, I did not have direct access to the java
command line options. However, setting an environment variable to increase the heap space resolved the problem.
$ export _JAVA_OPTIONS="-Xmx1g"
The scraper is now chugging along happily and I’m moving on with my day.