One of the hurdles to quick development in Apache Spark is having to set up a working cluster to test on. And even if you do have a working cluster, how to you test your code when you’re on a train with an intermittent internet connection.
Maybe installing it locally, as wll as Java, Python and all the other bits, hopefully with the right versions and they hopefully won’t clash with the versions you already have. Hang on, this is getting messy.
If only there was a quick way to spin up a local cluster that didn’t interfere with the stuff already on your machine. Let’s do this in Docker.
Docker allows us to run virtual machines on your desktop. The machine “images” already have the software you need installed on them and all you need to do is tell docker what network ports to open and how to connect multiple images together.
The following docker-compose.yml file will setup all the networking and connect to your local drive, enabling access to any data and scripts, allowing you to edit them locally.
spark-master: image: timvw74/spark command: bin/spark-class org.apache.spark.deploy.master.Master -h spark-master hostname: spark-master environment: MASTER: spark://spark-master:7077 SPARK_CONF_DIR: /conf SPARK_PUBLIC_DNS: 127.0.0.1 expose: - 7001 - 7002 - 7003 - 7004 - 7005 - 7006 - 7077 - 6066 ports: - 4040:4040 - 6066:6066 - 7077:7077 - 8080:8080 volumes: - ./conf/spark-master:/conf - ./data:/tmp/data spark-worker-1: image: timvw74/spark command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 hostname: spark-worker-1 environment: SPARK_CONF_DIR: /conf SPARK_PUBLIC_DNS: 127.0.0.1 SPARK_WORKER_CORES: 2 SPARK_WORKER_MEMORY: 2g SPARK_WORKER_PORT: 8881 SPARK_WORKER_WEBUI_PORT: 8081 links: - spark-master expose: - 7012 - 7013 - 7014 - 7015 - 7016 - 8881 ports: - 8081:8081 volumes: - ./conf/spark-worker-1:/conf - ./data:/tmp/data
Just copy the docker-compose.yml file into a folder and launch the containers by issuing ‘docker-compose up’ This will download the images if they haven’t yet been downloaded, setup the network and drive shares then start the instances.