Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 2.5 KB

README.md

File metadata and controls

62 lines (40 loc) · 2.5 KB

Titanic

The classic Titanic data science project using Spark and AWS EMR.

Please refer to ./titanic.ipynb for the code and the wiki for the report.

Instructions: Setup Local Development Environment

Instructions: AWS EMR

  1. Set up SSH Tunnel
  2. Configure SOCKS Proxy in Browser (using Foxy Proxy)
  3. Test the proxy: http://master-public-dns-name/
  4. Accesss the Web Interfaces:
    • Zepplin (Notebook): 8890
    • Spark (Cluster log): 18080
    • Ganglia (Monitoring): */ganglia/
    • Hadoop (MapReduce): 8088/cluster
  • Install Git on AWS EMR: sudo yum install git-all

Reference

Troubleshooting