Announcement Announcement Module
Collapse
No announcement yet.
Beginner's help! Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beginner's help!

    Greeting everyone, I'm a computer/ electronical engineer with a physics Ph.D background. I'm new to the Hadoop scene on Fedora 17 linux, I have Apache Hadoop, Hive, Pig, and Sqoop, and Spring installed. Every time I execute it from any of the programs from terminal, I get the following output:
    Code:
    [eddie_nygma@localhost ~]$ hadoop
    +======================================================================+
    |      Error: JAVA_HOME is not set and Java could not be found         |
    +----------------------------------------------------------------------+
    | Please download the latest Sun JDK from the Sun Java web site        |
    |       > http://java.sun.com/javase/downloads/ <                      |
    |                                                                      |
    | Hadoop requires Java 1.6 or later.                                   |
    | NOTE: This script will find Sun Java whether you install using the   |
    |       binary or the RPM based installer.                             |
    +======================================================================+
    So after researching up the problem it turns out I need edit my /etc/hadoop-0.20/conf/hadoop-env.sh file, what's going on here, I checked the config file and here are the default settings:
    Code:
    # Set Hadoop-specific environment variables here.
    
    # The only required environment variable is JAVA_HOME.  All others are
    # optional.  When running a distributed configuration it is best to
    # set JAVA_HOME in this file, so that it is correctly defined on
    # remote nodes.
    
    # The java implementation to use.  Required.
    # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
    
    # Extra Java CLASSPATH elements.  Optional.
    # export HADOOP_CLASSPATH=
    
    # The maximum amount of heap to use, in MB. Default is 1000.
    # export HADOOP_HEAPSIZE=2000
    
    # Extra Java runtime options.  Empty by default.
    # export HADOOP_OPTS=-server
    
    # Command specific options appended to HADOOP_OPTS when specified
    export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
    export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
    export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
    export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
    export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
    # export HADOOP_TASKTRACKER_OPTS=
    # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
    # export HADOOP_CLIENT_OPTS
    
    # Extra ssh options.  Empty by default.
    # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
    
    # Where log files are stored.  $HADOOP_HOME/logs by default.
    # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
    
    # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
    # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
    
    # host:path where hadoop code should be rsync'd from.  Unset by default.
    # export HADOOP_MASTER=master:/home/$USER/src/hadoop
    
    # Seconds to sleep between slave commands.  Unset by default.  This
    # can be useful in large clusters, where, e.g., slave rsyncs can
    # otherwise arrive faster than the master can service them.
    # export HADOOP_SLAVE_SLEEP=0.1
    
    # The directory where pid files are stored. /tmp by default.
    # export HADOOP_PID_DIR=/var/hadoop/pids
    
    # A string representing this instance of hadoop. $USER by default.
    # export HADOOP_IDENT_STRING=$USER
    
    # The scheduling priority for daemon processes.  See 'man nice'.
    # export HADOOP_NICENESS=10
    So my question is, since I've never edited a config file before 'cause I'm a newbie to linux, what lines in this file do I need to edit and what do lines of code do I need to put in it's place, that is, how should the default settings lines look in order to get Hadoop running?

  • #2
    Hi Riddler,

    Since this is a Spring Hadoop specific forum, typically you'll find answer on Spring and Hadoop. Since you're looking for help on getting started with Hadoop and setting up the cluster I recommend checking out the Hadoop site/mailing lists.
    There are distros out there that make it somewhat easier to install Hadoop then the vanilla install (such as Greenplum, cloudera or Hortonworks) however the basics are the same.
    It will take a while to get started as there are a lot of moving pieces and for those unfamiliar is not clear what's going on.
    As an alternative you could try running against an already installed cluster - there are various virtual images floating around the internet or you could try something like Amazon EMR.

    Hope this helps,

    Comment

    Working...
    X