To install hadoop:
define at least JAVA_HOME to be the root of your Java installation. $ vim conf/hadoop-env.sh
$ bin/hadoop # try to see if you get its usage run example $ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop*examples*.jar grep input output 'dfs[a-z.]+' $ cat output/*
$ sudo apt-get install ssh $ sudo apt-get install rsync
Setup passphraseless ssh
check that you can ssh to the localhost without a passphrase: $ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands: $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Use the following conf/hadoop-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Start the hadoop daemons:
$ bin/start-all.sh
When you're done, stop the daemons with: $ bin/stop-all.sh
Run exampleCopy the files at the input and files into the distributed filesystem: $ bin/hadoop fs -put conf input
initially in HDFS there is only the following, but with the above command it creates /user/morteza/input and puts alll the stuff from input dir there /tmp/hadoop-morteza/mapred/system/jobtracker.info
Run some of the examples provided: $ bin/hadoop jar hadoop*examples*.jar grep input output 'dfs[a-z.]+'
Examine the output files: Copy the output files from the distributed filesystem to the local filesytem and examine them: $ bin/hadoop fs -get output output
View the output files on the distributed filesystem: $ bin/hadoop fs -cat output/*
Some commands$ bin/hadoop dfs -ls /
$ bin/hadoop dfsadmin -report # free disk usage
|