segunda-feira, 31 de janeiro de 2011

Hadoop Hive in EC2 example

While researching Hadoop and Hive (for dw purposes), I found that
- Cloudera has some AMI which include Hadoop and Hive for Ec2 - I was able to easily use one of these AMI and perform some basic tests (hadoop was already installed in this instance, and all i had to do was "apt-get install hive"). The AMI was for a x86 server with AMI ID ami-ed59bf84
- sqoop is a tool to transfer data from and to RDBMS (Mysql or Postgres for example)
- Here is a project for tracking trends (using data from wikipedia) that uses Hadoop, Hive and EC2. I will give it a try soon.

I'm trying to perform some tests with these technologies, it's not easy because documentation is not easy to find. If you are doing the same please, comment this post and let me know your progresses.