Rss订阅

首页 »Java教程 » hadoop:基于Hadoop的Map reduce编程(一) »正文

hadoop:基于Hadoop的Map reduce编程(一)

来源: 发布时间:星期四, 2009年1月8日浏览:22次评论:0

　　翻译

篇国外

有关hadoop mapreduce

文章

文章比较长

先翻译第

部分吧

　　翻译者:pconlin900

　　博客:http://pconline900.javaeye.com

　　Hadoop是apache

个开源

map-reduce框架

MapReduce是

个并行计算模型

用来处理海量数据

模型思想来源于google

Jeffrey Dean 和 Sanjay Ghemawat

包括map

reduce

两个主要

功能

　　这是

个很简单

类似于Hadoop

MapReduce应用例子

应用了mapreduce

基本思想

可以帮助理解hadoop

处理思想和技术

但注意

它没有使用hadoop框架

　　例子

功能是创建

些

串

然后统计这些

串里面每个

出现

次数

最后汇总得到总

出现次数

　　Listing 1. 主

public  Main
{
　　public  void (String args)
　　{
　　　　MyMapReduce my =  MyMapReduce;
　　　　my.init;
　　}
}
Listing 2. MyMapReduce.java
import java.util.*;
public  MyMapReduce
{
List buckets =  ArrayList;
List ermediateresults =  ArrayList;
List values =  ArrayList;
public void init
{
for( i = 1; i<=30; i)
{
values.add("http://pconline900.javaeye.com" +  Integer(i).toString);
}
　　
.out.prln("**STEP 1 START**-> Running Conversion o Buckets**");
.out.prln;
List b = step1ConvertIntoBuckets(values,5);
　　　　.out.prln("************STEP 1 COMPLETE*************");
　　　　.out.prln;
　　　　.out.prln;
　 .out.prln("**STEP 2 START**->Running **Map Function** concurrently for all　　　　Buckets");
.out.prln;
List res = step2RunMapFunctionForAllBuckets(b);
.out.prln("************STEP 2 COMPLETE*************");
　　　　.out.prln;
　　　　.out.prln;
.out.prln("**STEP 3 START**->Running **Reduce Function** for collating Intermediate Results and Pring Results");
.out.prln;
step3RunReduceFunctionForAllBuckets(res);
.out.prln("************STEP 3 COMPLETE*************");
　　　　　　　　　.out.prln("************pconline900 翻译*************");
　　　　　　　　　.out.prln("***********博客:http://pconline900.javaeye.com*************");
}
public List step1ConvertIntoBuckets(List list, numberofbuckets)
{
 n = list.size;
 m = n / numberofbuckets;
 rem = n% numberofbuckets;
 count = 0;
.out.prln("BUCKETS");
for( j =1; j<= numberofbuckets; j)
{
List temp =  ArrayList;
for( i=1; i<= m; i)
{
temp.add((String)values.get(count));
count;
}
buckets.add(temp);
temp =  ArrayList;
}
(rem != 0)
{
List temp =  ArrayList;
for( i =1; i<=rem;i)
{
temp.add((String)values.get(count));
count;
}
buckets.add(temp);
}
　　　　.out.prln;
.out.prln(buckets);
.out.prln;
 buckets;
}
public List step2RunMapFunctionForAllBuckets(List list)
{
for( i=0; i< list.size; i)
{
List elementList = (ArrayList)list.get(i);
 StartThread(elementList).start;
}
　　　　try
　　　　{
Thread.currentThread.sleep(1000);
}catch(Exception e)
{
}
 ermediateresults;
}
public void step3RunReduceFunctionForAllBuckets(List list)
{
 sum =0;
for( i=0; i< list.size; i)
{
//you can do some processing here, like finding max of all results etc
 t = Integer.parseInt((String)list.get(i));
sum  t;
}
.out.prln;
.out.prln("Total Count is "+ sum);
.out.prln;
}
 StartThread extends Thread
{
private List tempList =  ArrayList;
public StartThread(List list)
{
tempList = list;
}
public void run
{
for( i=0; i< tempList.size;i)
{
String str = (String)tempList.get(i);
synchronized(this)
　　　　　　　　　　 {
ermediateresults.add( Integer(str.length).toString);
}
}
}
}
}

　　 init

思路方法创建了

些测试数据

作为测试数据

实际应用中会是海量数据处理

　　 step1ConvertIntoBuckets

思路方法将测试数据拆分到5个 bucket中

每个bucket是

个ArrayList(包含6个String数据)

bucket可以保存在内存

磁盘

或者集群中

其他节点；

　　 step2RunMapFunctionForAllBuckets

思路方法创建了5个线程(每个bucket

个)

每个线程StartThread处理每个bucket并把处理结果放在

ermediateresults这个

.gif' />list中

　　 如果bucket分配给区别

节点处理

必须有

个master主控节点监控各个节点

计算

汇总各个节点

处理结果

若有节点失败

master必须能够分配计算任务给其他节点计算

　　 step3RunReduceFunctionForAllBuckets

思路方法加载

ermediateresults中间处理结果

并进行汇总处理

最后得到最终

计算结果

专注于互联网--专注于架构

首页 »Java教程 » hadoop:基于Hadoop的Map reduce编程(一) »正文

hadoop:基于Hadoop的Map reduce编程(一)

相关文章

读者评论

发表评论

热门标签

精华推荐

最新标签

Dig排行

阅读排行

最新文章