文章归档

海量数据,你知道如何管理吗?

source : http://mndoci.com/2010/06/30/massive-data/

Facebook
36 PB of uncompressed data
2250 machines
23,000 cores
32 GB of RAM per machine
processing 80-90TB/day

Yahoo
70 PB of data in HDFS
170 PB spread across the globe
34000 servers
Processing 3 PB per day
120 TB flow through Hadoop every day

Twitter
7 TB/day into HDFS

LinkedIn
120 Billion relationships
82 Hadoop jobs daily (IIRC)
16 TB of intermedia data
2 engineers

如何在Oracle内使用Hadoop与MapReduce

两篇Oracle文档链接. 介绍如何在Oracle内使用Hadoop与MapReduce,,在与外界交互方面, Oracle做的还是很不错的..

之前Exadata关于列存储的融合, 集成Hadoop与MapReduce到Oracle中都表明了这一点.

如何使用Paralleled Pipeline function来在Oracle中集成MapReduce.

In-Database Map-Reduce (PDF)

介绍如何在Oracle数据库中访问存储在Hadoop集群上的数据,虽然此文中介绍的是HDFS,但此方法也可应用到其他分布式文件系统上.

Integrating Hadoop Data with Oracle Parallel Processing (PDF)