使用FUSE-DFS mount HDFS

23 October 2012

Hadooop源码中自带了contrib/fuse-dfs模块,用于实现通过libhdfs和fuse将HDFS mount到*nux的本地。

编译环境

编译libhdfs

chmod +x $HADOOP_SRC_HOME/src/c++/pipes/configure
chmod +x $HADOOP_SRC_HOME/src/c++/utils/configure

64位机中,需要修改libhdfs的Makefile,将GCC编译的输出模式由32(-m32)位改为64(-m64)位

CC = gcc
LD = gcc
CFLAGS =  -g -Wall -O2 -fPIC
LDFLAGS = -L$(JAVA_HOME)/jre/lib/$(OS_ARCH)/server -ljvm -shared -m64(这里) -Wl,-x
PLATFORM = $(shell echo $$OS_NAME | tr [A-Z] [a-z])
CPPFLAGS = -m64(还有这里) -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/$(PLATFORM)

在$HADOOP_HOME目录下执行

ant compile -Dcompile.c++=true -Dlibhdfs=true

Hadoop 1.0.0-1.0.4编择如下报错,网上搜索到bug和patch信息:https://issues.apache.org/jira/browse/MAPREDUCE-2127 ,但hadoop 1.0.4实际为缺少openssl-devel包。

     [exec] checking for pthread_create in -lpthread... yes
     [exec] configure: error: Cannot find libssl.so
     [exec] checking for HMAC_Init in -lssl... no
     [exec] /home/admin/tmp/hadoop-1.0.4/src/c++/pipes/configure: line 5234: exit: please: numeric argument required
     [exec] /home/admin/tmp/hadoop-1.0.4/src/c++/pipes/configure: line 5234: exit: please: numeric argument required

BUILD FAILED
/home/admin/tmp/hadoop-1.0.4/build.xml:2141: exec returned: 255

编译fuse-dfs

使用fuse_dfs

主机环境

其它

挂载hdfs

mkdir /tmp/dfs  #新建一个空目录
# 挂载dfs
# -d表示debug模式,如果正常,将-d参数去掉
# 需要root权限才能执行,fuse: failed to exec fusermount: Permission denied
# 172.16.33.151:9000为NameNode的IP和端口
/usr/local/fuse_dfs/fuse_dfs_wrapper.sh dfs://172.16.33.151:9000 /tmp/dfs/ -d
port=9000,server=172.16.33.151
fuse-dfs didn't recognize /tmp/dfs/,-2
fuse-dfs ignoring option -d
FUSE library version: 2.8.3
nullpath_ok: 0
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.13
flags=0x0000307b
max_readahead=0x00020000   
   INIT: 7.12
   flags=0x00000011
   max_readahead=0x00020000
   max_write=0x00020000
   unique: 1, success, outsize: 40

unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 2, success, outsize: 120
...

其它

路径及权限

现在的版本还不支持直接mount hadoop subdir,说是以后会考虑支持。

可以通过有些参数对某些路径实行保护

Fuse DFS takes the following mount options (i.e., on the command line or the comma separated list of options in /etc/fstab (in which case, drop the -o prefixes):

-oserver=%s  (optional place to specify the server but in fstab use the format above)
-oport=%d (optional port see comment on server option)
-oentry_timeout=%d (how long directory entries are cached by fuse in seconds - see fuse docs)
-oattribute_timeout=%d (how long attributes are cached by fuse in seconds - see fuse docs)
-oprotected=%s (a colon separated list of directories that fuse-dfs should not allow to be deleted or moved - e.g., /user:/tmp)
-oprivate (not often used but means only the person who does the mount can use the filesystem - aka ! allow_others in fuse speak)
-ordbuffer=%d (in KBs how large a buffer should fuse-dfs use when doing hdfs reads)
ro
rw
-ousetrash (should fuse dfs throw things in /Trash when deleting them)
-onotrash (opposite of usetrash)
-odebug (do not daemonize - aka -d in fuse speak)
-obig_writes (use fuse big_writes option so as to allow better performance of writes on kernels >= 2.6.26)

The defaults are:

entry,attribute_timeouts = 60 seconds
rdbuffer = 10 MB
protected = null
debug = 0
notrash
private = 0

读写测试

# HDFS挂载点:/tmp/dfs/tmp/
rm -f /tmp/dfs/tmp/1Gb.file;dd if=/dev/zero bs=1024 count=1000000 of=/tmp/dfs/tmp/1Gb.file
dd if=/tmp/dfs/tmp/1Gb.file bs=64k of=/dev/null