mentby.com
Blog | Jobs | Help | Signup | Login

loading
Is there an API that returns absolute HDFS paths? So something that would
turn "/user/myname/top/sub/.." into "/user/myname/top" and "top/sub" into
"/user/mynname/top/sub"? I've been looking for this functionality where I
think it would be (e.g. FileSystem or Path) but can't find it.

Read more »

I have an algorithm that runs multiple iterations of a Hadoop job. Each
iteration produces two kinds of output: stuff that is "done" and gets
written out to the side and stuff that is "not-done" and gets fed back into
the next iteration. The reducer makes this distinction. The algorithm
completes when an iteration has no "not-done" output.

Basically what I need is two different output channels for my reducer. What
is currently the best way to do this in Hadoop. I know the old API had a
MultipleOutputs class, but I think that's deprecated now. I have been
creating and populating the "done" sequence files directly, but I rather
have the Hadoop framework do this for me to save on work and avoid name
collisions that I haven't anticipated.

Read more »

You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.
Profile Widget
Copy and paste this HTML code to your blog or website: