Java执行hadoop的基本操作实例代码

更新时间：2017年04月25日 09:28:37 投稿：lqh

这篇文章主要介绍了Java执行hadoop的基本操作实例代码的相关资料,需要的朋友可以参考下

Java执行hadoop的基本操作实例代码

向HDFS上传本地文件

public static void uploadInputFile(String localFile) throws IOException{
    Configuration conf = new Configuration();
    String hdfsPath = "hdfs://localhost:9000/";
    String hdfsInput = "hdfs://localhost:9000/user/hadoop/input";
    FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
    fs.copyFromLocalFile(new Path(localFile), new Path(hdfsInput));
    fs.close();
    System.out.println("已经上传文件到input文件夹啦");
  }

将output文件下载到本地

public static void getOutput(String outputfile) throws IOException{
    String remoteFile = "hdfs://localhost:9000/user/hadoop/output/part-r-00000";
    Path path = new Path(remoteFile);
    Configuration conf = new Configuration();
    String hdfsPath = "hdfs://localhost:9000/";
    FileSystem fs = FileSystem.get(URI.create(hdfsPath),conf);
    fs.copyToLocalFile(path, new Path(outputfile));
    System.out.println("已经将输出文件保留到本地文件");
    fs.close();
  }

删除hdfs中的文件

 public static void deleteOutput() throws IOException{
    Configuration conf = new Configuration();
    String hdfsOutput = "hdfs://localhost:9000/user/hadoop/output";
    String hdfsPath = "hdfs://localhost:9000/";
    Path path = new Path(hdfsOutput);
    FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
    fs.deleteOnExit(path);
    fs.close();
    System.out.println("output文件已经删除");
  }

执行mapReduce程序

创建Mapper类和Reducer类

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
      String line = value.toString();
      line = line.replace("\\", "");
      String regex = "性别：</span><span class=\"pt_detail\">(.*?)</span>";
      Pattern pattern = Pattern.compile(regex);
      Matcher matcher = pattern.matcher(line);
      while(matcher.find()){
        String term = matcher.group(1);
        word.set(term);
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
      int sum = 0;
      for(IntWritable val :values){
        sum+= val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

执行mapReduce程序

public static void runMapReduce(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if(otherArgs.length != 2){
      System.err.println("Usage: wordcount<in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.out.println("mapReduce 执行完毕！");
    System.exit(job.waitForCompletion(true)?0:1);

  }

感谢阅读，希望能帮助到大家，谢谢大家对本站的支持！

您可能感兴趣的文章:

Java
hadoop

SpringBoot单元测试之数据隔离详解
我们在写单元测试时,有一个比较重要的要求是可以重复运行, 那么这样就会有一个比较麻烦的问题：数据污染,所以本文为大家整理了两个数据隔离的方式,希望对大家有所帮助
2023-08-08
在Java中实现可见性(visibility)的主要方法详解
这篇文章主要介绍了在Java中实现可见性(visibility)的主要方法详解,在Java中，使用关键字volatile和使用锁(如synchronized关键字或 java.util.concurrent包中的锁)来确保对共享变量的修改在多线程环境中能够正确地被其他线程所观察到,需要的朋友可以参考下
2023-08-08
Spring AOP原理及动态代理
这篇文章主要介绍了Spring AOP原理及动态代理，文章通过围绕主题展开详细的内容介绍，具有一定的参考价值，需要的小伙伴可以参考一下
2022-09-09
Java拆装箱深度剖析
这篇文章主要为大家深度剖析了Java拆箱装箱的相关资料，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2016-12-12
Java SpringMVC实现自定义拦截器
这篇文章主要为大家详细介绍了SpringMVC实现自定义拦截器，文中示例代码介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们可以参考一下，希望能够给你带来帮助
2022-03-03
Spring Boot详解各类请求和响应的处理方法
平时只是在用SpringBoot框架，但并没有详细研究过请求和响应执行的一个具体过程，所以本文主要来梳理一下SpringBoot请求和响应的处理过程
2022-07-07
mybatis-plus实现逻辑删除的示例代码
本文主要介绍了mybatis-plus实现逻辑删除的示例代码，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2023-05-05
解决InputStream.available()获取流大小问题
这篇文章主要介绍了解决InputStream.available()获取流大小问题，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教
2023-06-06
MyBatis注解实现动态SQL问题
这篇文章主要介绍了MyBatis注解实现动态SQL问题，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教
2023-02-02
查找native方法的本地实现函数native_function详解
JDK开放给用户的源码中随处可见Native方法，被Native关键字声明的方法说明该方法不是以Java语言实现的，而是以本地语言实现的，Java可以直接拿来用。这里介绍下查找native方法的本地实现函数native_function，感兴趣的朋友跟随小编一起看看吧
2021-12-12