Dmitriyev, Viktor and Kruse, Felix and Precht, Hauke and Becker, Simon and Solsbach, Andreas and Marx Gómez, Jorge Carlos (2017) Building a big data analytical pipeline with Hadoop for processing enterprise XML data. The 11th Mediterranean Conference on Information Systems (MCIS 2017).
Full text not available from this repository.Abstract
The current paper shows an end-to-end approach how to process XML files in the Hadoop ecosystem. The work demonstrates a way how to handle problems faced during the analysis of a large amounts of XML files. The paper presents a completed Extract, Load and Transform (ELT) cycle, which is based on the open source software stack Apache Hadoop, which became a standard for processing of a huge amounts of data. This work shows that applying open source solutions to a particular set of problems could not be enough. In fact, most of big data processing open source tools were implemented only to address a limited number of the use cases. This work explains and shows, why exactly specific use cases may require significant extension with a self-developed multiple software components. The use case described in the paper deals with huge amounts of semi-structured XML files, which supposed to be persisted and processed daily.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Big Data, Hadoop, ETL, ELT, XML, Data Analytical Pipeline |
Divisions: | School of Computing Science, Business Administration, Economics and Law > Department of Computing Science |
Date Deposited: | 12 Sep 2018 11:51 |
Last Modified: | 10 May 2019 15:39 |
URI: | https://oops.uni-oldenburg.de/id/eprint/3506 |
URN: | urn:nbn:de:gbv:715-oops-35873 |
DOI: | http://elibrary.aisnet.org/Default.aspx?url=https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1056&context=mcis2017 |
Nutzungslizenz: |
Actions (login required)
View Item |