- /oops/ - Oldenburger Online-Publikations-Server

Dmitriyev, Viktor and Kruse, Felix and Precht, Hauke and Becker, Simon and Solsbach, Andreas and Marx Gómez, Jorge Carlos (2017) Building a big data analytical pipeline with Hadoop for processing enterprise XML data. The 11th Mediterranean Conference on Information Systems (MCIS 2017).

Full text not available from this repository.

Official URL: http://elibrary.aisnet.org/Default.aspx?url=https:...

Abstract

The current paper shows an end-to-end approach how to process XML files in the Hadoop ecosystem. The work demonstrates a way how to handle problems faced during the analysis of a large amounts of XML files. The paper presents a completed Extract, Load and Transform (ELT) cycle, which is based on the open source software stack Apache Hadoop, which became a standard for processing of a huge amounts of data. This work shows that applying open source solutions to a particular set of problems could not be enough. In fact, most of big data processing open source tools were implemented only to address a limited number of the use cases. This work explains and shows, why exactly specific use cases may require significant extension with a self-developed multiple software components. The use case described in the paper deals with huge amounts of semi-structured XML files, which supposed to be persisted and processed daily.

Item Type:	Article
Uncontrolled Keywords:	Big Data, Hadoop, ETL, ELT, XML, Data Analytical Pipeline
Divisions:	School of Computing Science, Business Administration, Economics and Law > Department of Computing Science
Date Deposited:	12 Sep 2018 11:51
Last Modified:	10 May 2019 15:39
URI:	https://oops.uni-oldenburg.de/id/eprint/3506
URN:	urn:nbn:de:gbv:715-oops-35873
DOI:	http://elibrary.aisnet.org/Default.aspx?url=https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1056&context=mcis2017
Nutzungslizenz:

Actions (login required)

View Item