Tabular Data Extraction

Abstract

We propose a novel algorithm for extracting data from images of tabular documents having a specific structure. Our proposed method is able to maintain the original table format and structure, and offers better efficiency over existing methodologies due to its high scalability and parallel architecture. The findings of this paper will increase efficiency in the data extraction process from image-based tabular data and help ease the digitization process of tabular records.

Read the Paper Read the Extended Version

Behera, V.N.J., Ranjan, A. and Reza, M., 2020. An Innovative Image-Based Tabular Data Extraction Parallel Algorithm. In Progress in Computing, Analytics and Networking (pp. 95-104). Springer, Singapore.

Ranjan, A., Behera, V.N.J. and Reza, M., 2020. OCR Using Computer Vision and Machine Learning. In Machine Learning Algorithms for Industrial Applications (pp. 83-105). Springer, Cham.