Big Data Testing
From Ingestion Through Map Reduce and Output
Big data systems process enormous amount of data and testing such data on top of the features of the product requires specific skills and knowledge. Functional and performance testing are the two common aspects of big data testing. But more often security of the data also requires testing. Data testing is considered back end testing and will require database and query knowledge.
From functional perspective, testing needs to be conducted in all three phases of ingestion, map reduce and output. All the characteristics of data such as consistency, conformity, accuracy,validity, duplication and completeness need to validated.
During ingestion, we validate that data is collected from all sources correctly and timely, the data conforms to expectation in terms of attributes and quality. We also test things like timestamping, immutability and redundancy of the ingestion mechanism. Data pushed to HDFS is compared to the source to ensure ingestion was successful with no data corruption.
Map reduce testing is all about the business logic. Here we validate the data is correctly map reduced on all nodes of the entire cluster of the Hadoop system. We also validate duplication, source immutability and job interruption and re-processing.
At the output stage, we validate the data against business rules and that they arrive successfully at the designated warehouse, normally cassandra, mongodb, etc.
Performance testing covers job duration, memory utilization, parallel processing, node redundancy and data throughput.
Security testing, especially granular data access across different types of users or organizations may be required. Whether you are using Kerberos or other solutions, the QA in this area really need to be technically experienced.
Our professionals have gained good experience in big data testing and collectively have learnt a lot from each other. We can bring immediate value to your organization and help test your big data project.