Questions Related to Test Data
What is test Data?
Test Data is a data used by tester to test the application. Test data can be entered manually or it can be read from files (e.g excel, XML, any DB etc.). You can also say that test data is the input given to software system / AUT. Test data can be real, historical or hypothetical.
What are the sources of data?
Test data can be generated with the help of automated tools or it can be manually generated. Production data can be used in some cases for testing purposes. Data can also be created randomly for testing. Test data is very crucial to the success of testing. It should be generated in required quantities and the overall quality of test data should be maintained.
What are the characteristics of different sources of data?
Production data is most accurate data for AUT. But sometimes there are certain restrictions with production data i.e. the data may be confidential and cannot be given to each and every tester in the team etc. There are situations when changes/tailoring in production is restricted.
A tool/ utility are required to generate data. The generated data is often as good as the tester who created it and the tool he/she used to create the data.
Testers also create data manually. Creating data manually is time consuming task but I believe that it is the best way to create unique data as may be required by some extreme test cases.
Most testers generate random data for certain testing types like stress and load testing. Random data may not be realistic.
What are the techniques for generating test data?
Value analysis
Value analysis generates test data based on the data values. Range constraint analysis, or boundary analysis, suggests test data to represent such extreme values as upper bounds, lower bounds, and other exceptional values (e.g., a negative number or zero). Typically, both in-range and out-of-range values are included. Format constraint analysis focuses on data type; for example, a zero or a numeric digit might be placed in an alphabetic field, non-digits might be inserted into a numeric field, or a value other than F or M might be recorded in a single-character sex or gender field. Length constraint analysis generates test data with too many or too few characters or digits; this technique is useful for testing such fixed length fields as a social security number or a telephone number.
Data analysis
Path testing, also called branch analysis or loop analysis, is used to check the flow of logic through a program. The idea is to trace the program listing, identify the branch points, and include test data to force the program to follow each path. Generally, this technique relies on data values near the branch values to verify the program logic.
Volume analysis
Volume analysis, or control analysis, is intended to check the system’s behavior. For example, control totals might be checked by processing a set of test data, generating the totals, and then shuffling the transaction order and reprocessing the transactions to see if the same control total is generated.
Compatibility analysis
Some applications are designed to access data from multiple versions of a file or a database. For example, imagine a set of old data files developed using the COBOL delimited file format and a new database designed for SQL access. Occasionally, the system might be asked to convert the old data file structure to support a query or to generate a report, and some new transactions might trigger updates to the original file. Test data are needed to force the program to obtain input from and send output to both files.
Partition analysis
Partition analysis focuses on aggregate values. The reliability of a database is a function of correctness and completeness. The correctness of each individual transaction can be verified using data analysis techniques with discrete values. Aggregate data are developed to test completeness. For example, a type of aggregate value testing called existence testing might be used to check a database record by simply checking its record number or verifying that the record is referenced in the index.
System-dependent test data
Different types of systems call for special test data to test system-specific parameters. For example, symbolic data are essential for testing expert systems, real-time systems require time-varying and environment-dependent data, data communication systems require data to test transmission errors, and so on.
[Source: The Information Systems: Analysis and Design]
What are test data generators?
Test data generators are used to generate test data for the AUT or software program to be tested.
Below are some of the examples of test data generators:
1. tedagen
2. generatedata
3. fakenamegenerator
4. gedis-studio
5. dbmonster
6. datagen
7. databene-benerator
8. dgmaster
9. spawner
10. forsql
11. mobilefish
What are the things to consider while working test data?
Proper test data is very helpful in the testing process not only to find bugs but also to assist the debug process.
Make sure you have the fresh data. Do not forget to revise or update the data every time you start testing the build.
You should have some know-how in order to create or gather data from different sources like SQL, data generator tools etc.
Always try to prepare test data in order to ensure maximum coverage. Try to cover valid / invalid data sets, boundary conditions, volume testing etc.
You can try using the real / production data or historical and hypothetical data or you can combine all data.
Few useful documents on web related to Test Data:
1. The Art of Effective Test Data Design
2. Synthesize Your Test Data
3. Test data
4. Effective Test Data Design: An Interview with Rajini Padmanaban
5. Approaches to Test Data
6. Test Data Management - The Best Practices in TDM
7. A Few Thoughts on Test Data
What is test Data?
Test Data is a data used by tester to test the application. Test data can be entered manually or it can be read from files (e.g excel, XML, any DB etc.). You can also say that test data is the input given to software system / AUT. Test data can be real, historical or hypothetical.
What are the sources of data?
Test data can be generated with the help of automated tools or it can be manually generated. Production data can be used in some cases for testing purposes. Data can also be created randomly for testing. Test data is very crucial to the success of testing. It should be generated in required quantities and the overall quality of test data should be maintained.
What are the characteristics of different sources of data?
Production data is most accurate data for AUT. But sometimes there are certain restrictions with production data i.e. the data may be confidential and cannot be given to each and every tester in the team etc. There are situations when changes/tailoring in production is restricted.
A tool/ utility are required to generate data. The generated data is often as good as the tester who created it and the tool he/she used to create the data.
Testers also create data manually. Creating data manually is time consuming task but I believe that it is the best way to create unique data as may be required by some extreme test cases.
Most testers generate random data for certain testing types like stress and load testing. Random data may not be realistic.
What are the techniques for generating test data?
Value analysis
Value analysis generates test data based on the data values. Range constraint analysis, or boundary analysis, suggests test data to represent such extreme values as upper bounds, lower bounds, and other exceptional values (e.g., a negative number or zero). Typically, both in-range and out-of-range values are included. Format constraint analysis focuses on data type; for example, a zero or a numeric digit might be placed in an alphabetic field, non-digits might be inserted into a numeric field, or a value other than F or M might be recorded in a single-character sex or gender field. Length constraint analysis generates test data with too many or too few characters or digits; this technique is useful for testing such fixed length fields as a social security number or a telephone number.
Data analysis
Path testing, also called branch analysis or loop analysis, is used to check the flow of logic through a program. The idea is to trace the program listing, identify the branch points, and include test data to force the program to follow each path. Generally, this technique relies on data values near the branch values to verify the program logic.
Volume analysis
Volume analysis, or control analysis, is intended to check the system’s behavior. For example, control totals might be checked by processing a set of test data, generating the totals, and then shuffling the transaction order and reprocessing the transactions to see if the same control total is generated.
Compatibility analysis
Some applications are designed to access data from multiple versions of a file or a database. For example, imagine a set of old data files developed using the COBOL delimited file format and a new database designed for SQL access. Occasionally, the system might be asked to convert the old data file structure to support a query or to generate a report, and some new transactions might trigger updates to the original file. Test data are needed to force the program to obtain input from and send output to both files.
Partition analysis
Partition analysis focuses on aggregate values. The reliability of a database is a function of correctness and completeness. The correctness of each individual transaction can be verified using data analysis techniques with discrete values. Aggregate data are developed to test completeness. For example, a type of aggregate value testing called existence testing might be used to check a database record by simply checking its record number or verifying that the record is referenced in the index.
System-dependent test data
Different types of systems call for special test data to test system-specific parameters. For example, symbolic data are essential for testing expert systems, real-time systems require time-varying and environment-dependent data, data communication systems require data to test transmission errors, and so on.
[Source: The Information Systems: Analysis and Design]
What are test data generators?
Test data generators are used to generate test data for the AUT or software program to be tested.
Below are some of the examples of test data generators:
1. tedagen
2. generatedata
3. fakenamegenerator
4. gedis-studio
5. dbmonster
6. datagen
7. databene-benerator
8. dgmaster
9. spawner
10. forsql
11. mobilefish
What are the things to consider while working test data?
Proper test data is very helpful in the testing process not only to find bugs but also to assist the debug process.
Make sure you have the fresh data. Do not forget to revise or update the data every time you start testing the build.
You should have some know-how in order to create or gather data from different sources like SQL, data generator tools etc.
Always try to prepare test data in order to ensure maximum coverage. Try to cover valid / invalid data sets, boundary conditions, volume testing etc.
You can try using the real / production data or historical and hypothetical data or you can combine all data.
Few useful documents on web related to Test Data:
1. The Art of Effective Test Data Design
2. Synthesize Your Test Data
3. Test data
4. Effective Test Data Design: An Interview with Rajini Padmanaban
5. Approaches to Test Data
6. Test Data Management - The Best Practices in TDM
7. A Few Thoughts on Test Data