For example, you're testing an ETL
job/process/whatever with a minimal set of records (define that however you
like)...and the 'real' ETL job/process/whatever is going to be expected to
handle a significantly larger set of data, you may want to think about generating
additional data to get to the size of what you will be seeing in your
production environment.
Additionally, you may want to think about what the
volume of data pre-existing in your production environment is going to be when
you do the above-mentioned test (in your test environment.)
We're looking at two different overall tests above:
1. How long will it take to process a production
size file?
2. A variation of 1. How long will it take to
process a production size file with a production size database?
Part of the question that you will want to think
about is how long does it take for the ETL job/process to complete. The source
of this question is that you may be negatively impacting the user experience
(that is, if you have some sort of user application that uses the data that you
just loaded.) If you have an ETL job that runs for hours and gobbles up
resources, that's not good.