The importance of following a process for data cleaning

When students are working on a project that involves a lot of numbers they often have to work with huge volumes of data and produce statistical analysis that consists of charts, diagrams, and tables. When the students conduct their surveys in the field they often have to collect plenty of data, not all of which will be valuable, and they will often find that it is a waste of time analysing material that is not important. In fact some of the analysis might produce incorrect statistics that will throw the students off the focus of their project. It therefore becomes essential to perform a task that is known as data cleaning. So what are the steps to be followed in this process?

  • Use set formulas that can be applied to determine which set of records are unimportant and which ones are important. This makes the task of separating them a lot easier.
  • It is best if they were to follow a well documented procedure when they are working on data cleaning. If they did not then afterwards when they look back at what they were doing there is no way of keeping track if the records they eliminated were essential or not. 
  • The university review board will also ask researchers about the methods they followed for data cleaning and how they eliminated all the incorrect records. The students will have to justify their answers making the procedure and documentation all the more important. 
  • If Excel is being used to work on the data then use the sort function to eliminate duplicate records and records that vary too much from the average. 
  • Conduct trial runs of the data cleaning process before work on the main data is started so that all snags can be eliminated before the main process.