Is it dishonest to remove outliers from a sample of data?

This is the first time I have ever tried to write a blog, so please bear with me and feel free to comment 🙂

 

The definition of an outlier within data is an observation that lies numerically distant from other values in a sample. This is considered a normal definition, although many researchers differ in opinion as to what is considered a ‘normal’ observation. The rule on removing outliers from a piece of work is a simple one; If there is a high chance of an outlier being repeated in the future, it should not be removed.

I personally believe that it is not dishonest to remove outliers from a sample of data because, in most cases, an outlier is simply the fault of the equipment used to record data, or was entered incorrectly. Obviously if outliers appear more frequently, then extra research should be carried out to find the cause of the outliers and to see if they can be avoided in future research.

In a large sample of data outliers are expected and the end result will take such outliers into account. They should be seen and investigated before being discarded, as an outlier simply caused by faulty equipment could seriously damage a person’s findings and results. Likewise a legitimate outlier could cause problems in proving a person’s work, but a genuine outlier should never be removed just to help someone prove their theory.

 

 

Hello world!

Welcome to WordPress.com. After you read this, you should delete and write your own post, with a new title above. Or hit Add New on the left (of the admin dashboard) to start a fresh post.

Here are some suggestions for your first post.

  1. You can find new ideas for what to blog about by reading the Daily Post.
  2. Add PressThis to your browser. It creates a new blog post for you about any interesting  page you read on the web.
  3. Make some changes to this page, and then hit preview on the right. You can always preview any post or edit it before you share it to the world.