Showing results for 
Search instead for 
Did you mean: 
Create Post

Does Data Have Ethics? Data Ethic Issues and Machine Learning

Level 12

Hello THWACKers long time no chat! Welcome to part one in a five-part series on machine learning and artificial intelligence. I figured what better place to start than in the highly contested world of ethics? You can stop reading now because we’re talking about ethics, and that’s the last thing that anyone ever wants to talk about. But before you go, know this isn’t your standard Governance, Risk, and Compliance (GRC) talk where everything is driven by and modeled by a policy that can be easily policed, defined, dictated, and followed. Why isn’t it? Because if that were true, we wouldn’t have a need for any discussion on the topic of ethics and it would merely be a discussion of policy—and who doesn’t love policy?

Let me start by asking you an often overlooked but important question. Does data have ethics? On its own, the simple answer is no. As an example, we have Credit Reporting Agencies (CRAs) who collect our information, like names, birthdays, payment history, and other obscure pieces of information. Independently, that information is data, which doesn’t hold, construe, or leverage ethics in any way. If I had a database loaded with all this information, it would be a largely boring dataset, at least on the surface.

Now let’s take the information the CRAs have, and I go to get a loan to buy a house, get car insurance, or rent an apartment. If I pass the credit check and I get the loan, the data is great. Everybody wins. But, if I’m ranked low in their scoring system and I don’t get to rent an apartment, for example, the data is bad and unethical. OK, on the surface, the information may not be unethical per se, but it can be used unethically. Sometimes (read: often) a person's credit, name, age, gender, or ethnicity will be calculated in models to label them as “more creditworthy” or “less creditworthy” in getting loans, mortgages, rent, and so on and so forth.

That doesn’t mean the data or the information in the table or model is ethical or unethical, but certainly claims can be made that biases (often human biases) have influenced how that information has been used.

This is a deep subject—how can we make sure our information can’t be used inappropriately or for evil? You’re in luck. I have a simple answer to that question: You can’t. I tried this once. I used to sell Ginsu knives and I never had to worry about them being used for evil because I put a handy disclaimer on it. Problem solved.


Seems like a straightforward plan, right? That’s what happens when policy, governance, and other aspects of GRC enter into the relationship of “data.” “We can label things so people can’t use them for harm.” Well, we can label them all we want, but unless we enact censorship, we can’t STOP people from using them unethically.

So, what do we do about it? The hard, fast, and easy solution for anyone new to machine learning or wanting to work with artificial intelligence is: use your powers for good and not evil. I use my powers for good, but I know that a rock can be used to break a window or hurt someone (evil), but it also can be used to build roads and buildings (good). We’re not going to ban all rocks because they could possibly be used wrongly, just as we’re not going to ban everyone’s names, birthdays, and payment history because they could be misused.

We have to make a concerted effort to realize the impacts of our actions and find ways to better the world around us through them. There’s still so much more on this topic to even discuss, but approaching it with an open mind and realizing there is so much good we can do in the world will leave you feeling a lot happier than looking at the darkness of and worry surrounding things you cannot control.

Was this too deep? Probably too deep a subject for the first in this series, but it was timely and poignant to a Lightning Talk I was forced (yes, I said forced) to give on machine learning and ethics at the recent ML4ALL Machine Learning Conference.

Screen Shot 2019-05-02 at 11.45.45 AM.png

Feel free to enjoy the talk here, and if you found this useful, terrifying, or awkward, let’s talk about it. I find ethics a difficult topic to discuss, mainly because people want to enforce policy on things they cannot control, especially when the bulk of the information is “public.” But the depth of classifying and changing the classification of data is best saved for another day.

Level 11

interesting concept

Level 14

I must admit that i had to reread parts of this thinking "is this really a thing?" but it's interesting to say the least.  My first though is is that data simply "is".  There is no deeper context possible other than that.  I'd prefer to use different terminologies when talking about machine learning, government oversight, and ethics.  I'm not sure that I'd group them all together as the same topic. 

Thanks for the article!

Level 13

Interesting read.  Looking forward to the rest of the series.  Thanks.

Level 13

Interesting.  Never thought of that before.

Level 12

Thanks, yes there was a time (and that time is regular) where I have to ask, "Whoa, is this a thing, wait, how is this a thing?! WHY IS THIS A THING?!" But it makes press and is highly controversial, It was just a little more than a month ago that Google caught massive backlash and within a week canceled their own ethics board (Best found by searching: "google cancels ai ethics board") There's so many ad-laden places to find it, that I don't want to pick one link over another for you (half I tried complained about my ad-blocker )

I'll dig deeper into the next in the series where it will all tie together, even though it truly is such a monstrous topic!

Thanks for reading!

Level 14

Machine Learning will be developed by the big data corporations and we all know about their ethics      

Level 12

Interesting discussion. Unfortunately my conclusion is that there's nothing that can be done about it. Companies and governments will use data to their advantage without concern for the desires or needs of customers/citizens/subjects until those people demand accountability. And I doubt people will.

After the well-publicized incident where Target's data mining identified that a teenage girl was pregnant before her parents knew, Target discovered that a lot of women would react very negatiely when they felt spied on. So what did Target do? They didn't stop. They continued sending personalized coupon booklets to pregnant women, but they added in things that they knew a pregnant woman wasn't likely to buy. Wome would see this, not think that she was being spied on, assume everybody got the same coupons, and use the needed coupons for diapers, formula, etc. Where's the motivation to stop spying on customers like this? There is no motivation.

Google had a motto of "don't be evil" for a long time, and dropping it tells us all that we need to know. They will continue using data to profile us and give us directed advertisements. Look up a concert in Las Vegas or a beachfront hotel in Florida and soon you'll get a bunch of advertisements for similar hotels and entertainment.

Nobody I know likes this. But none of these people are willing to stop feeding the beast. Until a large enough fraction of the population says enough is enough, we can expect unethical use of our data to increase.

Level 12

You hit the nail on the head, there is nothing that can be done about it.

I mean, short of destroying all of the data and the ability to collect it...

Level 14

Great piece... Looking forward to the entire series.

Machine learning is now a commodity with AWS... you can buy cycles.

Level 16

Metadata and machine learning has been in use for quite some time now. If you read the fine print when you connect to 'free' wifi you will sometimes see they are collecting virtually everything about you, your surfing habits, etc.

Level 12

Also for the direct link to the video vs the massively long-split version, it was finally posted here!

Ethics and Machine Learning - Christopher - Lightning Talk - ML4ALL 2019 - YouTube

Thanks! ❤️

Level 20

everything is ML and AI now... it's more show than go from what I've seen.

Level 12

Wait until my next post... I'll show you some more of the Go

Data cannot have ethics.   It cannot be ethical or unethical because ethics are based on moral principles, while data is simply one or more facts.  Facts are not moral principles that vary by culture or environment, education or wealth, opportunity or need.

What is ethical in one society today is anathema to another.  That statement is not ethical or unethical; it's just a fact--a piece of data to store and consider.

Data may reveal information and trends that groups might use to display or reinforce the (temporary) validity of their moral principles (ethics), but data is not able to "have" ethics, or "be" ethical or unethical.

People have ethics.  They determine if their principles and behavior are moral in their own eyes.  Societies evaluate behavior from a higher viewpoint and determine (seemingly arbitrarily) what behavior and principles coincide with that society's morals and definitions of ethics at that time.

Analysis of behavior and principles changes, while data does not.  Interpretation of data may change, but the data has no ethics and data is data--it is empirical and unchanging.

  • It was 50 degrees F at location X at time Y.
  • There were five more votes for a candidate than against.
  • This item's mass is 5 Kg.

These facts have no ethics.  They simply exist in a vacuum.

They just are..


So let's take this to the currently-popular ethical dilemma of smart cars supposedly having to deal with ethics.  The classic example  A smart car may be programmed to cause your life to end in situations where your loss is deemed less impactful than the loss of multiple lives.

It's like ST:TOS saying "The needs of the many outweigh the needs of the one."  In this case Society (or the auto manufacturer/programmer) has weighed the options of one loss versus other "greater" losses and acted accordingly.

Is the data "ethical"?  Is the car ethical?  No.

Is the programmer or is Society acting ethically?  Yes.

Are ethics immutable and unchangeable by time and circumstance?  No.

Shall we spend more time trying to decide if data has ethics?



Data is neither ethical or not, but how it is used can be. Machine learning can make things more efficient, but can also be used unethically if allowed to "learn" and decide entirely on it's own. For example: credit score is often used as an indicator of credit worthiness. Without understanding the person behind that credit score and their circumstances you will never know the whole picture. For example take a married couple that share finances one person is totally irresponsible financially the other is extremely responsible, so much so that they both have great credit scores. The couple break up and go their own ways. Based on credit score the irresponsible person can take out big loans and proceed to leave the banks holding the bag. Take that same situation and assume that during their relationship they both have poor credit scores due to the irresponsible person. After breaking up the responsible person is hindered by the credit score caused by the other person.

So, in the first scenario is it ethical to loan to the irresponsible person.

In the second scenario is it ethical to penalize the responsible person.

The point is that if only the machine calculated numbers are considered wrong decisions can be made. Even given all of the possible variables neither man nor machine can always get it right, but only a human can truly have ethics.

Level 13

Thanks for the article

Level 11

Thanks, looking forward to the rest of the series.

About the Author
Founder at Remedy8 Security, Technology Evangelist, vExpert, EMC Elect, BDA, CISSP, MCT, Cloud, Ninja, Vegan, Father, Cat, Humorist, Author