You Look Like A Thing And I Love You – Janelle Shane

Review

The author has a delightful and whimsical approach to explaining some of the most important AI concepts. I really appreciated that the author spent a sensible proportion of the book speaking about bias in AI systems (some machine learning books don’t).

I found the author’s approach to learning by doing and hacking inspirational. It seems like a great way to learn!

If you’re interested in AI but not technical – then this book is a good place to start.

If you’re already versed in AI, then you might find this frustrating to read.

Key Takeaways

The 20% that gave me 80% of the value.

  • Promising headlines are common – but progress is slow. AI is already everywhere, but it isn’t flawless. It’s quirks are causing serious consequences (e.g. YouTube showing polarizing content).
  • 5 Principles of AI weirdness
    1. The danger of AI is not that its too smart, its that its not smart enough
    2. AI has the approximate brain power of a worm
    3. AI doesn’t understand the problem you want to solve
    4. AI will do exactly what you tell it to – or try it’s best
    5. AI will take the path of least resistance
  • Training AI is closer to teaching a child than programming a computer. Here’s the data, you try to figure out how to copy it .
  • AI is better when:
    • you don’t know the rules
    • there are many subtle rules
    • it’s possible new rules could be discovered
  • Healthcare researchers were shocked to find their cancer detection algorithm had learnt to detect rulers instead – as many of the tumors in their training data had been photographed next to rulers for scale.
  • AI can be biased, if the human dataset is biased. We need to learn to anticipate problems before they occur.
  • Worrying about an AI takeover is like worrying about overcrowding on Mars – it’s not today’s problem
  • The difference between successful AI problem solving and failure has a lot to do with the suitability of the task for AI learning. Four ways AI goes wrong…
    • The problem is too hard
    • The problem is not what we thought it was
    • If there are sneaky shortcuts the AI will find them
    • An AI that tries to learn from flawed data will be flawed
  • AI might be useful – even when a human can do something better (robot vacuum).
  • Be careful if mistakes have big consequences.
  • The narrower the task, the smarter the AI seems. As tasks become broad → AI tends to struggle. Therefore it makes sense to specialize.
  • AI are slow learners – requiring thousands of images to learn a new object.
  • AI’s are such slow learners we use simulation to train them faster than realtime.
  • AI’s are bad at remembering things. Bad at forward planning – as they can’t see too far into the future.
    • Text generation → gets harder the longer the text, as you need to remember what came before and plan ahead.
  • Is AI really the simplest way of solving the problem? If you can use common sense or simple rules then do that instead.
  • If a neural net trains itself – it’s hard to understand what it’s reacting to or why.
    • There are two methods:
      • Look at the cells that activate when they see particular things
      • Tweak the input image and see which changes make the cells activate most strongly
    • Open AI trained an ANN on Amazon review data to predict the next letter in a sequence.
      • They discovered that one of the cells had learnt to fire based on the sentiment of the review → finding it a useful predictor of the next letter
    • Google found one of their ImageNet recognition algorithms was looking for floppy vs pointy ears – to help it distinguish dogs from cats
  • Class imbalance is a real problem → if the thing you’re looking for is really rare (FRAUD) – algorithms can achieve great accuracy just by predicting the other class (NOT FRAUD)
  • Markov Chains tackle jobs similar to RNNs (recurrent neural networks) – such as predicting the next word in a sentence. They are more light weight and quicker to train than most neural networks – but they can’t predict far into the future.
  • Random Forests – are made up of decision trees → individual flow charts that leads to an outcome based on the information we have. Decision trees can become deep and complex. ML can build a forest of decision trees using trial and error. Each tree likely learns a different attribute and uses that to make a vote – and the votes are pooled to make a prediction. Each individual tree is only acting on a small piece of the data – but they combine there guesses into something more powerful.
  • Evolutionary algorithms – each potential solution is like an organism – each generation the most successful solutions survive to reproduce, mutating or mating with other solutions to produce different children.
  • Hyper-parameters are the rules we set to govern the process
  • Combining ML algorithms makes sense – because they’re better when working in a narrow domain. Deciding how to break your problem into tasks for sub-algorithms is a key way to achieve success with ML.
  • When using AI doesn’t work well:
    • the problem is too broad
    • not enough data
    • data confuses it
    • trained for a task that was much simpler than the actual problem
    • training situation didn’t represent the real world
  • More data is usually better when it comes to training AI.
  • How to get more data? Crowdsourcing, Mechanical Turk, or Data Augmentation.
  • Cleaning up messy input data is a good way to boost performance
  • AI inputs Giraffes into too many generated images – often in random scenes. They over-index in training data.
  • Unintentional Memorization → when ML memorizes something from an original dataset and exposes it to users (often PII that wasn’t expected to be in it)
  • AI can succeed in what you ask – but often what you ask isn’t actually what you wanted them to do.
    • It is helpful to imagine that it’s deliberately misinterpreting your reward function
  • Simulations have to be the map not the territory. AIs don’t have any obligation to obey laws of physics that you didn’t tell them about.
  • If data comes from humans – it will likely have bias in it
  • Using movie review data means you’ll train on ‘review-bombs’ a horrible internet phenomenon where people give movies negative reviews if they have black or women stars. Your algorithm will learn that bias. You can adjust word vectors to break those associations. You’re now playing god – and it’s not perfect – but it’s better than letting the worst of the internet decide
  • MathWashing or Bias Laundering = explaining away bias because a computer made the decision not a person
  • Some researchers believe dreams are a kind of low stakes simulation training. A lower fidelity energy efficient way to learn about important things and experiment
  • When class imbalance interacts with bias datasets → it often results in even more bias
  • Adversarial attach your CV: You can add Oxford or Cambridge in invisible white text to your CV to get through filters
  • Questions to ask when evaluating AI claims
    1. How broad is the problem?
    2. Where did the training data come from?
    3. Does the problem require a lot of memory?
    4. Is it just copying human biases?

As AI becomes ever more capable, it still won’t know what we want. It will still try to do what we want. But there will always be a potential disconnect between what we want AI to do and what we tell it to do.

  • How do we work with AI going forwards?
    • We have to understand it
    • Choose the right problems for it to solve
    • Anticipate how it will misunderstand us
    • Prevent it from copying the worst of what it finds in data
Category: ,