Skip to main content

Just Enough Math

Paco Nathan (Liber 118)
Business | Computational Thinking
E145/146
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ***..
(3.29, 21 ratings)
Slides:   1-PDF 

THIS TUTORIAL HAS REQUIREMENTS AND INSTRUCTIONS LISTED BELOW

This tutorial provides a hands-on programming intro to advanced math for business people — showing “just enough math” to take advantage of some popular open source frameworks.

The premise is that many people take university-level math, up until the “killing fields” of calculus. Most did not continue beyond that, but still have an interest. Meanwhile, math programs in many universities cling tenaciously to Cold War-era priorities, intent on weeding out people who would not pass requirements as engineers to build missiles, etc.

With the commercial successes of Machine Learning, Cloud Computing, etc., there are very good business cases for having “just enough math” to leverage new kinds of open source tools. These days people in business need to understand more about complex graphs, sparse matrices, Bayesian priors, optimization solvers, etc., which are not hard to learn but placed far beyond calculus.

As a case in point: in preparation for their recent IPO, Twitter overhauled their revenue apps to emphasize applying semigroups, monoids, rings, algebraic graph theory, etc., to leverage functional programming for efficient parallel processing at scale. Those topics may sound obscure, but 100 lines of Python illustrate the math clearly.

The formula applied in the tutorial is simple: a series of sections build on each other, where each introduces a few clear math concepts, discusses the history and typical uses, along with a sample business use case illustrating how to leverage that math, followed by brief code examples in Python that show how to solve for the use case. Then we look at open source projects which get used in production for similar kinds of work.

Part 1: Show Me The Monoid

Abstract algebra is a basis for parallel processing and functional programming, which ultimately leads to very good software engineering. Monoids, semigroups, rings, etc., are how we build Enterprise data workflows based on OSS frameworks such as Summingbird (Scala) and MBrace (F#).

Part 2: Lies, Damn Lies, Statistics, and Bayesian Statistics

A look at Bayesian statistics, point estimates, and other ways of making decisions when the business criteria are in flux and changing over time. Not the statistics that you learned in college, but what gets used in commerce, especially in the context of “Internet of Things” and sensor arrays.

Part 3: The Red Pill

Building on the above to explore some of the more useful parts of linear algebra, its relation to algebraic graph theory, graph queries, eigenvalues, factorization, etc. These provide the heavy lifting currently in high-ROI apps at scale.

Part 4: Caterpillar Won’t Be Building A Social Network

Leveraging linear algebra to introduce optimization theory, convex spaces, operations research… Focus on a pricing model example of linear programming using PyGLPK and then Genetic Programming for business use cases in which calculus does not apply.

Part 5: A Winning Approach

A simple conceptual framework to represent real-world problems as graphs, then solve the graphs as sparse matrices, implemented in clusters based on cloud computing.

In summary, the first four parts introduce different advanced math topics that are practical (and arguably essential) to use in business. Then the fifth part ties these pieces together into a conceptual whole, as a new way to approach computational thinking.

TUTORIAL REQUIREMENTS AND INSTRUCTIONS FOR ATTENDEES

* High School Algebra 2 or beyond in math background
* Some experience with Python will add to the experience; however, one can cut&paste all the code examples
* Laptop with Python 2.7 installed
* The Anaconda free download is recommended: http://continuum.io/downloads
* Alternatively, the instructor will provide alternative environments: a cloud server or VagrantVM on a USB stick

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Paco Nathan

Paco Nathan

Liber 118

O’Reilly author (Enterprise Data Workflows with Cascading) and a “player/coach” who’s led innovative Data teams building large-scale apps for 10+ yrs. Expert in machine learning, cluster computing, and Enterprise use cases for Big Data. Interests: Mesos, PMML, Open Data, Cascalog, Scalding, Python for analytics, NLP.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)