Machine Learning companies in SF/BayArea

*** If you want to be kept up to date about new job opportunities, data related events, and interviews with Data Scientists/Engineers, then Subscribe to our mailing list ***

I (Tony from SFBayML) am personally curating this, so I won't send you stuff that I wouldn't personally find valuable.

Company Url City Description
0xdata Mountain View 0xdata has H2O, which is the platform for data analysis. H2O provides in-memory distributed processing and deep learning algorithm.
247 Inc Campbell [24]7 is changing the world of customer experience by leveraging data, data sciences and prediction technologies in real time. We are a Sequoia backed profitable pre-IPO company with an impressive board and a proven leadership team. We have some of the largest brands and companies in the world as our customers - American Express, AT&T , Argos, Best Buy, Capital One, Optus, Target and United Airlines.
Adobe San Francisco
Airbnb San Francisco
Akuda Labs San Jose Akuda Labs collects over a billion documents and images daily, from social networks, blogs, forums, and news sites. We classify, index, store, and analyze them in real time. Our classification system makes use of lexical, semantic, and statistical classifiers. These filters continuously adapt and evolve through the use of robust machine learning techniques. Simultaneously, we extract author and publisher information. We harvest and augment demographic information using sophisticated text and object-recognition image analysis algorithms.
AliveCor Inc. San Francisco AliveCor Heart Monitor is a personalized, easy-to-use device that records accurate ECGs using a smartphone. We develop machine-learned algorithms to detect heart conditions automatically. Our large patient base is helping us discover new ways to diagnose desease in ways that have not been possible until now.
Alpine Data Labs San Francisco Alpine Data Labs is a machine learning company that is focused on helping other companies analyze their data more effectively. Alpine Chorus is a big data collaborative platform for code-less machine learning.
Amazon A9 Palo Alto
Apple Cupertino Oakland Work on various projects in the Search and NLP Domain to solve user intent identification challenges using machine learning.
Argyle Data San Mateo
Baidu Sunnyvale
BandPage San Francisco BandPage empowers musicians to build their fan bases, increase their revenue, and make a living at their craft. BandPage is used by over half a million musicians, including nearly all popular artists. Meanwhile, BandPage is integrated with most of the popular consumer music services, including Spotify, Rdio, Rhapsody, Shazam, LiveNation, XBox Music, Vevo, etc.
Bridge San Francisco We're creating a reasoning engine to make smart devices smarter and hopefully have a real impact on the user experience of the Internet of Things. We're using graph databases, rules-based heuristics, knowledge representation, and machine learning.
Captricity Oakland Captricity provides customer data as a service. We provide a single platform that can capture customer data across all input channels, starting with the hardest: unconstrained handwriting into structured data. Captricity uses crowd-guided machine learning and computer vision at scale to deliver data at 99+% accuracy, and at the speed and cost of a software service. Captricity allows customers to significantly increase customer engagement, cut costs, and “go digital,” all while complying with strict security requirements.

We are a python shop making use of django for our web stack. Here are some problems where we make heavy usage of machine learning:
- ICR, OCR, OMR: Handwriting, printed text, and mark (multiple choice) recognition
- Document Recognition and Classification: Form template identification
- Page stream segmentation: Segmentation of stream of documents into their form constituents
- Document Definition: Form structure identification and extraction
CellScope, Inc. San Francisco Using computer vision and machine learning to help interpret images and videos acquired with our iPhone otoscope (for viewing the eardrum) and future devices.
Cerebellum Capital San Francisco
Clari Mountain View Clari brings mobile, design and data science into sales. Our Sales Productivity Platform works with your existing enterprise systems to dramatically improve sales team productivity and increase sales forecast accuracy.

Example of some machine learning problems:
1. Estimate the risk of one opportunity based on current status and historical records;
2. Forecast the close time, close amount of deals
3. Aggregate forecast of sales performance for a team/division/company
4. Differentiate high-performing sales rep from low-performing ones.
5. Recommend immediate actions to save risky opportunities.
6. churn analysis for products

Some tools being used:
Mahout, Spark, Scala

ClearStory Data Menlo Park
CrowdFlower San Francisco CrowdFlower is the leading data enrichment platform for data scientists. Our quality-control technology is the most accurate and fastest way to collect, label, and clean data from an on-demand workforce. We call this people-powered data enrichment.

The CrowdFlower platform automates the management of the online workforce to tackle tasks that require human intelligence like:
Search relevance tuning
Data categorization
Image annotation
Metadata creation
Sentiment analysis

By accessing over 5 million on-demand contributors, CrowdFlower is the smarter and better way to tackle large quantities of repetitive work. When compared to alternatives like outsourcing or managing interns, CrowdFlower eliminates the lead time and overhead of traditional hiring, managing staff or outsourcing to save OPEX and avoid lost revenue.

Our primary goal is to ensure results are of the highest quality. Quality assurance can be time consuming and delays getting the final results wanted. CrowdFlower has the industry's best quality control mechanisms and we track everything from contributor response velocity to answer distribution. We assign every unit of completed work a confidence score to ensure the work completed has only been done by trusted and proven contributors.

Companies of all sizes have seen impressive results with CrowdFlower and include LinkedIn, Intuit, Delectable, Cellscope, The Home Depot, Flickr, Edelman and ebay.

Founded in 2009, CrowdFlower is backed by Trinity Ventures, Bessemer Venture Partners and Canvas Venture Fund.
Databricks, Inc. Berkeley Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark. We've been working for the past six years on cutting-edge systems to extract value from Big Data. We believe that Big Data is a huge opportunity that is still largely untapped, and we're working to revolutionize what you can do with it.
DataHero San Francisco Our goal is to enable anyone to combine data from multiple SaaS providers (Salesforce, Pardot, Braintree, Eventbrite, etc) and gain insight by leveraging smart visualizations!

We're a full-stack Javascript shop with Node.js on the backend right now, but we've got plans for more. There are opportunities for both supervised learning where users are telling us whether our classifications have been correct or not, as well as unsupervised learning and recommender systems based on similarity of datasets and what other users have done. Other projects include leveraging D3 for better transition and data exploration, and creating a read-only architecture that can independently scale our embeddable charts & dashboards, separate from our read/write architecture used for data importing and chart updates. San Francisco focuses on realtime monitoring, measurement, and reporting of proximity driven customer engagement via mobile devices.

Drawbridge San Mateo Drawbridge is the industry's leading cross-device technology company that enables brands to have seamless conversations with consumers across their connected devices, including desktops, smartphones, and tablets. By leveraging its Connected Consumer Graph, which includes 1.1 billion consumers across 2.6 billion devices to date, the company is able to gain insights and a much deeper understanding of consumer behavior to drive better ROI for advertisers - from creating brand awareness to driving incremental sales. The company is headquartered in Silicon Valley and is backed by Sequoia Capital, Kleiner Perkins Caufield and Byers, and Northgate Capital. For more information visit
Dropbox San Francisco
Enlitic San Francisco Enlitic uses recent advances in machine learning to make medical diagnostics faster, more accurate, and more accessible. The company's mission is to provide the tools that allow physicians to fully utilize the vast stores of medical data collected today, regardless of what form they are in - such as medical images, doctors' notes, and structured lab tests. To realize this vision, we are building on state-of-the-art deep learning algorithms and partnering with top research hospitals and medical device manufacturers.
Ersatz Labs Pacifica We make a platform for machine learning, with a particular emphasis on deep learning. From a single web-based interface, you can wrangle data, experiment with different machine learning models, and deploy your models to your app via our API. We're seeing a wide variety of use cases with a particular focus on finance (RNNs for time series prediction) and medical imaging (CNNs for object detection in medical images, 3d or otherwise).
Facebook Menlo Park
Glassdoor Mill Valley
Google Mountain View
Grace Note Emeryville
Grammarly San Francisco Grammarly aims to perfect written English by continuously advancing the world's most accurate automated proofreading technology. The Grammarly engine checks for more than 250 types of spelling, grammar and punctuation errors, enhances vocabulary usage and suggests citations. More than 3 million English writers worldwide trust Grammarly's products, which are also licensed by leading universities and corporations.
HeartFlow Redwood City Machine learning / computer vision for predicting structured data, namely, the myocardium, the coronary arteries, and coronary plaque, on computed tomography angiography.
Humtap San Francisco Music information retrieval, including signal processing and machine learning, is a core component of Humtap.
Infer Palo Alto We use machine learning to directly help businesses win more customers.

We have an amazing team from MIT, Berkeley, CMU, MSR, Google, FB, YC, Palantir, Jane Street, and more.

Our stack is mostly Python (on the ML side: numpy, scipy, sklearn, IPyNB, pandas, etc.), Postgresql, EC2.
iRhythm Technologies San Francisco iRhythm Technologies, Inc. is a medical device startup company combining novel diagnostic device and data analysis concepts together to create a new approach to cardiac rhythm monitoring. Out data science team is applying data-driven machine learning technologies to very large data sets from wearable medical sensors.
Lendable Oakland At Lendable, we use machine learning to expand access to fair microcredit in the developing world. We use machine learning to help microfinance institutions and product finance companies in the developing world to understand their customers and products better.

We build tools to automate machine learning tasks, including time-series feature extraction, anomaly detection, classification, and systems to understand the implications of our results when used by real organizations to make decisions in the field. First and foremost, we care about how machine learning and data science can make measurable impacts in the real world.
LinkedIn Mountain View Professional Networking company. Lots of problems related to machine learning, information retrieval, data mining and optimization.
Lithium San Francisco
LiveCareer San Francisco LiveCareer helps people build careers by offering resume and cover letter building tools. We have one the largest resume warehouses in the world. We employ natural language processing and machine learning at scale to gain insights into our users and offer new products. For example, we have automated resume checker. Our machine learning stack is primarily Python (scikit-learn and nltk) and Hadoop (Mahout).
Machine Zone Palo Alto Machine Zone is a leader in Massively Multiplayer online free to play games. We have millions of players actively engaged in top rated titles on mobile ecosystems like iOS and Android.

We have a data platform which collects and analyzes data streams in realtime. Large scale data analysis, Machine Learning and Data Science applications help us understand player behavior and enables us to enhance the immersive gaming experience for players. We analyze and do realtime NLP on language data that flows through the game.
Microsoft Bing todo
Open Door San Francisco
Oribital Insight Mountain View Orbital Insight is a Geospatial Big Data company leveraging the rapidly growing availability of satellite, UAV, and other geospatial data sources, to understand and characterize socio-economic trends at global, regional, and hyper-local scales. In one recent project we used convolutional neural networks running on GPUs to identify and count 700 million cars across more than one million parking lots and roads. Other projects have used various technologies to predict corn yield from agricultural images of the US corn belt, count shadow pixels to estimate construction rates for all of China, and analyze floating lids of oil tanks to estimate worldwide oil inventory.
Numenta Redwood City Numenta's mission is to lead a new era of machine intelligence. Our technology, Hierarchical Temporal Memory (HTM), is a detailed computational framework based on principles of the brain along with an extensive suite of software operating on HTM principles. Our HTM learning algorithms are available through the NuPIC open source community and are embedded in our commercial applications.
Netflix Los Gatos
Prismatic San Francisco At the core of Prismatic are a number of ranking and relevance systems that connect people to content that is always engaging. We analyze millions of web pages every day, classify them into tens of thousands of topics, and personalize the experience using a ranking algorithm that takes into account hundreds of features related to the content, the user, and interactions between the two. Our backend is written entirely in Clojure, a JVM-based LISP.
QED Berkeley QED builds customized technology solutions in the areas of machine learning, data visualization, and scientific computing. We have developed and deployed solutions across a wide gamut of application areas, including agriculture, telecommunications, bioinformatics, finance, and computer vision.
Quantcast San Francisco Uses big data and machine learning to provide audience insights and targeted advertising to publishers/marketers.
Quantellia Mountain View Quantellia integrates machine learning into a complex systems decision model to solve high-value problems. Its current offerings include optimizing the decisions made by telecom companies as they build the next generation of the internet: on poles, under the ground, and in your homes and businesses. It also offers decision simulation and visualization middleware for financial institutions, along with solutions in a number of other arenas.
Quantifind Menlo Park
Rally Health San Francisco We are making healthcare-related recommendations using both lifestyle and clinical data. We are partially owned by UnitedHealth and also work with other payers (insurances, employers, and exchanges). We use scala, python, spark, and hive on a CDH cluster.
Reflektion, Inc. San Mateo Putting customers at the heart of your commerce systems can yield revolutionary results. Reflektion offers a quick and easy way for retailers and brands to achieve better business performance. We do the number crunching, learning and delivering applications from the cloud. Our patented algorithms and technology reduce computational overhead by hundreds of times, enabling modeling of individual users every day. Reflektion detects trends and predicts in real-time what each customer is most likely to buy next, enabling dramatically higher sales and customer engagement. These technologies are packaged in a set of easily deployable solutions for personalized ecommerce, customer analytics, and personalized marketing.

We are using supervised and un-supervised Machine Learning algorithms to understand and predict user behavior and intent, as well as algorithms to predict future trends in businesses. Our Big Data architecture is built to handle large datasets in real-time (Hadoop, Kafka, Storm, etc.)

Read more:
Set Media Inc. San Francisco SET Media is a video technology company that connects brand advertisers with their audiences through high quality, targeted, and brand safe campaigns at scale. We classify and filter online videos using object, facial, and motion recognition algorithms, enabling advertisers to run ads on the most relevant content to their brand.
Sift Science San Francisco We apply online, large-scale machine learning to fight fraud for e-commerce businesses. We provide an API-based, cloud solution that can be used to instrument buy flows and user events to model different behaviors and surface correlations against good and bad behaviors.

Our technology is mostly Java-based and built on Hadoop (HDFS, HBase). We receive 1000s of API calls/second and surface scores with a 99p within 100s of ms.
Staples SparX San Mateo SparX is a small team executing on the firm belief that the future of eCommerce is driven by technical innovation and operational excellence.

We operate in skunk works mode within Staples, with an exceptional degree of autonomy, tasked to design and build the products that will shape Staples' future.

We do not wait for technical innovation to bake before we start leveraging it. Quite the contrary, we recognize the trend early and push in driving the technology forward. We have been leveraging Big Data, Machine Learning, Predictive Modeling, Real-time Distributed Systems, and Clojure as early as 2010. We draw all this together to build actual products, running in production, and which directly impact Staples' bottom line.

The great thing is that Staples is the world's 3rd largest eCommerce player, so no problem is trivial and the sky is the limit. If you want a worthy challenge: scale, a firehose of data, the chance to target and impact 50 million users... we got it! San Francisco Startup.ML is a machine learning accelerator for startups. We take care of the math and coding, so the founders can focus on building great companies. Our world-class data scientists and machine learning engineers actively engage with startups for a year to ensure that machine learning techniques are successfully embedded into their products and a smooth transition is made to an in-house team. We then take on the role of a trusted advisor and provide counsel to founders to set them up for continued success.
Stripe San Francisco
Sumo Logic Redwood City
Symantec Mountain View Security Analytics : Internet of things, Mobile, IT
Tagged San Francisco
The Climate Corporation San Francisco
Thomson Reuters San Francisco Thomson Reuters is the world's leading source of intelligent information for businesses and professionals
We combine industry expertise with innovative technology to deliver critical information to leading decision makers in the financial and risk, legal, tax and accounting, intellectual property and science and media markets, powered by the world's most trusted news organization.
Thumb Tack San Francisco
Tower Research Capital San Francisco Tower Research Capital LLC specializes in quantitative trading and investment strategies.

Founded in 1998, Tower develops proprietary trading algorithms by using rigorous statistical methodology to identify non-random patterns in the behavior of markets. Our portfolio managers use these algorithms to earn exceptional returns while mitigating risk.

In the course of developing its current trading strategies, Tower Research Capital LLC has built a powerful set of analytical tools and an automated trade execution infrastructure, which it is leveraging to pursue new trading opportunities. The company is made up of a rare combination of highly proficient individuals with backgrounds in a variety of fields: mathematics, computer science, statistics, physics, economics, engineering, and finance.
Uber Technologies San Francisco
URX San Francisco URX powers a deeplink search engine for mobile developers.

The data science team leverages techniques in ML, information retrieval and large scale data processing to ensure relevant content is delivered via their search engine.
Whistle San Francisco Whistle's first product is the Whistle Activity Monitor which is an accelerometer based health tracker for your dog. It attaches to any collar and measures your dog's activities, giving you a new perspective on day-to-day behavior and long-term trends.

Whistle quantifies your dog's activity level throughout each day, automatically detects key events, and uses machine learning methodologies to classify those events as walking, running, or playing. In the case that the event is misclassified, you can change the classification and that correction will be automatically used to improve the classification of future events for your dog.
WibiData San Francisco WibiData provides the real-time machine-learning and analytics capabilities that enterprises need to deliver personalized interactions across channels. Our software empowers the marketing, data science, and engineering departments across organizations to collaborate on a unified personalization strategy. Berkeley, Inc. sells cloud-based, machine learning applications to solve targeted business problems spanning the entire customer lifecycle. Our team of PhD statisticians & computer scientists innovates cutting-edge machine learning software which builds the foundation of our suite of analytics applications that help our clients optimize how they acquire, monetize, and retain their customers.
Yelp San Francisco
Yummly San Francisco Yummly is a food technology company currently focussed on recipe discovery. We develop machine learning and natural language processing methods to extract recipes from unstructured text, parse ingredients and recipe instructions and map them onto food ontologies, infer recipe attributes (e.g. cuisine), parse user queries, rank search results, recommend recipes, and more generally solve problems that help us understand food data and users' tastes.