Top 10 Projects in Data Science

Machine Learning Bootcamp

Data Science | North America

Digital transformation is do-or-die in today’s tech-fueled business world, but rather than just investing in the latest whiz-bang tech, Levi Strauss & Co. is also betting big on its people. In 2021, the fashion giant partnered with artificial intelligence consultancy Launchpad.ai to create an eight-week Machine Learning Bootcamp, considered the first of its kind in the industry. The program is open to employees across the enterprise—from those working the sales floor to those designing next season’s styles. Participants cover the foundations of data science, coding, machine learning and agile, and dive into use cases specific to the apparel industry, such as predicting manufacturing defects or optimizing the inventory mix at a store. “Every organization today is a digital, data and AI one—whether it realizes it or not,” says senior VP and chief strategy and AI officer Katia Walsh.

6th Most Influential Project of 2022

Read the Full Article ›

The point of view is from behind the screen, looking through the data & images to the woman's face and hands as she manipulates the windows of information

Mental Health Calls Data Analysis

Data Science

In the earliest days of the pandemic, as nations around the globe instituted mandatory lockdowns, many feared the enforced quarantines would result in serious setbacks to mental health—forcing domestic violence victims to be home with their abusers, driving the vulnerable to thoughts of suicide and exacerbating economic insecurity. To see if that concern bore out, researchers at Switzerland’s University of Lausanne launched a project to analyze 8 million helpline calls from 19 countries, all made during the pandemic. In November 2021, they revealed the findings: Six weeks into the outbreak, there were 35 percent more calls than before the pandemic, but most were about COVID-19 and loneliness. Yet calls regarding suicidal ideation did rise when subsequent lockdowns were mandated—and decreased when financial support, like economic impact payments, were offered or extended. Such data-driven insights provide not just a snapshot into pandemic behaviors but have the potential to influence future public policy and shape how governments respond to future crises.

Earth Mineral Catalog

Data Science

There are 10,000-plus varieties of minerals, from the deep night sky blue of lapis lazuli to the honeyed hue of citrine. Yet for centuries, mineralogists have been cataloguing them solely by species, not by how or when they were formed. It’s a limiting approach, considering that how a mineral comes into existence can reveal an important story about what was happening on Earth at the time. So mineralogists Robert Hazen and Shaunna Morrison from the Carnegie Institution created a first-of-its-kind compendium of mineral varieties. By testing minerals for trace elements of materials that might have surrounded the specimen as it was formed or by measuring their radiation levels, the two were able to create a catalog that not only tracks what the minerals are, but how and when they came to be. The research took more than 15 years to complete and almost doubles the inventory of known mineral varieties.

Firstleaf

Data Science

Wine subscription company Firstleaf made its name by using an algorithm to help customers zero in on the vino they never knew they were missing—their fermented-grape soulmate, if you will. Then in late 2021, the company unveiled a data-driven update designed to guide customers not to a handful of potential outcomes, but on any one of 410,000 distinct paths to their next favorite tipple. The platform analyzes more than 2,000 attributes for each wine Firstleaf offers, quantifying everything from minerality to mouthfeel. But that’s just the first round of data analysis. Once customers review the wines they’ve tried, machine learning uses these insights to further refine a user’s individual profile. The end result? A system that brings customers ever closer to the wine of their dreams, one data point at a time. Cheers to that.

Male programmer using laptop analyzing and developing in various information on futuristic virtual interface screen

Data Science for Health Discovery and Innovation in Africa

Data Science

Rapid advances in data science have the potential to transform health outcomes across Africa, which currently shoulders an outsize share of the global burden of disease. To accelerate impact, the University of Cape Town (UCT) in South Africa and the National Institutes of Health in the United States have joined forces on a five-year, US$75 million project to establish a data science research and training network across the continent. The Harnessing Data Science for Health Discovery and Innovation in Africa program focuses on everything from using data to boost pandemic preparedness in Nigeria to identifying women at risk for poor pregnancy outcomes in Kenya to improving medical diagnostic accuracy in Uganda. UCT will also oversee an open data science platform and coordinating center, in partnership with the Human Heredity and Health in Africa consortium, aimed at facilitating team research and collaboration. A second phase is already in the works, with the goal of expanding the project’s reach.

Blue processor chip, tech environment, blockchain concept

Delta Sharing

Data Science

AI innovation can’t happen without data, but anyone tasked with managing a data deluge knows how tough governance can be. And as companies increasingly ramp up use of third-party and vendor data, that task becomes even more complex. Enter Databricks, a San Francisco firm that has drawn hundreds of high-profile clients to its data management platform that warehouses raw data (known as “data lakes”) and allows analytical queries to be run against that information even before it has been formally organized into a database. In May 2021, Databricks added an open-source data collaboration program to this increasingly ubiquitous program, including the ability to share data with other organizations in real time—a feature known as Delta Sharing—without sacrificing security or running afoul of compliance strictures. The game-changing innovation quickly won rave reviews from such tech pillars as AWS, Microsoft and Google, and in June the company extended its utility even further with a marketplace that allows anyone with Delta Sharing-compatible clients to buy, sell and share data, along with machine-learning models and dashboards.

Speaking in Color

Data Science

Imagine an architect searching for just the right color—the one that conjures up a bustling market in Morocco or a long-ago picnic at a beach in Japan. It would be a long circuitous journey to arrive at that perfect hue. To make the process more intuitive, U.S. paint and coatings maker Sherwin-Williams teamed up with creative agency Wunderman Thompson to create a voice-activated AI app called Speaking in Color. The program, which launched in June, takes the user’s prompt and sorts through the many thousands of possible hues in its image database, then voilà: a whole palette of tailored choices. Fine-tuning the hue happens by voice as well—“Make it pop,” say, or “Add in a touch of gray”). The tool is targeted to the B2B crowd right now, but the result could be one of the largest color data sets in the world, opening up a surfeit of insights into global, cultural and geographical influences and preferences.

Bloom

Data Science

Large language models (LLM)—machine-learning algorithms trained on titanic amounts of data to recognize and generate human language—are getting bigger, smarter and more ubiquitous all the time. But developing and training one of these deep neural networks is costly, meaning it’s an initiative largely left to tech giants like Meta, Alphabet and Microsoft. But that changed with Bloom, a project to create the world’s largest multilingual LLM trained in complete transparency by a global volunteer team of more than 1,000 researchers from more than 250 institutions. The year-long, €3 million effort was centered on transparency and inclusivity from the start: The team developed LLM-specific data governance structures that make it clear where training data originates from, sourced data sets from around the world to minimize bias and actively recruited researchers from diverse backgrounds. With funding by U.S. AI startup Hugging Face and French research agencies Genci and IDRIS, the initiative culminated in 117 days of training on a supercomputer in Paris before Bloom was released to the world in July.

A isometric illustration of a city with many buildings, streets, people, trees, and a train

Justice Navigator

Data Science

Few topics are as divisive as that of police use of force and its relationship to race in the United States. Complicating the debate is a scarcity of data—or at least, that was the case until the Center for Policing Equity launched its Justice Navigator interactive app. Released in September 2021, it allows users to access information on exactly how participating police departments operate. Through extensive data analysis, the app breaks down the reality of where, why and on whom police officers use force. For example, one analysis states plainly: “Taking into account the influence of neighborhood crime rates, poverty and share of Black residents, Black people were subjected to force 4.4 times as often as white people.” The app is a case study in how data science can help society navigate fraught topics—and keep the focus on solutions.

A map of Paris that is color coded by the noise intensity of the area

Noisy City

Data Science

A co-founder of data science firm Jetpack.AI, Brussels data scientist Karim Douïeb, PhD, is among the lucky few members of his profession who’ve seen their work go viral. In 2019, his myth-busting map depicting U.S. political affiliations by region exploded across the internet—and introduced the world to Douïeb’s innovative and instantly graspable data visualizations. In June, he turned his attention to noise pollution, which negatively affects 60 million people living in Europe, per a 2022 study by the Barcelona Institute for Global Health. With support from climate action nonprofit Possible, Douïeb launched a new project: audible data visualizations—or a “data sonification experiment,” as he’s called it, called Noisy City. As users move a computer mouse around different city neighborhoods (such as London and New York City), traffic sounds increase or decrease in intensity—adding a visceral audio layer to an already impactful design.

Machine Learning Bootcamp

Data Science | North America

6th Most Influential Project of 2022

Read the Full Article ›

Data Science for Health Discovery and Innovation in Africa

Data Science

Bloom

Data Science

Mental Health Calls Data Analysis

Data Science

Delta Sharing

Data Science

Justice Navigator

Data Science

Earth Mineral Catalog

Data Science

Speaking in Color

Data Science

Noisy City

Data Science

Firstleaf

Data Science