Meet a Data Scientist: Dr. Elle O'Brien

Data Circles is excited to present the next entry in our new series, “Meet A Data Scientist!”

“Meet a Data Scientist” is dedicated to recognizing the amazing women powering the Puget Sound area’s data science community, spotlighting their journey into the field, their incredible accomplishments, and the weighty challenges that they faced along the way. This lies at the heart of Data Circles’ mission of inspiring women to enter the data science field by showcasing its many incredible role models.

Do you know any marvelous women in data science? Send us a tip here!

Dr. Elle O’Brien is a lecturer and research investigator at the University of Michigan. As a data scientist with an eclectic background ranging from academia to tech startups, Dr. O’Brien is intimately familiar with the dynamics and priorities of bo…

Dr. Elle O’Brien is a lecturer and research investigator at the University of Michigan. As a data scientist with an eclectic background ranging from academia to tech startups, Dr. O’Brien is intimately familiar with the dynamics and priorities of both worlds, as well as the diverse people and perspectives that the field needs and attracts today. Armed with that experience and a deep appreciation for the fundamentals, Dr. O’Brien is in exactly the right place at the right time, forging the next generation of well-rounded data scientists. Here, she shares her story and philosophy behind teaching all things data.



ON TEACHING DATA SCIENCE

Data science as a field is simultaneously as old as statistics and as new as computer science--it really depends on who you ask. Pathways into the field also span a wide range as some arrived directly after a STEM-focused college education, while some transitioned from other roles with the help of MOOCs, bootcamps, or the organic needs of our job requirements.

It’s a fact of the field that Dr. Elle O’Brien is mindful of. 

"Since data science is so new, there's not a standard way to get in, and that’s a good thing,” she says.

Dr. O’Brien, recently of Iterative.ai, is about to embark on a new role as a lecturer and research investigator at the University of Michigan’s Data Science school. As a data scientist with a unique and storied origin of her own, she has an appreciation for the kinds of eclectic people that the field attracts. Given the opportunity to shape future data scientists and their education at the University of Michigan, it’s an aspect that she very much wants to preserve and facilitate while, most importantly, still imparting the fundamentals of the discipline.

"My personal belief is that data scientists need to have the boring stuff down; I really love the boring stuff, and I love convincing people that they should care about it,” she says, citing boring-but-integral things like version control, code-writing best practices, and mathematical rigor. “I know not everyone is going to turn into a mathematician or an engineer, but that's not the goal.”

Since data science is so new, there’s not a standard way to get in... The field is simply too big to expect everybody to know everything, and that’s a good thing—that we can all specialize, that there’s going to be many different niches.

"Part of the challenge is that you have people coming from very different backgrounds. Some will be engineers, some will be scientists, and others will be coming from something completely different,” she explains. “I think that's a strength of data science, but the lack of a standardized background also makes it harder to have a single explanation for concepts that will suffice for all parties.”

This insight has inevitably shaped her approach to teaching data science.

“When describing something like DVC pipelines,” she explains, using a product example from her time at Iterative, “I’ll maybe introduce it for math people as, 'okay, this is a directed acyclic graph'. For software engineers, I'll say, 'okay, this is a make file.' For people unfamiliar with either field, I’ll explain it as ‘a recipe with steps that you want to follow every time.' You're always trying to find the right reference because, 'I know you have this kind of background, so this will mean something to you'.”

Though the goal is to reach the most people she can, she concedes that success isn’t guaranteed, though that isn’t as much of an issue as it sounds. She cites as an influence the late Bob Ross, the ‘80s public television painter who, thanks to the internet, continues to inspire viewers with a gentle and quiet confidence that they too can pick up the paintbrush and become artists.

“Being willing to say, 'it's okay if we don't all get this right now. I'm going to explain this, and if you don't understand what that means now, that's fine,'” she elaborates, mirroring Ross’ patience and persistence, “because the field is simply too big to expect everybody to know everything, and that's a good thing--that we can all specialize, that there's going to be many different niches.”


LEARNING TO LOVE MATH AND SCIENCE

Dr. O’Brien’s approach is no doubt informed by her own journey in discovering the joys of science and code, which appears less intentional than a series of happy accidents. 

In high school, she enjoyed distinctly non-STEM subjects like art and writing, and briefly entertained the notion of enrolling in culinary school. She developed an affinity for mathematics during her undergraduate education at Agnes Scott College in Georgia however, where she found comfort in the stability and constancy of calculus, where there was always “a right answer.”

Data scientists need to have the boring stuff down; I really love the boring stuff, and I love convincing people that they should care about it.

In searching for a part-time job that was anything but scooping ice cream--a summer job she held once and vowed never again to do--she determined that work at one of the research labs in her area would provide better means to survive as a student in Atlanta. She eventually ended up at the computational neuroscience lab. It was good paying work, but what she hadn’t counted on was that the work would serendipitously be interesting and complimentary with her newfound appreciation for math. 

"I was never drawn to how my science classes were in high school, where it was about memorizing facts, like, 'look at how much I know',” she recalls.  “Research was nothing like that. It's all about understanding what we don't know and asking 'can we measure this? What can't we measure?' I'm much more interested in those kinds of questions. I love getting to be a skeptic."

It was also at the computational neuroscience lab that Dr. O’Brien was first exposed to code, where she learned--perhaps unluckily--in trial by fire.

“I just kinda showed up and everyone in the lab knew how to code. There was an expectation that you would just kind of teach yourself. I don't think that worked terribly well for most of us. I think we learned out of fear of public humiliation, but I did learn because you just had to," she says.

Despite these early growing pains, the experience served her well, ostensibly setting her up for graduate life in the sciences. After Agnes Scott College, she briefly entertained entering rabbinical school, but ultimately enrolled at the University of Washington’s neuroscience school for a master of science, which in turn was followed by a PhD in speech and hearing science, also at UW.

With the prospects of culinary school and rabbinical school in the rearview mirror, it seemed that Dr. O’Brien was destined to settle into a life of academic pursuit. It wouldn’t be so simple however, as her familiarity with the academic science world eventually led her to question its culture and structure, resulting in a disillusionment that plagued many a career academic before her.

Dr. O’Brien thus began setting her sights on horizons beyond the walls of academia.


STARTUP LIFE

Dr. O’Brien credits academia as typically having strong mathematical backgrounds and a deep understanding of their field of research--attributes that those in industry aren’t often as strong in. On the other hand, academics tend to falter in terms of engineering process and efficiency when compared to industry. It’s a lack that is not lost on academia, she concedes, but rather is the result of the unique structure and operation of research labs.

"I think it's very hard given how labs are funded per project, which are carried out by very temporary employees--graduate students and post-docs--who are there for only a few years and often less. There's very little in the way of permanent staff that can create those conventions,” she explains.

For example, labs often host technically talented students with the ability to build technical solutions and infrastructure to the level of sophistication and efficiency found in industry, but their brief tenures essentially ensure an inability to rally support and resources for building technologies and the processes to maintain them--never mind training everyone in the lab in their use and upkeep, if implemented.

“It doesn't work without a top-down incentive and funds, and a lot of labs just don't have either; it's not really their fault,” she concludes.

Given these obstacles, Dr. O’Brien sought experience in industry. Though she hadn’t had opportunities to employ deep learning models in her research, she played around with them in her free time, eventually creating a model that would output nonsensically comedic romance novel titles. Serendipitously, the model caught the attention of the AVClub blog, which in turn caught the attention of Botnik, an Amazon Techstars accelerator startup that had similarly been exploring the comedic applications of AI.

Dr. O’Brien’s romance novel AI gaining mainstream coverage and winning the Internet’s heart

Dr. O’Brien’s romance novel AI gaining mainstream coverage and winning the Internet’s heart

If the comedic genius of her romance novel side project weren’t enough, Botnik itself was located mere blocks from Dr. O’Brien. Given her interest in gaining industry experience, the proximity of the opportunity offered by Botnik’s CEO for collaboration seemed as if it was meant to be. Before long, she came aboard as their Chief Scientist, building real world experience in creating deep learning and language models, literally for laughs.

“Our CEO had a bunch of comedy writer friends, and so we would come together in a writer's room and we'd play with a predictive text app that we made,” she says of her work there. “You can kinda ‘season it’ with a book or another source of text to get suggestions in that style. We wrote a chapter of Harry Potter and that did well on the internet. We just did a lot of cultural, fun writing projects with it."

After graduating with her PhD, Dr. O’Brien was faced with the decision of joining a large and time-tested institution in IBM, or continuing to work in startups. Valuing the impact and influence she could have as an individual contributor, she doubled down on startup life, this time heading to Iterative.ai, where she served as a full-time data scientist.

“I liked that it was open source software, because I had some hope that it might be useful to scientists," she adds, noting the then-distant prospect of a return to academia.

While at Iterative, she contributed to their goal of building out the data science toolkit, and in particular, assisted in building technologies making Git and related version control applications easier and more accessible to the DS community.

"We have a philosophy that tools a lot of software engineers use and have well-defined processes around like Git don't work super well with data science because of data science-specific necessities like big data sets, big models,” says Dr. O’Brien of Iterative’s vision. “So we made things like DVC, which helps you extend git version control. Our other project, CML, which I helped work on a bit, is all about adapting Github actions. What we do is we take a tool that's already there, and we try to extend it towards data science use."

 
 

Building tools to better integrate into the data science workflow was only part of her mission however; in particular, Dr. O’Brien also invested a lot of time educating and advocating for the use of these “boring-but-integral” tools. To aid in that project, Dr. O’Brien did what many savvy technical people do these days: she made a series of YouTube videos to explain everything Iterative was creating for the DS community.

"I was very pleasantly surprised by how many people were interested in learning about this stuff because it can be kind of dry. I was often saying, ‘Git is fun!’ You need it, but it's not that fun, and it's not that easy, either,” she admits, betraying both modesty and amazement over her audience’s enthusiastic reception. “The number of people that wanted to learn was really cool. That gave me hope that you can make people care about ‘bread and butter’ topics like that."

It was an important revelation, considering where she would end up next.

There’s a lot of illiteracy about what data means or implies. People tend to want science to back up a lot of their decisions and often, it doesn’t... It’s an interesting place to be negotiating some of those conversations, about ‘how are we going to talk about data?’

BEING THE BRIDGE

Having just completed her stint at Iterative, Dr. O’Brien is returning to academia, not so much out of a burning desire to return to something familiar as it’s an opportunity that finds her in the right place, at the right time, given a data scientist with her extensive, wide-ranging experience and unique set of skills.

Throughout her journey, she’s trained as a mathematician and scientist, organically finding her way to code and the interdisciplinary field of data science. It’s a path that has given her the time and space to develop a love for the field’s fundamentals, while also witnessing its applications and limitations. Coming into her lecturer role at the University of Michigan at this point in her career, it’s that passion for data science and its foundations that makes her an ideal communicator and teacher on its behalf; no small thing given the field’s ongoing need for effective and enthusiastic communicators.

"I know not everything in the world is literacy,” she concedes. “I don't think if everyone understood data, we wouldn’t still see a lot of big issues with discrimination and inequality, but I do think that there's a lot of illiteracy about what data means or implies. People tend to want science to back up a lot of their decisions and often, it doesn't. I feel like it's an interesting place to be negotiating some of those conversations, about 'how are we going to talk about data?'"

Dr. O’Brien touches on how these tendencies have led to the misuse of statistics and related scientific research in the news, policy, and even everyday conversation both on- and offline. Data science too, it seems, is not immune to this phenomenon, and it behooves us as data scientists to improve here and better communicate our understanding.

"I now feel hyper-aware of just how much misinformation we're being bombarded with every day,” she says. “There’s a statistical literacy of what a single study can tell you, and it's so frequently misrepresented.” 

I think you can be a good storyteller while also respecting the complexity of the things you work on. You can tell more interesting stories if you’re willing to live with that ambiguity.

By way of example, she notes clickbaity articles that extoll the latest fad diet or life-extending supplement that seem to populate every major online new outlet or aggregator, and how their strident pop science enthusiasm veers into a kind of subtle lifestyle coercion, guilting readers into adopting unreasonable behaviors or views for fear of not caring enough over their own health.

“The big determinants of health outcomes in this country are whether you have access to healthcare, race, and socioeconomic status; not how many probiotics you consume," she says, acknowledging that much of these sensationalist headlines and shoddy reportage are driven by the perverse incentives of attention-maximizing algorithms--themselves very particular products of data science endeavor.

"I think there's a sense that to tell a story that will catch people's attention, you have to say something that's really outrageous or overstate how powerful data science is, and I don't think that's true. I think you can be a good storyteller while also respecting the complexity of the things you work on,” says Dr. O’Brien, acknowledging the complexity of life and the avenues for discovery it opens. “You can tell more interesting stories if you're willing to live with that ambiguity. Like, 'I don't know if I can design a system that can do this thing, but let's talk about why.' That's often more interesting than overpromising or talking about how 'the great, big future when bots know everything about what song you're going to like.' It's more interesting talking about why that's hard to do."

Returning to first principles, learning to love and be true to the fundamentals is ultimately what gives the practice rigor and makes for good data science work, she argues. These cardinal values are ostensibly the lessons she looks forward to imparting on the next generation of data scientists. Having been in both academia and industry, and bringing with her perspectives and experiences from inside and out of science and math, Dr. O’Brien is the perfect bridge between worlds.

Given her love of teaching and promoting the discipline and an appreciation for her own unique journey, there’s no one better placed to be that bridge, helping new generations of data scientists from diverse backgrounds enter the field, imparting them with the tools and philosophies that will allow their work to be robust and impactful.

"A lot of really important issues come down to cultural and societal issues,” she says. “I just feel like giving people some of the language to see what science is, and what you can really get out of data, has just become more important to me.”

Tony Loiseleur