Episode 8

The Language of Digital Biology

with Daphne Koller, Ph.D.

December 14, 2021

Share Episode
Share on email
Share on twitter
Share on linkedin
Share on facebook

WatcH

Highlights

Listen

Daphne Koller, Ph.D.
Founder and CEO, insitro
Daphne Koller is CEO and Founder of insitro, a machine-learning enabled drug discovery company. Daphne is also co-founder of Engageli, was the Rajeev Motwani Professor of Computer Science at Stanford University, where she served on the faculty for 18 years, the co-CEO and President of Coursera, and the Chief Computing Officer of Calico, an Alphabet company in the healthcare space. She is the author of over 200 refereed publications appearing in venues such as Science, Cell, and Nature Genetics. Daphne was recognized as one of TIME Magazine’s 100 most influential people in 2012. She received the MacArthur Foundation Fellowship in 2004 and the ACM Prize in Computing in 2008. She was inducted into the National Academy of Engineering in 2011 and elected a fellow of the American Association for Artificial Intelligence in 2004, the American Academy of Arts and Sciences in 2014, and the International Society of Computational Biology in 2017.

Science is a team sport. We need to create incentive structure where there is a reward for building infrastructure, even if it's not the most glamorous piece of the puzzle, and to do that, you need to release your own ego.

Transcript

Scroll

Suchi Saria, Ph.D. 0:28
Hi, I’m Suchi Saria. I’m the founder and CEO of Bayesian Health. I also hold an endowed chair at Johns Hopkins, where I’m also the director of machine learning at healthcare lab there. And today, I have this great honor and pleasure to invite Daphne Koller to the meeting. I am a Day Zero advisory council member, through which we get to invite inspiring founders who’ve done exceptional work. Daphne’s had an amazing career, first as a professor, where she did foundational work in machine learning and AI, then as the founder of Coursera, an educational platform that has disseminated and made it possible for people to online educate themselves at scale. And then third, a company called insitro, which we learn more about, that she’s leading currently as the CEO. So Daphne, welcome. I think it’s rare to see people who’ve had three phenomenal careers. Can you give us a little bit of a brief tour to our listeners of your journey getting here.?

Daphne Koller, Ph.D. 1:23
Thank you, Suchi. It’s a real pleasure to be here and I’m happy to talk through some of those chapters and what inspired the transitions between them. I started my career as a computer science graduate student at Stanford and was one of the early people into the field of machine learning back in the early 90s, as it was really becoming a field, with the very first machine learning conferences. I came back to Stanford as the first machine learning faculty hire in 1995, where I had the great fortune to be able to build an incredible group of students, of whom you are one. So it’s a real, wonderful pleasure for me to see how far some of my former trainees have come. I became interested in machine learning applied to biology in the late 90s, as the first largest datasets in biomedical areas were coming into play. Initially, it was just because I found them more interesting technically than some of the datasets that were available to machine learning at the time. Most of them had to do with like labeling spam and non spam emails and it wasn’t all that exciting. But over time, I became interested in biology in its own right and ended up actually having a bifurcated existence at Stanford as a faculty member where half my lab worked on core machine learning problems published in computer science venues and the other half worked on biomedical datasets published in journals such as Nature and Cell. And it was interesting because many of my computer science friends didn’t even realize I did biology and many of my biology friends didn’t realize I was in a computer science department. So it was kind of a weird existence. And that lasted for about a decade. But over time, I began to have an increasing sense of urgency to make an impact in the world in a more immediate way. And that was partly inspired by the fact that some of the papers that I had written that were among the most impactful, including one with you Suchi where it was obvious to me that there was a real world applicability to the technology and the ideas, it was really hard for me within academia to get someone to actually adopt and use this in real clinical care for that particular paper. And I wanted to be able to make that impact. And so it was with that increasing sense of frustration that, when some of the work that I’d been doing at Stanford that had nothing to do with my research, it had to do with making education better with technology that emerged from some of the work I’ve been doing on flipped classroom efforts at Stanford, really created that MOOC movement, the movement of the massive open online courses, that sort of blew into the world, if you will, in late 2011 with these courses from Stanford that each had an enrollment of about 100,000 people or more. And that was kind of like, oh my God, this is just, I can’t believe this is happening. It wasn’t only the number of people that was striking, but also the fact that they were from every age group, every country, in every walk of life, and we were reaching people that really had no other opportunity for education.

Suchi Saria, Ph.D. 4:25
I remember the early days, as you were going through the transition at Stanford, and how much it sort of completely ruffled professors and administration, not only at Stanford, but universities across the country, where they were thinking, what is the future of education now? And you know, is it all going to be all based on MOOCs and, in particular, how much excitement there was from the student body, for machine learning AI, and we had literally, like, students, like, people just signing up, rooms exploding with people wanting to hop in. So I definitely remember those early days and how rapid the movement was too.

Daphne Koller, Ph.D. 5:00
It was incredible. So when I made the decision to leave Stanford on what was supposed to be, at the time, a two year leave of absence to go found Coursera. And it was kind of like I was preparing my TED Talk. And every three days, I would have to update the number of learners on the slide, because the most significant digit had changed in terms of the number of students who had enrolled in the platform. And I have to tell you that it grew way faster, partly at this point because of the pandemic, than I would ever have anticipated. I mean, we were so excited when we hit a million learners. And now there’s 92 million learners on Coursera from all over the world. It’s just like, the number is mind blowing, to me, and the impact that we’ve been able to have on the lives of students everywhere. To me, that was actually one of the things that I tried to measure. As you know, I’m a data person, Suchi. So it wasn’t just like, okay, there’s a lot of learners on the platform, great, but are they actually benefiting? And so one of the things that we did relatively early was we put in place some metrics for what impact that taking these courses have on your life. And we were just so gratified to see that, you know, of people who came onto the platform, which is well over 50%, who wanted to make a change in their career due to the access to education, something like in the high 80s were benefiting and more than a third, were benefiting in tangible, material ways, as in, I was unemployed, and now I am employed, or I got access. I was a barista before. Now I’m a data scientist. And the fact that we were able to open these doors to opportunity to so many people is just, to me, incredibly gratifying. And just to add one note, I saw that just, I think earlier this week, where some of the numbers from the recent Coursera classes show that the people who had found employment through the platform during the pandemic were disproportionately from minority backgrounds. And I found that to be an incredibly inspiring thing that we had been able to do that.

Suchi Saria, Ph.D. 6:57
That’s really lovely. And Coursera is now public. Is that right?

Daphne Koller, Ph.D. 7:01
Yeah, we went public earlier this year, in the spring, on the New York Stock Exchange/

Suchi Saria, Ph.D. 7:05
Perfect. And so, I think some people know you as the co-founder, and co-CEO of Coursera. And they don’t realize that you had a whole phenomenal career being one of the most foundational, exciting researchers in machine learning AI prior to that. And one would hope, okay great, so now Coursera has gone public, but you are already on to the next thing. So tell us a little bit about that.

Daphne Koller, Ph.D. 7:27
So the transition there began around 2016, when I’d been involved in Coursera for about five years, and I saw that the company was on a good trajectory, doing well. But it was primarily a content centric company, whereas I’m a scientist by training and there wasn’t really a lot of call for the skills that I had, certainly not on the biomedical side, but I would say not even on the machine learning side within what Coursera was trying to build, at least at the time. And I really wanted to go back and make a difference. And specifically, I was looking at the world around us and, you know, the machine learning revolution really began in 2012, as I was leaving Stanford to go to Coursera. And so I picked my head up above the trenches and I said, goodness, machine learning is transforming the world in pretty much every area, but not very much in the life sciences. And to me, one of the reasons why that is so difficult is because there just aren’t that many people who are bilingual, who really understand enough about the biomedical side of things, and at the same time, understand enough about the machine learning and can really bring them together in creative ways. And I felt like I’d been fortunate to have had that dual experience and I wanted to really make that happen. And so there was a little bit of a journey between the time that I left Coursera and 2016. But really what we’re trying to do at insitra, which was founded in mid 2018, is to bring those two worlds together in a unique and important way, and create a culture in which life scientists and computational scientists work together, not at the point where they’re solving things, which is already unusual, but actually at the point in which they’re coming up together with the problems that need to be solved. Because what we find is that, when you put people together where one group really understands the problems and the capabilities of what the data can provide, and the other has this ability to understand what kinds of insights one can extract from data in novel ways, you actually come up with a completely different set of questions that you can tackle that are totally unobvious if you were working separately.

Suchi Saria, Ph.D. 9:33
I think we’re seeing this in healthcare too. I know back in 2007, if you remember when we started that work with electronic health records, right, and looking at data from premature babies, I remember at the time when we used to go give talks people were like, what are you talking about? And if you remember, our colleagues in computer science and machine learning AI were like, medicine mumbo jumbo, that seems so hard. And our colleagues over in medicine, if you remember, even in 2009, when we were writing those papers, how hard it was because people had no idea what we were talking about. But fast forward. Now, this is all the rage. Now one of the things I’ve seen over and over people make mistakes is they think practitioners, in this case say healthcare practitioners or physicians, will tell us what to work on or what to implement. And then the data scientists or the engineers will figure out how to implement it. And over and over again, I’m seeing this mistake being made by large companies that are coming to the fore or consulting groups where people are going in and saying, you tell us what problems to solve, we’re mechanics and we can solve them. And fast forward six, seven years, and we just haven’t seen as many results out of those kinds of efforts. And I know you understand this really well. So I totally resonate with this point you’re making. Tell us a little bit about, I think one thing is, there’s been a number of companies in this AI for drug discovery space. And it’s gone from what people thought was really hype-y, and like, is it real, is there anything there, to now, there are multiple very large investments in the space. There are very credible founders, like yourself, leading large companies in the space. Tell us just a little bit of history of the space? Where is the real benefit to be had? What are we doing differently with AI in the mix? And then what is insitra doing and how is insitra different from some of the other companies in the space?

Daphne Koller, Ph.D. 11:21
Yeah, so there’s a lot wrapped up in that question. I would say that there are a few companies that really moved early into this space. And some of them, I think, did a better job than others in creating a durable competitive advantage. I think there’s a lot of companies out there that fall into the trap that you alluded to, Suchi, which is, give me a problem, and I’ll solve it for you. And I don’t think that’s a recipe for success. I think the ones that really are successful are the ones that brought together these two groups of people into something that has sort of the ability to identify and simultaneously solve some really hard problems. The other place where I think we, together with a handful of other companies, are relatively differentiated is that, from the many years that I’ve been doing machine learning, I’ve come to the realization that, garbage in garbage out. And you and I know, Suchi, from our early work on electronic health records, just how much of the data that’s out there, both on the patient side and on the biology side. Some of it is just really just bad. And some of it is not bad if applied to the purpose for which it was created, but if you use it to train machine learning models, it’s just really not fit for purpose. And so one of the things that I think characterizes insitro, along with a handful of other companies that I hold in high regard, is that we pay a lot of attention to where we collect our data and, in fact, where we create our data. And so, insitro, which, if you look at the name is the combination of in silico and in vitro, we’re over half wet scientists who basically are building a data factory that creates very large amounts of both biological and, most recently, chemical data that is used not for scientific investigation, but really for feeding into machine learning models that can then make much better predictions about what is the likely effect of a target if you intervene at it in a human population? What are the characteristics of molecules if you put them into a human? All of these are predictive models and they require a different type of data with a different experimental design than what you get from data that’s used for a scientific investigation. So companies that are making that investment, I think, are pretty differentiated in this space.

Suchi Saria, Ph.D. 13:36
One thing you brought up with this idea of fit for purpose methods. I’m noticing a lot of, like, in the last decade when, you know, people got excited about AI, suddenly, you know, their understanding of AI was a small number of technologies that are off the shelf that you can download and start running. And the challenge was not understanding what data sets or what problems are these technologies good for. And as a result, one of the failures that I see happened because they’re sort of doing lip service, I have a tool and I have this thing, and I know how to hit it. I have no idea if this is the right tool to hit it with. But everybody else tells me and the press is telling me this is a thing. And I think people don’t really realize when they talk machine learning and AI, it’s a land of 100,000 tools. It’s not a land of one tool. And so when they’re applying the same one tool to every problem, they’re running into a lot of issues. Like with electronic health records, for instance, you know, they are really messy. There are all sorts of ways to learn and I know you and I’ve written some papers early in the space in this field, that those ways to learn, like memorize what’s in the data, that’s garbage. And if you don’t actually think about applying right tools, but you apply some magical powerful tool that you think is ultra powerful, like deep GaNS or whatever, what you end up seeing is basically it’s just learning, like, watermarks and things that have no actual predictive capacity or no causal underpinning, and you’re basically getting garbage tools. And then unfortunately, it’s leading to this idea of, like, my anxiety about, like, unfortunately, what might be yet another winter because people are just glibly writing things off because they’re thinking, all AI is equal, not realizing not all AI is equal. You know, there’s well done, and there’s poorly done. So can you tell me a little bit about, like, what do you think, as you’ve sort of navigated this field for the last 20 or so years, or more actually, how do you help people who are on the receiving end of it, who are trying to figure out, you know, what’s good from bad? They want to make progress, but how do they cut through the site? How do you tell it from bad?

Daphne Koller, Ph.D. 15:50
Yeah, so first of all, I have a deep antipathy to hype in general, I think hype is problematic, even when it’s somewhat justified, because I think it’s better to demonstrate the capability and then talk about it, and then the glide of the successes, rather than make promises that may or may not be achievable. And I just try very hard to present a balanced perspective that says, look, there’s a lot of promise in these tools, these tools are incredibly powerful. But the problem that we’re tackling in biology and healthcare is also really, really hard. It’s one of the hardest problems out there. Partly, it’s because of the availability of the right kind of data and partly because human biology is one of the most complex systems that we have to deal with. And so I think it’s important to say, look, this is important, it’s a worthwhile goal, we have a new set of tools, it’s good to try and deploy them. But let’s not count success until we’ve actually achieved success. And when people ask me, for example, when do you know that insitro will have achieved success, my response is, when we put a drug in a patient and that patient gets better. That is my definition of success, when I’m able to look someone in the eye who’s better because of something that I’ve helped create. That, to me, is the definition of success. So as to how you judge in the interim because, for example, we all know that biology and medicine are a slow process and you don’t have that proof point, oftentimes until years down the line, I think you have to make a bet on the team that is working towards the goal. And do they have the right skill set on the machine learning side? And do they have the right skill set on the biology and medicine side? Have they created an integrated team where you have real experts on the different capabilities that you’re looking to bring together? And are they capable of working together as a single team? That to me is what’s going to be the main differentiator between companies that make it and the ones to do not.

Suchi Saria, Ph.D. 17:56
Let’s imagine, fast forward, tell me in insitro terms, as insitro succeeds, what are we going to see practically in the field? For our listeners, how are they going to know it’s working? What do the signs of its working look like? And what are some practical achievements in the field right now?

Daphne Koller, Ph.D. 18:13
So right now, I think that we are starting, we as a community, not just at insitro, are starting to develop proof points for pieces of capability that are potentially value creating. And we’ve seen in the last year, I think, a pretty remarkable achievement on DeepMind for alpha fold, which took a problem that, objectively, people had competed to solve, some very smart people, which was that of folding a protein, and it had pretty much plateaued at a given level. And a new machine learning based approach really cracked that ceiling and achieved a performance that was much, much larger. Now, I think it remains an open question of to what extent this will actually empower drug discovery. But it is a remarkable scientific achievement. I think we’re starting to see medicines that have been created with the help of AI tools, ones that were, in some cases, these are repurposed molecules. There’s one example of that. I would say that we’re even starting to see medicines that have been successful, where at least some lighter weight machine learning has been deployed in the design of those molecules. Like I would say, even some of the work that’s been done by Moderna on the one side of the vaccine, by AbCellera with the antibodies used, maybe not the most sophisticated machine learning models available, but some pretty decent data science models to help design those compounds. I think we’ll see a lot more of that in the next three years where more and more medicines that come to market will have had help from AI at different steps in the value chain. One of the things that we’re excited about at insitro is both thinking about the early discovery where the machine really identifies as de novo a therapeutic hypothesis, a target and a patient population, where a person would not have been able to sort of nail down that this is an important place to intervene. But there is an equally large opportunity to help with stages that are further down the value chain. Like, for example, some of the work that we did with Gilead really allowed us to considerably increase the power of endpoints in a clinical trial by doing better analysis on the histopathology and other biomarker data that had been collected from patients. And so maybe that’s an opportunity to decrease the length of a clinical trial, to decrease the number of patients required in order to see a statistically significant signal. And I think that’s a different place for AI to intervene in places that are maybe less glamorous and uncovering a whole new target, but where a lot of the costs are going is in those downstream clinical trials. So I think there are opportunities along all of those stages.

Suchi Saria, Ph.D. 21:01
I’m excited to see some of the progress you’re going to make. One thing that’s very interesting and unique about you is this idea of a professor CEO. And I’m also a professor CEO. We know there are a handful of professor CEOs, which is where we had deep research careers, and then you transition to also take on translating some of the research to the field. I’ll tell you something that I found really exciting as I was doing this was by being, in a way, when we’re doing scientific research, and you know, we’re making progress, there’s deep, deep, deep emphasis on the results. Because in science, there’s just really no opportunity. You can’t just spin things you know, you have to write papers, you have to give presentations to people who are experts in the field and your whole reputation depends on being able to show data and show results that actually work. Moving from academia to industry, to me, was like sort of an interesting, jarring, different experience where sort of the emphasis on results was very secondary and it feels like the speed with which maybe we need results, but there’s also a notion of like enormous speed with which people want to move forward. In a way, we’re at the frontier. We’re trying to push the frontier. We want to have great results. But we also have this great need for moving at a speed that is breathtaking, I’d say, in startup land. So how have you thought about that?

Daphne Koller, Ph.D. 22:17
I mean, I think you’re certainly right in the sense that there’s clearly a need to move quickly in industry, I would say that it’s maybe a little bit different in the kind of R&D space that we’re at because, if you sacrifice quality for the sake of speed, there is no, I mean, ultimately, the truth of the pudding is you put this drug in a patient and a randomized clinical trial. And if it fails, you have failed. And so you can’t cut corners. You can try and move with urgency, and obviously we do, but cutting corners will come back to bite you in really bad ways. So that is one difference between academia and industry. I would like to highlight, actually, a very different one, which I think is one that I found most dominant, which is that in academia, it’s really all about you as the PI. Think about what PI stands for, principal investigator. We used to have the Koller Lab, or we call it Daphne’s Lab, right? And we all think about that. And it’s all about you as the leader of, you set the vision and it’s all about you. And these sorts of structures work this way. You get rewarded by how many times you’re a first author when you’re more junior, and then how many times you’re last author when you’re more senior, whereas in industry, it’s really about the success of the organization. And the most difficult transition that I have seen in moving from academia to industry is people who are not able to release their ego and really think about science as a team sport, and company building as a team sport, and create a whole that is larger than the sum of the parts, and align people towards a different type of incentive structure where there’s a reward for building infrastructure, for creating something that supports the growth of the organization, even if it’s not the most glamorous piece of the puzzle. And that building of a puzzle, of an organization, that has to come from the top. People have to release their own egos in order to get the people around them to release their egos. I think that is a really critical transition point, in general, in building a successful organization, and especially for someone coming from academia, it’s a hard transition.

Suchi Saria, Ph.D. 24:33
That makes a ton of sense. That’s actually one of the things I despise the most about being in academia. I don’t know if you remember, I almost didn’t want to be a professor because of that. And it’s so funny. I’ve actually loved the idea of how, here, a whole village comes together to make something happen. And it’s really so much more about the thing you’re making happen as opposed to who made it happen, which was just super awesome. You’ve gone through these like incredibly hard adventures. What are some challenges you’ve faced as a founder? What are the some of the hardest challenges you’ve faced?

Daphne Koller, Ph.D. 25:03
I think one of the hardest challenges that I faced, certainly as a first time founder, but in a different way, even as a second time, is building the right team around you. And at Coursera, we really made the mistake of not realizing that you need to have in your team people who’ve been there done that, who bring that industry experience. We had that hubris of first time founders of, we’ll figure it all out. And eventually we kind of did, but it took a lot longer and we made a lot more mistakes than we would have if we’d brought in someone who had more of that industry experience. I think it’s really important to bring those people in early and to pick them really, really carefully because who is on your executive leadership team, the people that you spend all of your time kind of brainstorming with and shaping the strategy and thinking about how to build the company, those people are absolutely critical to the success of the company. And picking them in the right way is really hard and really important. So that, to me, is one of the most important lessons learned. The other one, I think, is coming back to the point I made a moment ago, which is, how does one create an organization that’s really aligned around the overall goals of the company, and creating that sense of, we’re all in this together, while each still giving people ownership over their own part, because otherwise, it ends up being sort of intertwined and it becomes much slower because, you know, you kind of have to coordinate with so many people. So how do you create that sense of shared journey, working towards a common goal, while still letting people move quickly.

Suchi Saria, Ph.D. 25:06
I found over the course of the pandemic, as we’ve been building the team, it’s been really hard to maintain cohesiveness when you’re in a very distributed environment. And that’s almost maybe easier to do when you’re maybe divided in a way that are small projects. But if you’re trying to tackle something really hard that is interdisciplinary, you need 7, 8, 9 disciplines to work together, there’s a lot of mechanics to how do you get everybody to work fast when you have many disciplines in the mix, and still maintain a sense of ownership. Are there takeaways you have for other founders who are trying to do that?

Daphne Koller, Ph.D. 27:16
So what we’ve done is we have made the project team the core execution unit in the company. So you define the broader strategy, you break it down into, you know, goals and sub goals. And then when you have a sub goal, you say, this is a manageable unit of work, to which we assign a cross disciplinary project team, and they report in different places. So if you have a machine learning scientist, they report into a machine learning person, you have a stem cell biologist, they report into, you know, developmental scientists, but they work together on that project team for, you know, six, nine months, however long the project takes. And they together have a sense of shared ownership of that sub goal and achieving this. And it does require a lot more thought because you have to be architecting the goals of the company in a way that they break down into these units of work that makes sense. That’s one source of difficulty. A different source of difficulty is creating a culture of people who are able to talk across disciplinary boundaries and really have respect and openness to the perspectives of people who are very different from themselves. And both of these are things that you have to be really deliberate about in how you build the company. And so to me, that is one important lesson learned to new founders is be really deliberate about your culture, and be really deliberate about your processes for how you create and create goals and create alignment. And, we’ve been, at insitro, so far better the culture side. I think we’re really, really good at the culture side. The processes are still a work in progress. I think we’re getting better, but it’s a journey.

Suchi Saria, Ph.D. 29:02
How big is insitro now?

Daphne Koller, Ph.D. 29:04
We’re just about 150 people. It’s a weird number, Dunbar’s number is hitting us, so we need to really be thinking about this.

Suchi Saria, Ph.D. 29:14
Daphne, thanks so much for spending time with me any one last takeaway for everybody? And in particular, one last message about, what’s your hope for where AI is headed and practical results in the next five years or three to five years.

Daphne Koller, Ph.D. 29:25
So the place I would actually like to end is with a message to people urging them to consider digital biology as a field that they would potentially want to pursue. To me, this is one of the most exciting fields that we’ve been in. I think, if we look historically, there’s been periods in history where certain disciplines have really kind of taken an incredible amount of progress in a short amount of time because of new technologies or new perspective. So, you know, it was in the beginning of the 1900s, it was physics where we understood the connection between matter and energy, in between space and time. And then in the mid to late 90s, it was actually two disciplines. It was the discipline of data science, which emerged from computing, but also involved elements of statistics and optimization and neuroscience, and what I call quantitative biomedicine, in which we moved away from being purely descriptive and really started to get at mechanistic systems level understanding of biological systems. The discipline of 2020, I think, is going to be this integration of those last two, where you have the ability to measure biology at unprecedented fidelity and scale, interpret what we’re measuring using the tools of data science, and then intervening in those biological systems to get them to do something that they wouldn’t otherwise do. I think that is an incredible opportunity for medicine, but also in the environment, in energy, in biomaterials, in agriculture. And it’s just an incredible time to be in this space and create a brand new discipline, which I think is just an incredible thing for people to do.

Subscribe for Updates​

For exclusive access to Think Medium content and program updates, subscribe here.