Could AI End Human QA?

Episode Transcript

1 00:00:04,431 --> 00:00:10,281 The original title for this episode was AI Killed the QA Star, which would make more 2 00:00:10,281 --> 00:00:14,811 sense if you knew the eighties lore of the very first music video played on MTV. 3 00:00:15,201 --> 00:00:18,141 Um, that's a music television TV channel. 4 00:00:18,141 --> 00:00:20,541 Back, back in the days, how you watched videos before the internet. 5 00:00:20,911 --> 00:00:25,141 That was in 1981 and it was called Video Killed the Radio Star. 6 00:00:25,141 --> 00:00:30,091 But I decided that a Deep cut title was too obscure for this conversation. 7 00:00:30,214 --> 00:00:34,464 Yet the question still remains, could the increased velocity of shipping 8 00:00:34,464 --> 00:00:40,254 AI generated code cause businesses to leave human based QA behind, 9 00:00:40,621 --> 00:00:43,501 presumably, because we're not gonna hire any more of them, and we don't 10 00:00:43,501 --> 00:00:46,951 want to grow those teams and operations teams just because we created AI code. 11 00:00:47,401 --> 00:00:52,051 And would we start relying more on production observability to detect code 12 00:00:52,051 --> 00:00:54,571 issues that affect user experience? 13 00:00:55,031 --> 00:00:59,441 And that's the theory of today's guest, Andrew Tunall, the President 14 00:00:59,441 --> 00:01:01,721 and Chief product Officer at Embrace. 15 00:01:02,111 --> 00:01:05,261 They're a mobile observability platform company that I first 16 00:01:05,261 --> 00:01:07,271 met at KubeCon London this year. 17 00:01:07,646 --> 00:01:12,776 Their pitch was that mobile apps were ready for the full observability stack 18 00:01:12,986 --> 00:01:18,206 and that we now have SDKs to let mobile dev teams integrate with the same 19 00:01:18,206 --> 00:01:22,766 tools that we platform engineers and DevOps people and operators have been 20 00:01:22,766 --> 00:01:24,446 building and enjoying for years now. 21 00:01:25,046 --> 00:01:30,506 I can tell you that we don't yet have exactly a full picture on how AI will 22 00:01:30,506 --> 00:01:35,216 affect those roles, but I can tell you that business management is being told 23 00:01:35,546 --> 00:01:40,346 that similar to software development, they can expect gains from using AI to 24 00:01:40,346 --> 00:01:46,046 assist or replace operators, testers, build engineers, QA and DevOps. 25 00:01:46,436 --> 00:01:48,746 That's not true, or at least not yet. 26 00:01:49,316 --> 00:01:50,996 But it seems to be an expectation. 27 00:01:51,269 --> 00:01:56,559 And that's not gonna stop us from trying to integrate LLMs into our jobs more. 28 00:01:56,559 --> 00:02:01,149 So I wanted to hear from observability experts on how they think this 29 00:02:01,149 --> 00:02:02,319 is all going to shake out. 30 00:02:02,889 --> 00:02:05,829 So I hope you enjoyed this conversation with Andrew of Embrace. 31 00:02:07,804 --> 00:02:08,304 Hello. 32 00:02:08,804 --> 00:02:09,204 Hey, Bret. 33 00:02:09,254 --> 00:02:09,734 How are you doing? 34 00:02:09,912 --> 00:02:14,542 Andrew Tunal is the president and chief product officer of Embrace. 35 00:02:14,922 --> 00:02:20,152 And if you've not heard of Embrace, they are, I think your claim to me the 36 00:02:20,152 --> 00:02:22,659 first time talked was, you're the first. 37 00:02:23,159 --> 00:02:27,579 Mobile focused or mobile only observability company in the 38 00:02:27,579 --> 00:02:29,009 cloud native computing foundation. 39 00:02:29,009 --> 00:02:30,289 Is that a correct statement? 40 00:02:30,789 --> 00:02:33,629 I don't know if in, probably in CNCF, right? 41 00:02:33,649 --> 00:02:38,219 Because like we're, we were certainly the first company that started going 42 00:02:38,219 --> 00:02:42,099 all in on OpenTelemetry as the means with which we published instrumentation. 43 00:02:42,159 --> 00:02:45,049 obviously CNCF has a bunch of observability vendors, some of 44 00:02:45,049 --> 00:02:49,849 which do, mobile and web run, but, completely focused on that. 45 00:02:49,849 --> 00:02:50,009 Yeah. 46 00:02:50,009 --> 00:02:50,789 I would say we're the first. 47 00:02:51,289 --> 00:02:51,599 Yeah. 48 00:02:51,719 --> 00:02:53,849 I mean, we can just, we can say that until it's not true. 49 00:02:54,399 --> 00:02:57,919 the internet will judge us harshly, for whatever claims we make today. 50 00:02:58,200 --> 00:02:58,620 Yeah. 51 00:02:59,120 --> 00:03:00,690 well, and that just means people are listening. 52 00:03:00,770 --> 00:03:01,170 Andrew. 53 00:03:01,170 --> 00:03:06,700 Okay, I told people on the internet that we were going to talk about So, the idea 54 00:03:06,760 --> 00:03:12,179 that you gave me that QA is at risk of falling behind because if we're pretty 55 00:03:12,550 --> 00:03:16,506 If we're producing more code, if we're shipping more code because of AI, even 56 00:03:16,506 --> 00:03:20,116 the pessimistic stuff about, you know, we've seen some studies in the last 57 00:03:20,116 --> 00:03:25,986 quarter around effective use of AI, it is anywhere between negative 20% And positive 58 00:03:25,986 --> 00:03:30,136 30 percent in productivity, depending on the team and the organization. 59 00:03:30,326 --> 00:03:33,866 So let's assume for a minute it's on the positive side, and the AI has helped 60 00:03:34,096 --> 00:03:39,936 the team produce more code, ship more releases, have, even in the perfect world 61 00:03:39,936 --> 00:03:45,036 that I see teams automating everything, there's still usually, and almost always, 62 00:03:45,036 --> 00:03:48,706 in fact, I will say in my experience, 100 percent of cases, there is a human at 63 00:03:48,706 --> 00:03:50,466 some point during the deployment process. 64 00:03:50,666 --> 00:03:51,826 Whether that's QA. 65 00:03:52,191 --> 00:03:56,411 Whether that's a PR reviewer, whether that's someone who spun 66 00:03:56,411 --> 00:03:59,931 up the test instances to run it in a staging environment. 67 00:04:00,391 --> 00:04:01,491 There's always something. 68 00:04:01,501 --> 00:04:04,741 So tell me a little bit about where this idea came from and what you 69 00:04:04,751 --> 00:04:06,491 think might be a solution to that. 70 00:04:06,991 --> 00:04:07,541 Yeah. 71 00:04:07,641 --> 00:04:11,731 I'll start with the, the belief that I mean, AI is going to fundamentally 72 00:04:11,731 --> 00:04:14,851 alter the productivity of software engineering organizations. 73 00:04:14,851 --> 00:04:18,201 I mean, I think the, CTOs I talk to out there are making 74 00:04:18,201 --> 00:04:19,681 a pretty big bet that it will. 75 00:04:20,181 --> 00:04:22,981 you know, there's today and then there's, okay, think about the, even 76 00:04:22,981 --> 00:04:26,781 just the pace that, that AI has evolved in, in the past, the past year and 77 00:04:26,781 --> 00:04:29,421 you know, what it'll look like given the investment that's flowing into it. 78 00:04:29,431 --> 00:04:34,341 But if you start with that, the claim that AI is going to kill QA is more 79 00:04:34,341 --> 00:04:37,911 about the fact that, we built, Our software development life cycle under 80 00:04:37,911 --> 00:04:42,441 the assumption that software was slow to build and relatively expensive to do so. 81 00:04:42,571 --> 00:04:47,201 And so if those start to change, right, like a lot of the, systems 82 00:04:47,281 --> 00:04:50,471 and processes we put around, our software development life cycle, 83 00:04:50,471 --> 00:04:51,771 they probably need to change too. 84 00:04:52,081 --> 00:04:56,371 because ultimately, If you say, okay, we had 10 engineers and formerly we were 85 00:04:56,371 --> 00:04:59,391 going to have to double that number to go build the number of features we want. 86 00:04:59,421 --> 00:05:04,541 And suddenly those engineers become, more productive to go build the features and 87 00:05:04,541 --> 00:05:07,201 capabilities you want inside your apps. 88 00:05:07,701 --> 00:05:11,651 I find it really hard to believe that those organizations are going to make 89 00:05:11,651 --> 00:05:16,781 the same investment in QA organizations and automated testing, etc. to keep 90 00:05:16,791 --> 00:05:18,581 pace with that level of development. 91 00:05:18,931 --> 00:05:22,511 The underlying hypothesis was like productivity allows them to build 92 00:05:22,511 --> 00:05:23,871 more software cheaper, right? 93 00:05:23,911 --> 00:05:27,081 And like cheaper rarely correlates with adding more humans to the loop. 94 00:05:27,281 --> 00:05:28,651 yeah, the promise. 95 00:05:29,151 --> 00:05:30,931 Like I make this joke and it's probably. 96 00:05:31,431 --> 00:05:34,521 Not true in most cases, cause it's a little old, but I used to say things 97 00:05:34,521 --> 00:05:38,141 like, yeah, CIO magazine told them to deploy Kubernetes or, you know, like 98 00:05:38,141 --> 00:05:44,201 whatever, whatever executive VP level C, CIO, CTO, and you're at that chief level. 99 00:05:44,201 --> 00:05:45,471 So I'm making a joke about you, but, 100 00:05:45,811 --> 00:05:45,851 funny. 101 00:05:46,187 --> 00:05:49,307 but that's when you're not in the engineering ranks and you 102 00:05:49,307 --> 00:05:52,067 maybe have multiple levels of management, you get that sort of. 103 00:05:52,567 --> 00:05:55,957 overview where you're reading and you're discussing things with other suits. 104 00:05:56,147 --> 00:06:00,187 You're reading the suits magazines, the CIO magazines, the uh, IT 105 00:06:00,187 --> 00:06:01,657 pro magazines and all that stuff. 106 00:06:01,793 --> 00:06:05,503 yeah, the point I'm making here is that, people are being told that AI 107 00:06:05,503 --> 00:06:06,933 is going to save the business money. 108 00:06:07,003 --> 00:06:10,453 I, I think I've said this to several podcasts already, but, you weren't here. 109 00:06:10,663 --> 00:06:15,633 So I'm telling you that I was in a media event in London, KubeCon, and, 110 00:06:15,653 --> 00:06:18,323 they give me a media pass for some reason, because I make a podcast. 111 00:06:18,323 --> 00:06:21,503 So they act like I'm a journalist standing around all real journalists 112 00:06:21,523 --> 00:06:26,473 and pretending, but I was there and I watched an analyst who's, from my 113 00:06:26,473 --> 00:06:29,393 understanding, I don't actually know the company, but they sound like the 114 00:06:29,413 --> 00:06:31,083 Gartner of Europe or something like that. 115 00:06:31,583 --> 00:06:35,003 and the words came out of the mouth at an infrastructure conference, 116 00:06:35,023 --> 00:06:39,213 the person said, my clients are looking for humanless ops. 117 00:06:39,713 --> 00:06:43,483 And I visibly, I think, chuckled in the room because 118 00:06:43,483 --> 00:06:44,353 I thought, well, that's great. 119 00:06:44,383 --> 00:06:47,913 That's rich that you're at a 12, 000 ops person. 120 00:06:47,953 --> 00:06:52,013 Conference telling us that your companies want none of this. 121 00:06:52,484 --> 00:06:54,124 these are all humans here doing this work. 122 00:06:55,150 --> 00:07:00,820 the premise of your argument about QA is my exact same thoughts was nobody 123 00:07:00,820 --> 00:07:05,080 is budgeting for more QA or more operations personnel or more DevOps 124 00:07:05,110 --> 00:07:10,340 personnel, just because the engineers are able to produce more code with AI, 125 00:07:10,440 --> 00:07:12,700 in the app teams and the feature teams. 126 00:07:13,120 --> 00:07:16,400 And so we've all got to do better and we've got to figure out where 127 00:07:16,460 --> 00:07:19,050 AI can help us because if we don't. 128 00:07:19,550 --> 00:07:21,710 They're going to, they're just going to hire the person that 129 00:07:21,710 --> 00:07:24,860 says they can, even though maybe it's a little bit of a shit show. 130 00:07:25,085 --> 00:07:28,025 my belief is that software organizations need to change. 131 00:07:28,455 --> 00:07:32,795 The way they work for an AI future, so that might be cultural changes, it might 132 00:07:32,795 --> 00:07:37,635 be role changes, it might be, the, you know, the word, the words like human in 133 00:07:37,635 --> 00:07:41,125 the loop get tossed around a lot when it's about engineers interacting with 134 00:07:41,125 --> 00:07:44,435 AI, and the question is, okay, what does that actually look like, right? 135 00:07:44,445 --> 00:07:48,625 Are we just reviewing AI's PRs and, you know, kind of blindly saying, yep, 136 00:07:48,845 --> 00:07:51,725 like they wrote the unit tests for something that works, or is it like 137 00:07:51,725 --> 00:07:55,555 we're actually doing critical thinking, critical systems thinking, unique 138 00:07:55,565 --> 00:08:00,345 thinking that allows us as owners of the business and our user success? 139 00:08:00,845 --> 00:08:03,925 to design and build better software with AI as a tool. 140 00:08:04,205 --> 00:08:07,185 and it's not just QA, it's kind of like all along the software development 141 00:08:07,185 --> 00:08:10,815 life cycle, how do we put the right practices in place, and how do we build 142 00:08:10,815 --> 00:08:15,735 an organization that actually allows us, with this new AI driven future, you know, 143 00:08:15,735 --> 00:08:19,965 whether it's agents, doing work on our behalf, or whether it's us just with AI 144 00:08:19,965 --> 00:08:22,405 assisted coding, to build better software. 145 00:08:22,595 --> 00:08:25,785 And, yeah, I'm interested in what that looks like over the next couple of years. 146 00:08:26,226 --> 00:08:30,746 That's kind of the premise of my, the new Adjetic DevOps podcast. 147 00:08:30,746 --> 00:08:34,466 And also as I'm building out this new GitHub Actions course, I'm realizing that 148 00:08:34,966 --> 00:08:36,786 I'm having to make the best practices. 149 00:08:36,806 --> 00:08:40,526 I'm having to figure out what is risky and what's not because no one has really 150 00:08:40,526 --> 00:08:46,476 figured this out yet in any great detail and, in fact, at, KubeCon London in April, 151 00:08:46,476 --> 00:08:53,316 which feels like a lifetime ago, there was only one talk around using AI anywhere 152 00:08:53,316 --> 00:08:55,446 in the DevOps and operations path. 153 00:08:55,946 --> 00:08:56,886 To benefit us. 154 00:08:56,916 --> 00:08:59,876 It was all about how to run infrastructure for AI. 155 00:09:00,176 --> 00:09:04,236 And granted, KubeCon is an infrastructure conference for platform engineering 156 00:09:04,246 --> 00:09:06,066 builders and all that, so it makes sense. 157 00:09:06,326 --> 00:09:10,556 But the fact that really only one talk, and it was a team from Cisco, which 158 00:09:10,576 --> 00:09:13,446 I don't think of Cisco as like the bleeding edge AI company, but it was a 159 00:09:13,446 --> 00:09:18,251 team from Cisco, simply trying to get And workflow or maybe you would call it 160 00:09:18,251 --> 00:09:22,411 like an agentic workflow for PR review, which in my case, in that case, I'm 161 00:09:22,411 --> 00:09:26,451 presuming that humans are still writing their code and AI is reviewing the code. 162 00:09:26,701 --> 00:09:30,761 I'm actually, I was just yesterday, I was on GitHub trying to figure out if 163 00:09:30,771 --> 00:09:35,111 I, if there was a way to make branch rules or, some sort of automation 164 00:09:35,111 --> 00:09:41,536 rule that if the AI, wrote the PR, the AI doesn't get to review the PR 165 00:09:42,036 --> 00:09:42,996 Yeah, yeah, right. 166 00:09:43,287 --> 00:09:45,047 we've got this coming at us from both angles. 167 00:09:45,047 --> 00:09:47,027 We've got AI in our IDEs. 168 00:09:47,067 --> 00:09:51,177 We've got, multiple companies, Devon and GitHub themselves. 169 00:09:51,177 --> 00:09:54,127 They now have the GitHub copilot code agent. 170 00:09:54,137 --> 00:09:55,427 I have to make sure I get that term, right. 171 00:09:55,787 --> 00:09:59,177 GitHub copilot code agent, which will write the PR and 172 00:09:59,177 --> 00:10:00,927 the code to solve your issue. 173 00:10:01,427 --> 00:10:05,367 And then they have a PR code copilot reviewer. 174 00:10:05,662 --> 00:10:07,302 Agent that will review the code. 175 00:10:07,802 --> 00:10:12,842 It's the same models, different context, but, it feels like that 176 00:10:12,842 --> 00:10:14,332 doesn't, that's not a human in the loop. 177 00:10:14,572 --> 00:10:19,242 So we're going to need these guardrails and these checks in order to make sure 178 00:10:19,242 --> 00:10:23,802 that code didn't end up in production with literally no human eyeballs 179 00:10:24,002 --> 00:10:28,552 ever in the path of that code being created, reviewed, tested, and shipped. 180 00:10:28,932 --> 00:10:29,832 cause we can do that now. 181 00:10:29,852 --> 00:10:30,532 Like we did, we 182 00:10:30,576 --> 00:10:31,836 Yeah, totally, you're totally good. 183 00:10:31,856 --> 00:10:35,396 And I mean, you can easily perceive the mistakes that could happen too, right? 184 00:10:35,396 --> 00:10:39,566 I mean, before I, I took this role, at Embrace, I was at New Relic for four and a 185 00:10:39,576 --> 00:10:41,356 half years, and before that I was at AWS. 186 00:10:41,656 --> 00:10:44,786 And so, obviously I've spent a lot of time around CloudFormation 187 00:10:44,787 --> 00:10:46,426 templates, Terraform, etc. 188 00:10:46,926 --> 00:10:49,696 You can see a world where, you know, AI builds your CloudFormation 189 00:10:49,706 --> 00:10:53,496 template for you and selects an EC2 instance type because the information 190 00:10:53,496 --> 00:10:57,176 it has about your workload is optimal for this EC2 instance type. 191 00:10:57,646 --> 00:11:00,756 But in the region you're running, that instance type's not freely 192 00:11:00,756 --> 00:11:02,616 available for you to autoscale to. 193 00:11:02,656 --> 00:11:05,106 And pretty soon, you go try to provision more instances, 194 00:11:05,106 --> 00:11:06,546 and poof, you hit your cap. 195 00:11:06,586 --> 00:11:11,726 Because, that instance type just doesn't have availability in Singapore. 196 00:11:11,796 --> 00:11:16,556 And, us as humans and operators, we learn a lot about our operating environments, we 197 00:11:16,556 --> 00:11:20,636 learn about our workloads, we learn about the, the character, the peculiarities 198 00:11:20,636 --> 00:11:24,626 of them that don't make sense to a computer, but are, like, based on reality. 199 00:11:25,126 --> 00:11:27,626 over time, maybe AI gets really good at those things, right? 200 00:11:27,626 --> 00:11:31,616 But the question is, how do we build the culture into kind of be guiding our, 201 00:11:31,656 --> 00:11:36,560 our, you know, army of assistance to build software that really works for our 202 00:11:36,560 --> 00:11:41,510 users, instead of just trusting it to go do the right thing, because we view, you 203 00:11:41,510 --> 00:11:44,730 know, everything as having a true and pure result, which I don't think is true. 204 00:11:44,730 --> 00:11:47,350 a lot of the tech we build is for people who build consumer 205 00:11:47,350 --> 00:11:48,510 mobile apps and websites. 206 00:11:48,580 --> 00:11:50,360 I mean, that is what the tech we build is for. 207 00:11:50,860 --> 00:11:54,790 And you can easily see, you know, some of our, our engineers have been playing 208 00:11:54,790 --> 00:11:59,770 around with using AI assisted coding to implement our SDK in a consumer mobile 209 00:11:59,770 --> 00:12:01,790 app that, and it works quite well, right? 210 00:12:02,290 --> 00:12:05,790 You can see situations where, an engineer gets asked by somebody, a 211 00:12:05,800 --> 00:12:09,500 product manager, a leader to say, Hey, you know, we got a note from marketing. 212 00:12:09,500 --> 00:12:13,300 They want to implement this new attribution SDK that's going to go. 213 00:12:13,610 --> 00:12:17,010 build up a profile of our users and help us build more, you know, 214 00:12:17,140 --> 00:12:18,970 customer friendly experiences. 215 00:12:19,470 --> 00:12:22,780 You have the bots go do it, it tests the code, everything works just fine. 216 00:12:22,820 --> 00:12:29,320 And then, for some reason that SDK makes a uncached request out to US West 2 217 00:12:29,330 --> 00:12:33,110 for users globally that, you know, for your users in Southeast Asia instead 218 00:12:33,110 --> 00:12:37,480 of adding an additional six and a half seconds to app startup because physics. 219 00:12:37,980 --> 00:12:39,720 And, what do those users do? 220 00:12:39,730 --> 00:12:41,750 if you start an app and it's. 221 00:12:41,980 --> 00:12:46,590 sits there hanging for four, five, six seconds, and you don't have to use 222 00:12:46,590 --> 00:12:49,560 the app because it's you know, you're waiting for your boarding pass to come 223 00:12:49,560 --> 00:12:51,380 up and you're about to get on the plane. 224 00:12:51,580 --> 00:12:53,770 You probably perceive it as broken and abandoned. 225 00:12:53,800 --> 00:12:58,190 And to me, that's like a reliability problem that requires systems thinking 226 00:12:58,190 --> 00:13:01,600 and cultural design around how you're engineering organization to avoid. 227 00:13:01,915 --> 00:13:03,765 one that I don't think is immediately solved with AI. 228 00:13:04,265 --> 00:13:04,715 right. 229 00:13:05,075 --> 00:13:09,655 observability is only getting more complex even, you know, it's a cat and mouse game. 230 00:13:09,655 --> 00:13:13,815 I think in all, in all regards, and I think everything we do has yin yang I'm 231 00:13:13,815 --> 00:13:19,855 watching organizations that are full steam ahead, aggressively using ai, and. 232 00:13:21,225 --> 00:13:22,735 I'd love to see some stats. 233 00:13:22,755 --> 00:13:26,795 I don't know if you have empirical evidence or if you sort of anecdotal stuff 234 00:13:26,795 --> 00:13:31,585 from your clients, but where you see them accelerate with AI and then almost have a 235 00:13:32,085 --> 00:13:35,955 pulling back effect because they realize how easy it is for them to ship more bugs. 236 00:13:36,455 --> 00:13:40,915 and then sure, we could have the AI writing tests, but it can also write 237 00:13:40,915 --> 00:13:43,465 really horrible tests, or it can just delete tests because it doesn't 238 00:13:43,465 --> 00:13:44,725 like them because they keep failing. 239 00:13:45,025 --> 00:13:46,485 it's that to me multiple times. 240 00:13:46,495 --> 00:13:48,685 In fact, I think, Oh, what's his name? 241 00:13:48,685 --> 00:13:50,125 Gene, not, it wasn't Gene Kim. 242 00:13:50,425 --> 00:13:53,795 There's a, a famous, it might've been the guy who created Extreme Programming. 243 00:13:54,195 --> 00:13:58,125 I think I heard him on a podcast talking about he wished he could make certain, all 244 00:13:58,125 --> 00:13:59,695 of his test files have to be read only. 245 00:14:00,195 --> 00:14:04,185 Because he writes the tests, and then he expects the AI to make them pass, 246 00:14:04,355 --> 00:14:07,355 and the AI will eventually give up and then want to rewrite his tests. 247 00:14:07,375 --> 00:14:10,915 And he doesn't seem to be able to stop the AI from doing that, other than just 248 00:14:11,255 --> 00:14:13,325 denying, deny, or cancel, cancel, cancel. 249 00:14:13,665 --> 00:14:16,895 And there's a scenario where that can easily happen in 250 00:14:16,895 --> 00:14:18,895 the automation of CI and CD. 251 00:14:19,160 --> 00:14:22,900 Where, you know, suddenly it decided that the tests failing were okay. 252 00:14:23,090 --> 00:14:24,900 And then it's going to push it to production anyway, or 253 00:14:24,900 --> 00:14:26,620 whatever craziness ensues. 254 00:14:26,840 --> 00:14:28,840 Do you see stuff happening on the ground there? 255 00:14:28,840 --> 00:14:29,110 That's 256 00:14:29,110 --> 00:14:35,660 did you read that article about, the, AI driven store, a fake store purveyor 257 00:14:35,670 --> 00:14:37,860 that Anthropic created named Claudius? 258 00:14:38,020 --> 00:14:42,270 That, when it got pushed back from clo, the Anthropic employees around 259 00:14:42,270 --> 00:14:43,890 what it was stocking, its fake fridge. 260 00:14:43,890 --> 00:14:48,780 Its fake store with it called security on the employees to try to displace them. 261 00:14:48,780 --> 00:14:52,330 I mean, it's, I think what you're pointing to is like the moral compass of 262 00:14:52,330 --> 00:14:57,160 AI is not necessarily, it, like that's a very complex thing to solve, right? 263 00:14:57,160 --> 00:14:58,930 Is is this the right thing to do or not? 264 00:14:58,940 --> 00:15:01,610 Because it's trying to solve the problem however it can solve the 265 00:15:01,610 --> 00:15:03,160 problem with whatever tools it has. 266 00:15:03,170 --> 00:15:05,090 Not whether the problem is the right one to solve. 267 00:15:05,090 --> 00:15:08,380 And right is, I mean, obviously, you know, even humans struggle with this one. 268 00:15:08,880 --> 00:15:09,240 Right. 269 00:15:09,490 --> 00:15:11,450 I've just been labeling it all as taste. 270 00:15:11,790 --> 00:15:13,990 like the AI lacks taste 271 00:15:14,259 --> 00:15:14,999 Yeah, that's 272 00:15:15,240 --> 00:15:19,000 like in a certain team culture, if you have a bad team culture, 273 00:15:19,000 --> 00:15:21,320 deleting tests might be acceptable. 274 00:15:21,330 --> 00:15:24,400 I used to work with a team that would ignore linting. 275 00:15:24,420 --> 00:15:25,910 So I would implement linting. 276 00:15:26,070 --> 00:15:29,170 I'd give them the tools to customize linting and then they would ignore 277 00:15:29,170 --> 00:15:30,540 it and accept the PR anyway. 278 00:15:30,940 --> 00:15:32,680 And did not care. 279 00:15:33,180 --> 00:15:35,720 They basically followed the linting rules of the file. 280 00:15:35,750 --> 00:15:38,490 Now they were in a monolithic repo with thousands and thousands 281 00:15:38,490 --> 00:15:39,870 of a 10 year old Ruby app. 282 00:15:40,350 --> 00:15:42,960 But they would just ignore the linting. 283 00:15:43,340 --> 00:15:46,330 And I always saw that as a culture problem for them. 284 00:15:46,790 --> 00:15:49,740 That I as the consultant can't, I would always tell the boss that 285 00:15:49,740 --> 00:15:52,220 I was working for there that I can't solve your culture problems. 286 00:15:52,220 --> 00:15:54,310 I'm just a consultant here trying to help you with your DevOps. 287 00:15:54,570 --> 00:15:59,260 You need a full time person leading teams to tell the rest of the team 288 00:15:59,260 --> 00:16:02,580 that this is important and that these things matter over the course of 289 00:16:02,590 --> 00:16:04,260 years as you change out engineers. 290 00:16:04,260 --> 00:16:07,090 You can't just go with, well, this file is tab. 291 00:16:07,290 --> 00:16:08,920 This file is, spaces. 292 00:16:08,940 --> 00:16:12,160 And this one follows, especially in places like Ruby and Python and 293 00:16:12,160 --> 00:16:15,260 whatnot, where there's multiple, sort of style guys and whatnot like that. 294 00:16:15,260 --> 00:16:17,670 But I just call that a taste issue or culture issue. 295 00:16:18,080 --> 00:16:19,950 AI doesn't have culture or taste. 296 00:16:19,960 --> 00:16:20,070 It. 297 00:16:20,365 --> 00:16:22,155 It just follows the rules we give it. 298 00:16:22,655 --> 00:16:26,865 we're not really good yet as an industry at defining rules. 299 00:16:26,865 --> 00:16:29,095 I think that's actually part of the course I'm creating is 300 00:16:29,595 --> 00:16:32,475 figuring out all the different places where we can put AI rules. 301 00:16:32,475 --> 00:16:34,025 We've seen them put in repos. 302 00:16:34,035 --> 00:16:36,605 We've seen them put in your UI and your IDEs. 303 00:16:37,105 --> 00:16:41,115 Now I'm trying to figure out how do I put that into CI and how do I put that 304 00:16:41,115 --> 00:16:45,765 into even like the operations if you're going to try to put AIs somewhere in your, 305 00:16:46,195 --> 00:16:49,795 somewhere where they're going to help with troubleshooting or visibility or discovery 306 00:16:49,795 --> 00:16:52,395 of issues, like they also need to have. 307 00:16:52,810 --> 00:16:55,570 Taste, which I want to change all my rules files to taste files. 308 00:16:55,570 --> 00:16:56,520 I guess that's my platform. 309 00:16:56,520 --> 00:16:56,920 I'm standing. 310 00:16:58,790 --> 00:16:59,110 Yeah. 311 00:16:59,380 --> 00:16:59,746 Cause it, 312 00:16:59,784 --> 00:17:02,644 Yeah, I mean, at the same time, you probably don't want to be reviewing, 313 00:17:02,764 --> 00:17:06,544 hundreds of PRs that an AI robot is going through, just changing casing of 314 00:17:06,544 --> 00:17:11,374 legacy code to, meet your, to the, Change obviously introduces the opportunity 315 00:17:11,384 --> 00:17:15,084 for failure, bugs, etc. And if it's, you know, it's somewhat arbitrary 316 00:17:15,114 --> 00:17:17,974 simply because you've given it a new rule that this is what it has to do. 317 00:17:17,974 --> 00:17:21,724 I mean, I had a former co worker many years ago, like 15, 20 years 318 00:17:21,724 --> 00:17:26,124 ago, who, was famous for just, going through periodically and making 319 00:17:26,124 --> 00:17:30,004 nothing but casing code changes because that's what he preferred. 320 00:17:30,014 --> 00:17:33,924 And it was just this, endless stream of, trivialized changes on code we 321 00:17:33,924 --> 00:17:35,454 hadn't touched in months or years. 322 00:17:35,454 --> 00:17:35,694 That's it. 323 00:17:35,954 --> 00:17:38,284 that would inevitably lead to some sort of problem, right? 324 00:17:38,294 --> 00:17:39,224 And because, 325 00:17:39,420 --> 00:17:40,420 The toil of that. 326 00:17:40,470 --> 00:17:43,520 I'm just like, yeah, that you're giving me heartburn just by 327 00:17:43,550 --> 00:17:44,690 telling me about that story. 328 00:17:45,190 --> 00:17:48,240 yeah, it's, well, I've gotten over the PTSD of that. 329 00:17:48,330 --> 00:17:49,200 It's a long time ago. 330 00:17:49,700 --> 00:17:50,400 So, okay. 331 00:17:50,410 --> 00:17:52,040 So, what are you seeing on the ground? 332 00:17:52,090 --> 00:17:57,300 do you have some examples of how this is manifesting in apps? 333 00:17:57,300 --> 00:17:59,250 I mean, you've already given me a couple of things, but I'm just 334 00:17:59,250 --> 00:18:00,470 curious if you've got some more, 335 00:18:00,945 --> 00:18:02,615 I kind of alluded to it at the very beginning. 336 00:18:02,615 --> 00:18:07,035 obviously, in my role, I talk to a lot of senior executives about directionally 337 00:18:07,035 --> 00:18:11,185 where they're going with their engineering organizations, because I think, the 338 00:18:11,185 --> 00:18:15,115 software we build, it does a bunch of tactical things, but at a broader sense, 339 00:18:15,185 --> 00:18:19,845 it allows people to measure reliability as expressed through, you know, our customers 340 00:18:20,135 --> 00:18:24,535 staying engaged in your experience by virtue of the fact that they otherwise 341 00:18:24,545 --> 00:18:28,635 liking your software are having a better technical experience with the apps you 342 00:18:28,635 --> 00:18:32,580 build, that the front end and ultimately measuring that and then thinking through 343 00:18:32,580 --> 00:18:36,340 all of the underlying root causes is a cultural change in these organizations 344 00:18:36,340 --> 00:18:37,880 and how they think about reliability. 345 00:18:37,880 --> 00:18:41,830 It's no longer just like are the API endpoints at the edge of 346 00:18:41,830 --> 00:18:45,100 our data center delivering their payload, you know, responding in 347 00:18:45,110 --> 00:18:46,790 a timely manner and error free. 348 00:18:46,810 --> 00:18:52,110 And then the user experience is well tested and, you know, we ship it to prod. 349 00:18:52,110 --> 00:18:52,200 And. 350 00:18:52,700 --> 00:18:53,350 That's enough. 351 00:18:53,850 --> 00:18:57,050 it really isn't a shift in how people think about it, but when I talk to them, 352 00:18:57,050 --> 00:19:01,870 I mean, a lot of CTOs really are taking a bet that AI is going to be the way 353 00:19:01,910 --> 00:19:05,580 that productivity gains, I won't say make their business more efficient, but they 354 00:19:05,580 --> 00:19:07,640 do allow them to do more with less, right? 355 00:19:08,140 --> 00:19:11,750 You know, like it or not, especially over the past few years, I mean, in B2B, 356 00:19:11,780 --> 00:19:16,160 we're in the B2B SaaS business, there's been, you know, times have definitely 357 00:19:16,200 --> 00:19:21,460 been tougher than they were in the early, 2020s, for kind of everyone, the, I think 358 00:19:21,460 --> 00:19:24,560 there's a lot of pressure in consumer tech with tariffs and everything to do 359 00:19:24,580 --> 00:19:28,470 things more cost effectively, and first and foremost on the grounds, this is a 360 00:19:28,530 --> 00:19:31,920 change we are seeing, whether we like it or not, and, you know, we can argue 361 00:19:31,920 --> 00:19:35,310 about whether it's going to work and how long it'll take, but the fact is 362 00:19:35,320 --> 00:19:40,220 that, like you said, you mentioned that the CIO magazines, leadership is taking 363 00:19:40,220 --> 00:19:43,973 a bet this is going to happen, and I think as we start to talk about that 364 00:19:43,973 --> 00:19:47,753 with these executives, the question is like, is the existing set of tools I have 365 00:19:47,763 --> 00:19:49,883 to cope with that reality good enough? 366 00:19:50,293 --> 00:19:54,453 And, yeah, I guess my underlying hypothesis is it probably isn't 367 00:19:54,453 --> 00:19:55,743 for most companies, right? 368 00:19:55,743 --> 00:20:01,063 if you think about the, the world of, you know, a web as an example, like 369 00:20:01,503 --> 00:20:05,003 a lot of, a lot of companies that are consumer facing tech companies will 370 00:20:05,003 --> 00:20:08,863 measure their core web vitals, in large part because it has SEO impact. 371 00:20:08,953 --> 00:20:12,793 And then they'll put a, exception handler in there, like Sentry that grabs, you 372 00:20:12,793 --> 00:20:16,953 know, kind of JavaScript errors and tries to give you some level of impact. 373 00:20:17,453 --> 00:20:20,503 around, you know, how many are impacting your users and whether it's high 374 00:20:20,503 --> 00:20:23,723 severity and then you kind of have to sort through them to figure out which 375 00:20:23,723 --> 00:20:25,553 ones really matter for you to solve. 376 00:20:26,053 --> 00:20:31,063 so take, existing user frustration with, human efficiency of delivering 377 00:20:31,063 --> 00:20:33,173 code and the pace, the existing pace. 378 00:20:33,673 --> 00:20:37,423 users are already frustrated that Core Web Vitals are hard for them to 379 00:20:37,423 --> 00:20:40,873 quantify what that really means in terms of user impact and whether a user 380 00:20:40,873 --> 00:20:42,423 decides to stay on the site or not. 381 00:20:42,423 --> 00:20:42,439 so, yeah. 382 00:20:42,758 --> 00:20:45,688 and the fact that they're overwhelmed with the number of JavaScript errors that could 383 00:20:45,698 --> 00:20:49,708 be out there because it's really, I mean, you go to any site, go to developer tools. 384 00:20:50,008 --> 00:20:53,998 look at the number of JavaScript errors you see, and then, you know, take your 385 00:20:53,998 --> 00:20:58,228 human experienced, idea of how you're interacting with the site, and chances 386 00:20:58,228 --> 00:20:59,878 are most of those don't impact you, right? 387 00:20:59,878 --> 00:21:04,198 They're just like, it's a, it's an analytics pixel that failed to load, or 388 00:21:04,198 --> 00:21:07,868 it's a library that's just barfing some error, but it's otherwise working fine. 389 00:21:08,368 --> 00:21:11,998 So take that and put it on steroids now where you have a bunch of AI assisted 390 00:21:11,998 --> 00:21:15,468 developers doubling or tripling the number of things they're doing or 391 00:21:15,468 --> 00:21:18,448 just driving more experimentation, right, which I think a lot of 392 00:21:18,458 --> 00:21:19,998 businesses have always wanted to do. 393 00:21:20,008 --> 00:21:23,528 But again, software has been kind of slow and expensive to build. 394 00:21:23,958 --> 00:21:26,998 And so If it's slow and expensive to build, my thirst for delivering 395 00:21:26,998 --> 00:21:31,958 an experiment across three customer cohorts when I can only, deliver one, 396 00:21:32,198 --> 00:21:36,198 given my budget, just means that, I only have to test for one variation. 397 00:21:36,198 --> 00:21:39,468 We'll now triple that, or quadruple it, or, and, you know, multiply that 398 00:21:39,468 --> 00:21:41,728 by a number of, arbitrary user factors. 399 00:21:41,988 --> 00:21:44,458 It just gets more challenging, and I think we need to think about 400 00:21:44,458 --> 00:21:45,748 how we measure things differently. 401 00:21:46,085 --> 00:21:50,165 once the team has the tools in place to manage multiple experiments at the same 402 00:21:50,165 --> 00:21:54,665 time, that just ramps up exponentially until the team can't handle it anymore. 403 00:21:54,665 --> 00:21:58,205 But if AI is giving them a chance to go further, then yeah, they're 404 00:21:58,205 --> 00:21:59,075 just, they're going to do it. 405 00:21:59,105 --> 00:21:59,575 I mean, 406 00:21:59,639 --> 00:22:01,999 yeah, I mean, you're going to get overwhelmed with support tickets 407 00:22:01,999 --> 00:22:04,959 and bad app reviews and whatever it is, which I think most people are. 408 00:22:05,159 --> 00:22:08,189 Most business leaders would be pretty upset if that's how they, are 409 00:22:08,209 --> 00:22:10,009 responding to reliability issues. 410 00:22:10,109 --> 00:22:15,179 I was just gonna say, we've already had decades now of pressure at all levels 411 00:22:15,179 --> 00:22:18,239 of engineering to reduce personnel. 412 00:22:18,519 --> 00:22:22,089 you need to justify every new hire pretty significantly unless you're funded and 413 00:22:22,089 --> 00:22:25,239 you're just, you know, in an early stage startup and they're just growing to the 414 00:22:25,399 --> 00:22:26,909 point that they can burn all their cash. 415 00:22:27,309 --> 00:22:31,339 having been around 30 years in tech, I've watched operations 416 00:22:31,389 --> 00:22:33,374 get, you know, Merged into DevOps. 417 00:22:33,374 --> 00:22:36,584 I've watched DevOps teams get, which we didn't traditionally call them that. 418 00:22:36,584 --> 00:22:40,184 We might have called them sysadmins or automation or build engineers 419 00:22:40,194 --> 00:22:43,554 or CI engineers, and they get merged into the teams themselves. 420 00:22:43,554 --> 00:22:45,094 And the teams have to take on that responsibility. 421 00:22:45,094 --> 00:22:48,364 I mean, We've got this weird culture that we go to this Kubernetes conference 422 00:22:48,364 --> 00:22:52,374 all the time and one of the biggest complaints is devs who don't want to 423 00:22:52,374 --> 00:22:57,144 be operators, but it's saddled on them because somehow the industry got the 424 00:22:57,144 --> 00:22:59,949 word DevOps confused with Something. 425 00:22:59,989 --> 00:23:02,929 And we all thought, Oh, that means the developers can do ops. 426 00:23:03,159 --> 00:23:07,039 That's not what the word meant, but we've, we've worked, we've gotten to this world 427 00:23:07,039 --> 00:23:12,279 where I'm getting hired as a consultant to help teams deal with just the sheer 428 00:23:12,279 --> 00:23:16,009 amount of ridiculous expectations a single engineer is supposed to have, 429 00:23:16,349 --> 00:23:18,339 that not the knowledge you're supposed to have, the systems are supposed to 430 00:23:18,339 --> 00:23:20,449 be able to run while making features. 431 00:23:20,889 --> 00:23:26,259 And it's already, I feel like a decade ago, it felt unsustainable. 432 00:23:26,579 --> 00:23:30,109 So now here we are having to give some of that work to AI. 433 00:23:30,609 --> 00:23:35,199 When it's still doing random hallucinations on a daily basis, 434 00:23:35,199 --> 00:23:36,679 at least even in the best models. 435 00:23:36,679 --> 00:23:40,289 I think I was just ranting yesterday on a podcast that SWE bench, which is 436 00:23:40,289 --> 00:23:45,179 like a engineering benchmark website for AI models and how well they 437 00:23:45,179 --> 00:23:49,289 solve GitHub issues, essentially, and the best models in the world. 438 00:23:49,709 --> 00:23:51,989 Can barely get two thirds of them. 439 00:23:51,989 --> 00:23:52,519 Correct. 440 00:23:52,529 --> 00:23:56,389 And that's, if you're paying the premium bucks and you've got the premium 441 00:23:56,389 --> 00:24:00,339 foundational models and you are on the bleeding edge stuff, which most teams 442 00:24:00,339 --> 00:24:04,719 are not because they have rules or limitations on which model they can use, 443 00:24:04,719 --> 00:24:08,129 or they can only use the ones in house, or they can only use a particular one. 444 00:24:08,509 --> 00:24:10,979 and it's just, it's one of those things where I feel like 445 00:24:10,979 --> 00:24:12,209 we're being pushed at all sides. 446 00:24:12,504 --> 00:24:16,094 and at some point, it's amazing that any of this even works. 447 00:24:16,104 --> 00:24:18,364 It's amazing that apps actually load on phones. 448 00:24:18,784 --> 00:24:21,494 it's just, it just, it feels like garbage on top of garbage, on top 449 00:24:21,494 --> 00:24:23,714 of garbage, turtles all the way down, whatever you want to call it. 450 00:24:24,214 --> 00:24:28,084 So where are you coming in here to help solve some of these problems? 451 00:24:28,108 --> 00:24:29,318 yeah, that's fair, yeah. 452 00:24:29,448 --> 00:24:33,218 I'll even add to that, All of that's even discounting the fact that new tools 453 00:24:33,218 --> 00:24:37,938 are coming that make it even simpler to push out software that like barely 454 00:24:37,938 --> 00:24:41,228 works without even the guided hands of a software engineer who has any 455 00:24:41,228 --> 00:24:43,158 professional experience writing that code. 456 00:24:43,658 --> 00:24:47,188 some of the vibe coding tools, like our product management team uses, 457 00:24:47,348 --> 00:24:49,148 largely for rapid prototyping. 458 00:24:49,648 --> 00:24:51,688 And I can write OK Python. 459 00:24:51,788 --> 00:24:54,528 I used to write OK C sharp. 460 00:24:55,028 --> 00:24:58,058 I have never been particularly good at writing JavaScript. 461 00:24:58,438 --> 00:25:01,908 I can read it OK, but like when it comes to fixing a particular 462 00:25:01,908 --> 00:25:03,818 problem, quickly get out of my depth. 463 00:25:04,318 --> 00:25:06,238 That's not to say, I couldn't be capable of doing it. 464 00:25:06,238 --> 00:25:09,618 It's just not what I do every day, nor do, I particularly have the energy 465 00:25:09,618 --> 00:25:12,658 when I'm, you know, done with a full workday to, go teach myself JavaScript. 466 00:25:12,808 --> 00:25:15,628 And, I'll build an app with one of the Vibe coding tools 467 00:25:15,838 --> 00:25:19,028 as a means of communicating how I expect something to work. 468 00:25:19,043 --> 00:25:22,503 So if you have any questions, feel free to reach out to me, and I'll 469 00:25:22,503 --> 00:25:32,719 be happy to answer them, and I'll be excited to hear back from you. 470 00:25:32,719 --> 00:25:34,908 And for coming. 471 00:25:34,908 --> 00:25:38,458 and I'm like, ah, it's better to just delete the project 472 00:25:38,458 --> 00:25:39,558 and start all over again. 473 00:25:40,028 --> 00:25:43,158 And, you know, if you can make it work, that doesn't necessarily 474 00:25:43,158 --> 00:25:44,398 mean it'll work at scale. 475 00:25:44,438 --> 00:25:47,908 It doesn't necessarily mean that there aren't like a myriad of use cases 476 00:25:47,918 --> 00:25:52,118 you haven't tested for as you click through the app in the simulator. 477 00:25:52,618 --> 00:25:54,948 and so, you know, I think the question is, okay, you know, given 478 00:25:54,948 --> 00:25:57,598 the fact that we have to accept more software is going to make its way 479 00:25:57,598 --> 00:26:01,068 into human beings hands, because we build software for human beings, it's 480 00:26:01,078 --> 00:26:02,318 going to get to more human beings. 481 00:26:02,818 --> 00:26:05,958 How do we build a, a reliability paradigm where we can measure 482 00:26:05,958 --> 00:26:07,028 whether or not it's working? 483 00:26:07,038 --> 00:26:11,538 And I think that stops focusing on, kind of, I guess to go back 484 00:26:11,538 --> 00:26:15,218 to the intentionally inflammatory title of today's discussion, it 485 00:26:15,228 --> 00:26:18,523 stops focusing on like a zero bug paradigm where I test things. 486 00:26:18,683 --> 00:26:22,503 I test every possible pathway for my users. 487 00:26:22,523 --> 00:26:25,273 I, you know, have a set of requirements, again, around 488 00:26:25,273 --> 00:26:27,033 trivialized performance and stuff. 489 00:26:27,533 --> 00:26:32,743 And, trying to put up these kind of barriers to getting code into human hands. 490 00:26:32,753 --> 00:26:36,543 And I just accept the fact more code is going to get into human hands faster 491 00:26:36,543 --> 00:26:38,323 at a pace I can't possibly control. 492 00:26:38,613 --> 00:26:42,063 And so, therefore, I have to put measurements in the real world. 493 00:26:42,083 --> 00:26:44,392 So, I use a lot of different tools around my app so that I can be as responsive 494 00:26:44,392 --> 00:26:48,032 as possible to resolving those issues when I find them, which is I guess my, 495 00:26:48,402 --> 00:26:53,512 you know, charity majors who co founded Honeycomb, she, her and I were talking a 496 00:26:53,512 --> 00:26:57,352 few months ago and big fan of stickers, she shipped me an entire envelope of 497 00:26:57,362 --> 00:27:00,162 stickers and they're all like, you know, ship fast and break things, right? 498 00:27:00,162 --> 00:27:02,122 I tested production, stuff like that. 499 00:27:02,622 --> 00:27:05,502 And somehow I feel like, you know, in our world, because we build 500 00:27:05,502 --> 00:27:08,712 observability for front end and mobile experiences, like web and mobile 501 00:27:08,712 --> 00:27:13,012 experiences, I feel like that message just hadn't gotten through historically. 502 00:27:13,362 --> 00:27:16,682 part of it's because like release cycles on mobile were really slow. 503 00:27:16,762 --> 00:27:19,432 you had to wait days for an app to get out there. 504 00:27:19,452 --> 00:27:22,662 Part of it was software is expensive to build and slow to build and so 505 00:27:22,682 --> 00:27:25,892 getting feature flags out there where you can operate in production was hard. 506 00:27:26,392 --> 00:27:29,542 Part of it was just the observability paradigm hadn't shifted, right? 507 00:27:29,572 --> 00:27:33,692 Like the, the paradigm of measure everything and then find root cause 508 00:27:34,012 --> 00:27:36,022 had not made its way to frontend. 509 00:27:36,072 --> 00:27:40,492 It was more like measure the known things you look for, like web exceptions, 510 00:27:40,492 --> 00:27:43,852 like core web vitals are on mobile, look at crashes, and that's about it. 511 00:27:44,352 --> 00:27:48,032 And the notion of, okay, measure whether, users are starting the app 512 00:27:48,032 --> 00:27:52,022 successfully, and when you see some unknown outcomes start to occur or users 513 00:27:52,022 --> 00:27:55,712 start to abandon, how can you then sift through the data to find the root cause? 514 00:27:56,132 --> 00:27:57,392 Hadn't really migrated its way. 515 00:27:57,392 --> 00:27:58,472 And that's what we're trying to do. 516 00:27:58,532 --> 00:28:01,892 we're trying to bring that paradigm of how do you define the A, for 517 00:28:01,892 --> 00:28:03,512 lack of a better term, the APIs. 518 00:28:04,012 --> 00:28:07,332 The things that your humans interact with, with your apps, the things they 519 00:28:07,332 --> 00:28:10,972 do, you don't build an API in your app for human beings, you build a login 520 00:28:10,972 --> 00:28:15,432 screen, you build a checkout screen, a cart experience, a product catalog. 521 00:28:15,932 --> 00:28:19,442 How do we take those things and measure the success of them and then try to, 522 00:28:19,512 --> 00:28:23,992 attribute them to underlying technical causes where your teams can have, better 523 00:28:24,002 --> 00:28:26,042 have those socio technical conversations? 524 00:28:26,402 --> 00:28:30,042 so that they can understand and then resolve them, probably using AI, right? 525 00:28:30,142 --> 00:28:30,712 as we grow. 526 00:28:30,762 --> 00:28:33,972 but like it, it allows better, system knowledge and the interplay 527 00:28:33,972 --> 00:28:38,232 between real human activities and the telemetry we're gathering. 528 00:28:38,732 --> 00:28:39,102 Yeah. 529 00:28:39,602 --> 00:28:42,422 Have you been, I'm just curious, have you been playing with, having 530 00:28:42,452 --> 00:28:48,322 AI look at observability data, whether it's logs or metrics? 531 00:28:48,382 --> 00:28:49,692 have you had any experience with that? 532 00:28:49,722 --> 00:28:54,702 I'm asking that simply as a generic question, because the conversations 533 00:28:54,702 --> 00:28:59,152 I've had in the last few months, it sounds like AI is much better at reading 534 00:28:59,152 --> 00:29:03,542 logs than it is at reading metrics or dashboards or anything that sort of 535 00:29:03,562 --> 00:29:09,082 lacks context or, you know, it's not like we're putting in alt image messages 536 00:29:09,082 --> 00:29:10,802 for every single dashboard graph. 537 00:29:10,802 --> 00:29:15,012 And that's probably all coming from the providers just because if if 538 00:29:15,012 --> 00:29:16,532 they're expecting AI to look at stuff. 539 00:29:17,032 --> 00:29:20,232 they're gonna have to give more context, but it sounds like it's not as easy 540 00:29:20,232 --> 00:29:24,552 as just giving AI access to all those systems and saying, yeah, go read the 541 00:29:24,552 --> 00:29:29,062 website for my dashboard, Grafana, and figure out what's the problem? 542 00:29:29,501 --> 00:29:33,331 I've seen it deployed in two ways, one of which I find really interesting and 543 00:29:33,341 --> 00:29:36,661 something we're actively working on, because I think it's just a high degree 544 00:29:36,661 --> 00:29:40,501 of utility and it's, You know, given the kind of state of LLMs, I think it's 545 00:29:40,511 --> 00:29:45,441 probably something that's relatively easy to get right, which is the, the notion of, 546 00:29:45,491 --> 00:29:50,051 when you come into a product like ours, or a Grafana, or, you know, a New Relic, 547 00:29:50,181 --> 00:29:53,751 you probably have an objective in mind, maybe you have a question you're trying 548 00:29:53,751 --> 00:29:55,701 to ask, what's the health of this service? 549 00:29:55,731 --> 00:29:58,161 Or, I got page on a particular issue. 550 00:29:58,161 --> 00:30:01,711 I'm going to, I need to build a chart that shows me like the interplay between 551 00:30:01,721 --> 00:30:06,191 latency for this particular service and the success rate of, you know, 552 00:30:06,201 --> 00:30:09,861 some other type of thing, more like database calls or something like that. 553 00:30:10,361 --> 00:30:13,981 Today, like people broadly have to manually create those queries and it 554 00:30:13,981 --> 00:30:17,451 requires a lot of human knowledge around the query language or schema of your data. 555 00:30:17,461 --> 00:30:21,551 And I think there's a ton of opportunity for us to simply ask a human question 556 00:30:21,621 --> 00:30:26,851 of You know, show me a query of all active sessions on this mobile app for 557 00:30:26,861 --> 00:30:32,181 the latest iOS version and the number of, traces, like startup traces that 558 00:30:32,181 --> 00:30:33,831 took greater than a second and a half. 559 00:30:33,831 --> 00:30:37,471 And have it just simply pull up your dashboard and query language and build 560 00:30:37,471 --> 00:30:40,991 the chart for you quite rapidly, which is a massive time savings, right? 561 00:30:40,991 --> 00:30:41,131 And. 562 00:30:41,591 --> 00:30:46,441 It also just makes our tech, which you're right, can get quite complex, more 563 00:30:46,441 --> 00:30:48,201 approachable by your average engineer. 564 00:30:48,231 --> 00:30:51,581 Which, you know, I'm a big believer that if every engineer in your organization 565 00:30:51,581 --> 00:30:53,331 understands how your systems work. 566 00:30:53,766 --> 00:30:55,866 And the data around it, you're going to build a lot better software, 567 00:30:55,926 --> 00:30:57,516 especially as they use AI, right? 568 00:30:57,516 --> 00:31:01,036 Because now they, they understand how things work and they can better 569 00:31:01,096 --> 00:31:03,056 provide instructions to the robots. 570 00:31:03,556 --> 00:31:05,636 so I think that's really a useful, interesting way. 571 00:31:05,646 --> 00:31:10,016 And we've seen people start to roll that type of assistant act, functionality out. 572 00:31:10,516 --> 00:31:14,516 The second way I've seen deployed, I see mixed results, which is I mean, an 573 00:31:14,516 --> 00:31:17,726 incident, go look at every potential signal that I can see related to this 574 00:31:17,726 --> 00:31:20,516 incident and try to tell me what's going on and get to the root cause, and 575 00:31:20,526 --> 00:31:24,326 more often than not, I find it's just a summarization of stuff that you, as an 576 00:31:24,326 --> 00:31:27,986 experienced user, probably would come to the exact conclusions on, I think there's 577 00:31:28,006 --> 00:31:33,086 utility there, certainly, it gets you a written summary quickly of what you see. 578 00:31:33,396 --> 00:31:37,896 But I do also worry that, it doesn't apply a high degree of critical thinking. 579 00:31:38,066 --> 00:31:42,986 and you know, an example of where lacking context, I mean, it wouldn't be very 580 00:31:42,986 --> 00:31:47,196 smart, right, is you've probably seen it, every traffic chart, around service 581 00:31:47,206 --> 00:31:52,256 traffic, depending upon how it runs, tends to be pretty lumpy with time of day. 582 00:31:52,306 --> 00:31:57,806 Because Most companies don't have equivalent distribution 583 00:31:57,806 --> 00:31:59,136 of traffic across the globe. 584 00:31:59,935 --> 00:32:02,925 Not every country across the globe has an equivalent population. 585 00:32:03,425 --> 00:32:06,825 And so you'd tend to see these spikes of where, you know, you have a number 586 00:32:06,825 --> 00:32:11,335 of service requests spiking during daylight hours or the Monday of every 587 00:32:11,335 --> 00:32:14,665 week because people come into the office and suddenly start e commerce shopping. 588 00:32:15,165 --> 00:32:18,605 And you see it taper throughout the week, or you taper into the evening. 589 00:32:19,095 --> 00:32:19,955 I think that's normal. 590 00:32:19,955 --> 00:32:22,465 You understand that as an operator of your services, because 591 00:32:22,465 --> 00:32:23,675 it's unique to your business. 592 00:32:23,695 --> 00:32:28,605 I think the AI would struggle, lacking context around, your business to 593 00:32:28,605 --> 00:32:32,955 understand, somewhat normal fluctuations, or the fact that, You know, a marketing 594 00:32:32,965 --> 00:32:36,855 dropped a campaign where there's no data inside your observability system 595 00:32:36,855 --> 00:32:38,455 to tell you that that campaign dropped. 596 00:32:38,455 --> 00:32:39,505 It's not a release. 597 00:32:39,885 --> 00:32:41,445 see your AI is lacking context. 598 00:32:41,445 --> 00:32:45,875 It's lacking the historical context that the humans already have implicitly. 599 00:32:45,885 --> 00:32:46,215 Yeah, 600 00:32:46,419 --> 00:32:49,119 and I mean that context might be in a Slack channel where marketing said we 601 00:32:49,119 --> 00:32:53,569 just dropped an email and you know an email drop so expect an increased number 602 00:32:53,569 --> 00:32:58,019 of requests to this endpoint as you know people retrieve their, their special 603 00:32:58,019 --> 00:33:01,589 offer token or whatever that will allow them to use it in our checkout flow. 604 00:33:02,044 --> 00:33:07,464 today like just giving, if we provided that scope to a, an AI model within our 605 00:33:07,464 --> 00:33:09,264 system we would lock that type of context. 606 00:33:09,764 --> 00:33:13,334 Yeah, if I'm not creating any of these apps, I'm sure they've already 607 00:33:13,334 --> 00:33:15,184 thought of all of this, but, the first thing that comes to mind is 608 00:33:15,184 --> 00:33:19,084 well, the hack for me would be give it access to our ops slack room, 609 00:33:19,584 --> 00:33:20,064 Right. 610 00:33:20,535 --> 00:33:23,590 we're probably all having those conversations of oh, What's going on here? 611 00:33:23,590 --> 00:33:27,680 And someone in someone's who had happened to get reached out to for 612 00:33:27,680 --> 00:33:30,880 marketing was like, well, you know, yesterday we did send out a, you 613 00:33:30,880 --> 00:33:32,240 know, a new coupon sale or whatever. 614 00:33:32,730 --> 00:33:36,240 So, yeah, having it read all that stuff might be, necessary for it to 615 00:33:36,240 --> 00:33:38,170 understand the, because you're right. 616 00:33:38,220 --> 00:33:41,320 it's not like we have a dashboard in Grafana that's number of marketing 617 00:33:41,320 --> 00:33:46,110 email, you know, the email sent per day or the level of sale we're expecting, 618 00:33:46,200 --> 00:33:51,170 based on in America, it's the 4th of July sale, or, you know, some holiday 619 00:33:51,180 --> 00:33:52,680 in a certain region of the world. 620 00:33:52,937 --> 00:33:56,417 Or a social media influencer dropping like some sort of link to your 621 00:33:56,417 --> 00:34:00,707 product that suddenly, you know, that it's a new green doll that people 622 00:34:00,707 --> 00:34:02,297 attach to their designer handbags. 623 00:34:02,297 --> 00:34:05,897 I don't know like anything about what the kids are into these days, but 624 00:34:05,897 --> 00:34:09,977 it seems kind of arbitrary and like I, I would struggle to predict that. 625 00:34:09,977 --> 00:34:10,757 Let me put it that way. 626 00:34:11,257 --> 00:34:12,517 they're into everything old. 627 00:34:12,517 --> 00:34:14,907 So it's into everything that I'm into. 628 00:34:15,407 --> 00:34:15,497 Yes. 629 00:34:15,598 --> 00:34:15,948 all right. 630 00:34:15,958 --> 00:34:19,068 So if we're talking about this at a high level, we talked a little bit 631 00:34:19,068 --> 00:34:25,188 about before the show around how embrace is thinking about, observability 632 00:34:25,198 --> 00:34:28,878 and particularly on mobile, but you know, anything front end there. 633 00:34:29,138 --> 00:34:34,002 the tooling ecosystem for, engineers on web and mobile is pretty rich. 634 00:34:34,002 --> 00:34:37,642 but they all tend to be just like hammers for a particular nail, right? 635 00:34:37,642 --> 00:34:41,132 It's you know, How do we give you a better craft reporter, better exception 636 00:34:41,132 --> 00:34:43,652 handler, how do we go measure X or Y? 637 00:34:44,152 --> 00:34:47,272 some of the stuff that we're thinking about, which is really how we define 638 00:34:47,272 --> 00:34:51,527 the objective of OB observability for, the coming digital age, right? 639 00:34:51,527 --> 00:34:54,487 Which is you know, as creators of user experiences. 640 00:34:54,937 --> 00:34:58,267 I think my opinion is that we shouldn't just be measuring 641 00:34:58,267 --> 00:34:59,857 like crash rate on an app. 642 00:35:00,337 --> 00:35:04,257 We should be measuring, are users staying engaged with our experience? 643 00:35:04,407 --> 00:35:08,197 And when we see they are not, sometimes, crashes, I mean, the obvious, the 644 00:35:08,197 --> 00:35:10,457 answer is obviously they can't, right? 645 00:35:10,457 --> 00:35:12,507 Because the app explodes. 646 00:35:13,017 --> 00:35:16,077 But, I think, you know, I was talking to a senior executive at 647 00:35:16,077 --> 00:35:18,337 a, a massive food delivery app. 648 00:35:18,337 --> 00:35:21,914 And it's listen, we know, anecdotally, there's more than just crashes that make 649 00:35:21,914 --> 00:35:23,814 our users throw their phone at the wall. 650 00:35:23,874 --> 00:35:28,384 you're trying to You're trying to order lunch at noon and something's 651 00:35:28,384 --> 00:35:31,754 really slow or you keep running into a, just a validation error because we 652 00:35:31,754 --> 00:35:34,924 shipped you an experiment thinking it worked and you can't order the item 653 00:35:34,924 --> 00:35:36,704 you want on the two for one promotion. 654 00:35:37,204 --> 00:35:42,204 you're enraged because you really want the, you know, the spicy dry fried chicken 655 00:35:42,370 --> 00:35:43,439 hangry. 656 00:35:43,444 --> 00:35:46,224 and you want two of them because I want to eat the other one tonight. 657 00:35:46,724 --> 00:35:49,704 and you've already suckered me into that offer, you've convinced 658 00:35:49,704 --> 00:35:52,924 me I want it, and now I'm having trouble, completing my objective. 659 00:35:53,424 --> 00:35:56,894 And broadly speaking, the observability ecosystem on the front end really 660 00:35:56,894 --> 00:35:58,134 hasn't measured that, right? 661 00:35:58,134 --> 00:36:01,084 We've used all sorts of proxy measurements in, out of the data 662 00:36:01,084 --> 00:36:04,504 center because the reliability story has been really well told and evolved. 663 00:36:04,964 --> 00:36:09,514 Over the past 10 to 15 years in the data center world, but it just really hasn't 664 00:36:09,514 --> 00:36:10,944 materially evolved in the front end. 665 00:36:10,964 --> 00:36:14,904 And so, a lot of that's like shifting the objective from how do I just measure 666 00:36:14,924 --> 00:36:21,334 counts of things I already know are bad to measuring what users engagement looks like 667 00:36:21,334 --> 00:36:25,644 and whether I can attribute that to change in my software or defects I've introduced. 668 00:36:26,144 --> 00:36:27,424 So, that's kind of the take. 669 00:36:27,494 --> 00:36:32,264 just about anybody who has ever built a consumer mobile app has Firebase 670 00:36:32,264 --> 00:36:35,124 Crashlytics in the app, which is a free service provided by Google. 671 00:36:35,144 --> 00:36:36,804 It was a company a long time ago that. 672 00:36:37,741 --> 00:36:40,991 Crashlytics that got bought by Twitter and then got reacquired by Google. 673 00:36:40,991 --> 00:36:44,341 it basically gives you rough cut performance metrics and crash reporting. 674 00:36:44,341 --> 00:36:44,521 Right. 675 00:36:44,521 --> 00:36:48,591 I would consider this like the foundational requirement of any level 676 00:36:48,611 --> 00:36:54,353 of, app quality, but to call this NSAID app quality, I think, you know, our 677 00:36:54,353 --> 00:36:56,483 opinion is that would be a misnomer. 678 00:36:56,483 --> 00:36:58,563 So we're going to go kind of go through what this looks like, right. 679 00:36:58,563 --> 00:36:59,783 Which is it's giving you things. 680 00:37:00,283 --> 00:37:04,733 You would expect to see, a number of events that are crashes, et cetera, and, 681 00:37:04,743 --> 00:37:08,363 you can do what you would expect here, crashes are bad, so I need to solve a 682 00:37:08,363 --> 00:37:13,413 crash, so I'm going to go into a crash, view stack traces, get the information I 683 00:37:13,413 --> 00:37:16,193 need to actually be able to resolve it. 684 00:37:16,293 --> 00:37:20,853 and, you know, I think we see a lot of customers before we talk to them 685 00:37:20,863 --> 00:37:25,173 who are just like, well, I have crash reporting and I have QA, that's enough. 686 00:37:25,673 --> 00:37:28,483 there's a lot of products that have other features. 687 00:37:28,483 --> 00:37:31,813 So like Core Web Vital measurements on a page level, this is Sentry. 688 00:37:31,813 --> 00:37:35,473 It's a lot of data, but I don't really know what to do with this. 689 00:37:35,873 --> 00:37:38,693 beyond, okay, it probably has some SEO impact. 690 00:37:38,703 --> 00:37:42,743 There's a, you know, Core, bad Core Web Vital or slow, you know, slow 691 00:37:42,743 --> 00:37:45,313 something for a render on this page. 692 00:37:45,633 --> 00:37:48,633 How do I actually go figure out root cause? 693 00:37:49,263 --> 00:37:51,063 But again, right, this is a single signal. 694 00:37:51,073 --> 00:37:56,703 So this is kind of, you don't know that, whether or not the P75 core web vital here 695 00:37:56,733 --> 00:38:01,803 that is, considered scored badly by Google is actually causing your users to bounce. 696 00:38:02,293 --> 00:38:05,983 And I think that's important because I was reading this article the other day 697 00:38:05,983 --> 00:38:09,783 on this like notion of a performance plateau, like there's empirical science 698 00:38:09,793 --> 00:38:13,473 proving that like faster core web vitals, especially with like content 699 00:38:13,483 --> 00:38:16,333 paint and interaction to next paint. 700 00:38:16,613 --> 00:38:20,683 et cetera, improve bounce rate materially, like people are less likely 701 00:38:20,683 --> 00:38:22,883 to bounce if the page loads really fast. 702 00:38:23,383 --> 00:38:26,573 But at some point, like if it's long enough, there's this massive long 703 00:38:26,573 --> 00:38:29,613 tail of people who just have a rotten experience and you kind of have to 704 00:38:29,613 --> 00:38:33,093 figure out, I can't make everyone globally have a great experience. 705 00:38:33,093 --> 00:38:36,513 Where's this plateau where, I know that I'm improving the experience for 706 00:38:36,513 --> 00:38:40,023 people who I'm likely to retain and improve their bounce rate versus I'm 707 00:38:40,023 --> 00:38:41,473 just, you know, going to live with this. 708 00:38:41,973 --> 00:38:44,703 And so we kind of have a different take, which is like we wanted to 709 00:38:44,703 --> 00:38:48,753 center our experience less on just individual signals, and more on 710 00:38:48,753 --> 00:38:53,423 like these flows, these tasks that users are performing in your app. 711 00:38:53,923 --> 00:38:58,593 So if you think about the key flows, like I'm breaking these down into 712 00:38:58,593 --> 00:39:01,873 the types of activities that I actually built for my end users. 713 00:39:02,373 --> 00:39:05,613 And I want to say, okay, how many of them were successful versus 714 00:39:05,613 --> 00:39:09,403 how many ended in an error, like something went truly bad, right? 715 00:39:09,403 --> 00:39:10,763 You just could not proceed. 716 00:39:11,073 --> 00:39:14,853 Versus how many abandoned, and when they abandoned, why, right? 717 00:39:14,863 --> 00:39:18,833 Did they abandon because they, they clicked on a product catalog screen, they 718 00:39:18,833 --> 00:39:20,403 saw some stuff that they didn't like? 719 00:39:20,903 --> 00:39:25,983 Or did they abandon because the product catalog was so slow to load, and images 720 00:39:25,983 --> 00:39:29,233 slow, like slow, so slow to hydrate? 721 00:39:29,653 --> 00:39:33,073 That they perceived it as broken, lost interest in the 722 00:39:33,073 --> 00:39:34,563 experience and ended up leaving. 723 00:39:34,623 --> 00:39:38,113 And so, the way you do that is you basically take the telemetry we're 724 00:39:38,163 --> 00:39:41,843 emitting from the app, the exhaust we collect by default, and you create 725 00:39:41,873 --> 00:39:46,333 these start and end events that allow you to then, we post process the data. 726 00:39:46,333 --> 00:39:49,213 We go through all of these sessions we're collecting, which is basically 727 00:39:49,213 --> 00:39:52,973 a play by play of like linear events that users went through. 728 00:39:53,368 --> 00:39:56,228 And we hydrate the flow to tell you where people are dropping off. 729 00:39:56,728 --> 00:40:00,478 and so you can see like their actual completion rates over time, you know, 730 00:40:00,478 --> 00:40:04,118 obviously it's a test app, so there's not a ton of data there, but what gets 731 00:40:04,118 --> 00:40:09,688 really cool is we start to, we start to build out this notion of once you see. 732 00:40:10,188 --> 00:40:15,458 the, issues happen, well how can I now go look at all of the various attributes 733 00:40:15,458 --> 00:40:21,378 of those populations under the hood to try to specify which of the things 734 00:40:21,748 --> 00:40:25,558 are most likely to be attributed to the population suffering the issue? 735 00:40:25,948 --> 00:40:28,008 So that could be an experiment. 736 00:40:28,388 --> 00:40:32,778 it could be a particular mobile, a particular version they're on. 737 00:40:32,818 --> 00:40:34,748 It could be, an OS version, right? 738 00:40:34,748 --> 00:40:38,658 You just shipped an experiment that isn't supported in older OSs, and those 739 00:40:38,848 --> 00:40:40,458 users start having a bad experience. 740 00:40:40,958 --> 00:40:45,168 And then each of those gets you down to what we call, this, user play by play 741 00:40:45,198 --> 00:40:50,548 session timeline, where you basically get a full Recreation of every part of 742 00:40:50,588 --> 00:40:54,278 the exhaust stream that we're gathering from you interacting with the app or 743 00:40:54,278 --> 00:40:56,838 website just for reproduction purposes. 744 00:40:56,898 --> 00:40:59,428 once you've distilled here, you can say, okay, now let me 745 00:40:59,428 --> 00:41:00,778 look at that cohort of users. 746 00:41:01,268 --> 00:41:03,578 And so I can do pattern recognition, which I think is pretty 747 00:41:03,729 --> 00:41:04,099 Hmm. 748 00:41:04,618 --> 00:41:08,698 So for the audio audience that didn't, that didn't get to watch the video, 749 00:41:09,038 --> 00:41:14,858 what are some of the key sort of, if someone's in a mobile and front end 750 00:41:14,858 --> 00:41:18,618 team and this is actually going back to a conversation I had with one of 751 00:41:18,618 --> 00:41:25,128 your Embrace team members at KubeCon in London, what are some of the key changes 752 00:41:25,138 --> 00:41:26,998 or things that they need to be doing? 753 00:41:27,498 --> 00:41:32,688 If I guess if I back up and say the premise here is that if I'm it's 754 00:41:32,708 --> 00:41:34,038 there's almost like two archetypes. 755 00:41:34,148 --> 00:41:34,908 What am I trying to say here? 756 00:41:35,108 --> 00:41:36,468 There's two archetypes that I'm thinking about. 757 00:41:36,468 --> 00:41:41,368 I'm thinking about me, the DevOps slash observability system maintainer. 758 00:41:41,758 --> 00:41:43,018 I've probably set up. 759 00:41:43,518 --> 00:41:47,318 Elk or, you know, I've got the key names, the Loki's, the 760 00:41:47,338 --> 00:41:49,368 Prometheus, the, the Grafana's. 761 00:41:49,368 --> 00:41:51,178 I've got all these things that I've implemented. 762 00:41:51,208 --> 00:41:53,218 I've brought my engineering teams on board. 763 00:41:53,218 --> 00:41:54,328 They like these tools. 764 00:41:54,828 --> 00:41:58,268 They tend to have, especially for mobile, they tend to have other 765 00:41:58,268 --> 00:42:00,408 tools that I don't deal with. 766 00:42:00,778 --> 00:42:02,568 They might have platform, like the, the 767 00:42:02,672 --> 00:42:03,402 Traditionally, right? 768 00:42:03,452 --> 00:42:05,012 We're obviously trying to change that. 769 00:42:05,012 --> 00:42:07,592 But yeah, traditionally, they have five or six other tools that 770 00:42:07,602 --> 00:42:11,402 don't play into the ecosystem, observability ecosystem you have set up. 771 00:42:11,808 --> 00:42:12,298 Yeah. 772 00:42:12,298 --> 00:42:16,798 So, so we're on this journey to try to centralize, bring them 773 00:42:16,798 --> 00:42:18,258 into the observability world. 774 00:42:18,308 --> 00:42:21,858 you know, traditional mobile app developers might not even. 775 00:42:22,358 --> 00:42:26,908 Be aware of what's going on in the cloud native observability space and 776 00:42:27,218 --> 00:42:28,348 we're bringing them on board here. 777 00:42:28,368 --> 00:42:34,808 Now suddenly they get even more code coming at them that's slightly less 778 00:42:34,828 --> 00:42:40,918 reliable or maybe presents some unusual problems that we didn't anticipate. 779 00:42:40,918 --> 00:42:45,258 So now, you know, we're in a world where suddenly what we have 780 00:42:45,258 --> 00:42:46,668 in observability isn't enough. 781 00:42:47,262 --> 00:42:47,593 yeah, 782 00:42:47,643 --> 00:42:49,653 you're a potential solution. 783 00:42:49,653 --> 00:42:53,373 What are you looking at for behaviors that they need to change to things 784 00:42:53,373 --> 00:42:54,613 that people can take home with them? 785 00:42:54,653 --> 00:42:55,253 And 786 00:42:55,362 --> 00:42:58,142 I mean, I guess the way I think about it is, right, the way, the reason 787 00:42:58,142 --> 00:43:04,832 observability became so widely adopted in server side products was because in 788 00:43:04,832 --> 00:43:10,002 an effort to more easily maintain our software and to avoid widespread defects 789 00:43:10,002 --> 00:43:14,372 of high blast radius, we shifted from a paradigm of like monoliths deployed on 790 00:43:14,372 --> 00:43:19,782 bare metal to virtualization, which was, you know, various container schemes kind 791 00:43:19,782 --> 00:43:25,322 of that has right now most widely been around, Kubernetes and microservices 792 00:43:25,352 --> 00:43:28,612 because you could scale them independently and you could deploy them independently. 793 00:43:28,612 --> 00:43:28,962 Right. 794 00:43:29,462 --> 00:43:34,902 And that complexity of the deployment scheme and the different apps and services 795 00:43:34,902 --> 00:43:39,682 interplaying with each other necessitated an x ray vision into your entire system 796 00:43:39,682 --> 00:43:44,452 where you could understand system wide impacts to your, the end of your world. 797 00:43:44,482 --> 00:43:48,232 And the end of your world, from the most part became your API surface 798 00:43:48,232 --> 00:43:51,932 layer, the things that served your web and mobile experiences. 799 00:43:51,972 --> 00:43:55,452 And, you know, there are businesses that just serve APIs. 800 00:43:55,802 --> 00:43:59,842 Right, but broadly speaking, the brands we interact with as human beings serve us 801 00:43:59,892 --> 00:44:02,302 visual experiences that we interact with. 802 00:44:02,802 --> 00:44:03,172 right. 803 00:44:03,292 --> 00:44:06,812 It's the server team managing the server analytics, not so much 804 00:44:06,812 --> 00:44:10,862 the client device analytics that 805 00:44:11,021 --> 00:44:11,351 right. 806 00:44:11,831 --> 00:44:15,121 the world has gotten a lot more complicated in what the front 807 00:44:15,121 --> 00:44:16,571 end experience looks like. 808 00:44:16,591 --> 00:44:21,901 And you could have a service that consistently responds and has a nominal 809 00:44:21,901 --> 00:44:27,171 increase in latency and is well within your alert thresholds, but where the 810 00:44:27,201 --> 00:44:32,831 SDK or library designed for your front end experience suddenly starts retrying 811 00:44:32,871 --> 00:44:37,711 a lot more frequently, delivering Perceived latency to your end user. 812 00:44:38,211 --> 00:44:42,441 And, and so I think the question is, could you uncover that incident? 813 00:44:42,451 --> 00:44:46,961 Because if users suffer perceived latency and therefore abandon, what metrics do 814 00:44:46,961 --> 00:44:51,441 you have to go measure whether or not users are performing the actions you 815 00:44:51,471 --> 00:44:54,551 care about them performing, whether that's attributable to system change? 816 00:44:55,051 --> 00:44:58,181 In most instances, I don't think most observability systems have that. 817 00:44:58,681 --> 00:45:01,671 and then the second question is, right, so, and by the way, Bret, that's the 818 00:45:01,671 --> 00:45:05,471 underlying supposition that in a real observability scheme, mean time to detect 819 00:45:05,471 --> 00:45:09,481 is important, is as important, if not more so, than mean time to resolve. 820 00:45:09,981 --> 00:45:13,441 the existing tooling ecosystem for frontend and mobile has been set up to 821 00:45:13,461 --> 00:45:17,391 optimize mean time to resolve for known problems where I can basically just 822 00:45:17,391 --> 00:45:19,301 count the instance and then alert you. 823 00:45:19,311 --> 00:45:22,591 So, And, you know, the lack of desire to be on call, like I've 824 00:45:22,591 --> 00:45:26,551 heard this stupid saying that there's no such thing as a front end 825 00:45:26,551 --> 00:45:28,701 emergency, which is like ridiculous. 826 00:45:29,001 --> 00:45:33,781 If I'm, you know, if I'm a major travel website and I run a thousand different 827 00:45:33,791 --> 00:45:39,241 experiments and a team in, Eastern Europe drops an experiment that affects 828 00:45:39,241 --> 00:45:42,511 users globally, 1 percent of users globally in the middle of my night. 829 00:45:42,816 --> 00:45:47,496 That, makes the calendar control broken and that some segment of that 830 00:45:47,496 --> 00:45:49,066 population can't book their flights. 831 00:45:49,346 --> 00:45:51,596 That sounds a lot like a production emergency to me. 832 00:45:51,866 --> 00:45:55,226 I, that has material business impact in terms of revenue. 833 00:45:55,726 --> 00:45:59,266 Or the font color changes and the font's not readable 834 00:45:59,456 --> 00:46:04,496 yeah, I guess I am imploring the world to shift to a paradigm where 835 00:46:04,496 --> 00:46:09,386 they view like users willingness and ability to interact with your 836 00:46:09,386 --> 00:46:12,006 experiences as a reliability signal. 837 00:46:12,506 --> 00:46:17,056 And I think the underlying supposition is that this only becomes more acute of a 838 00:46:17,076 --> 00:46:20,066 problem as the number of features we ship. 839 00:46:20,126 --> 00:46:26,676 I guess I'm starting with the belief that from what I hear, people 840 00:46:26,676 --> 00:46:28,646 are doubling down on this, right? 841 00:46:28,646 --> 00:46:33,236 They're saying we need to make software cheaper to build and faster to build. 842 00:46:33,736 --> 00:46:37,106 Because it is a competitive environment and if we don't do it, somebody else will. 843 00:46:37,606 --> 00:46:40,976 and as that world starts to expand, the acuity of the 844 00:46:40,976 --> 00:46:42,746 problem space only increases. 845 00:46:43,246 --> 00:46:43,596 Yeah. 846 00:46:43,606 --> 00:46:49,316 I think your theory here, matches are well with some other like we're 847 00:46:49,316 --> 00:46:52,916 all kind of in this little bit of we've got a little bit of evidence. 848 00:46:52,916 --> 00:46:54,746 We hear some things and we've got theories. 849 00:46:55,136 --> 00:46:58,396 We don't have years of facts of exactly how AI is affecting a lot of 850 00:46:58,406 --> 00:47:03,036 these things, but other compatible theories I've heard recently on this, 851 00:47:03,266 --> 00:47:04,846 actually with three guests on the show. 852 00:47:05,181 --> 00:47:06,661 that I'm just coming to mind. 853 00:47:06,661 --> 00:47:10,701 One of them is, because of the velocity change and because of the, it is 854 00:47:10,701 --> 00:47:14,831 this increase in velocity, it only is going to increase the desire for 855 00:47:14,831 --> 00:47:20,351 standardization in CI and deployment, which those of us in, if you've been 856 00:47:20,391 --> 00:47:23,161 living in this Kubernetes world, we've all been trying to approach that. 857 00:47:23,161 --> 00:47:27,491 Like we're all leaning into Argo CD as one of the, as the number one way 858 00:47:27,491 --> 00:47:29,021 on Kubernetes to deploy software. 859 00:47:29,271 --> 00:47:29,881 you know, we've got. 860 00:47:30,246 --> 00:47:35,876 This GitOps idea of how to standardize change as we deliver in CI. 861 00:47:35,886 --> 00:47:37,826 It's still completely wild, wild west. 862 00:47:37,826 --> 00:47:41,746 You've got a thousand vendors all in a thousand ways you 863 00:47:41,746 --> 00:47:42,896 can create your pipelines. 864 00:47:43,286 --> 00:47:46,226 And hence the reason I need to make courses on it for people, because 865 00:47:46,226 --> 00:47:47,956 there's a lot of art still to it. 866 00:47:48,126 --> 00:47:50,746 We don't have a checkbox of this is exactly how we do it. 867 00:47:51,146 --> 00:47:54,326 And in that world, the theory is. 868 00:47:54,616 --> 00:47:57,526 Right now that maybe AI is going to allow us to act where it's going to force us 869 00:47:57,526 --> 00:48:01,326 to standardize because we can't have a thousand different workflows or pipelines 870 00:48:01,616 --> 00:48:05,726 that are all slightly different for different parts of our software stack, 871 00:48:06,036 --> 00:48:10,256 because then when we get to production and we have started having problems, or if the 872 00:48:10,256 --> 00:48:13,886 AI is starting to take more control, it's just going to get worse because the AI 873 00:48:13,990 --> 00:48:17,540 Yeah, and at the end of the day, right, standardization is more 874 00:48:17,540 --> 00:48:18,970 for the mean than the outliers. 875 00:48:18,970 --> 00:48:22,450 And I think a lot of people assume they're like, oh, we can do it right because we 876 00:48:22,450 --> 00:48:26,250 have hundreds of engineers working on like our, you know, our automation and stuff. 877 00:48:26,250 --> 00:48:27,990 It's do you know how many companies there are out there 878 00:48:27,990 --> 00:48:29,550 that are not technology companies? 879 00:48:29,590 --> 00:48:32,330 They're a warehouse company with technology. 880 00:48:32,330 --> 00:48:32,345 Right. 881 00:48:32,895 --> 00:48:37,435 And like standardization allows them to more often than not build good 882 00:48:37,585 --> 00:48:42,635 technology, to measure correctly, to deploy things the right way, like we're 883 00:48:42,635 --> 00:48:46,995 all going to interact with it in some way, right, and as the pressure for more 884 00:48:46,995 --> 00:48:51,690 velocity and them building technology speeds up, the need for you know, a 885 00:48:51,700 --> 00:48:55,640 base layer of doing things, where it's consistently right, but only increases. 886 00:48:55,640 --> 00:48:57,910 And I think that's, you know, for the world it's a challenge, 887 00:48:57,910 --> 00:48:59,260 for us it's an opportunity. 888 00:48:59,760 --> 00:49:00,090 Yeah. 889 00:49:00,590 --> 00:49:01,060 Awesome. 890 00:49:01,560 --> 00:49:03,890 I think that's a perfect place to wrap it up. 891 00:49:04,020 --> 00:49:05,280 we've been going for a while now. 892 00:49:05,790 --> 00:49:09,745 but, Andrew, io, right? 893 00:49:09,835 --> 00:49:10,115 That's, 894 00:49:10,329 --> 00:49:10,619 Yeah. 895 00:49:10,649 --> 00:49:11,509 www. 896 00:49:11,909 --> 00:49:12,169 embrace. 897 00:49:12,769 --> 00:49:12,779 io. 898 00:49:13,115 --> 00:49:13,925 let's just bring those up. 899 00:49:13,925 --> 00:49:16,455 We didn't show a lot of that stuff, but, the, 900 00:49:16,505 --> 00:49:20,075 in case people didn't know, I made a short a few months ago about Embrace or with one 901 00:49:20,075 --> 00:49:21,815 of the Embrace team members at KubeCon. 902 00:49:21,815 --> 00:49:24,975 which we had touched a little bit on this show, but we talked about that. 903 00:49:25,475 --> 00:49:28,475 Observability is here for your mobile apps now and your front end apps and 904 00:49:28,475 --> 00:49:32,575 that those people, those developers can now join the rest of us in this, 905 00:49:32,675 --> 00:49:38,515 world of modern metrics, collection and consolidation of logging tools 906 00:49:38,515 --> 00:49:43,175 and bringing it all together into ideally one, one single pane of glass. 907 00:49:43,225 --> 00:49:46,325 if you're advanced enough to figure all that out, the tools are making 908 00:49:46,325 --> 00:49:48,475 it a little bit easier nowadays, but I still think there's a lot of. 909 00:49:48,975 --> 00:49:52,155 A lot of effort in terms of implementation engineering to get all 910 00:49:52,155 --> 00:49:53,615 this stuff to work the way we hope. 911 00:49:53,645 --> 00:49:56,615 But, it sounds like that you all are making it easier for those 912 00:49:56,615 --> 00:49:58,085 people, with your platform. 913 00:49:58,585 --> 00:49:59,305 We're definitely trying. 914 00:49:59,305 --> 00:49:59,525 Yeah. 915 00:49:59,525 --> 00:50:03,985 I think it'd be a future state that would be pretty cool would be the ability for 916 00:50:03,985 --> 00:50:09,075 operators to look at a Grafana dashboard or a, you know, dashboard or Chronosphere 917 00:50:09,075 --> 00:50:10,675 or New Relic or whatever it is. 918 00:50:11,175 --> 00:50:18,335 see that, you know, they see a disen a, an engagement decrease on login on a web 919 00:50:18,335 --> 00:50:23,575 property, and immediately open an incident where they page in the front end team and 920 00:50:23,575 --> 00:50:30,995 the core team servicing the off APIs in a company and have them operating on data 921 00:50:31,025 --> 00:50:32,375 where the front end team can be like. 922 00:50:32,875 --> 00:50:38,525 We're seeing a number of retries happening after we updated to the 923 00:50:38,525 --> 00:50:42,265 new version of the API that you serve for logging credentials. 924 00:50:42,765 --> 00:50:46,345 even though they're all 200s, they're all successful requests, what's going on? 925 00:50:46,435 --> 00:50:50,295 And that backend team says, well, it looks like our, you know, we were slow 926 00:50:50,295 --> 00:50:55,845 rolling it out and the P90 latency is actually 250 milliseconds longer. 927 00:50:56,345 --> 00:50:57,715 Why would that impact you? 928 00:50:57,745 --> 00:51:01,985 And they say, well, the SDK retries after 500 milliseconds and our P50 929 00:51:01,985 --> 00:51:03,565 latency before this was 300 milliseconds. 930 00:51:03,575 --> 00:51:06,885 So, 10 percent of our users or something are starting to retry 931 00:51:06,895 --> 00:51:07,985 and that's why we're seeing this. 932 00:51:07,985 --> 00:51:10,455 You know, the answer here is to increase. 933 00:51:10,955 --> 00:51:14,435 The resource provisioning for the auth service to get latency back 934 00:51:14,435 --> 00:51:19,775 down and or change our SDK to have a more permissive retry policy. 935 00:51:20,275 --> 00:51:24,425 and, you know, have teams be able to collaborate around the right design 936 00:51:24,425 --> 00:51:27,625 of software for their end users and understand the problem from both 937 00:51:27,645 --> 00:51:31,465 perspectives, but be able to kick off that incident because they saw 938 00:51:31,825 --> 00:51:36,535 real people failing to disengage, not just some server side metrics, 939 00:51:36,595 --> 00:51:37,805 which I think would be pretty neat. 940 00:51:38,305 --> 00:51:38,695 Yeah. 941 00:51:39,195 --> 00:51:42,265 And I should mention you all, if I remember correctly in cloud 942 00:51:42,275 --> 00:51:49,145 native, you're a lead maintainers on the mobile observability SDK. 943 00:51:49,145 --> 00:51:50,285 Is that, am I getting that right? 944 00:51:50,335 --> 00:51:50,555 I'm trying 945 00:51:50,569 --> 00:51:54,319 we have engineers who are provers, on Android and iOS. 946 00:51:54,479 --> 00:51:59,059 we have the, from what I'm aware of, the only production 947 00:51:59,059 --> 00:52:01,829 React native, OpenTelemetry SDK. 948 00:52:01,989 --> 00:52:07,009 we are also participants in a new browser SIG, which is a, a subset 949 00:52:07,019 --> 00:52:08,699 of the former JavaScript SDK. 950 00:52:08,699 --> 00:52:11,439 So our OpenTelemetry SDK for web properties is. 951 00:52:11,839 --> 00:52:15,759 Basically a very slimmed down chunk of instrumentation that's only relevant 952 00:52:15,759 --> 00:52:17,729 for browser React implementations. 953 00:52:18,229 --> 00:52:22,929 so yeah, working to advance the kind of standards community in the cloud native 954 00:52:22,939 --> 00:52:27,659 environment for what, Instrumenting in real world runtimes where Swift, 955 00:52:27,699 --> 00:52:31,299 Kotlin, JavaScript are executed. 956 00:52:31,799 --> 00:52:32,339 Nice. 957 00:52:32,839 --> 00:52:34,369 Andrew, thanks so much for being here. 958 00:52:34,379 --> 00:52:35,639 So we'll see you soon. 959 00:52:35,709 --> 00:52:36,259 Bye everybody.

Could AI End Human QA?

Episode Transcript

Never lose your place, on any device