Freestyle Fridays w/ Matt Housley - The Pedantic Layer, Frauds, and more

Episode Transcript

Happy Friday I guess.

How's it going, Joe?

It's it's we're back monthly.

I think we're doing this now, which is a lot of fun.

It's like the old days.

We're kind of like doing the Fridays better than Mondays.

Yeah, it's a lot more chill, especially when we're we're recording at noon now, so, you know, But yeah, you've gotten into your day, you've had several cups of coffee, you've already dealt with several annoyances, and you're just kind of cantankerous.

That's about, I mean, that's how I wake up too.

I mean, how much caffeine do I typically take in a day?

2 cups of coffee start out and then a 220 milligram of caffeine

about 10

about 10:00 AM and then I have another cup of coffee now and I keep mainlining caffeine until about 2:00 PM, at which time.

You cut it off.

Because 1/2 left caffeine is about 5 hours, right?

So yeah.

And it supposedly varies quite a bit.

But anyway, we're getting off topic.

No, this is very much the topic of the day.

We're just going to talk about caffeine ingestion the entire time, actually.

Speaking of which, I guess how much?

How much coffee do you do you drink?

A day I'm usually like a two cup a guy kind of day, but sometimes OK into three.

That's pretty typical for me.

Start out with one and then you know, kind of see how it goes into the afternoon.

So yeah.

Yeah, it sounds like beers.

Yeah.

Yeah, exactly.

My 15th one.

It's not even 3:00 yet.

That's right.

You got to like, balance the beers in the cafe.

Yeah, exactly.

What's been on in your mind?

So you wrote an article recently called The Pedantic Layer.

I wrote like what you had to, what you had to say.

I read it when it dropped in my inbox.

Maybe you can tell us a little bit about it and I can I can tell you my thoughts and response.

Well, so I mean, it kind of spurred last night I was reading our mutual friend Malcolm Hawker's LinkedIn post.

I guess don't read LinkedIn posts past like 8:00 PM.

It's right not taking caffeine past 2:00 PM.

It's a bad idea depending what you.

But I woke up this morning and I and I thought, OK, so I'll just, I'll just pound out a quick article on Yeah.

So his post was about just the nitpicking around the term unstructured data.

Right.

And how this is basically been a thing for decades now, right?

Not just not just because the AI hype cycle, but it has been right.

I mean, and so it, it's just like the data governance, data management consortium in particular make up sound like a consortium, You know, the, the, the field has a tendency to be pedantic, nitpick terms and so forth, which I guess it to some degree is great to other degrees.

I I just look at it and I say, OK, great.

Is there anything else we could be talking about right now?

It sounds, you know, and I and I made a comment that this, this sort of behavior where we keep going back to the past expecting different results.

Well, that first is the definition of insanity.

But 2, I feel like this course sort of behavior holds the industry back and we're kind of the data management and governance space in particular really hasn't seen much progression in my view since the 2000s or late 90s.

I mean, of course you had GDPR and other regulations, which makes data governance more front and center, but they're still doing with the same stuff, saying the same stuff.

And yeah, so I just, you know, as I typically do if when I write a rant, I, you know, just kind of lose my shit and write an article, people.

Yeah, that's my that's my form of acting out, I guess.

But people like that I got, I got a ton of texts.

Let me see here.

Yeah, yeah, let's read the reader.

Let me see, Carly wrote me this morning.

The pedantic layers where you spit up my coffee.

Oh, you're fucking hilarious.

And other people like that, right?

So there's several others.

And so I thought, yeah, OK, cool.

But that anyway, that's the notion of it.

So the notion is, look, things like unstructured data, things that we've been nitpicking about.

This is the article now, right?

Feel free to nitpick on it.

But The thing is, people are getting utility out of these things, and they have been for decades.

Right.

Text, images, whatever, right?

It doesn't matter.

Exactly.

Yeah.

And I think there are some things you can say, OK, this is probably structured data and this is probably some of you structured over here, but trying to have rigid definitions, it's actually gotten harder and harder over time.

And that's because any time you come up with a definition, it's usually in a particular technology context.

And it turns out one thing your article didn't get to is that even databases themselves have changed.

Even things like Postgres have changed a lot in analytics databases.

And that makes it even harder to define this.

So it used to be that old school databases you had fixed length types and so if you had to text field, you had to say it's like 20 characters in length.

Well, in Postgres that's not really true anymore.

The everyone uses Varkar now, which means that you can put like, I don't know what the limit is, but in a lot of systems you can put like a MB of text inside of a field that's getting into like what people used to call unstructured data.

And so even data that's in a database, just a a standard relational database can.

Be based on tables.

Now based on tables like all this stuff, yeah tables like factor.

For PT Vector you got yeah, I mean.

Variant type can be arbitrarily long.

And so it's like, why fight over this stuff, You know, just to soon sit, Yeah.

And then the arguments are really about, well, what is unstructured data?

And I'm kind of like, I don't know.

And I don't think a lot of people in the industry have time for this stuff like we're out, right?

The things that you're, you know, building with the things that you're still nitpicking about the definitions of, right?

So it's it's sort of like arguing.

I mean, when I was out at a hike, I was like, OK, so it's it's you know, that was think of analogies.

And some of them are pretty horrible or just lame.

But you know what if you see a house on fire, you're not going to sit there and debate, well, what is what is fire exactly?

Right, right.

Or what is water that's putting it out.

Like, you know, I mean, I would be thoroughly annoyed with that person.

I'd probably punch him.

Yeah, well, and it's like that's a lot of, you know, classic Monty Python is jokes along these lines.

And I think the whole troupe came out of that sort of Oxbridge world of very academic types and yet obviously sort of rebelled against it.

But a lot of you.

Know, yeah.

Laden or unladen swallow and the like yeah you know trying to define exactly what it is and apply specifications and we're kind of doing the same stuff here What is it?

One of my other favorite examples which hopefully I won't offend anyone but like data products and it's like.

Oh, what the fuck that Seriously, I'm kidding.

OK, I have friends in my respect who have tried to define this, and I'm like, I understand why you're trying to do it, but it's just not going to happen.

And in that case, there are some specific reasons why you're never going to succeed.

You've taken two of the most popular words in the tech industry, product and data, and then tried to mash them together and claim that as your own.

And a lot of definitions is about just, like, ownership, right?

I'm gonna claim this as my own.

It's like good luck.

Like everyone is claiming that it's their own.

And it's all the definitions are slightly different or in some cases dramatically different.

And so you could have your version of it, but you're never actually going to be able to claim that term completely.

I think it's, it's definitely jumped the shark a bit on that one.

I mean, even when you get to things like, OK, so the data product discussion is always interesting because people then go back to the notion of, well, in a factory, when we make products, there's parts and there's build some material and and all this stuff, right?

And we assemble it and it's a product that's used by a customer.

But what about data consumers?

Geez, now we got to talk about that too.

And producers, then it starts getting into this whole assembly line, you know, analogy, which never it falls apart pretty quick when you're talking about data, right?

So, you know, it's.

But yeah, anyway, I think thank you.

I also get really bored with these kind of conversations.

I just think they're we just have there's other things to be solving.

There other things to be solving and The thing is if you come up with an entirely new term out of the blue, like I think Chamac was the first one to say data mesh.

Maybe someone else said it before.

But like Monty Python came up with it.

Probably, Yeah, that's probably true.

Is that Oxbridge someone in the seminar to find it and they heard it.

But in that case, you know, you can kind of claim it.

Like you write a book, you put a stake in the ground, it's your term.

Yes, people will mess with that term.

Vendors will try to claim it, but you could say, well, here's my book.

This is how I defined it originally.

I mean, you know, have fun.

You can debate over it, but that's what it is.

But yeah, anytime you try to define a term that's used by the industry, it's just not gonna happen.

Even in our book, we define data engineering.

But I see all kinds of alternative definitions and like, well, that's the reality.

I don't really care.

The term was around before we wrote a book.

What can you do?

Yeah.

I mean, yeah, I think we we packaged it neatly in a way that was more consumable for people.

I think that was our contribution to it.

But yeah, we didn't come up with the term data engineering and, you know, and, and yeah, and I don't mind either if people want to come up with alternative definitions.

It's not like I'm sitting there being the definition police.

I mean, I will to some extent if it gets wildly off the mark.

Like if you see, if you said like data engineering is like cattle ranching or something, I would say, well, you just need to go down aisle 13 where they have the cough medicine and make yourself better.

Well, and I, I think the thing we pushed back against even before we wrote the book is a lot of people would try to claim, you know, Spark and Hadoop are data engineering.

That's what serious data people use.

And then later on as analytics engineering came along, people were like, oh, you know, data engineering is just using sequel with orchestration.

And I'm like, those are all too narrow.

I mean, yes, whatever definition you have has to encompass those things, but can't just be those things themselves.

No, we talked about this acknowledging too, but it's, it's bears repeating that that your your field is not the set of tools in the field.

Or the latest trends, right?

It's like this has been around for decades.

You can't just say that it's this one thing that everyone is very excited about right now, right?

That'd be, yeah, it's, yeah.

But yeah, I don't know.

And the other place, I'll get a bit nitpicky sometimes.

It's like, you know, when people say data warehousing, it's just, it's a database or some type of database, right?

I get a bit pissed about that because Bill's a friend of mine, he can't, he defined it.

Definition's been there for decades.

It's there.

And also, you know, I'm writing an article right now and this is, you know, I think it kind of straddles the fine line between being pedantic and just being practical.

But hey, this is going on practical data modelling, so therefore it must be practical.

But this article is going to be about, you know, why medallion architecture is not data modelling.

I see that people getting confused about this now where the the confusion is, well, is this a, Are you sure that this is a gold table, IE does it fit a, is it a star schema or something?

You know, because to me, the gold, you know, the gold layer or whatever it was, it's very similar to what the intention of the data Mart was back in the day, which is serving data to end customers, curator data for busy non-technical users.

So there's a spectrum of technicality and non technicality, technical, non-technical, you know, and so that's kind of how I look at it.

But anyway, that's, that's I'm going to draw the line there because I think people are getting confused saying, well, I'm modeling in a medallion way.

I'm like, well, it's, it's a life cycle at the stage.

But that's not a, that's not a data modeling approach per SE, right, Right.

So that's just like preventing people from, you know, using a lawnmower to trim their hedges.

It's, it's, it's not quite arguing about what is unstructured data.

I think this is a very well trodden thing.

Right.

Right to me, but maybe not to you, I don't know.

Well, I mean, the problem is, and this is true more generally, but it's certainly true in the tech industry, Every term is overloaded, right?

Every term actually has a lot of different definitions and you can try to claim it in your book, but people use it in the field everyday to mean a lot of different things and you just have to accept that.

So one of the classics is transactional database, and of course the purest definition would be to say that it's a database that actually supports transactions.

But in practice this gets applied to a lot of databases that don't support transactions per SE and are eventually consistent.

What do?

You mean by a transaction?

Exactly.

Yeah, there you go.

So we just have to deal with this problem and I that's why I appreciated your article.

It's like, yeah, why don't spend all this energy trying to create a definition that most people won't use, right?

Or if.

You're going to be pedantic.

Do it in a way that's going to push the field forward, right?

Right.

And that's the thing, like you want thoughts, not definitions.

I mean you want ideas, not just definitions.

Yeah.

And for the audience, by pushing the field forward, I mean, it it helping push the contemporary field forward.

Like, I don't care arguing about unstructured, the nature of unstructured data at this point.

It's sort of like arguing about, you know, things that Shakespeare wrote about.

Right.

It's like it's there, it's history.

Nobody gives a shit except Shakespearean's and Oxbridge folks.

But yeah, there's plenty of other problems in this world to be solving right now.

So yeah, exactly right.

Moving on.

Like what's what's been on your mind?

This is just a funny specific technical thing, but like Duckdb, I've been using it for a class lately and I mean, I know it's cool, but it's just it, it continues to surprise me with how good it is.

And like I have my students using it and I have them benchmarking queries and say, OK, go query, you know, this GB of CSV files now ingested native format.

Show me the performance difference.

They're like sending me messages.

It's 1000 times faster.

And I'm like, yeah, that's, you know, data structures.

They're actually very useful if they're done correctly.

And so I think the field has legitimately moved forward.

I mean, one of my first data jobs, we were using the Postgres, but we were just a row oriented Postgres.

Postgres could be columnar oriented Postgres, you know, transactional Postgres.

And we were using it for analytics.

And even then I'm like, this doesn't quite make sense to use the database this way.

And I think since then the tools have gotten much, much better where you have these amazing analytics databases that you can just run locally or on a single node and do a lot of stuff that you used to do, but just like so much faster.

And we've gotten away from this, like worrying about what's big data and what's not, I hope.

So I mean Duckdb, the guy who created it, he was telling me like you can query a TB of data on your laptop.

Pretty easily if you have the storage space or just on an EC2 instance, which I think is going to be a more common use case, sure.

Yeah, but it just spills the disk, so it does.

So you're not concerned by memory?

Yeah, it's a badass database for sure.

That's super cool you're using it for classes.

That's.

Yeah, I mean, cuz I, I think for teaching data engineering, I always, these days, I always want the students to get hands on with the actual tools that you would use in production.

But you know, you don't want them spending all their time in Redshift or Snowflake or any of these things cuz it gets expensive.

It's just like go mess around with it.

But then let's just do some stuff on a local node.

Let's orchestrate some duck DB and just cuz in practice, a lot of data engineering is, I mean, as we said earlier, it's a whole spectrum of things, but it's not all big data.

A lot of it is just like, give me, you know, a business events table or some kind of financial transactions table that isn't that large, but is like very consistent and high quality.

And I can get statistical results very quickly.

And these tools let you do that and the students could get their hands on with them, but also play around with the more big.

Data.

I use Doctor B when I teach too.

It's a lot of fun and the stuff I do is hook Streamlet up to it.

It's how you make data-driven apps like Streamlet.

It's dope for that kind of stuff.

You have to mess around with it, yeah.

Maybe yeah, it's just fun because it's just it's it's all.

Well, the cool thing with that is you can you know, you just config like literally a couple parts of your code and you have a completely different visualization, completely different way of interacting with your data.

And then the other thing I like to do is hook up a large language model to duck DB and streamlet and then you can just talk to.

So that's, that's part of the workshop that I've, I've been doing revamping it for next year.

But like, that's, that's fun cuz it just shows people OK.

Like you can actually take all these components and build and it's super easy.

Yeah, like there's no mystery to it.

Right.

And presumably, I don't know if you've done this, but you can like take, as I was saying, databases like Duckdb can handle basically what we would call unstructured data, you know, text messages and such.

And potentially you could do additional analysis on those using a tool like an LLM after storing them in Duckdb.

So good mess around with that too, yeah?

You could do that.

You could hook up the Postgres, I think the driver for PG, Duck DB or just there's new one recently and that one's kind of dope.

So then you could, yeah.

I mean, there's just, there's just so much flexibility.

I mean, back and we were teaching Sequel together a few years ago.

I mean we were using Big Query and now I would have the students who's Duck DB.

Yeah, I would.

Probably some of.

It right for some of it, Yeah, exactly do a mix, but yeah, I agree.

Yeah, and even even, I think a lot of the other vendors, you know, I'm not going to name names.

They've also made their free tiers, or at least their access a bit easier now.

So, you know, it's worth entertaining others.

But I think at the time we used Big Query just because anything under a TB of querying was data scam is free.

Yeah, right.

And everyone has a Gmail account.

Therefore it's pretty easy to set up Big Query.

And there was a lot of modern functionality in there that still is freaking awesome.

So yeah.

Yeah.

Well, we had that conversation with Jordan Degaunee a long time ago where he said Big Query is a legit like big.

Name is Jordan.

So actually duck DB right?

So he also created one of the.

Created big query too.

Yeah, exactly.

But yeah, we had a cover and he was saying that, you know, they created it as a legit big data database based on this internal Google Dremel data tool.

And but at the same time, most users weren't using it that way where they were just using it for pretty simple small data.

So that's how we use it in class, right?

I mean, you were talking gigabytes of queries that you can run locally pretty easily.

Yeah, you could.

You could for sure.

But the Google has this benefit in order to plug in Google here, send us, send us a gift card or something.

I want my.

But the thing I like about Bigquery, I still like about it is that the public sample data sets are really good.

Yeah, because they have, you know, the one we were using was I think, ads and then Google Analytics.

And Google Analytics really shows you how to work with one big table, but also with like extremely nested data.

And so that's one of the things that we would teach a lot is how to parse that and do a lot of advanced analytics.

So anyway, yeah, fun times.

How are you liking teaching?

It's fun.

I mean, it's I'm just helping to develop a course, so I'm not the primary teacher on it, but like developing the materials, the lectures, the whole thing.

So yeah, it's been a lot of fun going back to that.

So yeah.

Oh man, yeah, you were a long time academic, so it's.

Kind of good I was.

I was not quite Oxbridge, but you know, similar pedantic world you like.

Well, you came from math, I mean.

That's exactly.

And I guess I was going to say when I was reading your article, one thing I thought about is that in math we are very particular about definitions, but also pure math.

The world I came from doesn't apply very well to the real world for exactly that reason.

It's very hard to apply pure math concepts in the real world without some flexibility, right?

Without kind of bending the rules a bit.

It just doesn't.

You don't.

Have the land of proof so too.

Like exactly have to be exact.

Well, you do.

You can't prove something in generality without like, setting up very clear hypotheses about what you're assuming, axioms and such.

Yeah, I mean, I remember my professors in in math courses would dock me.

I would get dock points if I missed a comma.

That's how, that's how.

But you know that's you shouldn't miss the comma.

Well, if it messes with the structure of the sentence where it's unclear that it's like you have to have the comma.

Not not because you're pedantic about punctuation, well, you are, but because it needs to be very precise what your clause is and what you're referring to and everything else.

Right.

But when you're talking about axioms, you know, theodorms, whatever you're trying to prove.

I mean it is you have, I mean the stakes are high in that sense where it has to be correct.

Exactly right.

Yeah, backwards and sometimes, well, forwards and sometimes backwards if you're.

Yeah, yeah.

But I mean, look at the number of attempts that we've had to write general rules about stock options in their stock market.

They all have limits because the real stock market is driven by human beings and automated systems and all these different things going on.

And it's not, it doesn't behave like any particular mathematical system.

You have good models that are useful.

But they all have their limits.

They'll all break eventually, right?

Yep, that's, and I think that's a very good differentiator is we're, I think in a lot of cases we're, we're because what I'm seeing, especially now with, you know, semantics, philosophy, meaning all these things becoming more interjected into the data world.

I think people are trying to impose, in a lot of cases, philosophical approaches that have been around for thousands of years to something that is, I don't know that we're going to get any closer to solve because now people are trying to rash, you know, trying to philosophize about LLMS and what is the nature of the data that we're giving these LLMS, Right.

And it's like, OK, it's it's it's it's Reddit.

Yeah.

Pictures.

You know, I don't know what what do you want to talk about?

Like Twitter?

Yeah, do you remember Dave Chappelle back when he had ATV show?

That was very popular before he did the show.

Yep, he he had this skit.

It might have even been one of like the extra skits that they, they kind of published or broadcast after he left.

But it was like, what if the Internet were a real place and it turned out the IRA was just this awful place, you know, to go hang out.

That's what LOMS are trained on, right?

The Internet has really not gotten a lot better since he did that.

That's exactly the point, right?

Is it It's you're you're talking about?

You're nitpicking over a trash heap.

Right, exactly.

Who gives a shit?

You know what?

What is the nature of Reddit data?

Well, right, it's yeah, which have been sales.

Yeah.

Well, that's where I get a bit maybe this this sounds kind of pedantic, but I've said this in a number of talks and that is the philosophy.

Well, we talked about lazy scaling last time, right?

But the philosophy that you always hear out of the like base model developers, foundation model developers is that they have to put all the data in because the LLM needs to know about everything.

And it's like you're just excusing lazy behaviour.

I mean, yes, I get it, you need to train on some of the stuff.

But I guess that the vast majority of what we're training on is just junk.

It's like, you know, some level of NLP preprocessing I think is coming just for the sake of cost management.

Just to feed in so much garbage and spend so much money training on garbage will be viewed as absurd from the perspective of two or maybe five years from now.

Well.

Yeah, I mean, and think about what you're doing to your data.

It's actually worse because now you have so much redundant similar data that if you were to come up with a similarity score between two vectors, it's like, well, what's the shade of of difference between this crappy one, this, this piece of data over here and this one is like slightly different over here like that.

You use a lot of signal that way, so.

Exactly right.

Yeah.

And so at some point and I I view that as more of a data engineering task, right.

I mean, we've always argued that ML engineering and data engineering have a lot of overlap, and I think that's just going to continue to be more true in the.

Future, I think it's going to be the same thing.

Yeah.

Basically.

I mean, minus of, you know, and this sort of get pedantic as well as just holding this open a huge can of worms for myself.

Now it's just yeah, yeah.

Everything's like, well, Joe, but what do you mean by this Didn't mean anything.

Leave me alone.

Yeah.

What else has been going on?

Anything else in your mind?

I was thinking about what I would call fraudulent product promotion.

Have I have an example from back in the day, but I'm not going to name the vendor.

I think you'll probably know who am I talking about and that is this particular vendor in the early days, you know, was very involved in the Silicon Valley meet up scene, like just spent tons of money sponsoring meet ups.

Great, right?

Very, very intensive community involvement.

They were one of the first ones to do evangelism that way.

But like, they were one of the bellwethers of like this is how you promote a product.

And yet at the same time they they promoted their product in almost fraudulent way where some of these benchmarks they published were like, oh see, the more data you write, the faster it gets.

Except that the default config of the database was to start discarding rights once it got overwhelmed, which is why the speed appeared to improve as you hit more requests.

And so.

They're selling this as a database where you can store durable information, financial transactions, any number of things, and yet the default config is completely unsuitable for that at this point.

This database is a legit database, supports transactions.

It does a lot of nice things, but my thinking was, well, like what products from the current era will we look back on as fraudulent in a similar way, even if they become something legitimate in the future?

I mean, it's a good question, right?

It's.

Well, these days, I think.

There's so many, so many varying expectations of say, what an AI product is supposed to be.

So if you took any of the big vendors, right, So GPT 5 is the latest punching bag as of a few weeks ago.

So then when that came out, so, you know, if you rebounded this time last year, GPT 5 was touted to be project Orion.

I I think internally it was called the scuttlebutt was and it's supposed to be the, you know, basically the coming of AGI was supposed to be this release.

And instead you got basically a nice, you know, model router.

And so that's, you know, and it's interesting because I feel like the, I wouldn't say it's fraud necessarily, but what I think I see happening, and I've been writing about this a bit, is people are starting to walk back the claims that AGI is going to be the result of large language models, that ASI is coming, you know, by 2027.

I think people are walking back those claims.

Funnily enough, actually a friend of mine, Roman Gomplomski, who's been making the rounds on like the Joe Rogan show recently and others.

So he's like a big AI doomer.

He was just on a diary of a CEO talking about this.

And you know, if you listen to him, obviously in a few years, well, he said in five years, 99% of humanity will be out of work, you know, So if you take that counter argument, perhaps I, I mean, he, I've actually, I know him, I talk to him.

I think his arguments are actually pretty sound, but I think if you take the counter arguments that LLMS are, are not what we're not what's going to bring us to ASI or AGII think that more than anything is going to be, I guess the quote, fraudulent claim or maybe the, the claim that's not going to stand well in history, assuming that people like Roman aren't wrong either.

But you know, now, now it's funny 'cause you know, there's more look, there's more attention being placed than other, you know, older, old AI approaches like symbolic and so forth.

So.

I think is a classic model, yeah.

Right.

So it's the pendulum is swinging back again.

So I think more than anything it was just, I think that the hype, I mean, cuz you probably saw this going to conferences too over the past few years, but there was.

So the expectations were so high, right?

Right.

So high.

And yet I.

Almost got the impression that a lot of people promoting these products knew that they were overhyping them, especially the technical people.

Every, every data person or like e-mail person I talked to would quote that classic line.

The market will stay irrational longer than you can stay solve it.

And yet publicly they'd have to go out and say, oh, our new form of agents or whatever is going to solve all your problems.

And it was like, you have this amazing tech, legitimately amazing technology, couldn't do cool stuff.

And yet there was such a such pressure from investors to claim that it could do even more than it could and to embed it in every product, whether or not it was appropriate.

And that was the hype cycle to say, hey, we have to claim that we're doing this in our product or we can't get investment, our stock value is going to collapse or we can't raise the next round, right.

So we have to go out and make these claims and put it in there somehow.

So there's some truth to the claims, even if it's not appropriate.

Oh, absolutely.

I mean I have a.

Windows laptop upstairs?

In other news, I have a Windows laptop, but it has one of the AI buttons.

Yep, Yep.

Right.

I didn't ask for this.

I don't think anyone asked for it.

I don't use it.

It sucks, you know, No, you know, actually not Apology to Microsoft.

They're the ones that put the button on.

They're not me, but but yeah, I mean, you know, and I'm, I'm the guy who spends good money each month on my AIS because I find, you know, a lot of utility for myself with them.

But I'm not.

I'm.

But to me, they're just another tool.

And I'm the kind of guy I will, I, I rate limit these things every single day.

I get, I get, I get throttled every day using them.

But that's not to say, but I found how they work into my, you know, in my life.

But I'm not superimposing my productivity on everybody else.

That's something I wrote about where, you know, 'cause I wrote about, you know, the air bubble is probably dying down.

This is a great thing, actually.

I mean, I guess they're building shit instead of just hyping it, you know, to the, to the stratosphere or the manosphere or whatever.

But you know, but just because if you're getting personal success and personal anecdotes that say you're successful doesn't mean more broadly that there hasn't been an over investment in this stuff, which I think there has been, right.

Yeah, I think, I think specifically one of the things I think will be viewed as fraudulent in the future is AI coding tools.

And that's not to say they're not useful.

They're extremely useful.

What's fraudulent is the financial structure is I don't know if you saw this follow up news, but like there's the whole windsurf kind of debacle where open AI was going to buy them.

And then basically the founders, a couple of key people bailed and got paid.

Aqua hired basically, except without the acquisition by Google for like a billion dollars, right.

Everyone was like, this is completely insane.

This is unethical, which I agree with, but the reason was what came out like a we can have later, but doesn't didn't get much press was that windsurf was on the verge of bankruptcy because of changes to their cost per token on APIs.

They could not keep running.

And so basically once open AI realized that open AI does not need any more capital burns, they're burning capital fast enough.

Thank you very much.

And Google's like, OK, we'll take the talent.

We're actually good at fin OPS.

We're good at managing costs.

We can do something with these, these people, but no one would buy them if you're on the verge of bankruptcy.

Like I guess Facebook could buy or Meta could buy you or something.

But even they were probably like, we don't have that kind of money to burn, right?

Yeah.

I mean.

It is one of those things, right?

I mean, I, I, I wrote about this on LinkedIn.

If you I think I wrote now is the time to build with AI and then a few paragraphs like because that's you're supposed to is like have to much lower below the fold.

But I just wrote like, yeah, because that, you know, you should be building stuff off of the subsidized tokens right now.

This is this isn't going to last.

This is artificially low.

You know, I'm spending.

I decided to keep the ChatGPT Pro $200 a month plan and I also use the Max Claude $200 a month plan.

Why?

Because I think that I'm going to get as much utility out of those as I can.

The price is going to go up.

Yeah, or the capabilities will go down.

Those are the two things we've seen again and done right.

You're either, yeah, you have to.

And we've seen that again and again with AI coding tools where they raise the prices and they say, oh, by the way, we're now only going to allow you this many tokens and then you have to pay for more or something.

You know, it's no longer just an all you can eat plan.

Yeah, now you're like a child.

Like a Charles Dickens novel trying to get like, porridge or something.

Just like, right?

Right.

It reminds me in the in the early days of the Atoms For Peace program in the 1950s, they used to say it was too cheap to meter, right?

Like eventually we'd have so much nuclear power that they wouldn't even charge you for and you could just use enough electricity as you wanted.

And it's like that doesn't seem to pass the smell to us.

People can consume a lot of resources.

And that's what's been going on during this period that everyone treated it as too cheap to meter, even though we were burning like hundreds of billions of venture capital money to get there.

Oh yeah, look at the.

Numbers.

But yeah, Oh yeah, it.

It's been a lot and it does remind me a lot.

There's definitely parallels to Uber and food delivery back in the day where you get like $5 trips to the airport and free food delivery.

I mean those awesome times.

And that's why I'm, I'm do, I'm trying to maximize as much as I can with the AIS now.

Like, you know, if I back then, if I'd known what I knew now, I'd be taking rides everywhere.

I'd be like 500 lbs say DoorDash like every second.

It'd be awesome.

But the.

Question is, how do you transition right like once what what's your plan for when things get way more expensive or way more local models?

OK, OK.

Yeah, I mean.

Those are getting good too.

I mean, but again, people fail to realize those are also subsidized, right?

Because they just magically.

Appear yeah Google has that new what 127,000,000 model that came out a couple weeks ago at JAMA and I think I think rocks for code but the problem is yeah Google paid for that too you think they just like said oh we'll just give this for the benefit of humanity right right but I.

Mean Google is Google is way smarter about how they train their models They've they've had TP us for years, which basically they developed TP US because they didn't want to pay the NVIDIA tax before the NVIDIA tax was even as high as it is now yeah.

They wanted their own inference systems and they have not stopped working on that In fact, their latest models were trained not on traditional GP US, but on TP US.

And so I don't know, Google feels like the sleeping giant this situation, which is kind of scary frankly, because yeah I mean yeah.

And open AI.

They they're, they're starting to roll out their own trips too.

Yeah, it doesn't surprise me at all.

But like, I would expect Google to be ahead because they've just been doing it for so long.

They've been doing it for.

Ages yeah, I mean they got they got really good teams so but cool, I guess yeah we probably.

Biessed enough for one day?

Well, we'll do it again in.

About a month or so.

Or maybe we'll just make this a more regular show, who knows.

But anyway, it's good to chat with you again.

Have to hang out sometime, so yeah, let's do it.

Maybe that's just happy.

Hour on.

Wednesday, Thursday.

Thursday.

Sorry.

Next Thursday.

OK.

Low key Salt Lake City.

OK, yeah, another news.

Audience that people do drink beers in Salt Lake City or that's.

Whatever.

Yeah, Yeah.

Matt and I namely.

That's great.

The only two.

All right.

Well, have a good weekend, ma'am.

See you later.

Bye.

Freestyle Fridays w/ Matt Housley - The Pedantic Layer, Frauds, and more

Episode Transcript

Never lose your place, on any device