Ian Landsman (00:00)
Welcome to Tocantown. Today is Thursday, May 14th. I am your host, Ian Landsman.
Aaron (00:08)
and am your other host, Aaron Francis, and today we got a lot of Claude drama to talk about. On the first episode of Token Town, we have a lot of Claude to talk
Ian Landsman (00:19)
They really hooked us up ⁓ by doing a lot of drama filled things. So it's going to be a great topic. Perfect, perfect topic. But before we get to that, this show literally would not happen without the amazing sponsors, including a long time friend and sponsor of ours across many properties, Bento of bentonow.com, the email marketing and CRM platform built for the AI era. Send your product marketing and transactional emails with Bento.
Thanks so much to Bento for supporting us in this new endeavor. And yeah, we're streaming. So you'll be able to catch this live, which is amazing. ⁓ And we're gonna have it on all the podcast platforms and YouTube and all the things. Okay, let's do it. Let's jump in right to the topics.
Aaron (01:04)
Let's do it. You
give us a quick overview, I'll give you a quick overview and then we'll dive in. So what's the deal here?
Ian Landsman (01:11)
All right,
so this is, we're gonna go right with the big topic, which is Claude clarified and changed their pricing, right? So that now you can't, it's so complicated, you can't even describe it. They clarified it so that on your subscription plans, that is for interactive usage in their clients.
Aaron (01:18)
You
Yeah, clarified, clarified, yeah.
Ian Landsman (01:36)
And then any other usage that's programmatic, whether you're using Claw-P to call out on the command line or using their agent SDK, which I don't know a ton about, but I assume is for embedding it in other things. Then that's going to be billed by the token, except you get a credit for the equivalent amount of your subscriptions. If you have $200 subscription, you get $200 in programmatic credits. And that...
You will use any programmatic access and then when that runs out, can choose to either obviously have it stopped or to have overages and pay by the token.
Aaron (02:12)
This is a furtherance of the spiritual open-claw restrictions. you'll maybe remember that ⁓ they have basically applied a one-off patch to restrict open-claw ⁓ usage in the past. That's old news. What's new news, which I guess is just news, is ⁓ that they have further clarified or tried to clarify that
Ian Landsman (02:20)
Yes.
Yeah.
Aaron (02:41)
all non-interactive Cloud usage is now billed at the API rates. So that feels like a huge punch in the face, but then they also gave us a little bandaid of, but we're giving you a few extra tokens to fix the punch in the face we just gave you. ⁓ So yes, I think you set it up correctly that Cloud P, which is the ⁓ headless kind of like one-shot execution mode on the CLI.
Ian Landsman (02:55)
Hmm
Aaron (03:09)
CloudP and the Agent SDK are no longer covered under your subscription plan, which is a huge blow to tools like Conductor, Polyscope, T3 Code. Anybody that puts a pretty ⁓ UI or GUI on top of a Cloud session was probably using the Agent SDK, maybe in an outside case using CloudP and those are now hosed.
That's the setup. What do we, what the hell do we think is going on here? Yeah.
Ian Landsman (03:41)
Mm-hmm What the hot takes Well,
I don't know. I think we have different takes on this. This is a great first topic I don't feel like it's I understand the Idea that it's a kick in the face I understand people being mad because anytime you take something away from someone they're gonna be mad and that's like human interaction, right? but the reality is a Month ago. Everybody was yelling and screaming that these things aren't subsidized. They're not subsidized You're just paying for the tokens you're using blah blah
But here they're giving you $200 in value and then they're giving you another $200 in credits. So you're getting $400 of tokens for $200. Like that's the deal currently. And everybody's freaking out that their usage is obviously more than that, which implies their usage is more than $400 in usage on their $200 plan. So I don't know, there's a lot to get into with it, but what's your initial instinct? What was your initial reaction to it?
Aaron (04:19)
Maybe.
My initial reaction is, welcome to capitalism. That's my initial reaction. It's like, I'm the company, I do whatever the hell I want. And in that regard, I respect it fully. are, this is not a public good. This is not free speech. This is not gas, water, power, air, lights, whatever. This is a private company making their own decisions. Now, I think the more interesting question is, is this
Ian Landsman (04:41)
Yeah.
Mm-hmm.
Mm-hmm.
Aaron (05:05)
Is this a wise decision? We don't have enough information to answer that, but my read on it is there are gonna be an R, a lot of people on Twitter that are mad, which as I'm sure you will tell me, Twitter does not a market make. Twitter is the worst. Yeah, that's your bit. That's your bit. ⁓ So whether that's representative of the broader Claude user base, customer base, I can't speak to, but I think
Ian Landsman (05:17)
Mm-hmm.
This is my stick. Yes. Yeah.
Aaron (05:36)
I still think it's somewhat unclear. They've tried to draw the line between interactive and non-interactive. I think if they don't, I don't know. I think they're trying to push people to use cloud desktop. That's my renewed take on it is they let a thousand desktop clients bloom, conductor, polyscope, et cetera, and on subsidized usage, which is their right,
Ian Landsman (05:52)
Mm-hmm.
Aaron (06:04)
And now they're pulling that back and saying, oh, if you want to use their desktop clients, you're going to have to pay through the nose. But if you want to use ours, it's closer to free than not. And I'll take a little bit. I'll push back a little bit on the $400 of usage. I think if you're on the $200 max plan, you're getting like $4,000 of usage. And then that's the problem, right? You've been using, you've been...
Ian Landsman (06:15)
Mm-hmm.
Mmm
Mm-hmm.
⁓ I agree.
Right, but you're only paying 200.
Aaron (06:34)
I agree, I agree, but that's why people feel like this is a red pool because
Ian Landsman (06:34)
Okay, well, that's the thing. Yeah. Right. Right.
Aaron (06:40)
you've been in Conductor burning $4,000 of value while paying 200 and now if you want to burn $4,000 of value, you're gonna have to pay 4,000 but hey, here's $200 just to smooth it over a little bit. So that's kind of my read on
Ian Landsman (06:42)
Mm-hmm. Yes. Yeah.
I think what's really interesting about this is if you look at the big sort of the business picture on it, it's really interesting. What's given them the power in some ways to do this is like there was just a one of these companies that tracks this stuff, just how to graph out showing how like, uh, Anthropic use has now passed open AI in business for the first time. And we know like Anthropics, like ADX their revenue in the past eight months or whatever. And so.
Aaron (07:16)
Mm-hmm.
Ian Landsman (07:23)
You know, that gives them a lot of power to be like, Hey, people are loving this. People are using it. That is not all Twitter devs using $200 month plans, right? This is big companies spending hundreds of thousands of dollars a month in tokens. And so I think that part of it is that in the sense of like, we want to get all these companies that are right now, there's still a lot of companies that are just like, Hey, I gave everybody use your business card by $200 a month plans, individual plans, right? Let's get those people.
Aaron (07:48)
Totally.
Ian Landsman (07:52)
Let's get them into team plans, into enterprise plans. Let's enough of this. Like everybody's just out there, like be an individual buyers because they're not, they're working on a team with 30 other people. They're all buying Claude individually. Let's get them in teams, maybe for big companies doing this. So we're to give you some discounts. It's a now in negotiating thing, right? If you're really using a ton of tokens and, the businesses aren't really going to care. Like the, the businesses are like, yeah, if you're more productive with this, I don't care if this costs me more tokens now. So.
You know more cost in those tokens. I mean they care a little bit They don't care that much if you're actually being more productive with it. So I think there's a whole angle here about getting people off of those individual plans which are bad plans bad customers basically for them and Getting it more business oriented
Aaron (08:29)
Yeah, I think.
Yes.
Yeah, I think two things can be true at the same time. ⁓ One is this is gonna make a lot of people really mad and the other is that this is exactly what Anthropic wanted to do. Like this is accomplishing what I have to imagine is the intended purpose, which is get the whales off so that the enterprises will pay more money. That's what I have to, and even like,
Ian Landsman (08:43)
Mm-hmm.
That's true.
Yeah.
Right.
Aaron (09:06)
Even somebody says enterprises are already using APIs. ⁓ Like you said, I don't know if that's true. I think there are a lot of companies, maybe not enterprises, but there are a lot of SMBs that do the thing where it's like, yeah, use your company card, get the $200 a month plan and do let her rip in cursor or conductor or whatever. And I think Anthropic is trying to reel that part in, which is like, Hey, we're not, it's not the first party surface. So Anthropic's not getting the telemetry or whatever.
Ian Landsman (09:14)
Yeah, I don't think so. Right.
Aaron (09:35)
and they're using $4,000 of costs at $200 of payment. So that's my take on it. We did have another one here where they increased their weekly limits. So what the hell are they doing? So give us the rundown on this one.
Ian Landsman (09:46)
Yes!
Well, this, mean, this was weird because it happened right after the other thing. And it was like, is this, I guess this is a make nice type of thing, but it's basically, you know, the quad code announced that weekly limits are 50 % higher through July 13th. You know, this is this whole game. Both of them seem to be playing a lot of, right? Like I feel like every week, both open AI and Anthropoc, like, Hey, free tokens, more tokens, limited time tokens. You're it's, it's a long weekend, more tokens. Like the
Aaron (10:07)
Mm-hmm.
It's token town,
baby.
Ian Landsman (10:17)
And so get down that product manager guy whose name I can't remember for opening eyes always like, Hey, I reset the tokens. Yeah. Yo, I reset the tokens for everybody. Go have at it. And so they've seemed to have found this carrot, right? As a way of like, when things are a little rocky or when there's an opportunity to pull from the other to, Hey, we're throwing some free tokens out there. But, ⁓ I mean, I don't know. These things are so hard to know.
Aaron (10:21)
St. Tebow.
Ian Landsman (10:44)
I assume that's effective for them because they seem to be doing it all the time, but I don't know. I don't know what my take on is beyond that.
Aaron (10:52)
Yeah, think this was pure damage control, frankly, and it was obviously planned ahead of time. So they put out the tweet that's like, hey, the deal has fundamentally changed. ⁓ Everybody went ham on that one. And then they were like, hey, by the way, weekly limits are 50 % higher. And I think it was just trying to split the media coverage on yesterday's announcement, frankly. I will say the...
Ian Landsman (10:56)
Mm.
Yeah.
Aaron (11:21)
the 50 % weekly limits, that goes through July 13th. So that's two months. The upcoming subscription interactive non-interactive goes in to effect one month from now. ⁓ And then this $200, sorry, I punched you in the face thing, that has to be specifically opted into. So it's like, hey, you know,
Ian Landsman (11:25)
Mm-hmm.
Mm-hmm.
Okay.
That part, yeah, we didn't talk about that part. That's an important piece too.
Aaron (11:49)
We're cutting you off, but we're giving you $200 if you go opt into it. It's not automatic. And that's like, guys, like, come on. What could it possibly cost you to just save that PR disaster and opt everybody into that $200 automatically?
Ian Landsman (12:03)
Yeah,
I have to assume it's because there's like they just think there's so many claws out there running that people aren't even using anymore and things like that and they're like well we can opt them out of it and they'll turn off after they hit their limits I guess I don't know it's very weird they've done this like the second or third time they've done this they did it with some other thing too where they were giving out the free credits or something they're like but you have to opt in and it's like well you you sent me an email about it just like put the credits in my account but that is not how they do it ⁓ I do think this is
Aaron (12:16)
I guess.
Go to your dashboard, yeah.
All right, we're gonna talk about our, well,
we're gonna talk about our weekly stacks, but I wanna get the final synopsis on the Claude moves this week. So I'll give you my final, my final anthropic synopsis here. Well within their rights to do this. Anybody that's saying this is some violation of some open source, whatever, whatever, sorry, this is business and it sucks. Well within their rights to do it. Should they have done it? ⁓
Ian Landsman (12:35)
Mmm.
Alright.
Mm-hmm
Right? Yeah.
Aaron (13:01)
I feel like it was probably, it probably makes a ton of business sense and is incredibly bad PR. So should they have done it? I don't know, but I think they're doing exactly, the intended effect is exactly what they wanted, which is getting a lot of whales to move off of it. Like for example, this is not a tight synopsis. For example, Josh Pigford tweeted that like he's used like 15,000, yeah, $17,000 worth of,
Ian Landsman (13:14)
Alright.
Josh Bigford. 17 billion. 17 billion tokens last week.
Aaron (13:30)
of tokens and he pays $200. That is
Ian Landsman (13:31)
Yeah. All right.
Aaron (13:33)
in conductor. That is what they're trying to remove. So I think it sucks. ⁓ And I think it's probably a wise business decision.
Ian Landsman (13:36)
Yeah.
Yeah, I think it mostly comes down to that. We see this in many areas in AI right now, which is like Anthropic is becoming more corporate. Companies are picking Anthropic more. They're leaning heavy into like you have cowork, all that type of, they're very business oriented and ChatGPT is very consumer oriented. Now they want to be more business oriented and lots of businesses do use them, but Anthropic seems to be winning on the business front. And so I feel like this is then leaning into that. It's like, Hey,
Aaron (13:52)
Mm-hmm.
Ian Landsman (14:12)
random one-off people, that's great. If you want to pay the party, great. But if not, this is for businesses. Businesses, should be on team and enterprise plans. Come talk to us, we'll work it all out with you. And yeah, let's shed some of the, we need the capacity. So people just running their calendars and making notes in their kitchen, just subsidizing that forever doesn't make a lot of sense. And so I think that's what it comes down to. But it'll be interesting to see if OpenAI, cause they're going to have these same
Unless people just don't use open AI. ⁓ But I think presumably open AI will have the same limitations at some point. Like they're not just going to let everybody free ride forever. I don't think so.
Aaron (14:54)
Yeah,
the road goes on forever, but the party does end. think at some point, the reckon will be reckoned. So what is in your AI stack this week? What are you rocking?
Ian Landsman (14:58)
or at some point.
You have to think so.
All right, so this is our first
bit. I ⁓ am rocking Solo Term by Aaron Francis is my orchestrator of choice or my harness of harnesses, if you will. ⁓ My meta harness. you should actually trademark that. Yes, copyright. Yeah, you gotta trademark it. I gotta get you business minded. That's for the other show, but you gotta trademark that. ⁓ Yes, the meta harness is Solo Term.
Aaron (15:18)
Meta-harnessed, I'm gonna own that phrase, meta-harnessed. I am. ⁓ that's a good idea.
Okay, that's for the other show. That's for the other show.
Ian Landsman (15:35)
And I'm still rocking the Claude man. I'm a Claude guy. even Josh Pinkford also said he misses the Claude already after 10 hours being away from it. I feel like the, actual software of Claude code itself, as well as the way the model reacts to things I find superior in Claude. I do acknowledge that Codex is very good and technically I like a lot of the stuff it can do. I don't like using it and solo term.
Aaron (15:41)
He does. He misses the cloud.
Ian Landsman (16:00)
solves that problem for me because I do stuff in Claude. I write specs, I build software, I write code. And then if it's anything of significance, I say, spin up a Codex client and take everything we just did, feed it to Codex, get its feedback and integrate its feedback, blah, blah, blah. And then it does that and it works great. Like it finds bugs, it has different ideas. We get them working together. And so I get the good side of Codex without having to actually use Codex.
And it's great. So I have a small codex plan and a big clog.
Aaron (16:32)
So you
use Claude CLI because you like the vibes of the actual software. And then if you need more of the rigid German engineering, you have Claude call off to Codex via Solo. Is that right? Yes.
Ian Landsman (16:38)
more vibes. Yes.
That's right. Yes. He, he's my German reviewer guy. Right. Exactly.
Like, I mean, I'm fine with Claude. I think does a good job of specs and the code. I'm actually pretty fine with everything with Claude, but I do think there are advantages to just the different models, how they see things, what they choose to check. And yeah. And Codex is at times right now, I feel like can come off a little more thorough. One of my issues with that is sometimes it sounds very thorough. And in fact, yeah. And that in fact, it's just gone crazy.
Aaron (17:11)
Sometimes it's too thorough.
Ian Landsman (17:15)
Uh, so that's the part, one of the things I don't like about Codex, but it doesn't seem to do that too much in these scenarios. If you say, Hey, look at these commits, review the code changes, blah, blah, blah. It seems to stay on task pretty well with that. And at least so far, which I've been doing this for a while now, it holds up well. So that's pretty much my AI stack in terms of what I'm actually, uh, coding. I'm not doing anything really in anything else. I'm in solo. I got code, you know, Claude going, I'll have a Codex if I need it.
Aaron (17:24)
Mm.
Ian Landsman (17:44)
but mostly I use it through Solos MCP of Codex and that's kind of the stack.
Aaron (17:50)
You've been doing a lot of security work, but we're gonna get to that. We got a full segment on security. My stack right now is Codex, baby. Codex 5.5, extra high. Somebody asked me ⁓ why use extra high, and I said because it's there. If they had an extra extra high, I would use extra extra high. ⁓ So ⁓ I also got a question about why do I use fast mode? I don't always use fast mode.
Ian Landsman (17:52)
Mmm.
Yeah, Codex Man.
Mm.
That's how I use it as well.
Aaron (18:18)
⁓ and fast mode is in fact the same model just faster, which is different than Claude Codes like Spark model. So I use GPT-5.5 extra high ⁓ and I also use a product called Solo at SoloTerm.com, it's very good. ⁓ And I will often put GPT in the orchestrator mode and say, hey, I'm gonna talk to you. I only wanna talk to you.
Ian Landsman (18:18)
Mm.
Aaron (18:46)
and then I want you to spawn other codexes to do the actual work. But my ingress point is I talk to one agent and then it manages the sequential or parallel right lanes depending on conflicts. The other thing that is invaluable is the Porsche of coding harnesses AMP. So if you go to ampcode.com, talking about Spindy, AMP is and has always been
Ian Landsman (18:46)
Mm-hmm.
Alright.
yeah, AMP.
Mmm.
Aaron (19:15)
pay per token, which right now is looking pretty smart, because they didn't have to change a freaking thing. ⁓ AMP is always and forever my, ooh, this feels risky, let's ask AMP about it. Or, or, this feels tricky, let's ask AMP's Oracle. Or, finally, there's some good open source prior art, let's ask AMP's Librarian, because AMP, ⁓
Ian Landsman (19:17)
All right. All right.
Mmm.
You
Mmm.
Aaron (19:44)
Because of the way that AMP is structured, can do stuff outside of just their own models. So they pick the models. I never know what models they're using. But if you ask it to use the librarian, it uses this incredible GitHub search tool that is way beyond just know, curl and then clone it down and look through it. So I often will do that when I'm like, well, F, I got this really tricky WebGL Atlas cache issue. Hey, AMP.
Ian Landsman (19:53)
Mm-hmm.
Aaron (20:13)
Go find an open source repo that has done this. Figure it out and let me know. So Codex 55 with AMP as my super smart, super expensive professor, librarian, reviewer.
Ian Landsman (20:27)
I should say by the other piece that the main thing I'm using that's related to that kind of is the superpowers library, which I love and is sort of my version of that. that I, almost every feature goes through superpowers step-by-step process of like questioning me, building a spec, building a plan. Usually that's where I involve codecs. Also at that point, I'll be like, review this spec or review the implementation plan, depending on what it is. Do it, Claude does it. And then, you know, I'll do a final check with codecs, but,
Aaron (20:34)
Mm.
Ian Landsman (20:57)
Superpowers, amazing. Love superpowers. I've got to get superpowers. Right now it writes to markdown files. This is a thing I got to figure out with Solo Term to use Solo Term scratch pads to, I would like to have it right to that. Cause I don't tend to say, review these specs in the future. ⁓ anyway, that's the other big piece of my stuff, but yeah, I did play with AMP for a while. ⁓ I should bring it back in the fold. Cause I'd like this idea of using it for those special purposes. ⁓
Aaron (21:22)
It's so good.
It's unbelievable. Review and research, that's where AMP shines.
Ian Landsman (21:30)
Yeah, the whole, I think you only told me about that not that long ago. I don't think I realized that they had this librarian feature that could dig around and get hub, but that seems incredibly useful. ⁓ especially for your stuff. ⁓ even more than that, now that you're in desktop app land, like web app stuff, I feel like it's kind of straightforward. There's tons of stuff out there about it, but desktop app stuff, especially performance, desktop app needs.
Aaron (21:41)
⁓ dude.
It'll come
back and be like, here's what Xterm.js is doing, here's what VS Code is doing, here's what Wes Term, here's what Warp, here's what Alacrity are all doing, here are the best parts of them, here's what you're not doing. And it's like, okay, let's freakin' do that. It is spendy. All right, do we have producer Dave, you wanna throw up a question or two? Do we have anything you wanna throw on the screen here?
Ian Landsman (21:55)
Right.
That's pretty awesome. Yeah.
All right.
Let's see.
Dave (22:15)
I was going to jump on and just read them if you want.
Ian Landsman (22:17)
Yeah,
⁓ yeah, let's get some FaceTime. Let's do it.
Aaron (22:18)
Let's hear it, Producer Dave hit us with a question.
Dave (22:21)
All right, so
Mark asks what is, you know, I'll put on the screen at the same time. ⁓ What is the end game for the anthropic situation? Can't people just use custom terminals or TMUX? Is it just an arms race forever?
Ian Landsman (22:26)
Mm.
Aaron (22:37)
⁓ I do think it's a whack-a-mole. I think if you're going to give people Claude, Claude that can run in an environment that you don't control, i.e. my desktop. I think, yeah, I think it's, I think it's going to be whack-a-mole. ⁓ I don't know what their plan is. Somebody suggested, I think it was actually Mark on Twitter suggested they're going to start introducing, ⁓ telemetry and heuristics to see if the code or the prompt was actually typed or was programmatically inserted, which sounds like
Ian Landsman (22:46)
Yeah. ⁓
Mmm.
Aaron (23:06)
a nightmare and a bridge too far. So I don't know, but they are, ⁓ they're going to be playing Whack-a-Mole for a long time.
Dave (23:08)
Yeah.
Ian Landsman (23:14)
I do think there's always this level too. It's like in software, in my own product, Help Spot, like we have licenses and you can put a certain amount of effort into preventing people from hacking the license. But at some point, like some people just do it and that's fine. So they're to have the same thing. Like if it ends up being a low number of people who jump through whatever hoops are necessary to actually like make interactive Claude work for Claude or whatever. And if it's a pain in the butt to do it, like I think they'll just leave it and it'll be fine. But if it becomes really easy to do and everybody does it.
and it's costing them a ton of money, yeah, they'll probably then there's the next layer of how do we prove a human's at the terminal typing these strokes and all that stuff. yeah, this is what's bleeding edge here, you know, we're on the bleeding edge.
Aaron (23:56)
Alright, we got another.
Dave (23:56)
All right,
yeah, Wafalapagos had to leave, he said, but he just decided to choose violence. Ian, if Codex fixes the sloth, why not just use Codex?
Ian Landsman (24:04)
Mmm.
Yeah, well, I don't think it fixes the slop. That's a simple take on a complicated process. But ⁓ I have many times tried to just use codecs and every single time it ends up, it's like, look, wanted it, like the most recent time I gave it a bug. It went off, it built a literal cathedral. It was thousands of lines of code, all this stuff. And I was like, this can't be right. This is like, so I gave it to Claude and Claude literally fixed it in a single line. So it's like, I like,
Aaron (24:07)
Yeah.
it's beautiful.
Ian Landsman (24:37)
Claude's mindset is not to over architect and I generally prefer that. But when it comes to like looking for bugs or like being thorough and little edgy stuff, I like the spot check of Codex. So that's where to me they fit together. And just the actual interactive nature of Claude is far superior than Codex. ⁓ So when I'm in there with my peer coder, you know, I like Claude as my wingman.
Aaron (24:55)
I
All right, we got one more, or do we have one more?
Dave (25:06)
We had a couple more, but I thought this was an interesting one. As model usage pricing increases, or either of you all using, whether it's a specific model or a router that directs tasks to the correct sort of model slash effort in order to decrease the cost, sort of to optimize costs, I'm guessing. The right model for the right question. Right, right.
Ian Landsman (25:24)
Right. Like using sonnet instead of opus for something simple. Yeah.
Aaron (25:28)
So I am not, ⁓ but I heard today a good report of somebody doing that ⁓ inside of Solo, SoloTerm.com, it's very good. And what they were doing is they had set up, like they had set up ⁓ like maybe three copies of Claude, not like three subscriptions, but like Claude with specific flags so that they could launch ⁓ Claude Opus 4-7, you know, extra high thinking, or they could launch
Ian Landsman (25:50)
Hmm.
Aaron (25:55)
Claude Sonnet medium thinking, and they would put their orchestrator in like medium thinking mode and then tell their orchestrator to launch, like depending on the task, you could launch extra high, high, medium, whatever. And so I have heard of some people even within the same lab saying like, Hey, if you're just going to be like schlepping tasks around, let's just use medium thinking. But if you're going to be doing something hard, I'm going to set up a different configuration for that. But I personally just use extra high all day.
Ian Landsman (26:26)
So we recently moved to Claude teams, which so there's like more restrictions. You don't get the 20x plan. So this was this was weighing on me a little bit. So I've been thinking about this more partially It's like yes right now. I'm not I haven't made any adjustments yet. I'm like, whatever if we go into overages It's not that big a deal. We're actually paying less right now than we were with everybody on $200 plans So if we go over once a while, it's not a big deal, but
the, I do think like superpowers, one of its big things that's focused on is actually it built and writes these very specific plans so that specifically it launches stupider models to do them. I always override that and say, no use Opus, but I am considering not doing that now and being like, Hey, maybe sauna is just good enough. This is what the labs want you to do. Right. And there's probably a good reason to do it. And if we're happy with what Opus and 5.5 does right now for the most part,
Dave (27:04)
you
Ian Landsman (27:20)
then in three months, six months when the next ones are out, right? If it's the dumb one is basically opus level or 5.5 or 5.4 level, then you're probably not going to be losing that much. I do think there is a world soon where like the really smart one plans things and the lesser ones execute them. And it will probably be a smart way to do it, to save the tokens. But I, right. Yeah, I don't do it, but it's coming. think two nodes. ⁓
Aaron (27:45)
That's two nos. That's two nos from us up front. Sorry about that, fam.
Ian Landsman (27:49)
Uhhh...
Dave (27:50)
So
before I go, just so you know, we have over 70 people watching on YouTube and we have five, I'm sorry, 630, 640 people on Twitter. And we are almost at time. So I think you all have to pick one topic if we're gonna keep this. We're almost at the 30.
Ian Landsman (27:58)
Yeah, the people.
Thanks for coming out.
Did we play by the rules? Nah, we gotta finish our topics.
Aaron (28:11)
Nah, let's do it a little long. Let's do it a little long.
All right, we're gonna keep rolling. Y'all watch the live show because a bunch of people are doing it. So if you're listening on RSS, join us live. But up next, we've got Mythos and 5.5 Cyber represent a big leap in hacking ability. Ian, you have entered cybersecurity psychosis. Tell us what the hell is going on out there.
Ian Landsman (28:16)
Thanks, Dave.
Yes.
Man, it's scary out there. I think if you're not spending a lot of time thinking about your security posture, you are making a tremendous mistake and pretty soon you're going to pay very badly for that. So you got to be locked in on the security stuff. We've been doing all security stuff for the past month or two. Yeah. And we have these, I mean, the current models are already great. Like if you really send a model thoroughly through your code base, it's going to come up with like five, six huge issues. And it's going to come up with like
200 other secondary, but not inconsequential type issues, most likely of like things that if thing A goes bad, this thing B will then be bad for you. So yeah, and we just had the report out how, know, the, we ever see mythos, but the people have access to mythos anyway, on the latest previews, like it's, you know, beat some evaluations for the first time that in the past only humans could do this one specifically.
There's a 32 step like corporate network attack that they estimated takes 20 hours for a human to complete all the steps. And Mythos was able to do it in six out of 10 attempts at doing it. Um, and then also the chat GPT 5.5 cyber version also has, uh, you know, advanced the capability well above what 5.5 on its own does not quite apparently to mythos level, according to this data, but still very good, still far exceeding.
everybody listening to this, your capability and even Opus 47 and 5.5, you should have them in there. Yeah, I'm a little irked that they have these tools and they're not letting us poor little guys like only Cisco systems get to use it and not poor userscape and try hard, but hopefully soon we will have access to help defend ourselves.
Aaron (30:22)
So this is a report from ⁓ a Twitter account that I am not vouching for, but it sounds very official. It says the AI Security Institute and it's like, it's an Institute, that's amazing. And it does have mythos preview as the top cyber hacker followed somewhat pretty closely behind GPT cyber, which is GPT 5.5 cyber, which is different than GPT 5.5. Ian, do you have access to GPT 5.5 cyber? Is that where you had to like...
Ian Landsman (30:27)
It does. Yeah.
Mm-hmm.
Aaron (30:50)
scan your eyeball and pledge your firstborn to Sam Altman.
Ian Landsman (30:53)
Well, Sam Altman took all my data, but he did not actually give me access to 5.5 cyber. So I do not have access to it yet, yes. They gave me access to a security tool they have, which is different than these other security tools they're all working on, but yes, not yet.
Aaron (31:00)
we got bamboozled again. ⁓
So what's the recommendation here? Because Next.js, may it rest in peace, has had a new ⁓ RCE, a new vulnerability like maybe once or twice a week. Tanstack got pwned by no fault of their own from a GitHub cache poisoning thing. We've seen targeted attacks on maintainers to get worms inside that. what are we supposed to do? Are we just supposed to turn our tokens to like internal hacking? What's the plan here?
Ian Landsman (31:15)
Mm.
Mm-hmm.
Man.
Aaron (31:39)
Give me 30 seconds on what the hell we're supposed to do.
Ian Landsman (31:39)
I mean, some
of this stuff is like way beyond your control, right? Like, I mean, you're gonna never pull in something from NPM again. Like it's probably not possible for you to do that. Obviously companies are trying to like build in between layers and also had delays and all that stuff, but there's work grounds for all that. I think we're just in for a rough year or two. Like I think that's just reality, but yes, you should be turning the AI on yourself and having it hack you. Cause to me, especially if more of it in smaller,
Situations where a smaller company the biggest risk is more from AI bots just crawling around that people are gonna send out there and trying to find obvious exploits basically the same things the humans used to do but now you can do it 10,000 times faster and more effectively. So you got to make sure like the really sort of low-hanging fruit stuff is taken care of The rest of it. I don't have any good answers on
Like I don't have any good answers for when next JS has a hack and like put your whole app is next JS. Like, I don't know. Like are you never supposed to update? You need to update to get the security patches. So what are you going to do? You kind of have to keep an eye on it.
Aaron (32:33)
Yeah.
So my narrow recommendation
would be like ⁓ this chatter just said, there are ways to pin versions in your dependency files and there are ways to enforce a minimum age. you can't install something or rather you disallow your package manager from installing something that was published in the past seven days or whatever. Because like the Tanstack thing, security researchers found it in like 12 hours or whatever.
Ian Landsman (32:50)
Mm-hmm.
Alright.
Aaron (33:10)
Keep your dependencies as ⁓ pinned specifically as possible, but if not, ⁓ enforce a minimum age of five, seven days or something. We've heard a lot about slop forks, but this one is a slop rewrite. So slop forks is a term of art, ⁓ is a loving term of art. It's not meant to be, it's a denigration.
Ian Landsman (33:28)
What?
A lot of new terms.
Aaron (33:38)
But Cloudflare has slop forked several ⁓ WordPress and Next.js. And here, up next, we've got a slop rewrite. Bun uses Claude to convert to Rust from Zig in a one million line pull request. What are we thinking about this, Mr. Landsman?
Ian Landsman (33:58)
That's a
big old pull request, a million additions, only 4,000 subtractions. So I'm not sure what's going on there. I assume they just left the other code mostly in place in the repo. Yeah. Yeah. So a million lines of code. I got the impression, I think he said, I couldn't find the tweet quickly, but I think he only worked on it for like a couple of weeks or something like that, where it was like him and Claude. Jared Sumner works at Claude now. ⁓
Aaron (34:07)
I think it's a Canary release, so I think they rolled them both in.
Mm-hmm. Yep.
Ian Landsman (34:26)
Sorry, he obviously has access to a lot of resources and as many tokens as he needs presumably, but he got in there and rebuilt, converted the entire thing from Zig to Rust. It sounds like mostly for the compiler tooling for memory leaks and things, and that the speed is maybe a little bit faster, but it's not primarily about speed. ⁓ But this is fascinating. As somebody who runs an old app, the idea of like, hey, let's just convert this whole bad boy to something new in a week.
Interesting.
Aaron (34:56)
You've been running,
you ever been running a help spot for 20 years? You could rewrite it all in a week.
Ian Landsman (34:59)
Yeah, maybe
I don't think I have the test coverage. He has some amazing test coverage it sounds like where it's like perfect test coverage
Aaron (35:04)
That, yeah,
I think that is my, that's my big takeaway on this is he said all the tests still pass. And that immediately opens the question of, what were the tests written in? ⁓ Which leads me to believe, and it has been confirmed by Jared later, that the tests are basically like testing from the outside in. And so he's able to rewrite all of the core and then run the tests to make sure that it still passes. ⁓
Ian Landsman (35:10)
Yeah.
Right?
Mmm.
Aaron (35:34)
I think, yes, his stated goal, it got a little bit smaller, I think like three to eight to 12 megabytes smaller in terms of distribution size, but his stated goal is to paper over or rather ⁓ eliminate some of the frustrations he was having with Zig. Things that are beyond my level of programming expertise, but stuff like memory safety, borrow checkers, all that kind of stuff. So he switched it all over to Rust. I mean, I think the value of
Ian Landsman (35:42)
Mm-hmm.
Aaron (36:03)
black box testing and end to end testing is going through the roof right now. Because as long as, and this is a big asterisk, as long as your app still works, it doesn't matter how often you rewrite it. I think you gotta be pretty careful to prove that your app still works. Like that's not trivial. But if in this case, Jared is saying my app, which is Bunn, my app still works.
Ian Landsman (36:19)
Yep. Yeah, that's the key part.
Aaron (36:30)
I just happened to rewrite it all underneath, but every public surface of Bunn still behaves as it did before. All right, well, what are we complaining about? I guess, I don't know. So I think for us, for the little people that aren't gonna do a million line rewrite, I think end-to-end testing, integration testing, outside-in testing, black box testing, whatever you wanna call it, test the surface area.
Ian Landsman (36:44)
It's.
Aaron (36:58)
and make sure that your thing still works and then you can kind of fiddle under the surface there.
Ian Landsman (37:04)
Yeah, I mean, obviously like the AI helping you write so many tests nowadays is great. like outro has probably more tests and outro than all the tests in every app ever I've had before combined times two. But, ⁓ and obviously we're all still working on like, these tests always good tests and all those kinds of things. One being open source had a lot of probably just human well-written tests from two, three years ago or whatever. But, ⁓
Yeah, just really cool to see what's possible. Like in the old days, people would do things where they'd write a software in like an abstracted language and like compile it out to other languages and things like that. And now you could just be like, hey, maybe we could just take a week and convert the whole thing and just, just, just works? Question mark. ⁓
Aaron (37:49)
It just works question mark. That's
the motto of the show. It just works question mark. You have been listening to Tocantown. You can find us at tocontown.com. We live stream every Thursday at 2 p.m. Eastern, 1 p.m. Central on YouTube and Twitter, but you can subscribe at tocontown.com. You can also subscribe on the Tocantown YouTube. So if you wanna hear more, ⁓ mostly.
Ian Landsman (37:54)
There you go.
Aaron (38:18)
tight 30 minute episodes, no promises. ⁓ Please do follow along. Hopefully you enjoyed this first one. Ian, anything else to say?
Ian Landsman (38:20)
Close enough.
No, thanks to Bento for sponsoring. And if you're interested in sponsoring, us know. And we will definitely see you next week.
Aaron (38:35)
See ya.