ChatGPT does Advent of Code 2023

Closi · on Jan 15, 2024

> I think it's as close to the perfect benchmark as you can get.

Depends what you are benchmarking for... If you are benchmarking the ability of the solution to solve LEETCODE challenges, that is different to the ability of GPT4 to assist everyday programmers knock out business logic or diagnose bugs.

My experience of GPT4 is that it's significantly better at the latter than GPT3.5.

Additionally, the real test is for me is "Can an average programmer using GPT4 as a tool solve Advent of Code faster than an equally-skilled programmer without an LLM?".

nirvael · on Jan 15, 2024

>diagnose bugs

Literally says in the article that GPT's main drawback is that it can't debug in the same way a human can.

Closi · on Jan 15, 2024

Fixing bugs in code it writes itself is very different to diagnosing (i.e. identifying, not fixing) bugs in my own code.

When stuck I often paste code into ChatGPT and ask why it doesn't work, and it will often help me quickly identify the error and propose a fix.

tnel77 · on Jan 15, 2024

I have done the same thing. “I am seeing the following behavior, but I expected <X>. What am I missing?” That line has often quickly solved some bugs that would have otherwise taken some time to debug. Very handy!

DaiPlusPlus · on Jan 15, 2024

Can you share some examples of this? I haven’t had much luck with ChatGPT correctly identifying issues because (in my case, at least) they stem from other parts of a large codebase, and (last time I checked) I couldn’t paste more than a few kilobytes of code into ChatGPT.

One example are bugs caused by precondition violations, which ChatGPT can’t diagnose without also being given the code to all of the incoming call-sites, which means you end-up solving the problem yourself before you’ve even explained the issue to ChatGPT - so (to me, at least) my use of ChatGPT is more akin to rubber-duck-debugging[1] than anything else.

[1]: https://en.wikipedia.org/wiki/Rubber_duck_debugging

blibble · on Jan 15, 2024

> Fixing bugs in code it writes itself is very different to diagnosing (i.e. identifying, not fixing) bugs in my own code.

yeah, we run the random number generator again and hopefully this time less buggy code pops out

airstrike · on Jan 15, 2024

Consider for a moment the possibility that TFA is wrong

okasaki · on Jan 15, 2024

It can if you use the python interpreter.

qsort · on Jan 15, 2024

> Can an average programmer using GPT4 as a tool solve Advent of Code faster than an equally-skilled programmer without an LLM?

I mean, yes, obviously? Any tool, even if it had near-zero marginal utility must necessarily improve performance, as you can simply elect not to use it.

The disagreement is not in the direction, it's in the magnitude. Therefore the real test is: "Can an average programmer solve AoC faster with GPT4 and without syntax coloring or with syntax coloring and without GPT4?", or "GPT4 in C versus no-GPT4 in Python", or "GPT4 on a crappy laptop vs no-GPT4 on a high-end workstation" and so on.

chefandy · on Jan 15, 2024

> Any tool, even if it had near-zero marginal utility must necessarily improve performance, as you can simply elect not to use it.

That might be true for any tool with consistent, predictable output that you thoroughly understand. Someone could easily lose a few hours dickering around with chatgpt only to realize they’re better off just starting it from scratch.

ska · on Jan 15, 2024

> Any tool, even if it had near-zero marginal utility must necessarily improve performance, as you can simply elect not to use it.

No opinion on the specific tradeoff here, but in general it's not obvious to me that your statement is true. Using tools involves opportunity cost, so sometimes "not electing to us it" can on average be a net win, no? There is a cognitive and time load to using it at all, right?

Closi · on Jan 15, 2024

Why are your tests changing multiple variables?

A simpler test would be keeping the development environment the same and then adding GPT4 to see if there is a statistically significant and meaningful speed increase (of a decent magnitude).

I'm not looking for a 1% speed improvement, I'm looking for a >50% speed improvement. Maybe I should have stated 'significant' speed improvement in the initial post.

qsort · on Jan 15, 2024

It's not meant to be a scientifically rigorous test, but the idea is this.

It seems like you're accepting that (at least in its current state, barring unknowable future discoveries) this is a technology that complements developers, making them more productive.

If this is the case, then the question is how relevant this improvement is in quantitative terms. The cumulative improvements in developer productivity since the days of punching cards have easily been several hundred percent.

A thought experiment to figure out how relatively impactful this is would be to compare it to other technologies and see which gives the greatest boost.

My prior belief: somewhere around Intellisense level of useful, but not significantly more than that.

Closi · on Jan 15, 2024

Yes, I think the current state of technology (just starting) is a technology that complements developers.

Although my personal belief is this is more likely going to be an improvement similar to Assembly -> C (even if the LLM component doesn't improve, but assuming that the tooling does improve).

Personally I think there is going to be a new 'higher, higher level' of programming paradigms that are about to be invented that are supported by the ability of LLM's to write code - still augmenting humans, but making them many times more productive.

bcrosby95 · on Jan 15, 2024

I think the opposite - LLMs will cause programming language development to become stagnant because the language that compliments an LLM the best is the language that has the most code out there in the wild.

So the "LLM language" already exists - its Javascript, Java, C#, etc.

Closi · on Jan 15, 2024

I'm not talking about a language that an LLM outputs.

I'm talking about a higher-level language that outputs Javascript, Java, C# etc (but presumably some sort of lower level byte-code in the future).

A language where intent and clarity is more important than syntax or implementation details.

Where

menu_options = [process['name'] for process in processes if process['active']]

is more clearly expressed as

menu_options = list of the active process names

(or something similar)

jiggawatts · on Jan 15, 2024

This reminds me of a study where they compared the time taken to solve a problem comparable to a two-star AoC problem in difficulty across different dev environments, such as languages and IDEs.

There are a lot of anecdotes about how "it's the programmer, not the language" or "any language can be productive", etc...

The actual results were that many popular languages are more than an order of magnitude slower to achieve time-to-correct-solution than others. From memory, the fastest was F# with about 20 minutes, then Python and C# with about 40 minutes for both, and then C/C++/Java were hours to even days!

zdimension · on Jan 17, 2024

Do you have a link to that study? I'd be very interested in the full results!

IanCal · on Jan 15, 2024

> ChatGPT never did this: its debugging skills are completely non-existent. If it encounters an error it will simply rewrite entire functions, or more often the entire program, from scratch.

Well, it's going to need to rewrite functions to add debug due to not having edit capabilities, but I tried this and it absolutely added debug info which it then used to debug issues:

In an inner part it added:

        debug_info.append({
            'Hand': hand,
            'Bid': bid,
            'Type': hand_info[0],
            'Sorted Hand': hand_info[1],
            'Rank': rank,
            'Score': score
        })

I don't know quite what's happening but I feel like people constantly say it can't do something and the very first thing I try (just asking it to do the thing) usually works.

I gave it the hands in the problem statement and the expected result, and the explanation as to why (copy pasted). It ran the code, looked at the debug output, identified the problem and rewrote the function. I'm not saying it immediately solved the problem, but it easily added debug information, ran the code, looked at the output and interpreted it.

spacecadet · on Jan 15, 2024

Same. I have pre-defined "comments", "error handling", "unit tests" and more, and I get it all. Plus if I target an area and ask for revision, i get just that... Tech perps both miss understand "AI" and think they have mastered it by default.

causal · on Jan 15, 2024

"think they have mastered it by default" - yeah I see this a lot. I think its ease of use lures people into a false sense of mastery.

fragmede · on Jan 16, 2024

taking HN comments that claim ChatGPT can't do a thing, pasting them into ChatGPT, and then linking the reply where it does the thing, usually doesn't get me any points, in case someone wants to build a bot that does that for points.

lloydatkinson · on Jan 15, 2024

Of the few I attempted this year I actually used ChatGPT to help me decipher just what the hell the waffling, rambling, windy, unclear, and ultimately superfluous challenge requirements were.

I didn’t have the solutions generated by ChatGPT to be clear, I used a prompt along the lines of “take this text and extract it’s requirements and generate bullet points, also make the example inputs and outputs clear”.

I did the same for the previous year too, when ChatGPT hadn’t been out for long.

I find that a lot more enjoyable and less tedious.

genman · on Jan 15, 2024

While not all people appear to appreciate your comment then I can relate to it. Not everyone enjoys this kind of story and world building and would like to get to the point already.

matsemann · on Jan 15, 2024

People (including me) can sometime solve the tasks quicker than others can read them. Just read the sentence above the example, then skim the explanation below and go for it. It's a gamble sometimes, but it's entirely feasible to skip all the story (it often do contain hints, though).

genman · on Jan 15, 2024

Looking at the scoreboard, it must have been like this as some people have submitted solutions way faster I could have humanly read all the story.

mtlmtlmtlmtl · on Jan 15, 2024

This is how I read almost anything except fiction.

Start at the end and work my way backwards until the conclusion makes sense.

tisc · on Jan 15, 2024

What strikes me about ChatGPT is the blatantly wrong answers it can give. I asked ChatGPT to solve a augmented matrix using gaussian elimination, and it failed in this straightforward task spectacularly.

danielbln · on Jan 15, 2024

Confident sounding hallucinations have been a thing basically since it launched, it's not really a hot take (not that it has to be).

If you subscribe to ChatGPT+ you can just ask GPT-4 to verify the approach by doing a web search, that often works quite well.

drivers99 · on Jan 15, 2024

Perfect example of "confident sounding hallucinations", I was just googling (a moment ago) why olive oil caused a burning sensation. It turns out there's a substance called oleocanthol[1] and there are receptors for that mainly in the throat. But while googling it, I see on Quora (which made it to the previews on Google) an "Assistant - Bot" response that is completely wrong: "Drinking olive oil can cause a burning sensation in the back of your throat due to its high fat content[...]"[2]

It wouldn't be notable if someone specifically asked ChatGPT, knowing its limitation, but using it to automatically populate Quora and Google with it is pretty bad. People are using LLM to fill the web with BS.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3073417/

[2] https://www.quora.com/Why-does-drinking-olive-oil-burn-the-b...

jiggawatts · on Jan 15, 2024

GPT4 answers this question correctly.

I have no idea why in 2024 people are still lumping all LLMs together, as if they're all identical clones of the same thing.

It's like saying vehicular travel will never work because you didn't like the handling of the cheapest car you can buy.

jiggawatts · on Jan 15, 2024

ChatGPT or GPT4?

One is dumb as a brick, then other isn't. If you don't specify, then your comment should be dismissed out of hand.

Also, it's a well-known limitation of all current LLMs that they're terrible at basic algebra. Instead of trying to replace BLAS or Maple with it, ask it to write the Python code or produce the Mathematica expression.

adverbly · on Jan 15, 2024

https://m.youtube.com/@mzikmund

This creator has some excellent videos of chat GPT attempting advent of code.

He uses it in a more generous format where he is often giving it multiple attempts and trying to coax the correct answer out of it. He definitely has more success with it than the article, but it is hard to tell how much of that success is due generous assistance and prompting, and so it is hard to know how much the model has actually improved year over year.

fragmede · on Jan 16, 2024

I won't spoil which one it eventually gets choked up on, but it does. But that's besides the point. If I had to sit down and do the Advent of Code challenges with and without ChatGPT's assistance, I know which one I'd choose. Programming's been forever changed, for better or worse, and pretending like it hasn't does no one any favors. I'm not doing LLM unassisted programming anymore, because it's a waste of my, and everyone else's time, and the industry needs to move accordingly.

amne · on Jan 15, 2024

I have an observation not related to chatgpt but to the debugging skills mentioned in the article: indeed I've always felt that most teaching is done on perfect working code. I've never seen an exercise for developing debugging skills. For example: "This Djikstra implementation is finding the wrong path." and work with students to pinpoint the off-by-one error. I think it would reveal so much more about the implementation details than just explaining how it works. It could be its own topic to explore race conditions, edge case studies and so on.

kypro · on Jan 15, 2024

Debugging is by far the most difficult thing when working with AI algorithms because the output is rarely deterministic and that makes it hard to tell the difference between actual bugs and a bad selection of hyperparameters. You're also typically working huge arrays of numbers so unless you have some way to visualise what's happening it's also difficult to pin point where the algorithm is doing something it shouldn't be.

My main criticism of learning about algorithms using a framework is that they handhold so much with optimisation techniques and accessible APIs that you can know very little about what's actually happening and build something that works.

When writing AI algorithms just knowing how to implement in theory is rarely ever enough. The difficulty is always in the debugging and optimisation techniques.

JacobiX · on Jan 15, 2024

I believe this presents a good challenge, as the article mentioned: the problems are unique :) None of them are found in ChatGPT's training dataset.

alkonaut · on Jan 15, 2024

Are these problems unique enough that they couldn't have circulated before the AoC where they appear now?

I notice that for "novel" code that CoPilot hasn't seen before, it's mostly useless, but when writing a hobby project which is a path tracer (of which there are probably 1000 implementations on GitHub) it's excellent. Which isn't surprising. It has seen the exact same function I'm writing, written 100 times in every language imaginable. There are books on the topic etc.

monitron · on Jan 15, 2024

I’m curious…if you’re writing a path tracer for fun, does using Copilot not take the fun out of it? If it’s already been done a thousand times, it feels like the point has got to be to write your version, otherwise why not just copy-paste or pull in a library? If you use Copilot are you not giving up the joy of building it with your own hands? What’s left at that point?

alkonaut · on Jan 15, 2024

Good question. There is quite a lot of boilerplate in graphics, which is basically copy paste/book textbook lookup and ChatGPT is fantastic for this sort of thing. I wouldn't be able to deduce this, it would be a copy paste from github/stackoverflow/physics book/whatever. ChatGPT though is like a magic copy paste that takes the formula from github or the pbrt book and pastes it with my variable names. So you don't do paste and then update the names.

// convert color from linear to sRGB

let srgb = [some formula]

// calculate wavelength dependent index of refraction

let nw = [something]

But yes, obviously somewhere between this and just saying "Create all the code I need" All the fun would disappear too. For the most part, that wouldn't work either. It would shit out 200 lines of code that doesn't work, and now it instead of taking the boring job of googling a formula from an optics reference replacing the variable names, it's now taking the role of shitty developer and made me the code reviewer. But see that's my day job and it's not what anyone wants.

monitron · on Jan 17, 2024

Thanks so much for the thoughtful and insightful response!

savanaly · on Jan 15, 2024

There's still decisions to be made, just different ones. Does using a nailgun take the fun out of building your own dog house?

monitron · on Jan 17, 2024

I'm not sure a nail gun is the best analogy; it's a very simple tool that does one thing exactly as "instructed," and can't do any of the "thinking" for you.

savanaly · on Jan 17, 2024

Fair enough. Let's go with the analogy of a compiler, allowing you to write in python instead of assembly. Certainly this is doing a great deal of "thinking" for you. But hasn't it unlocked the more fun aspect of what you were doing? Not that this is true for everyone; some folks do still like to write assembly by hand, for fun or profit. But a compiler was still a remarkable advancement and worth celebrating.

Levitz · on Jan 15, 2024

Kind of. The precise, exact problem might be new, but it's similar enough to things that are already out there for many people to have helper functions at the ready and solve them with absurd speed.

robryan · on Jan 15, 2024

Even if the ideas aren't unique they are worded in a unique enough way that the training set wouldn't have anything like how they are presented.

km3r · on Jan 15, 2024

Seeing the other comments, it still leaves me the question if this is a limit of scale or a limit of technology? How much more can LLMs scale up over the next decade?

ChatGTP · on Jan 15, 2024

Probably a lot considering people are pouring practically unlimited resources into their development and scaling?

WJW · on Jan 15, 2024

This in itself proves nothing. Consider how much resources have been poured in making rockets better over the last few decades and yet we still can barely make it to Mars.

It is also quite possible that there are fundamental limits to scaling LLMs that we have not discovered yet. You won't find a way to let airplanes fly to the moon no matter how many resources you pour into it. To make it there, you need better methods (ie rockets in this case) that look nothing like the first few steps to get off the ground.

robryan · on Jan 15, 2024

LLMs fairly quickly got to consuming a significant portion of all available data. I can't see them improving significantly just by throwing more data at the problem.

scrapheap · on Jan 15, 2024

What would be really interesting would be to repeated attempt to get get ChatGPT to solve the problems.

By that I mean try to solve the Day 1 problem at the point of release, then try to solve it in a fresh ChatGPT session the next day, and then get it to solve it again the day after that, and so for the next couple of months.

Do that for each day that it runs for and then see what patterns emerge. Part of me expects it to get better at solving each problem as the rest of us write our solutions and then make them available on the web in some form for it harvest - but it would be interesting to see if that is the case.

kypro · on Jan 15, 2024

ChatGPT doesn't learn in real-time so in theory there should be no difference until a new model is released which has been trained on new data.

scrapheap · on Jan 15, 2024

In theory, but what if we did see a difference :)

kolinko · on Jan 15, 2024

No, it's practice. The model doesn't change day by day.

Kind of like saying "let's watch the same youtube video every day and see if it changes" - it won't, that's not how youtube works, there is no point trying.

kevindamm · on Jan 15, 2024

On the one hand, I suspect OpenAI does stealth updates of the model sometimes, or adjustments to the chat product.

On the other hand, even move-fast-and-break-things mentality knows that some of December should be in production freeze, emergency-only updates.

kolinko · on Jan 16, 2024

Most people have that suspicion, myself included. But the API models stay verifiably same, and also OpenAI would have little reason to lie about not updating ChatGPT.

Otoh, I can easily imagine helper/censorship models and chat’s system prompt being updated. System prompt doesn’t change capabilities much though, and the censor model behaviour doesn’t change the output, just cuts it off when it discovers copyright or other violations (the case of chat not being able to recite the Dune poem, DALLe not being able to produce certain images etc)

raincole · on Jan 15, 2024

We don't know how often ChatGPT is getting updated (internally), but I hightly doubt the answer would be daily.

scrapheap · on Jan 15, 2024

It doesn't stop us checking daily though does it :)

phatfish · on Jan 15, 2024

It would be interesting to compare the solutions from day 1 to the solutions a year from now (maybe less, but a year would give time for most people that cared to publish their solution and for them to be picked up in re-training).

Of course, an LLM will use the solutions that are in its training set. I asked an LLM to explain a pretty generic solution to one of last years AoC questions, and even with no (or very little) reference in the code to the fact it was an AoC solution the LLM used terminology from the question.

It wasn't ChatGPT, i think it was Claude.ai. ChatGPT gave an explanation you would expect from just seeing the code without the context it was related to AoC.

jedberg · on Jan 15, 2024

I feel like ChatGPT is equivalent to an intern. I'd be curious to see how well interns do on AoC for comparison with the same level of help.

boxed · on Jan 15, 2024

Claim chowder:

> The model can unquestionably code. [https://news.ycombinator.com/item?id=38205052]

Seems pretty clear most of the fantastic results from before was overfitting. Like every other case. It's amazing to me how these models can be caught red handed being overfitted again and again and again, and people don't get the memo.

skepticATX · on Jan 15, 2024

It’s hard to blame people when OpenAI’s whole shtick is that they’re mere years (at most) away from summoning a god machine that will usher in a techno-utopia, or destroy all of humanity. And to make matters worse, they have selected for this belief when hiring. So it’s pervasive throughout their work.

The data is out there to objectively assess these tools, but it’s not good for business.

bugglebeetle · on Jan 15, 2024

What we need is a standards bodies creating tests in private and running distributed, blind tests using only publicly accessible APIs or open source models. There’s no reason the ACM or a similar body couldn’t do this for computer science evals, for example.

boxed · on Jan 16, 2024

I like Twitters Community Notes more and more. It's a surprisingly good system to call out this crap.

rideontime · on Jan 15, 2024

When that "AI-generated Angry Birds clone" was doing numbers, didn't it turn out that most of the code came straight from tutorials for writing Angry Birds clones?

el-dude-arino · on Jan 15, 2024

Yep, because that's all AI is; a parrot. For well-defined problems it works great... BECAUSE THEIR WELL DEFINED.

Generative AI is only as good as the dataset that you give it, so for problems that exist in a heavily contrived and parameterized space (like leetcode-style problems) it works really well. But give it a novel problem with intertwining libraries and external dependencies along with custom type and structure definitions, it's going to fall flat.

Humans do something AI can't, and that's draw from experience to apply a solution on a novel problem. This is why I'm not terribly worried about AI coming for engineer jobs anywhere in the near future. I use ChatGPT all the time to write me small functions, generate regular expressions, etc. Basically all the drudgery.

I'd argue we've hit peak-AI at this point because from this point on, all datasets are going to be colored by AI-generated results. Generative-AI is now on a trajectory were it'll simply regress to the mean of knowledge.

tessellated · on Jan 15, 2024

How can you spell such a mistake in ALL CAPS and don't see it? ;)

I don't know about the rest of your comment, albeit I consider myself more in the non-Chomsky camp. This emergence thing seems a little more than an elaborate hoax to me.

Well, maybe they've just trained GPT-4 to wiggle my balls, when I ask it to analyse a poem that I wrote 15 years ago.

The local llm scene, regardless of this debate, is nevertheless the hottest topic IMHO atm.

Several finetunes have repeatedly blown my mind even in the past 2 weeks. On a Raspberry Pi.

AI can interface. Natural Language Processing. This is the first time in history humaity has such knowledge and technology.

It's basically C-3PO.

Tepix · on Jan 15, 2024

You make it sound as if (software) engineer jobs are constantly doing cutting edge innovation, and perhaps as if all engineers are capable of it. That doesnt match my experience. Most of it is a more or less smart combination of standard concepts: Something GPTs are already pretty good at and i'm sure they will improve further.

bamboozled · on Jan 15, 2024

But we are, at least I am, almost always integrating or interacting with new libraries, APIs, new features , new behaviours. I might not be personally writing that code but I definitely need to be up to date and adapting all the time.

november84 · on Jan 15, 2024

I came to the realization that I'm using gpt more and more for getting my boilerplate code and I'm starting to feel a bit guilty about it tbh.

Maybe I shouldn't, though I justify it because it allows me to focus on the bespoke bits of whatever I'm doing...

Feels a bit double edged to me still.

bamboozled · on Jan 15, 2024

I never write boiler plate code , am I weird ?

Even when I do need some, I use swagger or copier templates or something similar.

If you’re using ChatGPT for scaffolding, I feel you’ve fucked up?

november84 · on Jan 15, 2024

I'm not a developer mind you but devops. And while I reuse code regularly as well, when new projects come up, I'll ramp up with said boilerplate. Though thats typically in only required if it's something new.

Solvency · on Jan 15, 2024

If you think we've hit peak you're grossly underestimating the sheer volume of copyrighted books, manuscripts, screenplays, podcasts, movies, documents, history, and research papers that ChatGPT hasn't been trained on. There's a LOT more juice to squeeze still.

nirvael · on Jan 15, 2024

This is actually incorrect, there's not that much data left to train on. I remember reading an article about it, might have been one of Gwern's or something about Chinchilla scaling, but to produce an order of magnitude increase we need an order of magnitude more data and there just isn't that amount available.

nirvael · on Jan 15, 2024

Found the reference (see section 2): https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...

jojohack · on Jan 15, 2024

I wonder if Regenerative AI would be a more suitable name.

torginus · on Jan 15, 2024

It might have, however a ton of gamedev related code (like giving adding physics to an object, and having it explode on contact with something else), is either formulaic, or can be glued together from pieces of other code, a feat ChatGPT is definitely capable of. It doesn't require it to develop a sophisticated model of the problem.

And the above applies to a lot of prod code as well.

jiggawatts · on Jan 15, 2024

When GPT4's training data was only up to Sep 2021, I fed it a Rust solution to one of the 2022 AoC problems I had lying around on my PC. I asked it to annotate it with with "formal doc comments".

Now keep in mind that it had never seen this problem, and it was written in a very terse "competition solving" style with zero comments, one-letter identifiers, and generic function names like "day1" and "parse".

It figured it all out. It worked out that it was for solving a maze -- even though the word "maze" was not used anywhere in the code!

It worked out that a constant "(-1,0)" in one isolated bit of code was "orienting the character west", even though to figure that out it had to trace the logic through 4 or 5 layers of indirection. It connected a single 'w' character in the parser to a vector somewhere else!

Etc...

PS: It wrote a more useful, more accurate, more coherent, and more grammatically correct comment than I had seen in any codebase I had worked on in something like two years.

boxed · on Jan 16, 2024

> When GPT4's training data was only up to Sep 2021

Allegedly.

whoisthemachine · on Jan 15, 2024

That's what a "mixture of experts" has always implied to me - a vast amount of overfitting.

tromp · on Jan 15, 2024

> Problems start very easy on day 1 (sometimes as easy as just asking for a program that sums all numbers in the input) and progress towards more difficult ones, but they never get very hard: a CS graduate should be able to solve all problems, except maybe 1 or 2, in a couple of hours each.

I think this wildly overestimates the programming skills of the average CS graduate. My estimate of the fraction of CS graduates able to do that is closer to 1%.

brightball · on Jan 15, 2024

I ran a private leaderboard for conference tickets this year and then surveyed participants afterwards.

Earning a silver star took less than 1 hour each day for almost everybody on average. The gold star was tougher, between 2-5 on average.

https://blog.carolina.codes/p/advent-of-code-ticket-winners

thethirdone · on Jan 15, 2024

Based on the leaderboard screenshot, only 7 people even solved up to day 12. The later days tend to be significantly harder than the early days. Based on top 100 leaderboard times, ~2x as hard for silver star.

encomiast · on Jan 15, 2024

Yeah this…the conclusion I would have drawn is not that the gold star took between 2-5 on average, but rather, 85% of people who self-select as enjoying a programming challenge where not able to finish more than half the tasks. Not that that's bad — seems slightly above par for the leaderboards I was involved in — just that the conclusion is really misleading.

brightball · on Jan 15, 2024

You’re free to draw your own conclusions of course. I can only tell you about the feedback I’ve gotten directly.

Even people who didn’t complete any of the challenges have told me they plan to work them over the course of the year.

We’re going to make this an annual thing and announce it at the next conference in August, so hopefully we will have more participants next year.

danbruc · on Jan 15, 2024

Nothing will change the fact that the people participating in this challenge are not a representative sample of computer science graduates, hence an evaluation of their performance will not really tell much about what kind of performance one should expect from the average computer science graduate.

brightball · on Jan 15, 2024

I will say that one of the guys who tied to win the competition is a 2021 Computer Science grad from Clemson. He's not a brand new grad, but that's the closest sample that I have.

danbruc · on Jan 15, 2024

The claim was that all computer science graduates should be able to solve those problems in a few hours each, the counter claim was that only one percent of them is capable to do so. The existence of one computer science graduates capable of winning the contest does not really tell us anything and I think nobody doubted that there are computer science graduates that are easily capable of solving such problems. But the question was or is, are they the exception or the norm?

d0mine · on Jan 15, 2024

The problems are essentially an elaborate IQ test (follow instructions, pay attention to the actual input, do some googling to get some polygon, graph algorithms/libraries). No deep knowledge is required.

You don't need a CS degree for that.

~5% of those who solved the first 2 problems solved all 49 of them. Given that it is a significant time commitment to spend several hours per day every day for 25 days straight, more people could have done it, given the time. These are not some elite self-selected geniuses. 1/4 of those who solved 1st problem, hasn't solved the 2nd one despite its simplicity https://adventofcode.com/2023/stats

danbruc · on Jan 16, 2024

This would also have been my opinion, computer science graduates should definitely be able to solve those problems in a few hours. The longest time to first solution this is year was something like 15 minutes, an hour or two for 100 correct solutions, so they are certainly not really hard problems.

But personal opinion aside nobody provided any real evidence that this indeed the case and only that is what I wanted to point out in my comment.

Sharlin · on Jan 15, 2024

There's still a huge selection effect. He could have been solving programming puzzles as a hobby for the past fifteen years for all we know, and his degree might have little to do with his AoC skills.

brightball · on Jan 15, 2024

Yea, it gets tougher for sure.

Everybody who participated seemed to enjoy solving the problems. Many of them have said they plan to solve the rest over the next couple of months, just not on the contest timeline.

Levitz · on Jan 15, 2024

The average CS graduate doesn't attend conferences though. Enthusiasts do. And even bigger enthusiasts decide to compete for tickets.

sagaro · on Jan 15, 2024

Unlike Project Euler etc. where one really needs to be good at algorithms/math etc to make the most efficient code, else it will just run for days to brute force, most of advent of code can be solved with terrible algo as the input data is very small.

I think most CS grad students can solve Advent of Code. Some people, probably don't finish it not because it is hard, but probably because they lose interest.

plandis · on Jan 15, 2024

In my experience solving several years of Advent of Code, this is only true for part 1 of most days. A lot of part 2 solutions rely on heuristics to be solved in reasonable amounts of time.

robryan · on Jan 15, 2024

There are some from 2023 that aren't. Day 12 part 2 at least for me had one input that ended up being 95% or more of all permutations and would have taken at least weeks.

Sateeshm · on Jan 15, 2024

I failed to brute force day 4 part 2

boxed · on Jan 15, 2024

That's because most CS graduates can't code. At all. So yea...

qingcharles · on Jan 15, 2024

When I started my CS degree and one of the first classes was how to use Microsoft Word I lost my shit.

edgyquant · on Jan 15, 2024

No idea what school you went to but the first class is usually intro to programming, nothing to do with ms word

qingcharles · on Jan 16, 2024

This was University of West of England.

ddp26 · on Jan 15, 2024

> I don't pay for ChatGPT Plus, I only have a paid API key so I used instead a command line client, chatgpt-cli and manually ran the output programs.

But this is much different, and likely much worse, than paid ChatGPT. Code Interpreter does its own debugging-and-revising loop.

It's bizarre to me that this author wouldn't pay $20 one time to evaluate the higher quality product, the one most people would use if they cared about code quality, am I missing something?

danbruc · on Jan 15, 2024

Then somebody could maybe just copy and paste two, three of the failed problems into ChatGPT Plus and ask for some code solving the problems? Should only take a hand full minutes.

pera · on Jan 15, 2024

I'm confused, is the gpt-4 model used through the API different to the gpt-4 from ChatGPT Plus? I can't find clear information on their website.

dartos · on Jan 15, 2024

Kind of.

ChatGPT is a product built on top of gpt-4.

It includes many features that were built on top of the api (not as a a part of it)

So it’s the same model, but the chatgpt product has more features than just text generation

pera · on Jan 15, 2024

Interesting, do these extra features have an impact on code generation?

asabla · on Jan 15, 2024

yes they do.

And depending on the subject and/or what you're trying to do, it will have a larger impact the just talking to the model

behnamoh · on Jan 15, 2024

I stopped my ChatGPT Plus subscription and replaced it with pure GPT-4 API calls because the "product" built around the API just got dumber and dumber over time.

- ChatGPT pros:

Has some bells and whistles like code interpreter (which I can easily get via Open Interpreter).

Has plugins (although I found web browsing to be inferior compared to Poe/Perplexity).

- Pure GPT-4 API pros:

Is not dumbed down or forced to "forget" things or be lazy in coding.

I use the API either programmatically or through Poe.

AndrewKemendo · on Jan 15, 2024

Yes max woolf had a post here last week or two showing the differences

pera · on Jan 15, 2024

Would you mind providing a link? The only post I found on Algolia when searching for "max woolf" over the past month is yours :)

AndrewKemendo · on Jan 15, 2024

https://news.ycombinator.com/item?id=38782678

pera · on Jan 15, 2024

Thank you. I can't find any clear comparison between ChatGPT Plus and gpt-4 (through the API) on that article though, it seems to be focused on how to improve results with gpt-3.5? I was expecting some benchmarks

aantix · on Jan 15, 2024

Programmers are bizarre in that sense.

They’ll spend weeks, years even, coding a solution in an effort to not pay for it.

Penny wise but pound foolish.

MyFirstSass · on Jan 15, 2024

It's the same line of thinking that results in 90% of stuff on indiehackers and by solo devs is for themselves and other solo devs, tech tools that are extremely niche, for a market with people not willing to pay for them.

An extreme blindspot.

Every introverted dev should go outside and meet other people, check out the irl marketplaces with 1 coffee for 10$, thats the best investment they can make.

thefreeman · on Jan 15, 2024

You can’t sign up for chatgpt plus anymore. they aren’t accepting new subscriptions. You just get added to a waitlist. At least that’s what happened a few months ago when I tried and I haven’t heard anything from them yet.

jjice · on Jan 15, 2024

Is that the one that's $20 a month and gives access to GPT 4? I signed up on Friday.

mattweinberg · on Jan 15, 2024

FYI this changed a couple weeks ago. They reopened signups in mid-December.

rallyforthesun · on Jan 15, 2024

That might be wrong, i just did a paid subscription for my dad at xmas time, pretty shure you can still get paid subscriptions for ChatGPT+. We used his Googleaccount for this.

Havoc · on Jan 15, 2024

Yeah was considering doing same - build own interface, but $20 in the context of "this is likely the future" seems like the wrong place to be stingy

airstrike · on Jan 15, 2024

I view it as ragebait for nerds...

And I yet I still get baited everytime

ionwake · on Jan 15, 2024

Be cool if it was actually written by chatgpt plus as perfect rage bait

inglor · on Jan 15, 2024

AoC questions this year were _deliberately written_ to be confusing to LLMs, it's not failing because it's worse it's failing because the questions were written to make it hader for models :]

Edit: apparently not, the author is just really good at coming up with ai adverse puzzles. When testing ChatGPT did much better on last year’s puzzles.

adamors · on Jan 15, 2024

Here's the creator of AoC saying the exact opposite on Reddit https://old.reddit.com/r/adventofcode/comments/18bp8id/why_d...

> Here are things LLMs didn't influence:

> The story.

> The puzzles.

> The inputs.

> I don't have a ChatGPT or Bard or whatever account, and I've never even used an LLM to write code or solve a puzzle, so I'm not sure what kinds of puzzles would be good or bad if that were my goal. Fortunately, it's not my goal - my goal is to help people become better programmers, not to create some kind of wacky LLM obstacle course. I'd rather have puzzles that are good for humans than puzzles that are both bad for humans and also somehow make the speed contest LLM-resistant.

> I did the same thing this year that I do every year: I picked 25 puzzle ideas that sounded interesting to me, wrote them up, and then calibrated them based on betatester feedback. If you found a given puzzle easier or harder than you expected, please remember that difficulty is subjective and writing puzzles is tricky.

raincole · on Jan 15, 2024

Good example of "the best way to get a correct answer on the internet is to post a wrong one."

fragmede · on Jan 16, 2024

Poe's law strikes again!

imjonse · on Jan 15, 2024

He may not have consciously written them to confuse LLMs but even without using any he probably knows they get more confused on reasoning tasks stated in not very clear text. At least for a few problem descriptions I couldn't help feeling that the statement was a lot more complex than it could have been and not only because of the story woven around it. Of course it could have been there to confuse the humans :)

maximus-decimus · on Jan 15, 2024

> calibrated them based on betatester feedback.

At least one betatester might very well have been using chatgpt.

sd9 · on Jan 15, 2024

Except the article says the opposite:

    Some people have even speculated that the problems this year were deliberately formulated to foil ChatGPT, but Eric actually denied that this is the case.

Citing the author of AoC:

    Here are things LLMs didn't influence: The story. The puzzles. The inputs.

    I did the same thing this year that I do every year: I picked 25 puzzle ideas that sounded interesting to me, wrote them up, and then calibrated them based on betatester feedback.

https://old.reddit.com/r/adventofcode/comments/18bp8id/why_d...

danielbln · on Jan 15, 2024

I think you should maybe be less confident in your statements if you actually don't really know. Highlighting _deliberately written_ when it was anything but.. well, that's confident bullshitting, LLM style, ironically.

anonzzzies · on Jan 15, 2024

I thought they were as well, because there were nuances which made them harder for LLMs. When I tried on 1 dec, gpt4 couldn't solve day 1 part 2. And I tripped over exactly the same thing ; when I parsed it correctly and then explained to the llm, it did solve it. But no where near as fast as me with my bag of horrible hacking aoc lib...

It's interesting it turns out this year was not written with gpt in mind.

blibble · on Jan 15, 2024

> Edit: apparently not, the author is just really good at coming up with ai adverse puzzles.

aka new unique problems that aren't in its training set

radres · on Jan 15, 2024

ofc, chatgpt can do better in previous years given that solutions discussed all over the internet and could've train on them.

mtlmtlmtlmtl · on Jan 15, 2024

Yeah there are literally Reddit megathreads with hundreds of people sharing their code.

sanxiyn · on Jan 15, 2024

That's fascinating! Where can I read more about it?

cormacrelf · on Jan 15, 2024

[flagged]

usrme · on Jan 15, 2024

The article actually links to a Reddit post[1] where the creator of Advent of Code says that 2023's puzzles weren't engineered to be more difficult for LLMs.

---

[1]: https://old.reddit.com/r/adventofcode/comments/18bp8id/why_d...

gumballindie · on Jan 15, 2024

"but it will get better" - person that parrots openai's pitch deck.

criddell · on Jan 15, 2024

It would be surprising if it didn’t, no?

tovej · on Jan 15, 2024

It will get incrementally better at being a word calculator and looking up text, but LLMs aren't going to magically gain critical thinking skills or impactful intelligence capabilities.

randyrand · on Jan 15, 2024

humans magically did. I can’t see why we would be special.

lewhoo · on Jan 15, 2024

Then again, I think the opposite is (at this moment) equally unfalsifiable. Humans have on average 86 billion neurons, sperm whales have bigger brains and their average neuron count is 200 billion. African elephants have 257 billion neurons. There are some reasons to believe we are kind of special.

gumballindie · on Jan 15, 2024

To me it sounds as if ai workers are trying to oversimply how humans and brains generally speaking work to fit their narrative.

A bit among the lines of "if my grandmother had wheels, she would be a bicycle". Emulating neurons doesn't make intelligence.

danielbln · on Jan 15, 2024

While I also don't think autoregressive LLMs will ever gain sentience, I also don't subscribe to the other side of the argument that it;s all a dead end and true(tm) intelligence can never be achieved. Fact of the matter is, we don't know. We don't know why we are sentient, so how can we argue what may or may not lead to sentience when we have a lacking understanding of the inner working of current gen LLMs? A little more humility on what we don't know is in order, from all sides.

cameronh90 · on Jan 15, 2024

My guess is we won't know when we've developed a sentient AI until we're well past the point that it's undeniable, and then in hindsight we might be able to identify where it began to transition from non-sentient to sentient.

Sentience isn't even well-defined and I'm not sure we can even point to any unique quality of any individual human as indicative of sentience. We certainly can't agree on what point an individual human becomes sentient, as evidenced by the abortion debate. At best we seem to have some statistical evidence based on what we've achieved as a species being superior to what oak trees have achieved, but at an individual level, I don't know how to prove to anyone that I'm not just a very advanced LLM.

There is another sense where sentience is just being used to mean "that which is human", and by definition, nothing aside from humans will ever qualify as that.

gumballindie · on Jan 15, 2024

Sentience is a property of animal life. Electrons, software, rocks, etc don't posses sentience, so the notion that ai which is software may be sentient is nonsense.

cameronh90 · on Jan 15, 2024

This is just extending the "sentience is that which is human" definition to animals.

What about animals makes them sentient? Until we can answer that, people are just going to be talking past each other. Even if you forget about AI, whether animals are sentient, and, if so, which animals are sentient, is a big argument in biology/ethics/law that's been going on for at least half a century. The UK recently passed a law that declares animals to be sentient, but not all invertebrates are considered sentient in that law. Their justification is that they don't contain a central nervous system, but what special property about a CNS confers sentience?

gumballindie · on Jan 15, 2024

Yeah we are just beating the bush, with all due respect, bordering sophistry, potentially using a chat bot to generate replies.

thek3nger · on Jan 15, 2024

Animal life is just a different permutation of electrons, carbon and “software”.

logicprog · on Jan 15, 2024

The "LLMs will attain intelligence/consciousness if we just throw enough neurons at the problem" crowd really seems to be engaging in cargo cult thinking to me. Not that I think it would be impossible to make a conscious or intelligent mind using simulated neural networks — I don't think that there's some kind of "special mystical property" about humans like a soul or something — it's just that the structure and training of LLMs is fundamentally wrong for producing intelligence, that's all. It's an easy, simple route that gives you startling results for the first 30% but it's a dead end in the long run for achieving what they want, yet they're insisting that if they just keep pantomiming hard enough real intelligence will come.

bccdee · on Jan 15, 2024

Humans weren't trained to imitate a pre-existing body of text. We were trained to survive & reproduce in a competitive, limited-resource environment. This pressured us to develop critical thinking. I see no reason why we should expect the same results from a radically different process.

tovej · on Jan 15, 2024

Humans are a product of evolution, not a design to calculate similarities in vector spaces.

There aren't any similarities in the way animal species and LLMs develop.

stevenhuang · on Jan 15, 2024

You should learn more about evolution if you think there aren't any similarities.

gumballindie · on Jan 15, 2024

Correct - there are no examples of evolution in non living things. There are however evolving ways of using things, but things don't evolve, don't think, and don't learn. Uneducated people do assign such attributes to things - when people first saw tractors, planes or robots they thought they were the devil, coming to grab them. Much like ai doomers or those that support the notion that they are somehow intelligent do. People used to think computer viruses are alive and have a mind on their own. Again similar to people thinking tractors are the devil. Same old lack of understanding of technology.

danielbln · on Jan 15, 2024

Birds fly very differently from planes, yet both fly.

gumballindie · on Jan 15, 2024

Ok but we don't say that our laptops sing because they can emit music.

svieira · on Jan 15, 2024

The magician may not deign to gift our creations with life in the way he deigned to gift it to us.

michaelt · on Jan 15, 2024

Depends.

There have been rumours that the current ChatGPT loses money even at $20/month, and that to economise on running costs they've changed to a less capable model.

And if the modern LLMs have already mined everything there is to get out of pirated ebooks and the common crawl dataset - who knows how long it'll take for them to make the next big step forward?

danielbln · on Jan 15, 2024

Synthetic data training is a thing (https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...), new architectures are a thing (MoE), open models are becoming stronger, all models are becoming more optimized and efficient.

michaelt · on Jan 15, 2024

Oh, I absolutely agree that the field isn't stationary. It's moving at a breakneck pace, in fact!

But synthetic training data has its problems. Oh, it's great if you want a limitless supply of templated high school math problems. In other fields, though? If GPT-whatever is a bit unclear on whether Magnus Hirschfeld was a Nazi or a victim of the Nazis, and you use it to generate synthetic training data - you can't expect the student model to know better than the teacher model does.

cameronh90 · on Jan 15, 2024

> you can't expect the student model to know better than the teacher model does.

Why not? People outgrow their teachers all the time, even in fields where there is no new external data, like maths. Often improved understanding comes just from reflecting on an issue from new perspectives, which synthetic data can provide. Perhaps a teacher/student model helps AI develop those new insights, just as it does for us.

michaelt · on Jan 15, 2024

If (when taken in aggregate) the training data says Gerald Ford was the 38th President of the United States, and the model disagrees, is that not a deficiency of the model?

cameronh90 · on Jan 15, 2024

Probably, and sometimes teachers are wrong. Also, I don't know who Gerald Ford is but I'll take your word that he's the 38th president of the United States as nobody is here to tell me otherwise ;). It's likely though that some otherwise intelligent and sentient people may disagree on who is the current president of the United States!

I don't believe anybody is suggesting that we exclusively use synthetic data, but rather that synthetic data can augment other types of training. The other thing to consider is that less sophisticated models can be prone to hallucinating nonsense, but the hallucinations are usually inconsistent, whereas truthful responses tend towards consistency in various directions: between each response, internal consistency, and consistency with reality.

It's conceivable that a more sophisticated model would be able to learn a sense of certainty based on the consistency of its training data, much as we do. If you consider your education, I'm guessing you probably had lots of people tell you incompatible nonsense over the years. In my case, I've had teachers give me a ton of inconsistent explanations about how electrons "know" which path to take in a circuit - probably one of my biggest questions since a child. Only one explanation turned out to be internally consistent and demonstrable with experiments I've seen on YouTube. The result is that I now have, I think, a pretty decent understanding of how electric current forms a path within a circuit, or at least one that can be used to make valid predictions, despite being told a vast amount of inconsistent and wrong information over my life.

jprete · on Jan 15, 2024

There's no way to know if they switched models unless they publicly admit it, because the nature of the tech makes it extremely hard to know why it does any specific thing at all, let alone to debug it to find out why it didn't do the thing one wanted.

ChatGPT discussion here has been totally dominated by this problem from the beginning, but usually it's people raising the bar on what you have to do to get good results from it.

gumballindie · on Jan 15, 2024

It would, but it might take couple more winters.

orzig · on Jan 15, 2024

What prompt engineering do people use in real programming situations?

wokwokwok · on Jan 15, 2024

The comments about debugging make me sad.

It shows that people fundamentally do not understand the tools they are using.

Given a book of numbers, here are two tasks:

1) copy out the entire book, but replace every prime number with 7.

2) write down the list of prime numbers in the book.

Which one is easier?

LLMs have to generate tokens one at a time, and it’s very very difficult to perfectly generate a set of input tokens except for some tokens.

Since you are almost certainly randomising the probabilities to some degree (that’s what temperature does), you’re also asking for both deterministic and random outputs.

TLDR: ask LLMs what is wrong with the code.

Ask for a diff.

Don’t ask for an LLM to refactor, bug fix or annotate code…

That’s extremely naive usage.

Back to my stupid analogy: “please copy out this book, but fix the numbers which are ‘wrong’”

I can hardly complain when I get terrible results can I?

jprete · on Jan 15, 2024

No-one fundamentally understands LLMs. It's inherent to the technology.

wokwokwok · on Jan 15, 2024

Don't be daft.

If you don't understand that an LLM generates output token-by-token, and that as a result of randomizing the token output probabilities that you cannot generate an error free copy of the input you've invested so little time in understanding as to be farcical.

There's 'wow, these are complicated and I don't fully understand them'

...and there's, 'What this. It shiny. Not worky. Make some random change to prompt and pray to LLM gods'

Come on, make an effort.

jiggawatts · on Jan 15, 2024

I think you're being unfairly downvoted. People generally are terrible at using tools, especially tools that they perceive as "black boxes".

I've seen an IT professional type this into Google: "Why did my PC crash?"

I couldn't believe that after two decades of using web search technology, he still hadn't figured out how to extract value from a text index using specific and relevant key words.

Similarly, very few developers know what a database index does or whether they need one or not. My pet theory is that NoSQL databases become popular because many of them automatically index every column, making it feel like a magic black box instead of an evil black box.

LLMs are not only black boxes, but they're soooo fundamentally different and new that a lot of people are really struggling to wrap their brains around it.

Just in this thread, today, there are people that are complaining about the direct equivalent of "my poorly thought out Google search didn't work, so Google is bad."

jprete · on Jan 16, 2024

The GP directly calling me "daft" was unwarranted.

wokwokwok · on Jan 16, 2024

I apologise for calling you daft.

Here's a re-wording of our conversation without that; you decide how you want to take it.

me: I am sad because people are clearly using these tools without understanding them; here is a specific example and reason of why what they're trying to do doesn't work.

you: these tools cannot be understood.

me: not only is that is literally false, it's obviously and self evidently false, and I just gave you a specific example of how; I can't take anything you say as being in good faith when you believe that anything to do with AI is literally unfathomable, and you can't even be bothered responding to what I actually wrote.

Maybe, looking into it, you would find that it's not nearly as complicated or difficult as you imagine.

If not, I guess we have nothing to talk about.

This kind of attitude is why I'm sad.

/shrug

lancebeet · on Jan 15, 2024

My impression of the 2023 AoC was that Wastl has spent considerable effort on making it less LLM-friendly this year. Some days, this seems to have been done by adding extra conditions and complications which make the task more difficult for LLMs to parse. Other tasks required studying the input data which is difficult to achieve with an unsupervised LLM. Finally, the first couple of days seemed a lot more difficult this year than previous years, possibly to deter chatgpt users from filling up the leaderboard right away (though this year December started with a weekend which could also be a contributing factor).

bnprks · on Jan 15, 2024

> Other tasks required studying the input data

I also felt there were more problems than usual this year that could not easily be solved without looking at the input for special cases not alluded to in the problem descriptions. (As someone who has solved all 25 for the past 3 years).

An extreme example was this year's day 20 circuit-simulating problem, which was made far easier by having the given circuit split up into a few independent chunks that are only connected at the start + end. (I suspect it might be NP-complete without this feature)

It's a slightly different kind of problem solving to think "what makes this particular input easier than the general version of this problem", and one that I'd naively assume LLMs are less skilled at.

smokel · on Jan 15, 2024

There have been quite a few of these in the past. Some of the 2018 problems (e.g. day 21 [1]) required quite a lot of reverse engineering of programs in a custom instruction set.

[1] https://adventofcode.com/2018/day/21

qsort · on Jan 15, 2024

> considerable effort on making it less LLM-friendly this year

Wastl himself denied that this is the case[1]. This is a lie.

> Other tasks required studying the input data

Always been the case for AoC, how is that different from other years?

We have a clear example of a set of tasks that state-of-the-art LLMs cannot perform. We are doing science, for once. Why do we need to get into full conspiracy mode?

[1] https://old.reddit.com/r/adventofcode/comments/18bp8id/why_d...

lancebeet · on Jan 15, 2024

I apologize. It was simply my subjective impression of the AoC this year, not a statement of objective fact. I didn't intend to insinuate that Wastl is a liar.