As a few people have asked for screenshots, I spun it up. Here's a video of the basic gameplay: https://peterc.org/misc/fpscob.mp4 .. it's clunky, but it does play.
I'm not getting anywhere near the speeds advertised on my 3090 Ti, alas, but it's fun watching it "fill out" its answers. I did Simon's "SVG pelican on a bicycle" test on it and the result was quite minimalistic but fit the brief: https://gist.github.com/peterc/7672e74ec1437945e5fca5ce2c1c9... -- this was on the Q4 quant running on patched llama.cpp. I will be interested to see if Simon's looks much different.
Yeah, the patched llama.cpp. The reason is I saw that using the Q4 quant on vLLM is discouraged and the int8 won't fit on my 3090 Ti, but I could certainly give it a go. I also skipped Transformers as it needs to download the full weights and quantize them locally and I didn't fancy waiting for a 50GB download.
Stuttering John used to do this back on Howard Stern by asking celebrities questions that were far out of the expected gamut at red carpet events. This was all for shock/comedy value, but "who are you and what makes you famous" type questions can really throw celebs off script: https://www.youtube.com/watch?v=8P0hENpnMXk
I'm not the OP and I imagine all cases are different, but my dad was a software developer who had early cognitive decline in his 60s (he died of vascular dementia recently) and he used to talk about it a lot. He said it was like his tolerance for complexity kept closing in.
Where he could once hold an entire system and its details in his head (almost an essential skill in the 80s/90s), he could only instead focus on smaller pieces at a time. Any new tooling or approaches that came along, he was fascinated to hear about them, but no longer felt able to pick them up. He could still solve algorithmic problems and debug "in the small", but it was like he had to do math on a Post-it note where once he had a huge sheet of paper.
Its image processing is terrible. I ran several tests against it against Qwen 3.5 0.8b (yes, 7% the size) and Qwen beat it every time with Gemma often getting things entirely wrong. I even gave it a plain image saying "This is a test" and it thought for 6 minutes trying to analyze it and failed. Qwen 3.5 0.8b confidently got it in under a second.
It may be that the Q6 quant I got is borked (or my LM Studio is), but either way, the 0.8b's performance is mind boggling in comparison.
For Qwen 3.5 0.8B presumably you're running it unquantized, because it's so small. Get at least the Q8 of Gemma 4 12B with the F32 mmproj and use an f16 kv cache.
Then run it with the latest llama.cpp that contains the Gemma 4 12B unified bug fixes, using --image-min-tokens 560 --image-max-tokens 2240 --batch-size 4096 --ubatch-size 4096 --temp 1.0 --top-p 0.95 --top-k 64 --jinja
It's understanding far more complex things for me and can reliably handle tiny text, so it should be easily understanding an image that only contains the text "This is a test".
I guess Google implements more / stronger guard rails than Alibaba and thus confuses these small models. At least this was my impression with Gemma3 models where it often said that the image contains some nudity / sex scenes and therefore it cannot give a description of the image. Never understood the point of this behavior....
The biggest problem with all the Google models has always been RLHF, particularly safety training. They take a good, smart model and make it behave like a corporate person that has been to far to many forced anti-{sexism, racism...} seminars so that it is now living in fear of saying something that could be construed as wrong by some moral standard.
The Qwen series adopted vision wayyy earlier than anyone else. No idea why the other labs were sleeping on it but they had about 2 years of experimentation without any competition.
Can't speak for browser demos, but I just got the ternary model working on my M5 generating images. The 1 bit didn't work, as it has a known bug with XCode 24.5 and I wasn't in the mood for installing 24.4 alongside.
This year seems to be turning a bit of a corner. Of the top box office movies so far this year there's Michael, Project Hail Mary, Hoppers, Wuthering Heights, GOAT.. with Obsession and Backrooms rapidly rising.
Last year it was basically F1 and Minecraft (and while not sequels, both are arguably well known "franchises" outside of movies - but I guess MJ and Wuthering Heights are too ;-)).
Wuthering Heights was a remake, and Hail Mary was also a safeish bet since it's a novel by the same guy as The Martian.
Not to say that it isn't an improvement, but we're still pretty far from seeing American cinema catching up to the world stage in originality, let alone to the golden Hollywood era.
I remember reading Project Hail Mary years before the movie was announced, and thinking "this is written, if not exactly as a screenplay, in such a way to make it SO easy to adapt to a screenplay that given this is from the Martian author there is no way this will not be made as a movie"
Yeah a lot of authors nowadays just write screenplay, either thinking on licensing or just by being influenced by tv. Sanderson and Abercrombie come to mind as other authors that basically have action scenes and movie cuts baked into the books.
At least Hail Mary was an original IP with no built-in sequel opportunity. These days, I'd be happy if more major studio, big budget releases were adapted from original IP books.
Sadly, I heard that the studio is apparently trying to figure out how to make a Hail Mary sequel (sigh).
Agreed, I've already been to the movies twice this year (PHM and Backrooms), usually it's maybe once every other year for some one-day anime movie airing. Really enjoyed both of them, just as with PHM, I think Backrooms is best viewed on a big screen.
There are some older studies that showed nicotine (not via smoking, which makes sleep apnea worse) had a positive effect on sleep apnea as well due to the stimulation increasing muscular activity in the airways. Obviously lots of downsides too so it never caught on, but this seems to have a similar mechanism.
reply