Rendered at 02:28:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
razorbeamz 1 days ago [-]
LLMs are not the ideal tool for this job, because LLMs cannot do math or count.
bijowo1676 3 hours ago [-]
LLMs beat humans at generating code (and fixing broken one) and letting CPU execute the code
vitally3643 14 hours ago [-]
Most human programmers are also fantastically bad at math.
razorbeamz 2 hours ago [-]
True but irrelevant.
This 8-track duplication puzzle is a problem of math.
in-silico 9 hours ago [-]
> LLMs cannot do math
This is plainly not true anymore
razorbeamz 2 hours ago [-]
No, they fundamentally cannot do math. They are next token predictors, not calculators.
ithkuil 21 hours ago [-]
But LLM can write code that can do math and count. Tool use, more broadly, has proven to be a very powerful way to let LLMs do what they're good at (handle the fuzzy and imprecise nuances of natural language, which includes the scooping of a lot of context) and delegate other things they're not good at to external tools, some of which if can write on the spot.
If you think about it, we humans do that all the time too.
I'm crap at 4 digit multiplication in my head, but I have no problem doing that with pencil and paper
razorbeamz 2 hours ago [-]
> But LLM can write code that can do math and count.
They cannot, however, execute that code. They can feed that code into an external program they've been given access to, but they can't execute it themselves.
camkego 21 hours ago [-]
Thanks to this article, I learned that the 70's Sticky Finger jeans brand were named after a Rolling Stones album. Don't know how I missed that all this time.
bijowo1676 3 hours ago [-]
there is no secret to how LLMs did it.
I asked LLM and what it did was try DP solution, couldnt find DP solution, then launched brute-force solution. Everything in Python. so what LLM really did, was execute the SoTA human knowledge including solving using code.
here are LLM "thoughts" for example in the article:
________
I need to distribute 11 tracks across four programs on an 8-track cartridge. Every program must play for the exact same length of tape. Shorter programs waste the remaining silence. The goal is to minimize the longest program length.
________
The search found a best maximum program length of 613 seconds.
The dynamic programming solver claimed feasibility for 601, yet the search returned zero solutions.
A discrepancy appeared between the binary search result and the partition check.
The binary search lower bound was set to the longest track instead of the average length. This caused the search to miss the true minimum of 613.
Reconstruction failed — running exhaustive search for the true optimum.
________
The optimal assignment has been confirmed.
Program 3 is the longest at 613 seconds, which sets the tape length. Program 4 is the shortest at 573 seconds, leaving 40 seconds of wasted silence. The remaining three programs are 607, 610, and 613 seconds. A perfect 601-second maximum is impossible because the tracks cannot be split.
this was code used by LLM:
python3 << 'EOF'
tracks = {1:174,2:238,3:195,4:194,5:220,6:203,7:258,8:135,9:213,10:271,11:302}
def fmt(s): return f"{s//60}:{s%60:02d}"
items = list(tracks.items())
all_best = []
def search(i, bins, progs, best_max):
if i == len(items):
if max(bins) == best_max:
all_best.append([sorted(p) for p in progs])
return
if max(bins) > best_max:
return
tid, dur = items[i]
tried = set()
for pi in range(4):
if bins[pi] in tried: continue
tried.add(bins[pi])
bins[pi] += dur
progs[pi].append(tid)
search(i+1, bins, progs, best_max)
progs[pi].pop(); bins[pi] -= dur
search(0, [0,0,0,0], [[],[],[],[]], 613)
# dedupe
seen = set()
unique = []
for sol in all_best:
key = tuple(sorted(tuple(p) for p in sol))
if key not in seen:
seen.add(key)
unique.append(sol)
print(f"All {len(unique)} distinct optimal solutions at 10:13:")
for sol in unique[:5]:
sums = [sum(tracks[t] for t in p) for p in sol]
print(f" {sol} -> {[fmt(s) for s in sums]}")
EOF
This 8-track duplication puzzle is a problem of math.
This is plainly not true anymore
If you think about it, we humans do that all the time too.
I'm crap at 4 digit multiplication in my head, but I have no problem doing that with pencil and paper
They cannot, however, execute that code. They can feed that code into an external program they've been given access to, but they can't execute it themselves.
I asked LLM and what it did was try DP solution, couldnt find DP solution, then launched brute-force solution. Everything in Python. so what LLM really did, was execute the SoTA human knowledge including solving using code.
here are LLM "thoughts" for example in the article:
________
I need to distribute 11 tracks across four programs on an 8-track cartridge. Every program must play for the exact same length of tape. Shorter programs waste the remaining silence. The goal is to minimize the longest program length.
________
The search found a best maximum program length of 613 seconds.
The dynamic programming solver claimed feasibility for 601, yet the search returned zero solutions.
A discrepancy appeared between the binary search result and the partition check.
The binary search lower bound was set to the longest track instead of the average length. This caused the search to miss the true minimum of 613.
Reconstruction failed — running exhaustive search for the true optimum.
________
The optimal assignment has been confirmed.
Program 3 is the longest at 613 seconds, which sets the tape length. Program 4 is the shortest at 573 seconds, leaving 40 seconds of wasted silence. The remaining three programs are 607, 610, and 613 seconds. A perfect 601-second maximum is impossible because the tracks cannot be split.
this was code used by LLM: