Hacker Newsnew | past | comments | ask | show | jobs | submit | markbergz's commentslogin

For anyone interested in these LLM pairwise sorting problems, check out this paper: https://arxiv.org/abs/2306.17563

The authors discuss the person 1 / doc 1 bias and the need to always evaluate each pair of items twice.

If you want to play around with this method there is a nice python tool here: https://github.com/vagos/llm-sort


The paper basically sums to suggesting (and analyzing) these otpions:

* Comparing all possible pair permutations eliminates any bias since all pairs are compared both ways, but is exceedingly computationally expensive. * Using a sorting algorithm such as Quicksort and Heapsort is more computationally efficient, and in practice doesn't seem to suffer much from bias. * Sliding window sorting has the lowest computation requirement, but is mildly biased.

The paper doesn't seem to do any exploration of the prompt and whether it has any impact on the input ordering bias. I think that would be nice to know. Maybe assigning the options random names instead of ordinals would reduce the bias. That said, I doubt there's some magic prompt that will reduce the bias to 0. So we're definitely stuck with the options above until the LLM itself gets debiased correctly.


Thank you for writing this up, I'm glad I'm not the online one with the Android issues. Being able to put any software on it is nice, but it comes with a cost. The default Android jank along with the custom home screen makes the product difficult to use. I only wanted to read ebooks on the device but even that experience is not great and the reader app that comes with the device seems to be limited to pdfs. Additionally, there is 0 hand holding during set up. The packaging is top notch, but the QR code with instructions was hidden at the bottom under the device. I did not notice it until days later.

That being said, the screen technology is amazing and I hope they're able to continue the business. Unfortunately the bar for products is very high now but I think they have something here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: