Could someone who can parse (reverse engineer?) this write up an explanation? (i...

geocar · on Oct 18, 2014

I'll give this a shot. I'll try to explain what's in my mind as I read it as well.

First, get out the reference manual: http://kparc.com/k.txt and we'll do the first couple lines.

The sequence that goes f x applies x to f. this f is unary.

The sequence that goes x f y applies x and y to f. this f is binary (and just labelled verb).

Some things (adverbs) go f a x and apply f in some special way to x.

Last hint: You read this code from left to right (like english). Do not scan it.

Now let's dive in:

    c::a$"\n"

This says: c is a view of where a is a newline. That is, if "a" contains a file, then c is the offsets of each newline.

A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values. This is similar to how a cell in Excel can refer to other cells and reflect whatever the value of those cells happens to be.

    b::0,1+c

This says: b is a view of zero join one plus c. That is, where "c" contains the offsets of each newline, 1+c would contain the beginning of each new line. We joint zero to the beginning because of course, the beginning of the file is also the beginning of a line.

    d::(#c),|/-':b

This says: d is the view of the (count of c) joined with the max reduce, of each pairs difference of b. That sounds like a lot, but "each pairs difference of b" (where b is the position of all the new lines) is also the length of each line, and the max reduce of that is the longest line. You might say that "d" is the dimensions of the file.

    i::x,j-b x:b'j

This says: i is a view of x (?) joined with j (?) minus b (the offset of the beginning of each line) applied to each x which is defined as the bin of j in b.

j hasn't been defined yet, but you can see that x is defined later in the definition, and used to the left. This is because K (like APL) executes from right to left just like other programming languages. you write a(b(c(d()))) in C and you're executing the rightmost code first (d) then applying that leftwards. We can do this with anything including definitions.

The other string thing is that we know that b is the offset of the beginning of each line, and yet we're applying something to it. This is because k does not distinguish between function application and array indexing. Think of all the times you write in JavaScript x.map(function(a) {return a[whatever] }) when you'd really just like to write x[whatever] -- k let's you do this. It's a very powerful concept.

On that subject of binning: b'j is going to find the index of the value of b that is smaller or equal to j. Since we remember that b is the offset of the beginning of each line, then if j is an offset in the file, then this will tell us which line it is on(!)

But we don't understand what j is yet; it's the next definition:

    j::*|k

This says: j is a view of the last (first reverse) of k. We don't know what k is yet.

    f:""

This says: f is defined as an empty string.

    g::a$f

This says: g is a view of the offsets of f (an empty string) in a. Initially this will be null, but other code will assign to f later making g a value.

Next line.

    s::i&s|i-w-2

This is very straightforward; & is and, and | is or. While excel doesn't let us use a cell in it's own definition, k does: It means the old value of s. So this is literally: the new view of s is i and the old view of s or i minus w (?) minus 2.

We don't know what w is yet.

    S:{s::0|x&d-w}

This is a function (lambda). x is the first argument. If we called S[5] it would be the same as setting s to zero or 5 and d (dimensions) minus w. Double-colon changes meaning here; it no longer means view, but set-global.

    px:{S s+w*x}

This requires some knowledge of g/z: http://kparc.com/z.txt

Note that px is defining a callback for when the pageup/pagedown routines are called. x will be 1 if we page down and -1 if we page up. It may now become clear that S is a setter of s that checks it somehow. When we understand what w is (later in the program) it will be absolutely clear, but pageup/pagedown are changing s by negative w when pageup, and positive w when pagedown.

    wx:{S s+4*x}

Consulting the g/z documentation, we can see this has to do with wheels. Note we modify s again relative to 4 times x; x is -1 when wheelup and 1 when wheeldown. It becomes clear that s is modified by the pageup/pagedown and the mouse wheel.

    cc:{9'*k_a}

Again: in the documentation, 9' stashes something in the clipboard. cc is a function that takes the first slice of offsets k (we still don't understand) in a (the file). The g/z documentation says that cc is the callback for control-C. This is expected as control-C traditionally copies things to the clipboard. Since the slice of offsets k in a are being saved in the clipboard, we may guess at this point that k will contain the current selection.

This process is time consuming, but it is to be expected: Learning english took a while at first, and often required consulting various dictionaries. Eventually you got better at it and could read somewhat quickly.

I don't know if you want to try the next few lines yourself to see if you can get a feel for it, or if you want to try one of the less dense examples:

* http://www.kparc.com/$/view.k * http://www.kparc.com/$/edit.k

... or if you want me to keep going like this, or if you want to ask a few questions. What are your thoughts?

klibertp · on Oct 19, 2014

I have one question. I learned some J some time ago, but never really talked about it with anyone, and so my programs - a few lines' scripts, really - were always written with long, meaningful variable names. I read your explanation and every time you wrote "we don't know what it is yet" I wondered "why the heck isn't it just appropriately named?". I mean, why is 'c' better than something like 'nl_pos' for example? I get it that reading J, K or APL programs requires some serious work and I'm ok with that, but why would I need to burden my short term memory with one- or two-letters identifiers on top of that?

This is a honest question and I feel like there is some upside to those names I just keep missing. As I said, I'm not fluent in J, but while learning it I wrote and read quite a bit of it, and I only made it through some longer (like, longer than half a line!) examples thanks to a sheet of paper and sheer determination. I often was going through a fairly complicated expression and was starting to see what is it about, only to be stopped by an 'x' or 'c': I then had to go back a couple of lines, read 'x' definition again, and retry parsing that line from the beginning, hoping that I will remember what 'x' is this time. I started taking notes for this reason (it worked quite well I think).

Anyway, you seem to have no problems reading such code, so I figured I'd ask you: why and what is this style of naming good for, and what one needs to do to master it?

geocar · on Oct 19, 2014

j is abstract. It seems like it should be easier if it is concrete, but I find that this tends to lead to assumptions and encourages scanning instead of reading.

When I sit down to a program, I have an idea about what I want to do. I don't have home/end keys on my keyboard and I note that pressing Fn-Left and Fn-Right is annoying because I don't usually push them (I use emacs sometimes). So when I sit down to edit I want to add emacs keys.

I can see at http://kparc.com/z.txt that hx handles home/end, and that cX is control-X, so I think it should be something like:

    ca:{hx 0 -1};ce:{hx 0 1}

Now I look at the characters spent: Is it possible I could say this simpler? I don't think so, but I'd love someone to tell me.

How about delete? I think what I want to do is select the next character and remove it. So where is the cursor? Well I remembered that hx moves the cursor, so I read it's definition:

    hx:{L i+d*x}

Okay, so I remember d is "dimensions" and "i" is something, and L I haven't seen yet. So I go look at L:

    L:{K@B/0|x&d-1}

Now I have two new words: K and B to look up:

    B:{c[x]&y+b x}
    K:{J(;|\(*k),)[H]x}

Okay, B is straightforward: b is indexed by line, so b x returns the offset of line x. We add this to y (which is the second value of |x&d-1) and take the min of it and the offset for c. I might think at this point B converts the x/y coordinate into a linear offset. But what about K?

K is using J and H (but it turns out we only need J):

    J:{k::2#0|x&-1+#a}

And here we see the definition of k: x&-1+#a is the smallest of x and the second to last character in the file (a), and the max of that and zero (0|). This reads like "bounds check x to the shape of a". Taking 2 from that simply takes two values and assigns them to k so if x is only one-value k will still have two.

k is the cursor selection.

From here, I can make an attempt at implementing delete. My first attempt looked something like:

    cd{a::((0,k)_a),(((k+1),(#a)-(k+1)))_a}

Oh! But I've seen a lot of these terms before! Maybe I can do better. I note that kx is documented as the callback for keystrokes http://kparc.com/z.txt and can come up with:

    cd:{K j+!2;kx""}

This seems about as simple as I can make it. I'd like to see someone do better, but knowing that K is a setter for the cursor selection and j is the offset of the cursor, then j+!2 simply returns offset and offset+1; K will select it and kx"" will remove it.

Now maybe if k were called cursorSelection I might have gotten my first cd faster, but I would've missed the opportunity to see how these functions and variables were interconnected and I might not have written the second one.

I noticed however, that not scrolling really helped this exercise. I just moved my eyes around, and jotted a few symbols down on my notepad. I feel like if I had to scroll or switch windows I would not have been able to do this.

As for your last question: What does one need to master it? For now, I would say practice. Try writing a program in as dense a manner as possible. Remove as much redundancy as you can. Read it and re-read it until you feel like it can be no more abstract.

I am working on a much better answer, but I hope that one will do for now.

jjoonathan · on Oct 19, 2014

My guess is that the language's "day job" uses many similar-but-ever-so-slightly-different intermediate variables which are difficult to descriptively name in any manner that facilitates understanding faster than just re-reading the definition. Long descriptive names have a concrete cost: they impair your ability to recognize visual patterns / turn common compositions into "pictographs" and they don't refine well: instead of gaining an index and adding a bit to the definition the author has to come up with a bundle of new similar-but-slightly-and-meaningfully-different names from which someone else (including their future self) will be able to reverse-engineer the definition.

The shortcut of using descriptive names just doesn't have the same ROI in all kinds of code. The first time I was forced to abandon my descriptive-naming ways was when I started writing finite element solvers. I don't think it's much of a stretch to believe that (some areas of) finance have the same cost/benefit profile. Since this is a desktop programming example it's relatively easy to come up with good descriptive names, so the short names are almost certainly a holdover.

Or it's just a macho thing. There's enough pixie dust floating around this press release that I wouldn't be surprised.

bastawhiz · on Oct 19, 2014

> why and what is this style of naming good for, and what one needs to do to master it?

It's bullshit. The letters are not letters, they're symbols. Replace them with JPGs of pokemon if you want. It's just as well if they were ancient hieroglyphs. If you forgot what they all meant, you have to read the code and keep track of what variable name contains what. Or make a note on some paper.

This is why one-letter variable names are an antipattern: when there's no rhyme or reason to what value is associated with what symbol (why is `c` the index of the newline and not `n`?), your brain isn't going to remember it, and you've instantly forgotten what that code does. And everyone after you needs to do the same thing: you need to re-remember how each line of code works each time you work with it. And more importantly, looking at one dense line of code provides no context as to what the code around it does, compared to C or Python where one function can give you a decent idea of what its use is.

It's even more of a nonsense issue because the actual name of the variable is just a number: every one-letter variable name maps to an integer. Why not support proper identifiers and replace them with unique integers at runtime? The same goes for whitespace: it doesn't need to be kept in the program at runtime, but makes the application infinitely more readable because units of logic can be grouped on their own line.

So to answer your question: no, this is not good for anything and no, you shouldn't try to master it because there are much more useful things that you could be doing with your life.

avmich · on Oct 19, 2014

One reason for one-character names is the habit of these languages to sidestep naming problem. Naming things is one of the hardest problems in software. APL programs are ideally read as "variable X is ...", not "do this, then this, then this" - so appropriate names of variables could be something like "max of reduce by difference..." - hence the hardness of coming with good names.

Another reason is the same as in math. When you write equations on a whiteboard - the long-sought golden standard of expressiveness - you don't use long names - at best you encode names with subscripts, indexes etc. But names themselves are usually pretty short. APL languages use the same rationale.

Just like you need to read carefully every line of a math equation to understand what's going on, you have to read carefully each symbol of programs in APL family languages. It's unusual for programmers who got used to more help along the line - but the vocabulary of all such languages is quite short and doesn't extend that often. Another reason why APL could be used for teaching math.

I don't agree that this style of naming is not good for anything. Somehow majority of programmers in these languages agree with that. Regarding much more useful things -

"If you are interested in programming solutions to challenging data processing problems, then the time you invest in learning J will be well spent." (http://jsoftware.com/)

tinco · on Oct 18, 2014

That was awesome, thanks for explaining that for us. It makes a lot of sense the way you explain it, and I quickly got the idea that you can make some powerful expressions this way.

The smooth creation of lists is I think one of the most important language features higher level languages have over lower level languages like C.

Just this thing:

    c::a$"\n"

That's all I needed to be convinced that modern languages should have similar view constructs. In ruby it'd be:

  c = a.map.with_index{|a,i| a == "\n" ? i : nil }.reject{|i| i.nil? }

Quite a mouthful, mainly because Ruby lacks a neat way to do '$'. But doing the same thing in C would really be awkward, and likely not as efficient unless you have some fancy code for building enumerators in C.

geocar · on Oct 19, 2014

The view automatically gets updated whenever a gets updated. Every time you change a (directly or indirectly), then c will automatically get updated.

Doing this generally in Ruby I think is impossible, but you might be able to get close if all your objects are based on ActiveModel::Dirty

dragonwriter · on Oct 19, 2014

> Doing this generally in Ruby I think is impossible

Not so much. At least with views over most Enumerables, its quite possible in Ruby -- that's the whole reason that Enumerable::Lazy exists.

The existing File class doesn't quite support it because of the way its iterators are implemented (particularly, they are one way) but the class is easily extended to allow it, e.g.:

  class RewindFile < File
    def each
      rewind
      each_char {|x| yield x}
    end
  end

Then you can create a synced view that has the character positions of the newlines in a file like this:

  a = RewindFile.new "myfile.txt"
  c = a.lazy.each
            .with_index
            .find_all {|x,_| x=="\n"}
            .map {|_,y| y}

geocar · on Oct 20, 2014

If I go `a=whatever` or `a[42]=whatever`, then `c` will not get updated if I have already consumed those values; I still need to "reset" c every time I update a.

minor_nitwit · on Oct 19, 2014

that seems overly complicated for ruby if a is a file.

   c = []; a.lines{c << a.pos}

As far as $, you may know it from Regex as the new line indicator.

geocar · on Oct 19, 2014

a is a mapped string. You could do:

    c=[];n=0;a.lines{|x|c<<(n+=x.size)-1}

but $"\n" wasn't special, and this allocates tons of memory. tinco's implementation is much closer to what k is actually doing.

MichaelGG · on Oct 19, 2014

Certainly you could wrap up the $ operator into some Ruby method?

tree_of_item · on Oct 19, 2014

Yes, Ruby is Turing complete, but that's missing the point.

The value K provides is the collection of operators like `$` that implement a high level language for the sorts of problems K programmers face.

If you went through and implemented all of those operators in Ruby and only used those instead of things like loops, your code would be "unreadable" to the standard Ruby programmer; essentially you'd be programming in a different language.

tinco · on Oct 19, 2014

Sure, simply do:

    module Enumerable
      def dollar(c)
        map.with_index{|a,i| a == c ? i : nil }.reject{|i| i.nil? }
      end
    end

And then you could do:

    c = a.dollar("\n")

There is of course a reason this dollar method is not a part of the standard library. Its name makes no sense and it's oddly specific, how often would you want the indexes of matches to a character? Most modern languages don't like to work with indexes a lot, and in my day to day work I don't need indexes very frequently either. This I guess is just a thing that these finance/apl people do more often, so they have a specialized standard library.

(So when I said 'a neat way to do $' I meant it has no find_all_with_index method, which would make my implementation much cleaner)

Widdershin · on Oct 19, 2014

Regarding find_all_with_index, you can do:

    .find_all.with_index { |item, index| ... }

dragonwriter · on Oct 19, 2014

except .find_all.with_index passes the index to the predicate block, but still only returns a view of the matching elements from the list.

What I think is being sought is more like:

  module Enumerable
    def find_indices
      each.with_index
          .find_all {|x,_| yield x}
          .map {|_,y| y}
    end
  end

losvedir · on Oct 19, 2014

Wow, that was fascinating. K looks utterly mind-expanding, thanks for breaking this down.

You obviously have some experience working with K, and it sounds like at least Javascript, too? K is so foreign I expect it has a lot of interesting thoughts locked up in there that maybe don't get the attention they deserve.

Would you say there are any "killer features" of the language / environment that you miss when working with more traditional languages?

icsa · on Oct 19, 2014

And now with less snark :).

K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)

with no changes.

icsa · on Oct 19, 2014

<pre> And now with less snark :).

K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)

with no changes. </pre>

icsa · on Oct 19, 2014

Killer feature --------------

Not having to write:

for(int i=0; i<n; i++) { ... }

which generally obscures the fact that I'm trying to do a map or fold (e.g. - reduce) operation :).

woah · on Oct 19, 2014

xorcist · on Oct 19, 2014

> c::a$"\n"

The code is full of these things, that looks conceptually a lot like Perl oneliners (which I don't mean in a negative way). But the trick to coding with these short idioms is to make sure your input is strict in a very specific way that fits your particular problem.

A nontrivial codebase could not be written in this way. A real life editor would for example have to the concept of active parts of files to edit files larger than fits in memory, the ability parse different encodings and line endings without changing the on-disk format unless asked for, and many other things (and this just to implement the bare bones of an editor from the 80s).

This looks a lot like a macro language, with its many ready made functions for accessing ctrl-key sequences, cursor movements etc. Would it really hold up outside its problem domain?

geocar · on Oct 19, 2014

> A real life editor would for example have to the concept of active parts of files to edit files larger than fits in memory

Why? Address space is cheap, and a is simply map'd to the file.

I don't have any text files bigger than 64 bits of address space. Do you?

> the ability parse different encodings and line endings

vi doesn't do this. acme doesn't do this. I would agree it's a popular feature, but I still think it's a mistake: If every program on my computer has to continuously parse and deparse bytes into text, and be aware of all the different things every other operating system calls text, then it seems like a waste; it seems redundant. It's a waste of code, and of memory, and I think any mere editor that "needs" this lacks sufficient composition.

dragonwriter · on Oct 19, 2014

> A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values.

Its actually not at all an unusual concept; its well known from SQL, for instance, but also Scala has them (Scala views aren't views in the sense that you use the term, but the lazily-implemented transformers they implement produce views of the type under discussion) as does Ruby (via Enumerable::Lazy). In fact, while the name "views" isn't exactly commonly used for them outside of SQL (though, as noted, Scala uses the term for the construct through which one obtains them), they are pretty ubiquitous in modern languages.

icsa · on Oct 19, 2014

A view can be thought of as a dependency expression. In version 3 of K there was a data driven GUI. The dependencies allowed one to to Functional Reactive Programming (FRP) quite easily.

Unlike SQL, views in k are not limited to queries as dependency expressions. The expressions can be arbitrary.