| Ruby again, part two. |
[Jul. 17th, 2008|07:58 am] |
This is a continuation of my previous post, based on new information.
I shall quote chalain directly:
<Chalain> It's a combination of two things. <Chalain> 1. Ruby allows you to drop the semicolon if a statement parses to a complete expression. So the 3+2 gets parsed as an expression. <Chalain> 2. "Everything returns" in Ruby. Like most languages, you can accumulate return values on the stack, and they get ignored. <Chalain> So <Chalain> (3+2; +4) is a legal expression. It returns 4. <Chalain> A legitimate use of this would be, for example, (puts "It broke!"; logger.error 'It broke") if x>0
So, it's deliberate behaviour rather than a parser snafu. I am still of the opinion that it is broken, I have just changed where I'm pointing. I don't think parentheses should be used in this way. There is a perfectly good multi-expression conditional syntax, and it puts the if at the beginning rather than the end. Parentheses should enclose a single expression only rather than trying to be braces or begin/end as well.
Ruby is full of little things like this, where small wins and "it would be nice" features litter the language with booby traps. |
|
|
| Ruby again. This time the parser. |
[Jul. 16th, 2008|05:16 pm] |
Disclaimer: The following was written while I was furious and frustrated. Please bear this in mind if anything offends you.
Until I came to my senses, I had actually fired up an editor to update my CV so I could find a job that wouldn't require me to fight with the festering pile of yuckiness that is Ruby. Consider the following irb session:
>> (1 + 2 +
?> 4)
=> 7
>> (1 + 2
>> + 4)
=> 4 As far as I can tell, if the linebreak results in two valid expressions, the first is thrown away and the second is used. Why this is the case, I have no idea. All I can surmise is that whoever wrote that bit of the parser was smoking crack. The fact that this hasn't been picked up and fixed just demonstrates that the Ruby community completely fails to care about quality. I can't even find reference to it on the internet.
Ruby has long been way down on my list of languages to use. It has just dropped into the same class as PHP and Visual Basic. Congratulations, Ruby. I didn't think you could get worse. You did.
Edit: This saga continues in a new post. |
|
|
| Don't bring thin skin to the internet |
[Jul. 2nd, 2008|07:14 pm] |
I had an all-too-frequent argument this evening. This resulted (as it occasionally does) with the other person storming off in a huff, all offended. The conversations usually goes something like this:
Him: I like $broken_technology. Me: $broken_technology is very seldom a good idea. Why do you like it? Him: Because $common_misconception. Me: Actually, that's a common misconception. $correction. Him: But are there any $not_broken_technology things that do $common_misconception_thing. Me: Yes, plenty. For example, $thing. Him: Um, but $other_misconception. Me: Actually, $other_correction. Him: You people are a pack of technobigots! *storms off in a huff*.
Part of the problem is that I suffer from SIWOTI Syndrome. I find it incredibly difficult to just let people sabotage themselves because they believe something that is not true. Part of the problem is that I want to teach people to think rationally about their behaviours and motivations, because it's the only way to avoid muddling around in a fug of lies and excuses. Part of the problem is that I get too emotionally invested in trying to better the lives of strangers when I can see exactly where they are making their mistakes and have the data to back it up.
Also, to be perfectly honest, I like to be right. I like to dispense wisdom. I enjoy being the expert people come to when they have a programming problem. Being the authority is nice. It makes me all warm and fuzzy when people do something better because of advice I gave them.
The real problem is that people don't like to be wrong. Thinking is hard. The misconceptions are comfortable old friends. The work they have done under those misconceptions represents effort that they may need to throw away. This makes people defend their misconceptions. It also makes them see an attack on a misconception as an attack on them personally.
This is where the subject comes in. As soon as you take anything on the internet personally, you're opening yourself up to pain. It hurts me when my laboriously collected wisdom is written off as bigotry. It hurts you a lot more when you take a request for data backing up your assertions as "getting ganged by zealots".
I don't really have a good answer. I could probably be a bit more diplomatic, but it's difficult to say "you're not making sense" in a way that doesn't offend those with delicate sensibilities. I could just ignore people being wrong, but occasionally I do enlighten some poor soul who has merely been led astray by the propaganda machines and is capable of becoming a useful and productive member of the community. (That's not to say the others aren't, just that they need to shed some ego first.)
Another side effect of my SIWOTI is that I often come across as arrogant. Sometimes this is me misjudging the level at which to pitch my explanations and coming across as patronising (if I pitch too low) or elitist (if I pitch too high). Then if I ask a few questions to judge background, I'm interrogating instead of helping. Sometimes it's because I tell people they're wrong. If I kept quiet or only answered the questions asked, even when they pointed to deeper misunderstandings or flaws, I could avoid these issues. But then people would keep being wrong. And we can't have that. |
|
|
| This one isn't really Ruby's fault |
[May. 15th, 2008|07:04 pm] |
Consider the following regular expression, designed to validate a tab-separated string and extract data from it:
>> t = "a\tx\tb"
=> "a\tx\tb"
>> re = Regexp.new("a\t\S\tb")
=> a S b
>> re.match(t)
=> nil
>> re = Regexp.new("a\t[^\t]\tb")
=> a [ ] b
>> re.match(t)
=> #<MatchData:0xb7e57c90> This, at first, had me glaring at the docs for Ruby's regular expressions to make sure \S was implemented as I expected. The second version was more restrictive, yet it matched where the first version failed. After a few minutes' fuming, I got the inklings of an idea and tried the following:
>> re = Regexp.new('a\t\S\tb')
=> atStb
>> re.match(t)
=> #<MatchData:0xb7b411dc>
>> re = /a\t\S\tb/
=> atStb
>> re.match(t)
=> #<MatchData:0xb7ae8c1c> Then it hit me. See, Ruby double-quoted strings interpret escape chars. The reason \t worked above was pure coincidence -- it evaluates to a tab literal, which is treated the same as \t in a regex. Thus, in my original case (turning a bunch of strings into regexen) I could use single-quoted strings if I didn't have to interpolate values into them. Unfortunately I do. For reasons unrelated to this post, I can't use regex literal syntax which allows interpolation but does not evaluate escape characters.
Once again, Ruby foils me. This time, however, it managed to foil a righteously indignant rant about regex escape sequence handling. Go figure. |
|
|
| On Ruby |
[Apr. 22nd, 2008|10:00 am] |
A programmer I admire greatly twittered the following today:
Dear Ruby haters: I used to be afraid of the table saw until I learned how to use it safely. This did not involve nerfing the table saw. I have great respect for this man, but we have a fundamental disagreement about a language he uses by choice and I use by necessity. I have a bunch of disagreements with the language, but my real problem is with the community. To continue with the power tool analogy, Ruby is a table saw with all sorts of dangerous fittings on it. It is powerful, and in skilled hands it can be exceedingly useful. However, part of the power is that it is easy to replace bits of the drive train or add extra blades as requires. Now, this is a non-issue in skilled hands. The problem is that many of the extra fittings and modifications are not built by skilled hands, including the manufacturer-supplied brand-stamped officially-approved ones. These dodgy fittings usually work fine, but can occasionally unleash a maelstrom of whirling metallic doom.
Back to reality from the analogy. The Ruby culture seems to value "clever" code and metaprogramming even when there are better alternatives. Individual programmers or competent teams can overcome this bias, but eventually they are going to want to use third-party code or some of the darker corners of the standard library. This is where the pain really starts. |
|
|
| More Ruby sadness |
[Apr. 8th, 2008|02:13 pm] |
Some more hard-to-debug behaviour in Ruby, this time coming from a simple tyop. Consider the following function, noting the misspelling of elsif:
>> def testfoo(foo)
>> if foo.nil?
>> puts "foo is nil"
>> elseif foo.empty?
>> puts "foo is empty"
>> else
>> puts "foo is #{foo}"
>> end
>> end
=> nil Now, consider calling this function:
>> testfoo("foo")
foo is foo
=> nilIn this case, all is good.
>> testfoo("")
foo is
=> nilIn this case, we get no error but the wrong thing happens.
>> testfoo(nil)
foo is nil
NoMethodError: undefined method `empty?' for nil:NilClass
from (irb):4:in `testfoo'
from (irb):14
from :0
>> Only in this case do we see a problem, and it's not a problem that makes sense, because we're testing for nil above.
The issue here is that Ruby fails to check whether the functions you're calling actually exist. Of course, in the world of method_missing, this is impossible. It could, however, at least check if method_missing is defined. The really dangerous thing is that nil might be an edge case and thus almost never occur, which just goes to show that proper unit test coverage is critical. |
|
|
| New term: user-coddlement |
[Apr. 1st, 2008|03:39 pm] |
I came up with this on the spur of the moment and it seems that nobody else has used it within the reach of your favourite search engine.
User-coddlement (n): An annoying modification made to a piece of software specifically to assist users who really should know better.
Examples:
- "Are you sure?" confirmations on non-destructive operations.
- Randomly chosen defaults to "protect" people from having to make decisions.
- Adding verbosity to everything in case the user hasn't looked at the documentation.
|
|
|
| Ruby and Documentation: a rant |
[Mar. 31st, 2008|09:58 am] |
Why is it that people seem to think that autogenerated "API references" count as documentation?
If I am looking at a new library, I want a tutorial and an overview of the major features. I'm happy to dive into code and API references once I know more or less what I need to look at but please, for the sake of all the little children and their puppies, give me somewhere to bloody start!
Now that I have that out of my system, it's time to dive into this mess and see if I can figure out what I need. |
|
|
| Open note to devs |
[Jan. 30th, 2008|11:02 am] |
When debugging server-side errors, it helps to be watching the logs on the server you're actually talking to. |
|
|
| Ruby rant: Timeout::Error |
[Jan. 24th, 2008|05:44 pm] |
I am currently filled with hate and loathing. I'm going to need a few moments to cool off before I carry on working.
I was trying to make an http call from an irb session full of state I had carefully collected. Most of the code I used is in a script, but I had been doing a substantial amount of exploration over data from a few different place. While waiting for a large dataset from a busy server on the other end of a high-latency link, I got a timeout in the HTTP receive stuff. This in itself would not be a problem, except it crashed my irb session.
The problem is with the exception that a timeout raises:
module Timeout
class Error < Interrupt
end
end The initiated will know that (a subset of) the Ruby exception hierarchy looks something like this:
Exception
- StandardError
- RuntimeError
- ZeroDivisionError
- ScriptError
- SyntaxError
- SystemExit
- SignalException
- Interrupt The important bit there is that all the stuff you can reasonably expect to recover from is under StandardError. Because of this, a default rescue block will not catch anything that isn't a StandardError. The observant reader will notice that I helpfully showed Interrupt's position in the hierarchy. The observant reader will also notice that it is not a subclass of StandardError. This means that you need to catch it explicitly, or it will cause a crash.
Now, my theory is that Timeout was written by someone who misinterpreted "Interrupt" as meaning "interrupt what I'm doing", not "interrupt my application". This would be understandable, although still not acceptable, in a third-party library. Timeout is a core standard library module, however.
To summarise: If you're using the standard Ruby timeout mechanism, which you are if you use the HTTP libraries and almost certainly a whole host of others, you need to explicitly catch Timeout::Error or have a dodgy network bring your entire application crashing down around you. |
|
|
| Houses and hacking |
[Oct. 7th, 2007|04:11 pm] |
As of this afternoon, I have the keys to my new (temporary) residence in Observatory. It has two spare bedrooms, so if you want to visit me in Cape Town, do it before the end of January. :-)
I have been playing with Project Euler quite a bit recently. It's a fun little project where they give you a bunch of short mathematical programming problems. Most of the problems are less than an hour's work to solve, although there are a few that take longer. I've been using them to get some Erlang experience, and you can find my solutions in a darcs repository at http://darcs.jerith.za.net/projecteuler/ or you can grab projecteuler.erl directly. |
|
|
| Happy hacking |
[Aug. 19th, 2007|06:22 pm] |
I wrote a satisfying amount of Python this weekend. I am currently in the throes of rewriting bits of my website backend, and one of the things I'm replacing is the syntax highlighting code. The present system shells out to vim and munges the HTML it gets back into a useful form. The new plan is to use pygments, a python syntax highlighting library. The problem with pygments is that it doesn't handle all the languages I want to display.
One of these languages is Erlang, which I have a tendency to wax enthusiastic about. Since there are unlikely to be too many other people who want to highlight it in a Python library (although Haskell's there, so maybe not) I decided it would be worth the effort of writing a lexer for it. This was easier said than done, however. The vim and emacs syntax files were both slightly broken (I'll be looking at them next) so I ended up doing a lot of the work by referring to the docs and experimenting with the interpreter.
Along the way, I ran into an issue with the Java lexer. Some badly formatted code tickled a pathological case in one of the regular expressions and took exponential time in the length of an exception class name. Anything longer than about 24 characters (which includes a lot of Java's standard exceptions) and the lexer would never come back. This was resolved with the help of the people on #pocoo (pocoo is pygments' parent project) and ended up in a simplification of and fix to the Java and C# lexers.
Anyways, I've submitted the patch for the Erlang lexer and I'll see if it gets into trunk. It was a fun experience, combining two of my favourite languages. |
|
|
| Higher order functions in bash |
[Jul. 13th, 2007|10:07 am] |
I recently had occasion to refactor a bunch of bash scripts, and one of the problems was that certain loop constructs were cropping up a lot. A quick search didn't turn up a way to do higher order functions in bash, so I (re)invented one:
higher_order() {
# some code
eval "$1"
# some more code
} The eval evaluates the code passed as a parameter, although for anything nontrivial this should probably be a function name.
Here's a slightly more concrete example:
do_twice()
{
for i in `seq 2`; do
eval "$1"
done
}
hello() {
echo "Hello $1"
}
do_twice 'hello world' Update:
I wrote up a more complete version over at my website and have just added a bunch more toys to it. |
|
|
| Python one-liner |
[May. 23rd, 2007|10:21 am] |
Today, I needed to get some information out of my Amazon AWS usage report. (Bandwidth usage, in this case.) I wanted pretty output in gigs (usage is measured and reported in bytes) and I didn't want any of the extraneous stuff. Since AWS helpfully provides a CSV, I wrote a cunning little Python one-liner to extract it:
python -c 'print "\n".join([str(int(line.split(",")[-1].strip())/(1024.0**3)) for line in file("/tmp/report.csv").readlines()[1:]])'
You need to read this from the inside out, since it's a list comprehension. First, it reads all the lines from the csv file and throws out the first (because I don't want the header). Then, it splits each line at the commas and only looks at the last. This is converted to an integer (stripping off whitespace first), divided by the number of bytes in a gig and turned back into a string. These processed lines are collected by the list comprehension, have newlines stuck in between them to make one long string and printed.
0.472050191835
1.29025535937
2.04500017408
2.04324277025
1.82879307121
2.03375558369
0.681352665648 Yay Python! (The Perl people could probably do it in fewer characters, but it would look more like line-noise.) |
|
|
| Concept overloading madness |
[May. 9th, 2007|02:30 pm] |
I recently had cause to go delving into the documentation for Ruby's open() method. While I found the information I needed fairly quickly, I was appalled to discover that open() will happily open a pipe to a subprocess (if the path parameter starts with "|") and even fork your app (if the path parameter is "|-").
Perhaps I'm old fashioned and don't belong in the world of modern (and not-so-modern -- apparently Perl does this too) interpreted languages. The way I see it, open() should open a file. popen() lets you fork a subprocess. fork() should be there to let you fork. If I'm opening a user-provided filename, I don't want to have to worry about checking the filename for a bunch of magic characters that completely change the behaviour. If you require 'open-uri' you also get to open arbitrary URIs as if they were files. (This isn't as bad -- at least you need to explicitly require it. Unless one of the libraries you use does that behind the scenes.)
This "open anything that could possibly return a file-like object" idea is also a very leaky abstraction. Consider error conditions. Opening a local file will never die with "connection reset by peer" or "DNS lookup failed". However, the programmer now has to either deal with these or ensure that he never lets a non-local path slip through to open(). The alternative is to mask all these errors and return something generic. This is worse, since now the programmer can't behave differently on timeouts and missing files.
I can see the utility in having a method that will open anything you can throw at it. However, this should be explicit, such as open_anything() and perhaps a little more configurable. Perhaps it could check the path against a series of regexen and dispatch to lower level open_something() methods. Perhaps it could take a list of exception handlers to apply by default. At any rate, it certainly shouldn't fork. |
|
|
| Excluding a strings in a regex |
[Apr. 19th, 2007|11:24 am] |
In the spirit of Karnaugh's recent blog post about RTFM and GIYF, I am posting the solution to a problem I had at work today.
In any system that generates more than the most trivial volume of logs (especially if said logs contain debug info and other cruft that is only useful in very specific situations), you need some kind of automated parsing. This usually means matching each line against a set of regular expressions.
Consider the following log format: <timestamp> <loglevel> <message>
In our toy example, loglevel could be DBEUG, INFO or ERROR. We want to match everything at ERROR level (because we are using third party libraries that may log errors we don't yet know about) but we want to ignore a particular spurious message caused by a bug that has not yet been fixed.
The regex to match all error lines is easy: /ERROR/
The regex to match all error lines that don't contain the substring nasty false alarm is a little more difficult, and requires a PCRE extended pattern. Specifically, a look-ahead assertion. The syntax for this is (?!<pattern>).
Thus, our final regex becomes: /ERROR(?!.*?nasty false alarm)/
The .*? is a non-greedy catchall that allows the substring we want to exclude to occur anywhere in the rest of the line and not necessary directly after the ERROR. |
|
|
| Bogochat and Erlang hacking |
[Mar. 10th, 2007|08:32 pm] |
I spent a large part of today sitting in bed with the laptop playing with Erlang. The result can be seen on my website.
It took a few iterations to get this far, and the experience was incredibly enlightening. When I have time and my brain isn't so tired (tomorrow, perhaps) I'll write up a proper tutorial running through the same steps my vaguely directed hacking took. |
|
|
| Set operations in bash |
[Mar. 9th, 2007|02:01 pm] |
This post is some Linux geekery inspired by a problem a coworker solved today.
The problem was that he had a file full of stuff (one item per line) and another file full of partially overlapping stuff and wanted a list of the stuff that appeared in the first file but not the second, which is essentially set difference. This was to be done in bash by preference, as it was part of a longer script and perl/ruby one-liners look ugly in scripts. You may want to take a few seconds to try figure this one out before you look at the explanation below.
cat foo bar bar | sort | uniq -u
Starting at the end, uniq -u outputs only lines that have multiple consecutive copies in the input. Since the lines need to be consecutive, the input needs to be sorted. The cat portion is the trick. We cat bar twice to make sure that it will never contribute a line to the output. Combined with foo, this will give us one copy of any line that appears only in foo, two copies of any line that appears only in bar and three copies of any line that appears in both. There is only one real requirement, and that is that foo contains no duplicates to begin with. This is fairly trivial to arrange and is left as an exercise for the reader.
This is not the only set operation possible, however. You can also do the following:
- Intersection (requires no duplicates in either file): cat foo bar | sort | uniq -d
(Have a look at man uniq for details on the flags it takes.)
- Union (no input restrictions): cat foo bar | sort | uniq
(The sort and uniq are not strictly required here, but they keep the output format the same. Also, the sort | uniq can be replaced with sort -u for a small efficiency gain.)
Complements don't really make much sense since you can use difference to filter out the set you don't want from pretty much everything. Shell scripts seldom need to deal with infinite sets and they'd probably take too long to run anyway...
This post brought to you courtesy of caffeine, day-job problems and mithrandi. |
|
|
| Erlang and online games |
[Jan. 11th, 2007|06:04 pm] |
Taken verbatim from an email I sent to a programmer's mailing list I'm on:
I have decided that the only way for me to finally get around to
learning Erlang is to write a nontrivial project in it. To this end, I
have decided to implement a multiplayer online game of some kind.
I'm only really interested in the Erlang backend, so I'm thinking a
text-based MUD of some kind that players can telnet to, but if someone
wants to write a graphical client or something that's also good. A
frontend can be in any language/platform combination as long as I can
run it in Linux for testing.
I could probably do the whole backend thing myself given enough time,
but collaborators make things easier and more fun, especially to bounce
ideas off and discuss bits of code. This is a learning project, so I'm
not expecting anyone to know Erlang at all before they join.
Also, people interested in the game design and worldbuilding aspect are
welcome to help out in just those areas without having to write any
code.
The offer stands here as well. Tell your friends, the more the merrier! |
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| [ |
go |
| |
earlier |
] |
| |
|
|