Category / Software Engineering

Columnized text in Minecraft Chat February 4, 2015 at 8:11 pm

Minecraft uses variable width fonts in the chat output during gameplay. For plugins that benefit from displaying tables of data this will cause issues. It isn’t possible to just use spaces to align columns of data. Any text that contains characters outside of the ‘normal width’ will cause columns to be pushed to one side or the other.

Most characters are equivalent to six pixels in width. The ones that cause problems are not the same width. The biggest troublemaker is ‘space’. A space is rendered equivalent to four pixels wide. You could almost ‘fix’ the problem filtering out troublesome characters, but the output would unpleasant.

While searching for a solution I ran into an implementation that came very close to fixing the issue, but still had minor misaligned text which was highly visible and distracting to me. Not being happy with the improperly aligned text I started doing some tinkering and testing various unicode characters for width. I was able to stumble onto an unobtrusive character that seems to be on most platforms AND (more importantly) is the equivalent to one pixel wide.

In addition to fixing the problem with misaligned text I was able to add the ability to ‘properly’ align numbers to the right (for those of us using left to right rendering fonts).

Tables of output for the SimbeyMod teleport feature:
tp1

The implementation allows for left, right, and center alignment for columns. In addition the behavior keeps the table only as wide as required for the widest display area needed and will collapse ‘less important’ information as required.

Dynamic sizing of the ‘Name’ Column:
tp4

Dynamic removal of ‘less important’ Column:
tp5

I ended up with following solution:

Usage:

Ernie

Interesting (To me) July 6, 2014 at 2:10 pm

Looking through code recently written I noticed that I was using std::for_each where the new syntax of range based for would have sufficed.

Given how it was used, I don’t understand why I didn’t use the new syntax…

std::vector < int > vec;

for( int c : vec )
{
    //...
}

vs…

std::vector < int > vec;

std::for_each( vec.begin(), vec.end(), [&]( int c )
{
    //...
} );

The new range based for is much easier to understand in this case. As far as code generation, both are fairly close to identical in an optimized build. If a lambda expression is required either expression can be used (‘just’ call the lambda in the loop). The only real place the std::for_each ~should~ be used is likely when the range is a subrange. I have many times wondered why for_each didn’t have an overload like so:

std::for_each( vec, [&]( int c ) 
{
    //...    
} );

or…
using string_map = std::map < std::string, int >;

string_map items;
//..

auto es = items.equal_range( "E" );

std::for_each( es, [&]( string_map::value_type& v )
{
    //...
} );

Without aliases (or auto) try typing in the second expression. The pair<...> returned by equal range as a raw type is exceptionally painful to type, let alone read.

In general I prefer simpler looking code over more complex. (Ever looked through the std:: headers? Yipes!) Often while reviewing code a few days after writing it, I will notice that I could have done things a little better. When I am in control of the project I am not afraid to refactor.

Ernie

I need to dig up my copy of the interview mentioned… June 6, 2014 at 6:52 pm

Reading these quotes from Bjarne Stroustrup today made me very happy.

From http://www.stroustrup.com/bs_faq.html#really-say-that 

“There are more useful systems developed in languages deemed awful than in languages praised for being beautiful–many more”. Yes, in MIT Technology Review interview and elsewhere. There, I [Stroustrup] also said:

“I think we should look for elegance in the applications built, rather than in the languages themselves”. I [Stroustrup] should have said “more than” rather than “rather than”.

“To use C++ well, you have to understand design and programming technique”.

“C++ is designed to allow you to express ideas, but if you don’t have ideas or don’t have any clue about how to express them, C++ doesn’t offer much help”.

Stroustrup a hero? Naw… But I probably agree with more of his ideas than disagree. 

Ernie

HTTP 2.0: It’s about time… May 25, 2014 at 6:22 pm

Time is money.

I’ve spent a lot of time thinking about protocols old and new in the last few years. Granted, I am not an expert in this field, but i’ve been there and done that a number of times in my career. One of my most recent forays into the fray included using ZeroMQ as a foundation for a much higher efficiency transport of data between the “ultimate” client and final server. (Blah, Blah, Blah) I know that the fundamentals of this project were sound. A huge problem was momentum. REST (in peace, please!) and the REST style API is hugely inefficient, but “the world” is hyper enamoured with this style of programming at the moment because it helps developers “enjoy” a form of scale without too much thinking. (I dig Amazon’s Kinesis even if it is “extra inefficient” about the way it transfers data.)

I recently spent sometime at a company operating a retail web API. I wasn’t around long enough to fully ingest their system, but what I saw wasn’t hideous. In the grand scheme of things the exceptionally inefficient way of doing things hadn’t made a huge impact (in most cases) on the perceived performance. (Let me be clear on this. I was impressed with the general way the API was constructed AND the API’s were generally successful, if not “perfect.” What system is?) It would be hubris to say that I could have done better. The key was that the API(‘s) WAS mostly effective in helping to generate (highly likely) many 100 millions of dollars in revenue. Given that kind of metric who really gives a flying fig about ultimate performance? If the performance is good enough to drive revenue for a Fortune 500 company and not piss off too many customers does it really matter? Funny thing is yeah, ~I~ do think it matters.

Why am I even bothering to write this? I think the world is just about ready to go full circle again. This is my conclusion based on a number of things and my own “prejudice” towards (what I consider more) efficiency.

Back before “the world as we know it now” there “was” these “things” called DCOM and CORBA. Both kinda suck(ed), but I also think that both were reasonable attempts to accomplish what ~I~ see REST style API’s doing. ~I~ think that the biggest issue with both was that ~most~ people couldn’t see past solving the problem at hand. Some form of metadata driven programming has to exist for the inter-op. What I keep on seeing is the push and pull between “pure” meta-style-programming and getting it done. I can cite a number of examples of this that I have seen. In all of the cases, moving declarations out of the implementation has been the hardest “thing” for many of the developers I have worked with to understand. I believe that this is one of the reasons why I continue to hear developers “hate on” DCOM and CORBA. Know what I think is funny? WCF and SCA are “just” variations on the principle of the theme. (This is my opinion, hate on me all you want.)

So, “in the beginning” there was binary and slow. When the computer industry was continuously creating faster and faster computers, with more and more memory the idea of trading ease of understanding with efficiency of transfer has gained a ton of momentum. (Trust me, I am going to finally get to a point about HTTP.) I mean, the idea of transferring data as text is neet-o when it only takes a few microseconds. We have come a ~long~ way from the days of needing to work with 300 baud transfer rates. Um, yeah that was bad and the amount of data we transfer now JUST in handshakes and establishing protocols would bring something that slow to its knees. XML is/was awesome and XML sucks. Having structured unstructured data could be really cool. But… the same (fundamental type of) issues that occur with binary data transfer occur with the unstructured XML. The basic idea of a DTD or XSD is to describe the contract that the very unstructured nature of XML was intended to alleviate. (I am glossing over some of the other ideas that made XML a cool thing, like that browsers could ignore tags and that some form of structure could be added to what was effectively transferred unstructured data via HTML.) A well structured XML file is ~far~ from the “human readable” ideal behind one of the core tenets of XML. Yeah, my opinion too. Hate all you want. I LIKE many aspects of XML encodings (XAML is both awesome and shite), but they can be exceptionally difficult to read and the DOM (for XML not XAML) is… yuck.

Lets pretend that 50 (gazillion) other examples of encoding data as text don’t exist and say that JSON “is the bomb.” (Ok, I am going to welch on that and say I like Lua encoding a ~little~ better, but JSON has more momentum in the web world from my point of view.)

So one of the base ideas is that JSON is a serialized form of a living and breathing JavaScript object. JSON also “pretends” to be human readable and seriously fails (IMO) for more than trivial encodings.

So all these systems get designed with this basic idea that everything on the wire is text. Oh, and while we are at it, it is 7bit text encoding as well. Yeah lets make something that is already “bad” use only 7/8’s of the bandwidth. It doesn’t matter… What is a few microseconds between friends? Oh, and if THAT wasn’t bad enough… when you need to transfer anything that is REALLY binary in nature then you likely must encode THAT in a subset using only 6 bits of the already hobbled encoding. Gack. base64 makes me want to puke (but I except that it is required and my gag reflex has long been suppressed even if I whine about it).

You say it doesn’t matter? You probably don’t work at a company that is potentially transferring yottabytes of data between computers during the lifetimes of the systems. (I swear that when 32bit computing made 4G of memory available it was hard-ish to imagine needing that much memory when typical hard drives of the time had not reached that milestone.) I’d say it matters enough that some company’s KNOW the benefits of reducing the bandwidth required. Google’s Protocol Buffers are a real manifestation of this (IMO).

Now I am going to finally mention why the ideas in HTTP/2 makes me all a flutter. “Compressed” headers, long term connections, internal multiplexing of the IP stream, “magically” compressed frames, and binary framing. If you can read between the lines of this jumble of a blog post, you might make the connection that I think that HTTP/2 is going to help fix some of the issues. The flow control idea would be a massive help in some of the systems I have had opportunity to work with. Think embedded systems that have HTTP servers “built-in” but still use alternate connections that aren’t HTTP for the bulk of the data transfer. (You know who you are!) About the only thing I loathe (because of every OS/CPU ~I~ have worked on) is the network byte ordering of the protocols. I suppose I just HAD to not like at least one thing.

I fully suspect that a number of folks that think binary is evil are likely going to hate on http/2. I mean, just look at the Huffman encoding and using the utf8-esq length encoding instead of only string literals for name/value pairs. What sacrilege.

However, at the end of day I also suspect that the VAST majority of developers won’t directly encounter this as the tools used will largely hide the details. The benefits to the RESTful based APIs of the world could be massive. Finally making the concession that a server might have information that a client “wants” at sometime in the future “breaks” the http is “stateless” mantra. I think we passed that rubicon many, many years ago.

It’s hard to imagine that Chrome (my current browser of choice) and IE wouldn’t support http/2 nearly instantly (SPDY anyone?). I don’t really follow the “browser wars” anymore. I can’t imagine any “serious” shop not making http/2 a priority for servers. The changes aren’t (IMO) as aggressive as what is required to support IPv6 (why aren’t we there yet!? — rhetorical: ‘people’ suck and it won’t truly happen until it is too painful to ignore anymore). It is entirely plausible that http/2 ~could~ be something that just happens completely between the browser and server without much ado. Then again, nothing comes for 100% free.

It’s about time.

Ernie

Lua base64 __again__ D’oh! January 16, 2014 at 10:09 am

I am at it again. Sigh.

Optimization is trial and error. I have an encoding project that was taking a good 60+ seconds using BASH when a “huge” number of files uncovered the inherent inefficiency. Moving “invariant encoding” out of the main loop helped. The time was down to around 30 seconds. However, I ~knew~ this was a poor implementation. The problem was all of the sub-processes and string movement.

In comes Lua to offload the issue into a “real” language. The entire process dropped to around 6-7 seconds using the existing encoding schemes. This seemed a bit much to me still, and I went digging. The existing routines used “cool tricky Lua” using string.gsub. This ~is~ cool, but can take a large amount of memory. It was at this point I opted in to the “I can do it better.” You can read about that below. Today’s “rant” is about shaving even more time off encoding. (Encoding is one of the “points” of the project in question.)

So first the results based on the “brute force” 820K large file sent through the routines:

base64 (GNU 8.21) base64.lua (github/ErnieE5) base64.lua (github/paulmoore) Lua wiki
0.055s 0.245s 0.916s 2.586s

And then the delta from my latest optimization in the “real world” of the app.

Before ~0.984s after ~0.877s. A rough tenth of a second based on this apps “real world” usage of encoding around 110,000 smaller (30-80 bytes) strings into base64. A tenth of a second is just enough to perceive the difference.

If I ~really~ need the app to scream, I ~know~ that well under 100ms is plausible if I move to a compiled version. For this application BASH scripts used to be good enough. Lua will likely suffice for now. (Until the number of encodings hits over a million!) I’d likely try a Java approach before I resorted to having 3 different binaries.

Lua isn’t a self optimizing language which can be fun. (Or at least be a challenge to me.)
Ernie

Seriously? January 15, 2014 at 12:45 am

Sometimes I take things “to extreme.” Given a “coding challenge” and way too much extra time on my hands…

I needed a natural Lua base64 encoding and wasn’t happy with the ones I found.

Using the same base file (818K) as input (on the same machine) I started at around 1.3 seconds and was able to tune that version to just over a full second. After a good night sleep and a little competition (in the form of finding another implementation performing slightly better) I tuned the code again eliminating about 50% of the processing overhead. This is already crazy, but I decided that I really needed to improve the performance even further. In “natural Lua” I am able to encode the same file in about 350ms. I am not certain if I can find any more areas to improve, but my thinking was already in that direction before my last “inspiration.”

That is not “incredible” performance in the “real world.” However, with the constraints of the system I am working in, the performance increase is pretty cool (IMO). I can’t say my implementation is “the best ever.” There are a crap load of implementations floating out there. There could be another that performs considerably better. I am wondering if a generated table could remove all of the math in the main loop. The problem is diminishing returns. At the moment I have (more than) good enough
performance for my immediate needs.

base64 encoding in Lua par duex January 13, 2014 at 10:54 am

Sleep. Helps the brain. After getting a decent nights sleep, I considered the algorithm I used (which was considerably faster than anything I previously found) and figured it could be better after finding another version on github. Re-thinking my main conversion routine doubled the performance. Natural Lua code is still orders of magnitude slower than optimized and compiled “native” code, but these routines are “usably fast-enough” for the purposes I had in mind.

base64 (GNU 8.21) base64.lua (github/ErnieE5) base64.lua (github/paulmoore) Lua wiki
0.055s 0.555s 0.916s 2.586s

When I optimize code I like to figure out what is redundant. (A good optimizing C/C++ compiler can make you lazy at times.) Any optimization is about understanding the platform and usage. With natural Lua (i.e. Not LuaJit or a binary c library) there are a few main factors for optimization.

  • function calls
  • variables
  • branching

These optimizations apply to very tight loops. Most of the time, I prefer to have “lots” of functions and keep code as reusable as possible. (The functional programming aspects of Lua are very handy at times.)

Every function call is going to incur the  overhead of stack management. If the routine is fairly small, the overhead of the management code can add significant time to a routine. This applies for all programming languages, but with a “scripting” language that doesn’t have deep algorithmic analysis you need to keep this in mind. It adds up when millions of iterations call a method.

Variables in Lua are all reference counted. Tables and strings can have significant impact to runtime performance if a loop creates and discards these elements for each iteration. (One of the reasons table.concat is much better for large string assembly over using s=s..”more” in a loop.)

Branching can have significant performance implications on tight loops. A specific case of a branch that will slow Lua down (in comparison!) is the idea of ‘if not end then … else … end’. If the main loop of a routine has this type of logic for every iteration and the branch only fires the end routine at the very end, it very well could shave many cycles off the routine. If possible, have the loop end ~before~ the “special” condition and then handle the special condition at the end. (e.x. in base64 encoding all inputs are 3 bytes and the tail has special padding for inputs that are not evenly divisible by three)

The moral to this story? No matter HOW good you ~think~ you are, in most cases you can do better. (Also, no matter how much time you spend searching for something “in the cloud” you are going to miss something.)

Ernie

base64 encoding at 2:13 am

I needed a “natural” Lua base64 encoding routine for a fun project I was working on so I went looking for something in the cloud. I found a “ton” of examples and quickly became disappointed. Man they reeked of poorly performing code.

Part of the cruft was that many of the routines had been written prior to Lua 5.2. In Lua 5.2 the bit32 library was added. This allows some basic bitwise manipulations. My goal was to have the routines in “pure Lua” because I didn’t want to have to build a C library for each of the three platforms that this code will run on. I also wanted to stick with Lua because a good portion of the project is already in Lua.

The best (fastest) routines I found after a search were the first here. (http://lua-users.org/wiki/BaseSixtyFour) Even these just didn’t “feel” right to me. The base64 command line tool from GNU can convert a 800K file in a tiny fraction of a second (0.055s). The fastest natural Lua routines I found came out to about 2.5 seconds (2.586s). This just seemed way too slow to me and would add a significant amount of “user” annoyance with the larger encoding needed.  

The thought of having to have three different io.popen() routines just seemed like a bad idea.

The above mentioned routines also had a side effect of generating 6.5M of intermediary data during the conversion. Yipes! The result output is only 1.1M! The 6.5M is from an  encoded string that is used during the conversion. I thought the bit32 library might come to the rescue.

My first step was to break the routines out and see what was going on. Just replacing the binary string with bit32 wasn’t enough. The best timings I got only improved the performance by about 200ms. Sigh.

Lua is not an optimized language. By this, I mean that the “compiler” doesn’t sift through the code and remove “redundant” temporaries or have good loop optimizations. My feeling was that these issues were major contributing factors.  I was right.

The net result? The same 800K file:

base64 (GNU) base64.lua (E.E.) Lua wiki version
0.055s 1.093s 2.586s

I can’t make natural Lua work as as well as native code, but I think I made an improvement. Not all of the improvement was because of the bit32 library. A general restructuring of how the encoding and decoding works was required.

Interested?

I have published the library on github. (https://github.com/ErnieE5/lua_base64.git)

Ernie

Technical Interviewing December 10, 2010 at 7:40 am

I’ve been thinking a lot about some of the questions I have been asked in the past for technical interviews recently and I have come to the conclusion that they mostly suck and I have no idea how to make them 100% better.

It is not that the questions aren’t relevant, but that they don’t (to me anyway) generally do a good job of showing what a candidate offers.  For those fresh out of school, one of the generic algorithm questions is probably a good way to see what a candidate will offer.  It give will you an insight to what they have likely been exposed to in an academic setting and is likely a good way to engage the candidate because it will be similar to what they were exposed too.

For a well seasoned developer I am convinced that they are a complete waste of time (and at least in my case will can make me seem like an idiot).  Unless you are constantly exposed to the types of scenarios that require the use of the raw algorithms it is likely that they will be extremely rusty.   The following is a highly contrived example of one of these types of questions.

“Write a function to count the number of spaces in a string.”

If and of itself this function is fairly simple at first glance:

The problem with this type of question is that no one should be writing code like this.  When writing on a white board simple is better, but simple only gives the verification that the general process can be followed.  A better example of this would be as follows:

However, I would contend that even the above shouldn’t be in most production code.  If the parameter validation fails then zero is returned.  Programming errors are masked.  Not to mention that raw string handling is rarely a good thing.  Are your really hiring the candidate to re-invent the wheel?

At the end of the interview what have you really learned about the candidate that tells you how they would function in a complex system.  Nothing.  (IMO)

But how do you determine if a developer is able to follow complex ideas?  Hell if I know.  I wish I did. The closest question I have seen that was a good screen was something like this:

“Take this RFC and implement this subset transformation.  Use your language of choice and write highly efficient bullet proof code.”

It gives the candidate a chance to think about the problem and present a good polished solution.  The drawback is that unless you sit them in a room with a camera on them you can’t really tell if it was that person doing the work or not.

I think the best way to really get to know if a developer is good for your team is “try it before you buy it.”  Contract to hire is likely the best bet.

I know that I have done poorly on interviews that I am fairly sure I would have been a perfect fit for the job.  The problem was the technical screen and the stumble bum way I came across.  I have a hard time “disengaging” my brain and writing what I consider to be substandard code.  However that “substandard code” is what goes well on a white board, because you are trying to illustrate an idea.