Achieving enlightenment

sudo apt install enlightenment

🙂

Posted in Linux | Tagged , | Leave a comment

On credit, attribution, and ownership

Over the past few months, I’ve found myself questioning more and more of my previous beliefs I’ve held for so long. This is due in part to taking Software Development with Matthias Felleisen, but also partly just because I’ve hit the point in my life where everything starts to get real. One of the major topics I’ve thought about is my opinion on attaching my name to everything I’ve created.

When I was younger, I used to think attribution was everything. I always insisted on being credited for every little contribution I would ever make. I vividly remember making sprites for a friend’s game, and demanded I get put in the credits, despite making maybe like 1 or 2 things. I also remember making one of my friends put the entire text of the GNU GPL 3.0 license in his CSS style sheet because I wrote it for him. I also was very strict on supposed “copyright infringement.” I recall berating one of my friends because he had the gall to as so much as take inspiration from my choice of colors on my homepage.

Thanks to Software Development, I had been introduced to the concept of egoless programming, which immediately struck a chord with me. Egoless programming effectively separates the programmer from the program. For example, egoless programming would ask “why does the code do this?” instead of “why did you do this?” This idea made me reconsider why I write code. I have no motivations of fame or wealth or even adoption, I just make something that I think is cool and want to share it with anyone else who thinks it might be cool. If this is my sole reason for creating something, why should I attach my name to it. So people later down the road can go “Oh, this Collin guy made this cool thing?” Not exactly something I care about. I genuinely do not care about any sort of fame or recognition I would get from being the guy that creates the new cool thing, just creating the new cool thing is reward enough for me.

Another thing pushing me away from the idea of attaching my name to everything I’ve done is just looking at the other people (including my old self) who do it. I have seen internet wars erupt over “trademark disputes” or people attaching credit onto the stupidest of things. Over two years ago, I helped one of my friends make a quick poster for his band. I used to work as a graphic designer, and tend to do work like that as a hobby, so I graciously agreed. The design itself was super basic, just 5 basic Keith-Haring-esque people standing in front of a basic sun and sky with two massive speakers to each side. I really liked the design, and was super stoked that they decided to use it. Flash forward to now, and they use it for everything. The credit to me has also been lost in translation. At some point, the band members started crediting my friend with having made it, and that credit continues to this day. I realized this when viewing their Instagram page and seeing that my friend was credited in the description with having made it. I was going to send a message and go “hey… I actually made this”, but in that moment, I stopped and thought about what a simple shout-out in the description really means. Maybe one or two people would drop by my unrelated Instagram page. Maybe one person would go “hmm… I’ll remember that name”, but ultimately, nothing in the grand scheme of things would happen. Its a single image on a social media platform I don’t even like to use. The fact though that they are still using the design I made for them so long ago is humbling to say the least, even if I’m the only one who knows it.

Going forward, I am not sure what I will do. I may begin releasing some of my works into the public domain. I may begin to isolate, and develop things only for me, not caring about the wider world. I am not sure. At the end of the day, a name on a page is merely a grain of sand in the vast desert that is our timeline. To realize this is to be freed from the burden of trying to inform everyone that it is there.

This entire post was inspired by a name being accredited for a website which is designed to act as a collective for a group. I will not name names, but those who know will know.

Posted in Misc, Programming | Tagged , | Leave a comment

JavaScript’s “history” methods are cool

Recently, I was writing a paper which required me to do a little historical background research. Through this I ended up on the website for Encyclopedia Britannica. As I was scrolling the page, I saw something that I hadn’t seen before. As you infinite scrolled down the content page, the URL bar at the top of your browser would update to reflect what section you are in. All of this was done seamlessly without refreshing the browser or anything.

Now, I had seen similar things like this using full page transitions. I had always assumed it was a function of caching or otherwise, but it turns out, its a built in JavaScript method. One that is pretty damn cool.

The methods in question are history.replaceState and history.pushState. history.replaceState , like the name implies, replaces the whole URL with a new one and does not generate a “back” state for you to return to when you hit the back button. If a user was scrolling a page, a website could use this to update a section link as the user scrolled. history.pushState is a little more interesting, it creates a new history entry and replaces the URL with a new one. This means the back button still functions, and will return you to the previous URL, which also won’t reload the page. This means you could have a whole website that only really loads an entire page once, and just fetches and renders content, and the user would have the same experience as a traditional website.

If you want to see this in action, try the following code snippets on any website:

history.replaceState(null, "Title", "/fakeURL");
history.pushState(null, "Other Title", "/hitTheBackButton");

If this is combined with a listener for the “navigate” event, a full website’s navigation could happen without a reload at all, which is very neat. An example of that could be as follows:

window.navigation.addEventListener("navigate", (e) => {
    // fetch content here
})

I may try to screw around with this in the future, but for now, it can live here. If anyone implements a website using something like this, or knows one that does, please email it to me or drop a comment, I’d be very interested to see it in the wild.

Posted in Programming | Tagged , , | 1 Comment

The strange case of the Gopher “i” type

I finally feel like making another blog post! This marks the first for 2024! Crazy!

If you haven’t heard of Gopher before, you aren’t alone. Gopher competed against FTP and HTTP during the race for the web, and as you can guess, lost. Gopher is an interesting hybrid between the two. Like ftp, it is designed primarily to share files and organize itself like a directory, but unlike HTTP, it offers no markup language to make descriptive pages. Gopher instead decides to send a single character before the name of the file, this character denoting the type. They range from 0 as “text file”, to 7 as “request for text query”, to more modern additions like I for “image” and are always treated as links. There is one item type that stands out though, and that is the i selector.

i is a nonstandard item type which was added to Gopher’s common vernacular at some point between its inception and the current year. It is not specified in the RFC, but is unanimous enough to get a spot on the Wikipedia page. It has become one of the most widely used item types and appears on most modern Gopher sites, including my own. It stands out from the other item types however, as its not meant to be displayed as a link, but rather a line of standard text. This makes it very confusing as to where it fits into Gopher as a whole.

HTTP is a protocol about sharing files and documents, however somewhere along the way, it became a platform in its own right. The days of serving static documents over the wire has given way to fully fledged applications running over what amounts to a network connected virtual machine. A program capable of downloading and running code from anywhere, and doing so in a (mostly) platform agnostic manner. With its meteoric rise and ever expanding feature set, many people have decided to go back and give Gopher another shot. These people however, intentionally or not, are bringing HTTP habits with them.

If you look at any Gopher site created in the early days of the protocol, you’ll notice something. It effectively acts as just a public FTP service. You are simply presented with a file index and a list of files and directories. Despite this, like on FTP, people still hosted informational content, programs for download, and a facsimile of what would become blog posts. To view these files, one would simply download it to their computer, and open it with a dedicated viewer.

If you juxtapose that to a modern Gopher site, created within the last 10 years, one thing will stand out almost immediately. The mixing of content with directory listings. Just like HTTP has evolved in its time, Gopher has also evolved and changed, and this change has fundamentally altered how Gopher sites look and act. If one navigates to a Gopher site now, they are able to read information and follow links directly under or above them. They are able to see a message of the day, or a brief introduction to what is stored in a directory. Gone are the days of “README.txt”, presenting information to the user is now easier and more commonplace. This is partially the reason for the i item type’s widespread popularity, it is a really useful tool.

The i item type’s use however, varies from site to site. As of recent, I’ve noticed an uptick in sites using the i item type in place of a traditional text file for relaying information. This is an interesting change for multiple reasons. Traditionally, Gopher relied on the client just for getting files, not viewing them. When HTTP became the universal protocol, most browsers implemented some way to view various files in the browser, and suddenly it became a requirement to read text files in a browser. When many went from HTTP to Gopher, they seemingly brought this sentiment with them, and would rather have the client display their information than rely on a client to download and read it.

Another interesting idea becoming common place in Gopher is adding links to text content pages. Gopher has no concept of a “page”. Instead of a page, when you view a Gopher site, you are viewing a “Directory”, functionally no different than typing “ls” in FTP. Any other item type is simply a file or redirection to another service (or in the case of item type 7, a request to a directory). The introduction of the i item type allowed directories to suddenly start acting as files, but with more features. In a directory, you can link somewhere; in a text file, you can’t. Some modern Gopher blogs I’ve seen completely forego the humble text file all together, opting instead to use directories for posts since they can curate a collection of clickable hyperlinks, or even just to have the option to do so. Now what would traditionally be a directory listing has turned into a pseudo-document-markup-language, an interesting shift.

This of course begs the question “is this a good idea?” The only major downside of doing this is just that older Gopher clients won’t be able to render your page. Anything from Internet Explorer 6 to NCSA Mosaic will think your page is full of broken directory links, since they predated the i item type. However, modern Gopher browsers like lynx or even Firefox 3.5 can display i types just fine. The most interesting dilemma is simply “is this intended?” On one side, the i type has become the second most used item type, therefore it has cemented itself in the protocol. On the other, Gopher is a protocol for sharing files, writing blog posts in it is not intended behavior. If we stick to intention to dictate what we do with a protocol however, most of modern HTTP would need to be thrown out, since HTTP was designed to serve documents, not applications.

In my humble opinion, the Gopher “i” type is alright. I like and use it actively. In my eyes, it acts more like an FTP server’s MOTD, or Message Of The Day, than a markup language, but it is very helpful for demarcating sections of files, or introducing a common theme they have. However, I also see the limitations of this item type. I do sometimes use older browsers on older systems, and Gopher makes a really nice transportation layer between older and newer systems, since it doesn’t use any form of encryption what so ever. These older browsers do not typically handle the i type correctly, and usually display the entire page as a mass of links. While the content is still fully readable, it suddenly becomes harder to distinguish what a real link is and what a broken i type is. For blog posts and longer form text content, I feel the simple text file is still the best option. In this way, you don’t break your content on older browsers and it remains accessible for everyone. Ultimately, I don’t care what anyone does with their Gopher site. The vast majority of people use modern Gopher browsers that can handle anything. Most modern Gopher users use it as a refuge from the modern web, but still want the mixed media features that come with it. Who’s to say they can’t use Gopher as their own mini-HTTP.

In conclusion, Gopher has lived a pretty interesting life, and with that life has come many changes. While the i type was not specified in its original RFC, it has cemented itself as one of the most widely used types. The use of this type however has seen some fundamental changes to how Gopher sites were intended to be designed. While it may not be supported historically, it has transformed Gopher from a file transfer protocol to one with a de-facto document language. A change that says more about how our use of the internet has changed more than anything.

Posted in Misc, Server | Tagged , | Leave a comment

Micro: The Crow rewrite

I am currently rewriting the core of the Crow lisp language I made. I wrote the language originally over a year ago, I tried to create a hybrid between Lisp and JavaScript, however as time went on, I slowly realized that this hybrid approach was just doing both languages badly. I have recently decided to realign Crow with Scheme, the Lisp dialect I prefer, while taking aspects of Common Lisp that I prefer to Scheme. This new core is not ready to be built or used, but it is gradually being worked on over at the Crow GitHub under the “core-rewrite” branch. If you were wondering why my site has been so quiet over the past few months… this is why…

Posted in Micro Posts | Tagged , | Leave a comment

Software I Use (And Recommend)

This started as a page for my website, but gradually grew in scope to the point where I think this deserves a blog post dedicated to it. I may also make a page for it on my site, but much more condensed with the true explanations here.

Here is a list of programs I use on a daily basis. I mostly am providing this because I have been asked before questions like “What browser is the best?” or “What IDE would you recommend for X language”, so lets compile it all into a list, shall we.

Before we begin, when it comes to software, I have changed what I look for over the years. Back in 2016, I used to be a software minimalist. Going so far as to use almost every application from the terminal if possible. Over time I found that this wasn’t sustainable for me (or really anyone who gets work done) and have gradually moved to more and more normal looking applications. Now, I generally try to use FOSS software, however I even have been slipping on that. Generally I like software that is native to my platform of choice, so I will be splitting this up by platform. I also look for software that is light weight, or at least isn’t extremely bloated for what it is, and also for software that does the job I need it to do.

Operating System

Generally my advice to other people here is to use whatever feels best to you, but here is what I use and recommend if you can’t make up your mind. Do note that I come from a perspective of someone who programs for a living, and not just that, but works on the lowest level of an operating system, so my opinions on systems will be very much biased towards features that most users probably won’t care about.

For Linux distributions, I generally recommend Debian and Ubuntu, they are both remarkably stable and I have used them for years, having started on Ubuntu in 2011 and moved to Debian for my first full-time foray into Linux in around 2015. Both have served me well for the time that I have used them. As for Linux in general, I highly recommend it for everyone, regardless of computer skill. I think Linux has become easy enough to use for the layperson, and offers a much better alternative to Microsoft Windows.

As for the OS I am currently using, I have been using MacOS for almost a year now. Since the M1 Macs have bad support for Linux as of writing this, I have been stuck on MacOS, and to be completely honest, its kinda nice. MacOS is definitely UNIX like, and pretty faithful at that being based on BSD. I have found absolutely no issue porting my workflow over to MacOS other than the issue with some packages not being available for ARM64. I’m looking at you Valgrind. To be completely honest though, I see MacOS get a lot of hate from people in the programming community and I struggle to see why. MacOS is basically just a Linux (or BSD more like, but still) distro from a corporation at the end of the day. It works fine for what I need and does the job swiftly.

I generally don’t advise using Windows since I personally have grown to hate their Win32 API, as well as their developer tools. In my opinion, Windows is a product of a bygone era, one which still clings to life thanks to business and enterprise. In my opinion as well, there has never been a ‘good’ release of windows, maybe bar the first few which were impressive for their times. While I have used Windows for a majority of my life, and do have to admit I have nostalgia for it, I can’t recommend it for use in programming. I eventually will write a blog post about how bad the Win32 API is, and especially how bad the Windows POSIX layer is.

Web Browser

Lets move from Operating System to Operating System (but this time for the web).

Now adays we have very little choice in the web browser market, basically just three engines: Blink, WebKit, and Gecko. I do actively use all three of these browser engines however, mostly just to see how different they all are, but even still I do have a few favorites (and a few ones you should avoid at all costs).

For the browser I use daily, Firefox. I have used Firefox pretty much daily since around 2015, if not earlier. I have always liked Firefox since for a while, it was the only browser that Linux could truly call its own. While Chromium did exist on the platform, dubious open source violations, as well as a confusing relation with the system theming caused it to look completely out of place among the other free software on the system. Firefox has and will probably always be the FOSS browser of choice, despite having its own FOSS violations. Apart from that, despite taking a hit with the XULpocalypse, Firefox addons continue to be more powerful than Chrome extensions in my experience, and Firefox seems to have a much better, albeit smaller community.

As for Blink based browsers, I have to give the recommendation to Chromium and/or Brave. I barely use Blink based browsers, other than to load pages which Firefox cannot, or to log into Google apps. Chromium, being the open source version of Chrome (I generally recommend Chromium with the Ungoogled Chromium patch set applied, however stock Chromium is still pretty good), it does the job wonderfully for browsing the web. Brave is a little weird. While I want to like it, I can’t actually push myself to use it. It is simply a Chromium reskin, with a lot of Crypto integration sprinkled on top. I do not trade in crypto, nor do I use it in any capacity, so most of what makes Brave unique is completely lost on me. Likewise, Brave is a little annoying with how bloated it is becoming, with the integration of many extensions into the browser core itself. Every time I boot up Brave, which is generally not often, I am barraged with popups asking me if I want to enable or disable this new feature, along with sometimes multiple new icons appearing on the toolbar. However despite all of this, Brave is an amazing browser in terms of speed and privacy. I highly recommend it as probably one of the better Chromium based browsers if you don’t want to use Firefox. And it is the browser I use and recommend for iOS as well.

As for WebKit, there aren’t many options, and in general I don’t recommend them at all other than for testing. While Safari is cool, many modern websites won’t work with it, and likewise it is completely proprietary, although the web engine that powers it is not. Epiphany (gnome-web) is also nice, but suffers from a horrible rendering pipeline which causes the simple act of scrolling to be incredibly laggy, despite the browser actually loading and rendering pages extremely fast compared to other browsers. If you absolutely have to use WebKit, just use which ever browser is most compatible on your system. Oh, and do NOT use QtWebKit, it has been abandoned for years.

As an honorable mention, Pale Moon, which is based on a fork of Gecko called Goanna. Pale Moon is an awesome browser for if you need light-weight and powerful. Pale Moon was forked from Firefox before the XULpocalypse, so it retains the classic XUL extensions and platform, giving much more freedom to the user. However sadly Pale Moon has struggled to keep up with the ever changing web landscape, meaning many sites are completely broken in it. Despite this, development continues on, and just recently, full support for Google WebComponents was added, making it much more compatible with the web as a whole. I’d say check it out, but you probably won’t do much with it.

Text Editor / IDE

Emacs. Done.

Well seriously, I mostly only use Emacs for its Lisp interpreter, but it is still fun to say that’s all I use.

As for the Text Editor / IDE debate, I generally say use an IDE where it makes things more convenient, for example when working with Java or C, but a text editor when nothing really matters, for example with HTML or JavaScript.

For IDEs, I have to recommend XCode if you use MacOS. XCode is really light weight and fast in my experience. It works well too with llvm. That’s about all I can say however. I haven’t used it for much more than compiler development.

For text editors, I recommend Geany on Linux. Geany is amazing if you use Gnome or a GTK desktop. Geany works with many languages, and even integrates nicely with GNU Make. Geany however is terrible on MacOS and Windows just because GTK is not great on other platforms, and better alternatives exist.

For Windows, I recommend Notepad++. While I haven’t written code in Notepad++ is well over 5 years, I remember it being nice.

Otherwise for cross platform, use VSCodium. Its a fork of VSCode that removes some telemetry and other proprietary components. Its pretty good all things considered.

Compilers and Programming Languages

For C, gcc or clang are fairly similar, I use them interchangably. They are both amazing.

For Java, OpenJDK.

For Lisp, Scheme, particularly Guile. Racket is nice too.

Generally I program mostly in C, but for scripts and other small projects, I’m known to work in Ruby and Lisp. Most of my web apps are written in JavaScript as well, although I haven’t written one of those in a while.

Instant Messengers / Chat

For IRC, Limechat on MacOS, HexChat on Windows and Linux.

For Telegram, just use the official client.

For XMPP, Dino.

Generally I recommend IRC and XMPP for people, but I have found myself unable to leave Telegram just because of history with the platform. Telegram is horrible and people should not use it. Signal is also a nice alternative, although a bit more limited.

Media Player

VLC or MPV, depends on the type of media.

Games

I don’t really play games anymore. Just put this here to say avoid steam like the plague. Use GOG.com wherever you can.

To be continued…

If you can’t tell by how short these paragraphs are becoming, I really don’t have much to say about much other software besides the ones I talked about so far. Expect this list to be updated when I find other software I need to talk about, or maybe another blog post made about it.

Posted in Misc | Tagged | Leave a comment

Micro: Just an update

Holy crap, this has been a busy year, and we aren’t even half way through it. I have been here even though I haven’t made a single post since new years (sorry). So where has the time gone? Let’s talk about it…

First of all, Crow is progressing well. In fact crow is now much more fully featured than when I last talked about it, and steadily improving. Many of the hack-jobs like the variable system have steadily been replaced by much more robust systems like the closure system. These systems have given way to a multitude of features like lambdas. Next big project to tackle is overhauling the horrible garbage collector which has pretty much remained unchanged from the beginning. Documentation is also steadily being produced, hopefully by the end of the month, crow will be fully documented, and have a fully featured tutorial!

On a side note, I have also been working with Linux kernel development in my spare time. Mostly to create a project I’m calling CrowOS which would basically be an entirely different kernel, just basing on Linux for driver support. In doing so I have learned many things about how the kernel works, how to build a truly minimal one, and even how to develop and hack on kernel modules. I want to make a kernel module development series on this blog since kmod development isn’t very easily documented in tutorial form, probably for good reason, but I want to change that :).

Anyway, keep a close watch on both this blog and the Crow project, I will hopefully be back to posting more regularly soon.

Posted in Micro Posts | Tagged , , | Leave a comment

Micro: First!

So uh… This is the first post of the year I guess. I’m making it my goal to write a lot more this year, and probably add more interesting stuff to my website. Already I got my website working with a generator I wrote, so adding to my site is now easier than ever, and updating it is just as easy. As well as that, I have backed up the entire html directory and will begin pruning old services nobody uses in order to not only save space, but make everything more manageable.

Basically this is the year of cleaning up and streamlining.

Posted in Meta Posts, Micro Posts | Tagged | Leave a comment

Making a parser generator doesn’t have to be hard.

I’m inspired to write this because of Russ Cox’ regular expression blogpost, which follows a similar format, although I’m not about to compare my implementation to others.

Recently, I decided to create a project called CCC, or Collin’s Compiler Creator, mostly because I want to use parser generators, but feel disingenuous if I were to use an existing one like Bison or YACC. I have the same problem as John Carmack, if I didn’t write all of the code, I don’t feel like I wrote any of the code. But aside from that, I embarked on making a parser generator… and finished in the same night, and in this post we will make another.

Before we begin, the complete code can be downloaded here.

Step 0: How does a parser generator work?

A parser generator simply takes in a list of tokens to parse for, and outputs code to do the actual parsing. The generator itself only outputs the code to be used, so effectively no matter what language you choose to write it in, it is an ahead of time operation. Why would this be useful? Well, I invite you to check out the source code for Blæst, which I wrote my own parser for without the use of a parser generator. To put it lightly, that parser is a mess of spaghetti code wrapped in duct tape and glue. It is horribly inefficient, and filled with bugs. And aside from all of that is horrible to look at, and even worse to add new features to. Now compare that to a parser generator, I could’ve generated those almost 1,000 lines by just feeding it an array of tokens to look for, and when it finds them, just have it return a number corresponding to the token it found. And apart from that, rather than have each token use its own string compare (something this is slow and inefficient), it could combine them, since it knows what strings to compare ahead of this. This approach makes the code much more easy to maintain, and also makes everything easier to upgrade and even port to different languages.

Step 1: Lets get a list of tokens

For this tutorial I will use Javascript. I personally like Javascript because it is fairly similar to languages like C++ or Java, which most people will be familiar with, and if you aren’t, Javascript itself is probably familair to you. I have rarely met someone who can’t at least read Javascript. If you have a problem with this, check out the Github for CCC, its written entirely in… well… C.

The first thing we need to do is get a list of tokens. For this I’ll just create an array that is already populated, you can get these values however you wish, maybe making inputs on a website or via the command line or whatever. But the array should look something like this once you’re done.

// Our list of tokens
var tokens = ["Hello", "Hi", "Goodbye", "Good", "World", "Wordlist"];

I have chosen these specific words because when we construct trees for making the branching parser, they are going to come in handy for demonstrating how we can reuse branches.

Step 2: Create the trees

Now we need to create our actual parser tree. A parser tree is the sequence of letters needed to make the word. We call it a tree because it can branch. For example, both “Hello” and “Hi” start with “H”, so our “H” branch then branches off into an “e” and an “i” branch. This allows our parser to effectively string compare anything that starts with “H” together. For reference, standard string compares go through entire strings in one pass, it would string compare our input to “Hello” by comparing each letter of our input to “Hello”, which takes time. Then it would do the same with “Hi”. Both “Hello” and “Hi” start with an “H”, so if our input doesn’t start with an “H”, we can be assured it isn’t either “Hello” or “Hi”, so we don’t waste our time.

Creating these branches is actually quite easy. We simply walk through our word, creating branches for points we don’t already have. First we need to go through every token, then create a list of “next letters” for each current letter. If that “next letter” is the next letter of our word, just go to it and repeat. If not we add it and go from there.

// We need to create a 'state' which holds the possible next letters, and a state for those as well
var initialState = {
        id: 0,
        next: []
};

// Keep track of how many states we have made for code generation later
var stateCounter = 0;

// Loop through our tokens
for(let i = 0; i < tokens.length; i++){
  
  // Our current token we are checking
  let currentToken = tokens[i];
  
  console.log("Current token: " + currentToken);
  
  // Reset to the initial state since we are on a new token
  let currentState = initialState;
  
  // Now loop through every letter in our token
  letterLoop: for(let j = 0; j < currentToken.length; j++){
    
    // Our current token letter
    let currentLetter = currentToken.at(j);
    
    // Go through every possible next letter
    for(let k = 0; k < currentState.next.length; k++){
      
      let possibleNextLetter = currentState.next[k];
      // If the possible next letter is our current letter, we follow it
      if(possibleNextLetter.letter == currentLetter){
        
        console.log("Found branch for: " + currentLetter);
        
        // Set the new current state to the branch we are following
        currentState = possibleNextLetter.state;
        
        // Go back to the letter loop
        continue letterLoop;
      }
    }
    
    console.log("Adding " + currentLetter);

    // If we get down here, its because we couldn't find a branch, so we need to add one
    currentState.next.push({
      letter: currentLetter,
      state: {
        id: ++stateCounter,
        next: []
      }
    });
    // Now set our state to the new state we made
    currentState = currentState.next.at(-1).state;
    console.log("ID: " + currentState.id);
  }
  
  console.log("Adding end of word marker");
  
  // Add our end of word marker, this contains the token number we return
  currentState.next.push({
    letter: 0,
    state: {
        id: ++stateCounter,
        next: []
    },
    token: i + 1
  });
}

Now we should have a tree of every possible letter combination, if it were drawn out it should look something like this:

   [Start]
   /  |  \
  H   G   W
 /|   |   |
i e   o   o
| |   |   |
$ l   o   r
  |   |   |\
  l   d   l d
  |  /|   | |
  o $ b   d l
  |   |   | |
  $   y   $ i
      |     |
      e     s
      |     |
      $     t
            |
            $

Where $ is an end of token marker.

Step 3: Code generation

This is easily the hardest part, because it takes the most thinking, turning our states into code. Almost every parser generator I’ve seen uses a state machine for parsing, and lucky for us, they are very easy to implement. A basic state machine looks like this:

var state = 0;
switch(state){
  case 0:
    state = 1;
    break;
  case 1:
    state = 0;
    break;
  default:
    state = 1;
    break;
}

We just need to make our state machine slightly more useful than this one.

First we need to come up with our states… wait… we made those already. The state numbers in the case are just the state IDs we just made. So then to set the next state we just need to check for the letters… wait… we did that too. So we literally just loop through every state we generated and output the information. How easy is that!?

The generator may look complicated, but thats just because its not very clean looking. Just read through it and you’ll see it s simply just outputting switch/case statements.

console.log("BELOW IS THE GENERATED PARSER CODE");

// Print the head to our function

// function parse(token){
//   let i = 0;
//   let state = 0;
//   switch(state){
console.log("function parse(token){");
console.log("\tlet i = 0;");
console.log("\tlet state = 0;");
console.log("\twhile(true){");
console.log("\t\tswitch(state){");

// Now go though each state and print out the imporant bits
// To do this I'll simply create a recursive function.

function printState(state){
 
  // Print our case
  console.log("\t\t\tcase " + state.id + ":");

  // Make our case for the next letter
  console.log("\t\t\t\tswitch(token.at(i)){");
  
  // And our conditions
  for(let i = 0; i < state.next.length; i++){

    let condition = state.next[i];

    // If our letter is 0, we are at the end of the string, in Javascript, the string.at should return undefined at the end, so we check for that
    if(condition.letter == 0){
     // If we are actually at the end of the string, return our token number
     console.log("\t\t\t\t\tcase undefined: return " + condition.token + ";");
    }
    else{
      // Otherwise make it so we go to the next state

      // case 'nextLetter':
      //   i++;
      //   state = nextStateId;
      //   break;
      console.log("\t\t\t\t\tcase '" + condition.letter + "':");
      console.log("\t\t\t\t\t\ti++;");
      console.log("\t\t\t\t\t\tstate = " + condition.state.id + ";");
      console.log("\t\t\t\t\t\tbreak;");
    }
  }
  // Make sure if we can't branch anymore, we return 0 so we don't hit an infinite loop
  
  // defualt: reutrn 0;
  console.log("\t\t\t\t\tdefault: return 0;");

  // And finally just close up the switch/case and this current case for the state machine.
  console.log("\t\t\t\t}");
  console.log("\t\t\tbreak;");
  
  // Now go through every next state and print it out again
  for(let i = 0; i < state.next.length; i++){
    // recursion is fun
    // recursion is fun
    // recursion is fun
    // recursion is fun
    printState(state.next[i].state);
  }
}

// Call the function on the initial state and it should go through everything 
printState(initialState);

// Now finish up the function
console.log("\t\t}");
console.log("\t}");
console.log("}");

// Now everything is done

And with that, concludes our parser generator. Now this generator is not the best. For example, if the token doesn’t start at the beginning of the line, for example “gHello”, it will not match it. It will also only match full words, so “Helloworld” will not match for “Hello”. Its also case sensitive.

Running the program will output a function ‘parse’ in the console. The basic usage of this is as follows. parse(string to parse). It will return a number, 0 if it did not match, or a number 1 to … if it did. and n corresponds to the array position of the token + 1. For example, in this example, “Hello” returns 1 because it is the first element in the array.

The output of the example run here should look something like this:

Current token: Hello
Adding H
ID: 1
Adding e
ID: 2
Adding l
ID: 3
Adding l
ID: 4
Adding o
ID: 5
Adding end of word marker
Current token: Hi
Found branch for: H
Adding i
ID: 7
Adding end of word marker
Current token: Goodbye
Adding G
ID: 9
Adding o
ID: 10
Adding o
ID: 11
Adding d
ID: 12
Adding b
ID: 13
Adding y
ID: 14
Adding e
ID: 15
Adding end of word marker
Current token: Good
Found branch for: G
Found branch for: o
Found branch for: o
Found branch for: d
Adding end of word marker
Current token: World
Adding W
ID: 18
Adding o
ID: 19
Adding r
ID: 20
Adding l
ID: 21
Adding d
ID: 22
Adding end of word marker
Current token: Wordlist
Found branch for: W
Found branch for: o
Found branch for: r
Adding d
ID: 24
Adding l
ID: 25
Adding i
ID: 26
Adding s
ID: 27
Adding t
ID: 28
Adding end of word marker
BELOW IS THE GENERATED PARSER CODE
function parse(token){
	let i = 0;
	let state = 0;
	while(true){
		switch(state){
			case 0:
				switch(token.at(i)){
					case 'H':
						i++;
						state = 1;
						break;
					case 'G':
						i++;
						state = 9;
						break;
					case 'W':
						i++;
						state = 18;
						break;
					default: return 0;
				}
			break;
			case 1:
				switch(token.at(i)){
					case 'e':
						i++;
						state = 2;
						break;
					case 'i':
						i++;
						state = 7;
						break;
					default: return 0;
				}
			break;
			case 2:
				switch(token.at(i)){
					case 'l':
						i++;
						state = 3;
						break;
					default: return 0;
				}
			break;
			case 3:
				switch(token.at(i)){
					case 'l':
						i++;
						state = 4;
						break;
					default: return 0;
				}
			break;
			case 4:
				switch(token.at(i)){
					case 'o':
						i++;
						state = 5;
						break;
					default: return 0;
				}
			break;
			case 5:
				switch(token.at(i)){
					case undefined: return 1;
					default: return 0;
				}
			break;
			case 6:
				switch(token.at(i)){
					default: return 0;
				}
			break;
			case 7:
				switch(token.at(i)){
					case undefined: return 2;
					default: return 0;
				}
			break;
			case 8:
				switch(token.at(i)){
					default: return 0;
				}
			break;
			case 9:
				switch(token.at(i)){
					case 'o':
						i++;
						state = 10;
						break;
					default: return 0;
				}
			break;
			case 10:
				switch(token.at(i)){
					case 'o':
						i++;
						state = 11;
						break;
					default: return 0;
				}
			break;
			case 11:
				switch(token.at(i)){
					case 'd':
						i++;
						state = 12;
						break;
					default: return 0;
				}
			break;
			case 12:
				switch(token.at(i)){
					case 'b':
						i++;
						state = 13;
						break;
					case undefined: return 4;
					default: return 0;
				}
			break;
			case 13:
				switch(token.at(i)){
					case 'y':
						i++;
						state = 14;
						break;
					default: return 0;
				}
			break;
			case 14:
				switch(token.at(i)){
					case 'e':
						i++;
						state = 15;
						break;
					default: return 0;
				}
			break;
			case 15:
				switch(token.at(i)){
					case undefined: return 3;
					default: return 0;
				}
			break;
			case 16:
				switch(token.at(i)){
					default: return 0;
				}
			break;
			case 17:
				switch(token.at(i)){
					default: return 0;
				}
			break;
			case 18:
				switch(token.at(i)){
					case 'o':
						i++;
						state = 19;
						break;
					default: return 0;
				}
			break;
			case 19:
				switch(token.at(i)){
					case 'r':
						i++;
						state = 20;
						break;
					default: return 0;
				}
			break;
			case 20:
				switch(token.at(i)){
					case 'l':
						i++;
						state = 21;
						break;
					case 'd':
						i++;
						state = 24;
						break;
					default: return 0;
				}
			break;
			case 21:
				switch(token.at(i)){
					case 'd':
						i++;
						state = 22;
						break;
					default: return 0;
				}
			break;
			case 22:
				switch(token.at(i)){
					case undefined: return 5;
					default: return 0;
				}
			break;
			case 23:
				switch(token.at(i)){
					default: return 0;
				}
			break;
			case 24:
				switch(token.at(i)){
					case 'l':
						i++;
						state = 25;
						break;
					default: return 0;
				}
			break;
			case 25:
				switch(token.at(i)){
					case 'i':
						i++;
						state = 26;
						break;
					default: return 0;
				}
			break;
			case 26:
				switch(token.at(i)){
					case 's':
						i++;
						state = 27;
						break;
					default: return 0;
				}
			break;
			case 27:
				switch(token.at(i)){
					case 't':
						i++;
						state = 28;
						break;
					default: return 0;
				}
			break;
			case 28:
				switch(token.at(i)){
					case undefined: return 6;
					default: return 0;
				}
			break;
			case 29:
				switch(token.at(i)){
					default: return 0;
				}
			break;
		}
	}
}

With all that, I hope you enjoyed learning the joy of creating a parser generator. Overall this only took me a few hours to write, and it was quite fun to do so, so I really recommend doing it. I have uploaded a gist of the complete code for this parser, so use that for whatever you want.

Posted in Programming | Tagged , , , | 1 Comment

Micro: Brave or Ungoogled Chromium

Recently I was thinking to myself, should I be recommending people Brave browser or Ungoogled Chromium. Myself, I use Firefox, however I acknowledge that most people probably want to stick with Chromium based browsers just because they are familiar, and if they are moving from Chrome, the experience should be more or less the same. That being said however, which Chromium based browser is the one to recommend. Personally, I have been and still recommend Brave just because it is a more complete out of the box experience. While I don’t (and probably never will) fully trust Brave, I can’t deny that their browser is leagues more private than Google’s Chrome. Brave also has out of the box support for extensions and widevine DRM, so you can effectively use Brave as a drop in replacement for Chrome. On the other hand there is Ungoogled Chromium. Ungoogled Chromium is literally the most private browser possible. It is Chromium with all the google forcefully removed, but because of this, it lacks many features like the webstore or widevine. While Brave lets you disable these features, they flat out come disabled on Ungoogled Chromium, and to enable them is a process. As well, the main killer for me to recommending Ungoogled Chromium is its lack of a search engine by default. I have no idea why they chose to make it “No Search” by default, but the majority of people will just load up a browser and expect it to be able to search out of the box. And how would they be able to search to figure out how to set the search engine? To be honest, Ungoogled Chromium is still probably the better browser in terms of privacy, but Brave is what I have to recommend because of its out of the box usability. Expect a blog post in the future about me grappling with being a Firefox user and explaining why Chromium is better in every way…

Posted in Micro Posts | Tagged | Leave a comment