Adam Canady's blog

A Carleton College senior.

Video Degradation Through Compression

Lossy encodings of media have tons of advantages - smaller file sizes, more portability, decreased bandwidth usage, faster buffering, etc. But one of the oft-tossed-away downsides is that videos lose a ton of quality upon encoding into lossy codecs.

To demonstrate this, here are a couple videos that show a source file undergoing compression on multiple iterations. It’s easy to see the degredation of quality even after the first couple iterations

Uploading to YouTube 1000 times

Dubbing over a VHS tape 25 times

How to Migrate Your WordPress Blog to Ghost

It’s very straightforward to migrate your WordPress blog to a Ghost blog. First, we’ll start off by exporting everything out of WP, then we’ll set up Ghost, import everything, and finally tweak the setup so our old URLs are supported.

Get data out of WordPress

  1. Install the Disqus plugin and export your comments to Disqus.
  2. Install the Ghost plugin and obtain a .json file containing all of your post content and metadata.

Set up Ghost

  1. Provision a Ghost installation. Since Ghost is an early-stage project, it’s easiest to start from a pre-setup image, like the one offered by DigitalOcean.
  2. Log into Ghost by navigating to http://your-ip/ghost and creating an account. More info on this process here.

Move everything over

  1. Import your .json export from WordPress into Ghost by navigating to http://your-ip/ghost/debug and clicking import. Keep in mind, this will overwrite your login information with the root user used in WordPress.
  2. Add Ghost to your theme by adding <div id="disqus_thread"></div> to your theme’s post.hbs anywhere after ``. This will vary by theme.

Customize previous URLs

This is probably the trickiest part of the migration. Since Ghost is in very early development. We’ll need to add a redirection that follows our previous scheme. For me, I added the following code to line core/server.js in line 371 after the block containing the // ### Frontend routes:

server.get(/^\/(2013\/?).*/, 
    function redirect(req, res) {  
        res.redirect(301, req.url.substr(5));
    }
);

Your mileage may vary - mine was easier because all of my posts in WordPress were preceded by /2013/blog-permalink.

Shoutout to these posts for help transferring my blog:

Why Does It Seem Like All the Threading Projects in Node Are Abandoned?

I was recently looking for a threading solution in Node and, to my surprise, many of the projects have been abandoned! Looking at Threads a go go, and webworker-threads, for both of which the last commit was at least 6 months ago. It seems like there’s no demand for a solution to running long running or blocking tasks in node (aside from fibers or child_process).

I can’t even get Threads a go go to install with the latest version of Node (and haven’t been able to do so for over 4 months).

Is it really the case that these tools are not necessary? If so, what solutions have people found for handling long processes?

Update: Even fibers seems to have gone a long time without an update.

Update 2: Welcome HN folks! Discussion here.

Opening CSV or Excel Files in Python

A recent project of mine has involved being very flexible with input data, transforming it to a standardized format, and putting it into a database. This is commonly referred to as ETL, or Extract, Transform, Load.

In doing this project, I’ve gotten familiar with an awesome Python Data Analysis framework called Pandas. To install Pandas, simply pull one of these in the ol' command line:

sudo pip install pandas

Now you’re ready to use the pandas library! The basic functionality takes some kind of data input, whether it be CSV, JSON, Excel, or a number of different formats, and converts it into a DataFrame object.

This DataFrame object is great because it can be indexed and you can perform all kinds of operations on the data. For my purposes, I only wanted to get all the data into this standardized format, then get it out in dictionary-like objects for each row. I was only concerned with using Excel and CSV files (as this post title indicates), so here’s the function I used:

def open_file(filepath):
    if ".csv" in filename: 
        doc =  pd.read_csv(filepath)
    if ".xls" in filename: 
        doc = pd.read_excel(filepath, 0, index_col=None, na_values=['NA'])

    return [dict([(colname, row[i]) for i,colname in enumerate(doc.columns)]) for row in doc.values]

You may be thinking “what could that possibly do!?!” Well, it does quite a few things! If you give it a file path that contains .csv or .xls (which includes .xlsx), it’ll open it and put it in a DataFrame. Then it takes the DataFrame and constructs a list of dictionaries, each one containing the headers from the first line of the file and the values from each successive row.

Why Does the Development Flow Suck?

After writing about my idea for an Integrated Design-Development Environment, it occurred to me that there’s a lot of unnecessary hassle in all stages of development and deployment of web applications. It takes hours to get a development environment set up before you can even start working on a specific piece of code. Then, when launch-time arrives, it takes at least as long to get everything in line to launch the thing. While it’s not exactly a fair comparison, I could go to one of several dozen hosting companies and launch a WordPress instance for about $5/month. Then, the only remaining step is to actually log in and write content.

Many people are “full-stack” developers, meaning they’d handle everything from launching a VM, configuring a web server, installing dependencies, to putting together the application’s backend and frontend. Not to mention some marketing and content creation on top including SEO, social media, and email campaigns.

All that is great, and there’s something to be said for knowing everything about your application from top to bottom (or bottom to top, in this case), but it just doesn’t have to be that complicated 99% of the time. For the select few who absolutely need to create custom directives in nginx to save nanoseconds on requests - this article is not targeted at you. This is for the people who want to use a well-known stack to rapidly prototype an idea they had for a web app.

We already have have awesome tools like Docker and Vagrant to ease the transition between development and deployment. Just today, I read about Jumpstarter, a company that is trying to make a difference in this space. While their efforts are promising, I have yet to see their services in motion. Moreover, they’re currently only focused on PHP based webapps like Joomla, WordPress, and Drupal.

Even with these tools, there’s still undoubtedly a lot of room for improvement in the field of “the flow” (as Jumpstarter refers to it). For a new “full-service” technology to succeed in this area, I think these steps are crucial:

  1. Focus on a particular flow that is popular enough that it is used by many, but new / unpopular enough that not many people have optimized it yet.
  2. Develop a set of web-based tools that makes it easy to collaboratively:
    • Design the application - wireframes, drawings, etc.
    • Develop and test - possibly even a hosted test that would allow you to send functional prototypes to testers, clients, colleagues, etc.
    • Deploy. One click to a live VM - either the user's via SSH or partner with a service like Linode, AWS, or DigitalOcean)
    • Manage versions - integrate with issue tracking
  3. Promote, iterate the tools based on feedback
    • Maybe they want a desktop-based environment or mobile admin tools
  4. Create educational tools that would literally walk someone through creating an application using your flow
  5. Add ability to import current (already deployed) projects into 'the flow' - or even just parts of applications, like development and version control so people could take advantage of the tools in a modular fashion.

That last point is crucial. If you can show people that they can import their current, super messy projects into your environment, that opens up a whole new world of customers.

Integrated Design-Development Environment (IDDE)

Ever since my freshman year Data Structures class when I was introduced to interfaces in Java and the Eclipse IDE, I could instantly see the power behind abstracting how code works behind the scenes. You could develop now with the belief that certain functions did what they said they would, only to actually implement them later or use them out of a library. This is the kind of forward-thinking that will be taken one more step with the IDDE.

As I was laying out my most recent project, I was using a whiteboard, however, I wondered if that was the most efficient way to organize my thoughts. It would at least be a two-medium process - I’d have to scrawl everything onto the whiteboard, then translate the visual thoughts into pieces of code that actually make the application tick.

What if, instead of the two-medium process, there was a web app that enabled real-time collaboration for wireframing applications? Not wireframing in the traditional sense - it could start off as a purely visual process where a designer could lay out every view of the application that will be a part of the final product. Then developers could come in and decide how data would be inserted into the views, where it was stored behind the scenes, how APIs hooked up to sync external data, and handle routing between the different views.

It could even go so far as to allow devs to lay out the functions they’ll need to write, what parameters they’ll take and what they’ll return, and split everything up into appropriate files, organizing the codebase as development occurred.

The wireframe could evolve with the application - allowing designers to move buttons around, change the color scheme, and add new visual features while devs could implement the features as they were added.

Perhaps the IDDE would benefit both one-person ventures and massive development companies alike.

If, in a company, the design team was separated from the development team, the designers should be able to design what they want as they are the experts in art and user experience. Similarly, the developers have experience in implementation and deployment, so they should be able to handle the technical aspects. It’s not hard to see an IDDE creating a massive speed increase in update/bug-fixing time as one party would not have to wait for the other to complete whole chunks of changes, then become overwhelmed with the final product.

On the contrary, if a company had integrated designers and devs or single proprietor were to brave a project alone, she could have all of the work in one place, giving her a clear picture of what needs to be completed before launch to achieve the desired functionality.

Since the IDDE runs on a web app, it might even be possible to export Vagrant-style VMs for easy and surefire deployment. Batteries included, no assembly required.

Finally, the IDDE could be very modular. Perhaps it would suit a person only like to use the design module and share portions of it with hired development contractors.

A Developer's Interlude Into Science

This post is a work in progress

Science is great. It’s awesome to see the ways people use math to describe the world and how clever abstractions help us understand nature better.

This term, I’m in a couple physics courses and a chemistry course. As someone whose courseload has been dominated by computer science courses, it’s hard not to relate everything back to CS.

A few notable examples below.

Chemistry

Chemical Naming

Ever since I learned how to name molecules according to some basic IUPAC rules, I really have wanted to write a program to represent molecules as data and figure out how to name them according to the rules. This is one of those things where there is a well defined written algorithm for it, but it may or may not have been implemented yet. My professor indicated that a program called ChemDraw can do this, but sometimes/often gets the name wrong.

Chemical Structure Analysis

Using NMR and IR spectroscopy, we can determine a lot about a molecule’s structure. Each time I sit down with a problem set or an exam question, it’s easy to start applying the algorithm I’ve developed

I have some thoughts on how this could be automated. Perhaps if you had a sufficiently large training set, it would be possible to machine learn strucutres for compounds.

Physics

Visualization of equations

One of the things I really detest about the way physics is currently taught is that the student is just spoonfed a few equations, then told to solve some problems. I feel like that’s a doable way to learn, but it’s suboptimal because the student isn’t really invested in the problem. An alternate teaching style would be to give a student sufficient instrumentation such that they could come up with a solution to describe a phenomenon like simple harmonic oscillation on it’s own, but in practice this isn’t really feasible due to time constraints. That also seems like a suboptimal solution.

I wonder if there’s some middleground, where computer simulations (D3 + worrydream.com’s work) could help manipulate a system in real time to give a student intuition about what’s going on behind the scenes.

Side note: one thing I’ve found more and more is that people get really caught up in notation and it’s likely that use of better notation would lead to better understanding of a system. In my opinion, computer scientists are better at naming variables (even though, according to XKCD, it’s the hardest problem in comp sci). When I see an omega on the board, that doesn’t really symbolize angular velocity so much as the variable name angular_velocity would. Doing math in programming languages is easier and more clear despite the excess activiation energy required to type the variable name.

Whiteboards

Note: this post started off as a contrast between real whiteboards and digital whiteboards, then I got distracted by a specific idea for code organization. I split this post up into two separate ones since the conversations were distinct enough. You can find the code organization piece here. Hope you don’t mind!

Whiteboards are awesome because a bunch of people can get together and connect their thoughts in an extremely visual manner. The only downside is that they require folks to be in the same room as one another to communicate clearly.

After having used a few online whiteboards that are supposed to serve as a substitute for the real thing, I wasn’t really impressed. There still felt like there was a disconnect in the input devices. On a real board, it’s so instantaneous to precisely draw with a marker, be able to erase it by wiping bits away with the palm or forearm (or a cloth if you're that guy), however, it seems as if the mouse is far inferior.

Perhaps its because I was taught how to write with a pen-like device instead of a mouse-like device when I was in elementary school? To avoid another argument - it just isn’t as quick to switch between the keyboard and the mouse to be able to write labels and draw different shapes.

Reverse eBay

I recently saw this idea on r/CrazyIdeas:

A reverse-ebay: Buyers post something they are interested in buying, and vendors compete to offer the lowest bid. Internet shopping would suddenly get so much easier--instead of searching different websites and online markets for the lowest price on an item, you could make a single post and have the lowest price come to you. Additionally, it could be easier to find obscure items. No doubt it could be linked to ebay, amazon, etc. to connect potential buyers and sellers.

An interesting idea, so I thought about how it could be implemented.

Essentially, this is almost like creating a service that would organize group buys. Here are some quick thoughts:

  • The service would need to demonstrate a significant level of demand. Otherwise, it's tough for suppliers to offer the same type of discount since they're only incentivised by one sale.
  • Suppliers may want different prices and quantities depending on how large the auction became. Some may be able to supply a lot more than others who would supply less but would bid lower. In this case suppliers must be able to bid their maximum quantity offered and their minimum price.
  • Auctions would have to be limited by time. Perhaps there would be a period of time where buyers could sign up, then a period of time where suppliers could sign up
  • Suppliers would have to be confident demanders would pay up. If 500 people signed up to buy garden tools, you better be sure a high percentage of them will actually buy. Perhaps they would have to put their money down ahead of time at some predetermined market value, then they get the difference back between the market value and their 'wholesale' final price. If they decide to jump ship, there could be a flat 5% fee they'd get tagged with.
  • Distribution would be difficult. The supplier would have to ship to, say, 500 different people, which would be a hassle. If the site hosting these transactions got big enough, it's reasonable that they could act as a distribution center. The auction-winning, lowest-bidding supplier could send their goods to one location, and the distribution center could ensure it's sent out to demanders.
  • This service may be an inefficient intermediate step towards something more continuous. Demand is not guaranteed and comes in big chunks, which represents an inefficiency as some suppliers might like to ship continuously. Some suppliers would probably like to say that they can supply 500 units/week and have that order fulfilled week over week.

How I Use Nginx and Node.js

As I mentioned in the piece on my web stack, I currently use Nginx to manage requests on a server and set up different hosts. Here’s how I do it.

Nginx is great because it’s extremely fast and very configurable. Here, I’ll share a basic configuration that works well for me to host a few sites on the same VPS.

Basically the idea is to figure out where the request is coming from, then direct it to the correct application (I run Node apps on different ports).

Defining the server group

First off, we have to tell nginx where our Node application is running.

 upstream subdomain.domain.com {
     server 127.0.0.1:3000;
 }

Defining the server itself

Now we define the server, telling it where to listen, what to listen for, and where to direct requests. In the case of node, we want it to pass requests off via proxy to the server we defined earlier.

server {
    listen 0.0.0.0:80;
    server_name subdomain.example.com;
    access_log /var/log/nginx/subdomain.example.com.log;

    # tell nginx to forward all requests to the proxy
    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-NginX-Proxy true;

        proxy_pass http://subdomian.example.com/;
        proxy_redirect off;  
    }
}

And that’s all you need to direct traffic to a Node server! Though if you’d like Nginx to handle static files with it’s insane speed, stick around.

Handling static files

If you want to get fancy, you can have Nginx handle static traffic instead of Node. Just place something like this below the previous location block.

location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$ {
    root /var/www/subdomain.example.com/public;
}

Putting it all together

Now that everything is configured correctly, we need to figure out how to put it in the right place to make it effective. I’ll assume you already have nginx installed on your machine. First, we’ll need to create a configuration file in the /etc/nginx/sites-available/ directory. For example, my file is located here: /etc/nginx/sites-available/example.com

This whole file looks like this:

upstream subdomain.domain.com {
    server 127.0.0.1:3000;
}

server {
    listen 0.0.0.0:80;
    server_name subdomain.example.com;
    access_log /var/log/nginx/subdomain.example.com.log;

    # tell nginx to forward all requests to the proxy
    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-NginX-Proxy true;

        proxy_pass http://subdomian.example.com/;
        proxy_redirect off;
    }

    location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$ {
        root /var/www/subdomain.example.com/public;
    }
 }

Now that we have this site in our sites-available folder, we must tell nginx that it’s enabled as well. The easiest way to do this is to create a system link to the file in the sites-available folder in our sites-enabled folder, like this: ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/example.com

Finally, we’ll need to restart nginx like this: service nginx restart and everything should work once you route the DNS A record of your domain to this IP address and wait for the changes to propagate through DNS (usually takes an hour, but can take up to 48)