8.oct.2019
I wrote a toy compiler few months back. I wanted people to see it, so I put the code up on Github. But as it turns out, not everyone is willing or capable of going through the convoluted process of cloning the repository, compiling the program, installing a Nepali language keyboard and learning an obscure half-baked programming language just because some idiot put it on Github.
So, I started to write a web app to make the program easily accessible. The web app lets user write code in their browser, then compiles and executes the program on the server, and allows the user to send input from the browser to the server as it executes.
My first instinct was to use something like AWS Lambda to compile and run each process as a cloud function, but then I looked into the deep abyss of my wallet and found myself lost in the darkness.
Another idea was to forego the cloud altogether. Compiling code into assembly can be done in any under-powered Virtual Private Server. I can write an implementation of a simple virtual machine in JavaScript, then I can add a new backend to my compiler to generate code for the virtual machine. Then I can embed the JS virtual machine in the webpage, and when the user hits compile, all server has to do is compile the language into machine code for the virtual machine and send that back to the client. Execution becomes client-side headache. Something like (a slightly saner version of) Brainfuck could be perfect for this kind of application. It’s relatively simple to make a Brainfuck Virtual machine.
Anyway, I decided that AWS Lambda is too wasteful for my needs. Virtual Machine on Webpage idea is going to significantly increase code maintenance related tasks in the future. I tried to find some other way of executing user’s programs on the server.
But huge security issues emerge by allowing user-generated executables to run on your server. Just to name a few:
This is just the tip of the iceberg. So many other malicious attacks are possible depending on the system and infrastructure arrangement. The system designer has to be very careful in setting up the system where untrusted and potentially unsafe executables are run, without causing significant lags for genuine users.
This was the first time I had to design a system like this, so I had to research quite a bit. In this post, I highlight some results of my research. First, lets see how the above issues can be dealt with in a general way.
These are, of course, just the general guidelines. In my case, because i. the users actually can’t directly craft the machine code that runs in the server and ii. I have full control over and knowledge of the assembly code that is being generated, things are a little easier security-wise. But this will eventually change as I add more features and as more contributors join. So taking some time to tighten the security is more future-proof.
I’ll start with the features already provided by a relatively recent Linux kernel by default.
init
process in it's view. Namespaces allow
containers like dockers to fake isolated systems without the overhead of the
full Virtual Machine.Because these features are provided natively by the Linux kernel, we can apply them using corresponding syscalls and parameters and make with a relatively robust sandbox. I toyed with the idea of making my own restricted micro-sandboxing program (and I really wanted to) but decided not to because I was already juggling more things than I like to.
There are programs which use these kernel features and more to sandbox applications for us. I had expected there to be many, specially in this age of cloud computing and lambda functions.
nsjail: I couldn’t get it to compile because of some strange protobuf dependency error. It doesn’t help that the GitHub readme doesn’t have any build steps or the versions of dependencies required. Which is a shame because this was almost exactly what I was looking for: a lightweight application sandbox. I might look into it some more later. This is the official site for nsjail.
mbox: It doesn’t seem to work in 2019. The example usages in GitHub fail to
do any kind of blocking. The -n
still doesn’t block internet access. I’m
guessing that it relies on some old kernel specific features. It was last
updated 3 or 4 years ago on github. Also, reading the author’s paper and
ycombinator comments, I got the feeling that he’s much more proud of his
filesystem layering work than his sandboxing work. Also, Mbox seems more like
some academic/proof-of-concept work. Doesn’t fit my requirements.
This is the official
site for Mbox.
Docker: Heavy and not built for my use case. It is also not as security-focused, and apparently it can be broken out of. At any rate, I’m not gonna be instantiating full containers for executing a sub-megabyte programs. Although Docker running small distros like Puppy linux is an interesting idea. This is the official site for Docker.
systemd-nspawn is really interesting. It needs a full chown-able filesystem I didn’t use it this time, but I’m definitely going to use this in some future project.
minijail: This little software is apparently used by Google to sandbox chromium programs. It is not as feature rich as nsjail so I ended up using this in conjunction with cgroups and some other programs to isolate the unsafe binary.
https://en.wikipedia.org/wiki/OS-level_virtualisation#Implementations
PS Mosh is really cool substitute to SSH, specially when you’re using vim to code directly on the server. Plus, the fact that I don’t have to restart SSH connection every time I wake my laptop is such a convenience.
Resources