Pao Ramen
-
Fear of over-engineering has killed engineering altogether
Jul 26 ⎯ In the last 20 years, we have seen how engineering has become out of fashion. Developers must ship, ship, and ship, but, sir, please don’t bother them; let them cook! There are many arguments that go like “programming is too complex; it is not a science but an art form, so let’s YOLO it." And while there is a certain truth to that, I would like to make a counterargument. Where does it come from? If you’ve been in tech for long enough, you can see the pendulum always swinging from one end to another. The more you've seen, the more you realize that interesting things happen in the middle. In the land of nuance and “it depends." Before the 2000s, academics ruled computer science. They tried to understand what "engineering" meant for programming. They borrowed practices from other fields. Like dividing architects from practitioners and managing waterfall projects with rigorous planning. It was bad. Very bad. Projects were always late, too complex, and the engineers were not motivated by their work. This gave way to the creation of the Agile Manifesto, which disrupted not only how we program, but also how we do business in tech altogether. Both Lean Startup and later the famous YC Combinator insisted on moving fast, iterating, and not worrying about planning. To make things worse, engineers took Donald Knuth’s quote “Premature optimization is the root of all evil” and conveniently reinterpreted it as “just ship whatever and fix it later… or not." I do think the pendulum has gone too far, and there is a sweet spot of engineering practices that are actually very useful. This is the realm of Napkin Math and Fermi Problems. The right amount of engineering To predict where a planet will be on a date, it's faster to model and calculate than to wait for the planet to move. This is because humans have a shortcut. It's called mathematics, and it can solve most linear problems. But if I want to predict a complex system such as cellular automata, we have no shortcut. There is no mathematics – and perhaps there will never be – to help us bypass the whole computation. We can only run the program and wait for it to reach the answer. Most programs are complex systems; hence the resistance to engineering at all. Just run it and we shall see. Yet, they all contain simple linear problems that, if solved, are shortcuts to understanding whether they make sense to be built at all. A very little cost that can save you months of work. Those linear problems are usually one of the three: Time: Will it run in 10 ms or 10 days? This is the realm of algorithmic complexity, the speed of light, and latencies. Space: How much memory or disk will it need? This is the land of encodings, compression, and data structures. Money: And ultimately, can I afford it? Welcome to the swamp of optimizations, cloud abuse, and why the fuck I’ve ended up living under a bridge. And of course, the three are related. You can usually trade off time and space interchangeably, and money can usually buy you both. Fermi problems and Napkin Math One of the things that bothers most people is “not knowing the numbers." Predicting the future without past data can be very stressful. But, once you get used to making things up, you will see that what matters is not the numbers, but the boundaries. One of the most well-known Fermi problems, infamously used in some job interviews, is to guess how many piano tuners there are in New York. You won’t get the correct number, but one can be certain that the number must be between 10 and 10,000. You can get closer if you know how many people live in New York (8M). If 1% own a piano (80,000) and 1 tuner can serve 100 customers a year, then the upper bound is about 800, or 1,000 to round it up. Despite those guessed boundaries being orders of magnitude apart, it may be enough to convince you not to build a 1 euro/month app for that niche. You start by writing down extremely pessimistic assumptions, things that likely fall in the p99. For instance, if you want to calculate how much storage you need to store a book's content, assume a book has 5,000 pages. Most books will have less than that, so if the resulting calculation is positive, then you are more than good to go. Calculations boil down to simple math: adding and multiplying your assumptions. Nothing fancy. But for other calculations, you will need to know benchmarks or details of algorithms and data structures. For instance, if you are trying to estimate how much money you need to train an LLM model, knowing how transformers work will help. It lets you calculate the memory needed. I recommend bookmarking some cheat sheets and keeping them around. I usually only do a worst-case scenario calculation, but if you want to do interval calculations, you can use tools like guesstimate. Keep the calculations around since once you start having real data, you will want to verify the assumptions and update the priors. Example of the Napkin Math at fika As an example, I will show the calculations I did while building fika to determine what was possible, what was not, and what ended up being up to date. The main assumption is that a p99 user (which I modeled after myself) would have around 5,000 bookmarks in total and would generate 100 new ones a month. According to the HTTP Archive, the p90 website weighs ~9MB, which is very unfortunate because it makes storage (R2) costs too high. But if you look deeper into the data, most of this weight goes into images, CSS, JavaScript, and fonts. I could get clever with it and get rid of most of that content with Readability, compress images, and finally gzip it all. This is a requirement that I didn’t think of before starting the project, but it became obvious once the numbers were on the table. I also calculated whether I could afford to use Inngest or not. The user price was too high until I discovered that batching most events could reduce the cost to a manageable amount. I've also evaluated two more fantastic vendors. Microlink fetches the bookmark's metadata, and Browsercat provides a hosted Playwright solution. Unfortunately, their pricing model wouldn’t fit my use case. I wanted to price the seat at $2, and these two providers would eat up all the margins. Later I explored implementing hybrid search. OpenAI's pricing at the time was $0.10 for a million tokens, which meant a monthly $0.60 per user. It was too pricey, but some months later they released a new model for only $0.02. Even though that price was now making semantic search possible, I had already migrated the search to the client. With the release of snowflake-arctic-embed-xs, I wanted to see if I could embed all the bookmarks in memory. This was needed since implementing a disk-based vector database was not in scope. I calculated that it would need ~350MB, which is not great. But this space is moving fast, and small models are becoming more attractive, so I will wait a bit to see how it develops. Lastly, one of my biggest fears about building a local-first app was ensuring that the users would be able to hold all the data on the client side. I’ve focused only on bookmarks, since this is where most of the weight will likely go. Origin Private File System (OPFS): To read the bookmarks offline, I want to save a copy on the user device. Saving them all, in the worst-case scenario, is ~3GB. This means that with storage quotas of 80% in Chromium, I could support any device with more than 4GB of storage. Nice. In memory full text search: I wanted to know whether I could have all the bookmark bodies in memory and operate a BM25-based search with orama. I don’t know much about the inverted trees and other data structures of full-text search databases. But, if we assume there is no overhead (unlikely), having all the bookmarks' text in memory would take around ~350MB. I didn't discard this approach, but this is definitely something I need to look deeper into. I'm currently exploring whether using SQLite + FTS5 would allow having those indexes on disk instead of in memory. Current results I've received the first 200 signups, proving that some assumptions were too pessimistic. ~1000 → ~200 stories a month/user: It’s still early, but obviously most users still have few bookmarks and subscribe to very few feeds. This number alone makes the cost per user go down to $0.02, which unlock; removing the paywall altogether without going bankrupt. 0.36MB → 0.13MB per bookmark: It turns out that bookmarks can be compressed more than I initially thought. Using WebP, limiting the size, and getting rid of all CSS/JS/fonts is making bookmarks very lightweight. 5% → 3% overlap: My intuition says that this number will go higher. As more people join the platform, the likelihood that you will bookmark a story that someone else has already bookmarked should go higher. But at the moment, users are more unique than I thought. 20% → 108% feeds per bookmark: This means that bookmarking one story finds, on average, 1.08 feeds. How can this be? Is RSS that popular that websites include more than one? Nope. This is one of the biggest surprises, and it was a complete miscalculation on my end. It turns out that the system has feedback loops. A bookmark recommends a feed. The feed contains stories. Those stories discover a different feed. For instance, subscribing to Hacker News is a very fast way to discover many other feeds. It’s too early to judge the usefulness of the numbers per se, but making the exercise was a very important step to drive the architecture. It took me only one hour to put an Excel together. A very low cost compared to all the hours I spent implementing fika. Coda I hope after this post I encourage you to check the fridge before cooking. Don't be afraid to do some basic calculations, and doing so will not make others see you as a lesser alpha. It’s not over-engineering; it’s not premature optimization. It’s a very basic form of hedging, with a ridiculously low cost and a potentially bonkers return. Because the best code is always the one that is never written. Cheers, Pao
-
Building fika: Constraints and Architecture
Jul 23 ⎯ In this series of articles, I’m going to explain how fika is built. I’ve learned a lot building it and I would love to share these learnings with y’all. Constraints When I started this project, I wanted to build something different. I didn’t start by identifying a pain or target audience, and I didn’t have any economic incentive at all. I just wanted to explore, craft, and learn. For this reason I set myself the following constraints: Local-first: I’ve been frustrated with building applications with manual state transfer. They are hard to build, slow, and error prone. When Ink & Switch published the local-first article, many people who were exploring the same problems got together and started to share ideas. I wanted to be part of that movement. Web based: Never bet against the web. Despite all its shortcomings as a platform, users demand apps to be in it; native applications are not enough. Affordable: Software is increasingly expensive, and more so now that ZIRP is over and companies are rushing to become profitable. I want to build products for the long tail, which means that the price per user should be negligible. Above all, I wanted to maximize learning, to get more experience with Python, machine learning, design, CSS, and new frontend frameworks. So I didn’t try very hard to optimize for “the right tool for the job” and instead aimed for “the most fun as long as it gets the job done”. The problem So with these constraints cleared, I decided to find a problem easy enough to experiment with. I’m a person who curates and shares a lot of content in private communities, and after reading The Expanding Dark Forest and Generative AI an idea popped in my head: I would build a product for people like me that enjoy bringing information from the dark forest to the cozy web. The infographic depicts email and RSS as the protocols of exchange in that liminal space. This then gave me the idea of putting together 3 products in one: a bookmark manager, an RSS reader, and a blogging platform. A way to save, subscribe to, and publish content. Obviously, I was overly optimistic. The project turned out to be a classic case of “We don’t do this because it’s easy, but because we thought it would be easy”. It turns out that building a bookmark manager or an RSS feed is not easy. And doing both, pretty hard. I spent most of the time fighting pipelines and processing HTML, which was not my initial goal whatsoever. But hey, this was by far one of the most fun I had while building software, so I won’t complain. The architecture fika's architectureClient One of the decisions I was very clear about from the beginning, was that fika should be built on top of a syncing engine. This would allow me to have offline and realtime capabilities, but most importantly, a declarative data layer. No more fetch libraries, cascading, suspenses, caching, or loading states. I will write a deep dive in sync engines in another post, but after trying most players in the space, I decided to roll up my sleeves and implement a Python backend for Replicache. I knew I wanted an authoritative server backed by a DB, so Replicache seemed to be the best fit for the use case. While I’m extremely happy with the outcome, I have to tell you that it wasn’t an easy task: The documentation is very sparse and the code is not open source so you end up reverse-engineering the examples. But when you make it work, it works fantastically. On top of Replicache I built a very thin layer with Solid so I could have all the state in memory as signals. This incurs a memory cost but I’m convinced that the trade-off is worth it. The UX of zero latency apps is unparalleled, not to mention the DX gains you get from this pattern. The team behind Replicache is working on a new approach called Zero, which probably will replace this layer at some point. Solid has been one of my biggest surprises. It works extremely well and the APIs are very well thought out. Local-first apps are the most stateful apps one can build, so having fine-grained reactivity saves you a lot of headaches with under/over rendering in React. Once you grasp the model – which is fairly easy if you’ve used MobX – all those issues vanish. Things render exactly when you expect them to render. Fantastic. API service and jobs The backend is implemented in Python with FastAPI. Coming from Ruby and wrestling with Sorbet for several years, Python’s type hints felt like a blessing. But after working enough with them, it is clear that the type system is actually not that great: you can’t really model a domain with algebraic datatypes since there are no union types, and generics are rather clumsy. That being said, I was positively impressed with Pydantic and the ecosystem around it. For the bookmark processing pipeline and RSS fetching, I experimented with durable execution vendors, settling with Inngest. The pattern is very interesting and it leads to a more declarative way of writing your jobs and a more robust handling of failures. Nevertheless, these tools are a bit pricey for the volume of this product, so eventually I might try harder to make Celery work with asyncio or just host my own Temporal instance. Python sync/async divide is dramatic, I would switch to Typescript if I were to start over again. Fetcher service I wanted to be able to read the bookmarks directly from fika, but because of the same-origin policy and X-Frame-Options headers, you can’t just fetch or embed any website you want in the browser. Contrary to native applications that can fetch any origin and embed whatever they want in a web view, this is one of the biggest limitations of the web platform. Since building for the web was one of the constraints, I had to build a service to fetch and extract metadata from websites. Not only that, if I wanted people to download the bookmarks to read offline, I had to process them heavily, removing everything but content and compressing images. To store these html files for offline access, I end up using the Origin Private File System inside a web worker. Search For full-text search, I started building hybrid search in the backend. I went down the rabbit hole of “you just need Postgres” which turned out to be a major disappointment. Also, costs for embeddings are too high for the unit economics I was aiming for. On top of this, a backend search implementation would mean to give up offline search. And since local-first was one of the constraints, I ended up implementing the search in the client. I built another web worker to index all the content in orama. Thanks to having Replicache as a sync engine, this was very easy to implement. Data comes in from the server, and the sync engine sends it to the web worker to be indexed straight away. I ended up giving up semantic search, but eventually I will implement it with transformers-js. Database and infrastructure To implement the syncing engine, I needed a way to listen to database changes in realtime. For that reason I went with Supabase. This combined with RLS policies makes the syncing engine completely transparent to the developer. Any write in the DB pings affected users to pull fresh data. Magic. For the hosting, I went with a mix between Cloudflare and fly.io. Cloudflare: I have the domains, everything JavaScript related and most importantly, R2. Fly.io: I run the api and fetcher services. Once I can remove the dependency of jsdom completely, I will probably move the fetcher to Cloudflare to avoid doing redirects to R2 for egress cost saving. Lastly, I do all the image optimizations through bunny. For $9.5, this is by far the cheapest image transformation CDN you can have. Public page Finally, I implemented the public page using Astro. I wanted to see what the hype was about, but I will probably unify both the public page and the client app under a single Solid Start codebase. Coda I hope you enjoyed this article! I will get much deeper in further episodes, so stay tuned. Thanks for reading, Pao.