the purpose of closure objects is to hold references to closed over variables. but how does it find those variables if they may or may not be on the stack? we cant rely on the exact mechanism of local resolution because locals are always guaranteed to be on the stack during a functions execution!
Since local variables are lexically scoped in Lox, we have enough knowledge at compile time to resolve which surrounding local variables a function accesses and where those locals are declared. That, in turn, means we know how many upvalues a closure needs, which variables they capture, and which stack slots contain those variables in the declaring function’s stack window.
the new abstraction introduced here is something called an upvalue. an upvalue is what the compiler sees as a closed over variable. what bob is saying above is that we can figure out exactly what our upvalues are at compile time and make sure that at runtime those variables are accessible on the vm stack
exactly how that is done is a bit more complicated and its not immediately clear when reading the section on upvalues how the implementation supports the eventual runtime variable capturing behavior. in a way he’s basically saying, here’s how we want to compile upvalues – trust me we’ll need this information at runtime when we create closures.
one of the first questions i had reading this section was, how does the vm at runtime, given these upvalue indices, differentiate between locals and upvalues? we know that locals get pushed onto the vm stack when they’re referenced by other expressions and the OP_RESOLVE_LOCAL calls index into the relative position of the stack inside call frames. but what about upvalues? not all upvalues are necessarily on the stack.
this wasn’t answered until later when he added an array of pointers to upvalues (ObjUpvalue** upvalues;) to closure objects. so these indices we’re building at compile time are going to index into that array in our closures. since these are pointers, they could be pointing at either captured that are still on the stack or maybe ones that bob eventually moves onto the heap.
at compile time, at the end of a functions block compilation we now emit a new instruction OP_CLOSURE that the VM will use at runtime to wrap our function objects within a new closure object. the idea is that we’re going to use this closure object to store references to closed over variables (upvalues).
as a refresher, each time we create a compiler instance per function declaration, we also create a new function object via newFunction.
now at the end of the function compilation we make sure to emit an OP_CLOSURE so that at runtime, we use that opcode to wrap the raw ObjFunction in a closure and push it onto the stack.
below is the disassembly of fun foo() { fun bar(){} }
there’s a couple of interesting things about this design choice
every function, regardless of whether they close over variables, will be treated like a closure at runtime. this adds both overhead through the creation of each closure function and indirection
closed over values are stored on the clojure instead of the function, which nicely reflects the reality that we made have multiple different closures of the same function!
so far we’ve only been writing statements at the top level of the program. there’s no notion of a callable chunk of code. with the introduction of functions in chapter 24, all the current top level states like the compiler, locals, and chunks / instructions are moved into function objects
previously with locals we were effectively operating in a single function world. this effectively meant that all locals were allocated at the beginning of the global call stack. with functions that each have their own local environments, the author introduces an early idea that was implemented by fortran where different functions had their own fixed set of locals
this works if there’s no recursion and i’ll demo an example that shows why fixed, separate slots break down once you start to recurse:
fun factorial(n) {if (n <=1) return1;return n *factorial(n -1);}factorial(3);
assume we give factorial its own fixed set of stack slots
Slot 0: parameter n Slot 1: temporary result for multiplication (the value in slot 0 * factorial (slot 0 – 1))
now call factorial(2). this produces slot 0 = 2 and slot 1 = 2 * factorial(1)
then call factorial(1). this produces slot 0 = 1
OH CRAP, but that just overwrote slot 0 = 2 which we need to compute 2 * factorial (1) from the previous call. except now it ends up calling 1 * factorial(0) and screws up the entire expression
bob notes that fortran was able to get away with fixed stack slots simply because they didn’t support recursion!
in chap 22 of crafting interpreters, bob nystrom walks through the implementation of local variables. it makes efficient use of memory by tracking local variable position and scope metadata during compilation phase and leveraging that to locate the correct value in the immediate proximity within the execution stack (where we expect all local variables to end up, unlike globals which are late bound and may be defined far away from where they’re actually used).
what i found most complicated about this chapter is the number of states you need to track and hold to understand how the compile and runtime stages work together. it helped me to write down a few essential states in trying to understand it, so i figured i translate those notes to some sort of visualization because i think it might help others too
here’s a visualization of the compile phase where we’re converting the tokens into a byte code instruction sequence (chunks). the arrow indicates the parse position where the vm is pointing to the source code and the variables on the right represent the state at that point.
side note: i didn’t bother doing character by character – i moved the arrow to positions where there are actually side-effects since not all tokens produce the sideeffects i actually care about for this demo.
and here is the runtime execution of the resulting byte code sequence. as you can see, the first thing that happens is that the literal number 13 is pushed onto the stack. every variable declaration’s value will be known at compile time.
however, notice that there is no information about what the name of that constant is. is 13 the value of “foo”? or something else? what’s cool about this implementation is that it doesn’t matter at this time because during the compilation phase, we’ve already figured out where that local is going to be on the stack for the variable foo. based on the information about locals and off sets in the previous phase, it’s going to be at position or offset 0 based on the metadata from the locals array that was getting constructed at compile time.
running economy is a complicated topic and hard to measure, but a common measure of economy is done through vo2 (volume of oxygen) measures as a proxy. according to wikipedia, “Those who are able to consume less oxygen while running at a given velocity are said to have a better running economy”.
i put together a few visuals to illustrate this concept better.
here’s a graph showing
oxygen consumption or vo2 on the y axis
velocity in meters per second on the x axis
as velocity increases, so does oxygen consumption. they increase together up to a point (vo2 max)
oxygen consumption plateaus / steady states at the vo2max at and beyond a specific velocity
now, if the athletes is able to train their aerobic system to run at the same velocity with lower oxygen consumption, you get this graph
the dotted black vo2 consumption at given pace is the original line. the new solid line is as a result of training
same pace, but lower o2 consumption. this athlete has improved their running economy!
similarly, if you graph the relationship between vo2 and velocity for different athletes, the one with the lower vo2 consumption at any given pace is more economical
i also find this relationship interesting because it also tells you why increasing vo2 max is valuable. vo2 max sort of represents near maximum / max effort and running at vo2 max typically can’t really be sustained for longer than 11 minutes. right now, the athlete can only run at their max for 11 minutes. if you shift the max up, here’s what happens
the previous velocity is now a fraction of max, so less effort is required to sustain the same pace. they can now race at that same pace for longer! better endurance
the new max is associated with higher velocity. their previous 11 minute high effort pace is even faster
precise vo2 max testing is typically done in a lab hooked up to an mask that measures oxygen consumption while running on treadmill at increasing intensity. one of my favorite running youtubers / olympic athlete is luis orta (venezuelan runner). he does a vo2 max test here and gets an 80 mL/kg/min.
that is a ridiculous number because the average vo2 max for untrained individuals are around 30 – 40!
vo2 max at the end of the day is just a metric / one indicator. i used to see vo2 max videos everywhere on youtube when i first started running and it made me feel like i somehow needed to track it as part of my training. completely untrue.
jack daniels classifies running training into four categories (see his lectures here). i’ll summarize here because i found it to be a helpful framework for building my own training program for the new year. each type adheres to the same general principle of minimum effort for the maximum gain. he says if you want to improve physiological function, you want to stress it. but you want to stress it at the lowest intensity of stress
easy runs
build aerobic base and ability to do higher volume runs
train at max stroke volume to gradually create cellular adaptations
mitochondrial density
fat oxidation
60% of max heart rate
threshold training
build endurance through pushing the lactate threshold. blood lactate accumulation happens at difference paces / effort levels. so goal is to push accumulation farther out relative to effort
accumulation is function of how much produced vs how much cleared
past the threshold is where speed of running beyond which blood lactate rises continuously instead of plateau
at or below threshold = steady state lactate accumulation (not rising)
train at threshold means training at pace where any faster results in lactate rising continuously
82 – 88% of mhr
threshold is basically pace you can hold for roughly 1 hour
interval
purpose is to maximize aerobic power. how much blood is delivered and how much of that o2 is converted to energy
aerobic power is approximated via vo2 max
o2 consumption measured by millilitres of oxygen per kilogram of the body mass per minute (e.g., mL/(kg·min)).
vo2 max is max rate of oxygen consumption
97 – 100% of MHR
repetition
kind of like intervals (honestly not sure why he called this out separately), except the focus is on even higher intensity followed by long rest periods. purpose is to improve running economy
as you go from easy running to repetitions, the main variables within a training session that change are intensity and volume. easy runs are high volume, low intensity. on the other ends, repetitions and intervals are high intensity but low volume. this is a helpful lens through which to view running programs because the proportion of a type of training in a running program tells you the type of race or performance it’s effective for
while i really like doing threshold training, my current volume of training is low so right now i feel like i’m sacrificing base building when i really ought to aim at building more volume and developing a larger base. right now i do higher intensity training twice a week, but i may dial that back to just once a week and dedicate my other days to easy runs. it’s hard for me to do two intense sessions a week without feeling the impact on my joints / ligaments, particularly my right knee – which tells me i should probably scale back the intensity and just focus on volume
there’s a saying that all problems in computer science / programming can be solved by another level of indirection. in this chapter the pratt parser is a great example of that when it comes to parsing expressions such as
simple numeric literals i.e 1 or 2
single operand / prefix expressions like -1
binary expressions like 1 * 2 involving numeric, equality, comparison, or logical operators
any complex combination of the above with groupings
back in jlox, expression parsing was based on recursive descent expressions recursive descent. in this chapter, the parse sequence is driven by a special function called parsePrecedence. two new abstractions (the parse rule table and the rule lookup function) come together in the parsePrecedence function which is going to be the new entry point to expression parsing
here’s a truncated example of some parse rules in our parse table. it’s a mapping of token types to a group of metadata (prefix parser, infix parser, and precedence level)
unary is the prefix parsing function for the minus token. binary is the binary parsing function, and the precedence level of PREC_TERM. this is the getRule function that, given a token type, can retrieve that metadata
the relevant parse function for a given token consumed via advance is fetched dynamically from the parse rule table. so given a token type of NUMBER for parser.previous.type, the first thing parsePrecedence attempts to do is locate the prefix function for that token
other prefix functions may themselves call back to parsePrecedence such as grouping if a left parenthesis is encountered
for chained expressions involving infix operators i.e 1 + 2 + 3, the current precedence level is used to continue consuming the following expressions in a left-associative manner. so parsing 1 + 2 + 3 becomes ((1 + 2) + 3)
addition of new tokens involves setting a new token rule for those tokens and their metadata (prefix operator, infix operator if it applies, and precedence level). the parsePrecedence function automatically obeys the precedence levels during parsing. in jlox, parsing precedence has to be carefully managed by ensuring that it’s reflected in the call sequence (top down execution where lower precedence parse functions calling higher precedence ones)
unlike recursive descent top down parsers where the syntax reflects both the grammar and precedence order (lower precedence parse targets always invoke higher precedence ones), it’s harder to visualize the call sequence in a pratt parser because the exact call sequence is only apparent during runtime through calls to parsePrecedence (which decides how far to parse on the current precedence). nevertheless this seems like a more extensible / configurable way to manage expression rules
i went for an easy run this morning and was thinking about the purpose of training and zone 2. a cornerstone of pretty much any aerobic training program is the easy (zone 2, 60-70% of max heart rate or 5-6 RPE) run. there’s usually the long easy run combined with shorter easy runs throughout the week. when i first started training for longer races (15k), i thought the sole purpose of these longer runs was to progressively overload until i’m comfortable running the race distance. so if i’m training for a 15k, i’m increasing my ability to sustain a comfortable aerobic effort little by little until i’m able to do it for my desired distance.
if i’m training for a 5k, there must not really be a purpose of doing these longer runs. right? there’s a principle in training called specificity – basically it means you tailor your training to the specific energy system and skills that you are trying to improve. so if you’re trying to become a better long distance runner, run long distances. if you’re trying to become a better sprinter, sprint! this seems pretty intuitive, except what’s not obvious is that if you want to become a better runner at any distance, you also want to incorporate long runs!
base endurance
i’m not really an expert on physiology and there’s a ton of resources covering the benefits of long runs, but my layman understanding of this so far is that doing easy runs at roughly 60% of MHR is what allows you to
build your heart muscle (increasing stroke volume or how much blood can be pumped per beat) with minimal effort
these improvements are primarily a function of duration. so, generally speaking, the longer you are working your heart at that intensity the more of the benefits (up to a point, we can’t run forever without risking injury).
allow your body (muscles, bones, ligaments, joints, etc) to gradually adapt to higher volume
by doing easy runs at higher volume without injury, you unlock higher volume of more intense workouts into your schedule. someone who is comfortably running 30 miles a week can introduce a couple of 5k intense threshold runs into the week to build even more speed and endurance. if you’re doing 5 miles a week, there’s just no room for that. nothing wrong with running 5 miles a week, but my point here is to illustrate the relationship between steady state volume and training opportunity
the minimal effort point here is pretty key. you can train a far higher intensities to build your heart muscle, but turns out your hearts current maximum stroke volume is reached at 60% of MHR. so if you do a full out run, your stroke volume is still the same – you’re just expending more energy for the same heart muscle building benefits. also since doing high intensity runs all the time means you likely sacrifice on volume aka less time overall in this zone. people are also all different – in some situations there may be runners that can do very high volume and intensity and that works for them. i know that’s not me 😀
there are also numerous other related responses that support this gradual volume buildup of the heart muscle, a couple that i notice come up often are:
increase mitochondrial density (mitochondria generate energy in a cell using oxygen and glucose) so higher numbers of mitochondria means being able to use more of the available oxygen and glucose during aerobic activity
increase in ability to use fat stores as fuel instead of glycolysis, using glucose and oxygen (able to run longer)
so overtime, spending a lot of time in easy runs builds the heart muscle and its ability to pump out blood and increases your capacity to make use of that higher volume of blood per beat thanks to cellular level changes like mitochondrial density (more efficient). how this translates to races is that you’re able to do them at any distance without getting as tired because your aerobic system is more efficient. and because of the gradual buildup in your overall muscular strength you can run at higher volumes at a comfortable pace per week. this higher mileage then unlocks higher quality / higher volume intensity training.
jack daniels, a well known running coach, often says that you should know the purpose of your training. why are you running today? what is the purpose of this long run? well there’s the purpose of long runs. you do long easy runs because it builds the very foundation of your aerobic performance.
back in January this year i ordered a refurbished dyson v11 off newegg (the full model name is V11 Animal+ Cordless Vacuum) for about $300 (new ones were close to $600) and it was working great up until end of November last month. the problem was that the trigger had stopped working – it wasn’t springing back into its normal position after depressing and wouldn’t turn on the vacuum anymore.
turns out this broken trigger on the v11 is a well known issue and it’s caused by a weak plastic arm / lever on the trigger assembly. it’s frustrating because why the hell would you made such a high use component that get subjected to repeated force out of thin plastic instead of metal? or at least make the plastic arm thicker so it doesn’t just crack in less than a year of use.
thankfully because this is such a common issue there were repair tutorials online and spare parts available through ebay. i was able to finally finish the repair yesterday and in this post i’ll share what resources i used and some tips (both for others and for myself in the future if i need to do this again…)
here’s the youtube video that documents the disassembly process and required tools. just a heads up, the trigger mechanism is embedded pretty deep and requires basically an entire disassembly of the vacuum. the video is less than five minutes long but i think it took me closer to 45min to get it all apart.
tips
you WILL need all the tools mentioned in the video. definitely the long torque screw and pliers. you won’t be able to remove the trigger assembly without a pair of pliers (i tried). it will also be helpful to have some kind of gripper (things that look like tweezers but for electronics, most electronic repair tool kits will come with this) to grip on to wires later during re-assembly
buy a new complete trigger assembly with metal switch (or at the very least a metal trigger piece to replace the plastic trigger with). yes it’s pretty funny that there’s apparently an entire market providing more durable switches for the v11 than dyson themselves. in my first go at this, i did what the video suggested and tried gluing the broken trigger with superglue. i do not recommend doing this because the trigger ended up breaking immediately again and i had to repeat the entire process. maybe i didn’t let it cure long enough. maybe my super glue wasn’t super enough. whatever, just save yourself the trouble and replace the entire assembly. below is an image of one i found on ebay (note that it says v10 – it’s also compatible with v11).
during reassembly, there will be a point where you need to straighten / bend the metal ends of the electric connectors in order to pass it through various parts of the vacuum. you’ll know what i’m talking about if you end up going through the full disassembly. try not to bend/re-bend them too many times because you can easily break off the metal ends (see below)
in my first pass at this after i had glued the trigger back together, i actually broke off the metal piece by accident when trying to bend it back and then spent over an hour trying to re-solder it back on. i also have no idea how to properly solder and ended up burning a hole in my table cloth. anyway when you’re re-connecting those metal connectors back, use your pliers to adjust them to be close to 90 degrees (as they were before you had to remove them) but it honestly doesn’t have to be perfect. just use the screws to tighten them against the motherboard.
in chapter 16 for the lox vm, the scanner implementation takes on a completely different approach compared to jlox. when we implemented jlox, the scanner did a full scan of the source file and then created all the tokens in memory for the parsing phase
in the C implementation, the file is still read but we don’t create a separate list for all the tokens by doing a full read of the file. instead the scanner refers directly to the source and we only create as many tokens as necessary (no more than 2 tokens since lox is a LLR1 type grammar that only requires a single token lookahead to uniquely identify a lexeme). this is a lazier and more memory efficient approach.
for example, here’s the scanner struct and how it’s initialized
start refers to the beginning of a lexeme (say, an identifier)
current is the current character being scanned
there’s also some additional metadata like line number for debugging support
and this is the Token struct for representing a complete lexeme
typedef struct { TokenType type; const char* start; int length; int line;} Token;
start is a pointer to the source – again we’re not allocating additional memory to hold token information
type is our special enum to things like TOKEN_IDENTIFIER
with the scanner and the token structs in place, the compiler drives the actual changes to these objects as it scans as much of the source code as it needs (and constructs tokens) to emit byte code sequences
ObjFunction* compile(const char* source) { initScanner(source); Compiler compiler; initCompiler(&compiler, TYPE_SCRIPT); parser.hadError = false; parser.panicMode = false; int line = -1; advance(); while (!match(TOKEN_EOF)) { declaration(); } ObjFunction* function = endCompiler(); return parser.hadError ? NULL : function;}
calls to advance and declaration both will eventually call out to scanToken which will make use of the scanner to read and construct the next token. for example if the token is a number, the compiler will emit two byte codes via a call to emitConstant(NUMBER_VAL(value));
the entire sequence of bytecodes is built this way, the compiler driving the scanner forward and emitting byte code sequences on the fly.
my team and i recently completed a database migration from mongodb to postgresql for one of our rails apps. the service is a graphql api built on rails 7 and is backed by a mongodb database (m40 cluster managed through mongo’s atlas platform) with ~500gb of data and we performed a live zero-downtime migration to a db.m5.2xlarge RDS running in our own aws account . the application is organized like a pretty standard rails app. all data is represented by rails models and data access is done through an object mapping layer using mongos object document mapper (ODM) mongoid.
the requirements for this project were pretty straightforward
stop using mongo
dont take our service down to do an offline migration (given the amount of data we needed to move, the maintenance window we would need would’ve been way too long anyway based on some of our initial test)
our high level approach was to use the double writing pattern by dual writing to both data stores and put reads behind dynamic feature flags, backfill the tables one collection at a time, switch over the reads to the new database and then cut off the old read and writes.
this is a very common technique in service to service migrations when teams undertake monolith to microservice transitions (which were all the rage five to ten years ago, but the trend is reversing as of late) and the same process can be applied to switching data stores within the same service. the new reading/writing code in the service hit a new storage instead of the new api / service.
setup phase
we started by setting up an initial connection to postgres and added some basic tooling
set up the postgres database and the rails integration. our infrastructure teams spun up our new postgres instance on RDS sized comparably to the current storage on atlas. in the rails app, we setup active record ORM alongside the existing mongoid ODM and updated both our development and CI setup to spin up a postgres image
set up data transfer / backfilling utility scripts that extract mongo document data for a given collection and transform it into an postgres compatible format and inserted it into the postgres database. for example, nested documents become normalized foreign key relationships
set up feature flagging (we used flipper) to dynamically control the reading switch (double writing was not behind switches but we made sure to wrap our new writes with catch-all exception handling to never interrupt requests
double writes
we divvied up most of the work by resource types and tackled them in the order of some combination of entity complexity (lots of relationships, super nested) and data volume (getting an early start on the largest collections was important since we had deadlines to hit).
for each resource in the system, we did the following
create active record equivalents of the current ODM models. so this means bringing over model level unit tests, validations, and any database level constraints. to uniquely identify migrated data, we made sure to include a mongo_id column on every new table
set up dual writes. most of the writes happen through graphql mutation resolvers at the graphql API layer so this involves adding adjacent active record write logic.
duplicate existing unit and functional tests to cover the new models and code
set up the backfilling code. the shared migration script was sufficient for most of our data (simple batch read, transform, bulk insert), but a handful of our models with more complex entity relationships necessitated their own migration logic
backfill and read rollout
once dual writing was enabled for a while and we’re confident there are no issues with the new data, run the backfill scripts. depending on the collection, this took anywhere from minutes to days
upon backfill completion, verify the successful migration using a custom built data verifier script that ensures that all the mongo documents were successfully transferred. this script knew how to compare both simple flat docs and ones with very nested relationships by using rails model level reflection API
finally, switch the reads from mongo to postgres. this was done through flipper so no additional deploys are necessary
cleanup
once all dual writing is setup and all reads are done against postgres, remove the double writing and only keep our postgres active record reads and writes.
remove all traces of mongo
celebrate!
challenges
no project is without its challenges / setbacks and wow we had a number to deal with (and overcome!). we had issues on every stage of the sdlc
coordinating with other teams making changes to the service. we had to enact a code freeze since we were running into instances of people introducing new writes without the flags/dual writing stuff we required
wading through hard to understand business logic areas with low test coverage. we needed to create active record equivalents of a lot of writes, but some writes were fairly complex (very stateful, lots of conditions) and involved a coordination of multiple domain models
keeping the new active record models ,tests, scripts isolated. we can’t just delete the current application code so the new models needed to live alongside the old ones. since we wanted to preserve the model names as possible but you cannot have two models of the same name in models/ so we introduced a postgres namespace across the board to house the new code. this was a fantastic solution that made it both easy to add new models and delete the old ones later
database schema migration automation problems. we initially were running the new rails schema migrations by hand but when we switched over to automating the schema migration using k8s/helm, we accidentally made migrations run one off jobs (instead of pre-release hooks). as a result, we had deploys still succeeded despite failing migrations
some of our collections are large, so our backfill scripts need to run anywhere from several hours to several days. this increases the likelihood of running into issues mid data transfer, so it’s important for scripts to be idempotent and resumable. for the idempotent part, we did this by adding mongo_id primary key reference to all of our postgres tables to represent the identity of the mongo record migrated (in most backfilling instances with only a couple of exceptions, we skip the insert based on the mongo id if it’s already migrated). for resumability, during migration we always read mongo documents ordered by their primary key (lucky for us the first four bytes of the 12 byte id is the creation timestamp) and we log out the last key in the current batch during migration processing as a checkpoint to use later as a cursor
set off alerts when running backfills because of elevated read / writes against postgres which were in the call path of all existing requests. we ended up creating a read only mongo replica off of our primary in atlas to use for our backfilling. unfortunately, while this solved the contention issue we introduced new problems around data consistency. for example there was an instance where i ran the backfill against an outdated replica and ended up inserting stale records into the new database. luckily the verifier detected missing records and i was able to drop the table and re-run the backfill with a fresh / up to date database instance
missing mongo key constraints and existence of duplicate records. we had a number of collections containing dupes due to missing uniqueness indices, so when we added the appropriate uniqueness constraints to the new tables in postgres, the backfilling process blew up because the mongo data was bad. this required some data cleanup and one of my teammates wrote a handy de-duping script using mongos aggregation API to identify and remove dupes by gathering dupes for a any given document key combination into lists and then keeping the latest by purging the dupes.
one minor snafu we ran into this was that the aggregation code does a lot of the grouping of documents in memory on a node and in one instance this caused a memory spike that impacted avg performance while the script was running
based on the logs, we seem to get a good number of duplicate insert errors due to race conditions of requests attempting to modify the same resource at the same time, which probably explains why we had so many dupes in the old database to begin with. most of these cases can be ignored but it would be good to figure out why they’re happening so often
bad new data being inserted into our postgres database due to incorrect new code. for example, there was a situation where we were writing a UTC offset attribute into mongo through the ODM and when this got carried over to the active record class, it was only writing positive UTC offset values and excluding all negative offsets due to a bad guard clause i added. oopsie
we also had minor and more suble bugs like timestamps not being properly updated. for example in active record we needed explicit .touch to update when no attributes changed but clients expected an updated timestamp. this was happening out of box with mongoid
data divergence happening in dual writing code during upserts that were caught by the verifier. for example, some records had fields that accrued values over time, but once dual writing got introduced and it got executed by a new request, only the most recent data in the payload is inserted into the new database (the original values accrued on a field in the mongo database were not carried over). unfortunately, this data gap wasn’t addressed by our backfilling because our backfilling code skips dual written records, so the historical values were never carried over for that record during that process.
to illustrate this with a scenario: lets say a mongo record was created before dual writing and it’s field values gets value 1. time passes. we release the dual writing code. a new request wants to upsert the same record but this time with value 2. two writes happen: one to mongo, which ends up with [1,2] and one to postgres, which only has [2] (the most recent value).
to fix these issues, we wrote one off data sync / repair tasks to fix these diverged records. this was pretty much an issue for any record that performs upserts and whose backfilling strategy was an insert_all (skips on conflict) are candidate for divergence.
contending with ongoing performance problems of the service trying to differentiate between whether degraded performance impacted by our new code or what was already there (turns out a little bit of both!)
on rolling out a read for a single high traffic collection, the entire service went down for a solid 5-10 minutes where i couldn’t access the flipper UI because none of the pods were responsive. turns out this was caused by missing indexes that was causing RDS CPU to be pegged at 100% due to full table scans happening in RDS against the collection
we did a pretty great job managing these issues as a team and right now we’re fully on postgres and it looks like it’s running smoothly so yay!
so there’s a interesting property between the XOR operation and mod 2
turns out, the xor (^) of any sequence of bits is equal to the sum of those bits modulo 2
for example
1 ^ 0 ^ 1 ^ 1 is the same as (1 + 0 + 1 + 1) % 2
if you take this step by step, the xor side:
1 ^ 0 = 1
1 ^ 0 = 1
1 ^ 1 = 0
0 ^ 1 = 1 (answer)
the modulo side:
1 + 0 + 1 + 1 = 3
3 % 2 = 1
why?
lets look at the truth table for XORs using two bits
left bit
right bit
xor result
0
0
0
0
1
1
1
0
1
1
1
0
xor table
XOR is an exclusive OR, so it will only be 1 if there’s ONLY ONE bit that’s on. if there’s two bits or no bits, the result is 0. what other operation of two operands where the result is 0 given 0 and 0 and 1 and 1? modulo 2!
this equivalence exists because when we’re dealing with two bits, their sum is 2. 2 mod 2 is 0. when both bits are 0, the sum is 0 and 0 mod 2 is 0. when only one of them (and odd number) is on, we always get a sum of 1 and 1 mod 2 is 1
even though we’re only looking at two bits, this actually generalizes to any sequence of bits because it turns out that XORing any sequence of bits results in 0 when there is an even number of 1 bits and 1 when there is an odd number of 1 bits (or none)
hex notation shows up a lot in computing so it’s really useful to understand. it’s really hard though to learn to take your base 10 lens off because that’s what we’re so accustom to!
in base 10 position notation, each place represents up to 10 digits (0-9). this is really handy because when we go beyond 9, we can shift over and use a new position to denote 9 + 1. so the value of each position in a base 10 integer is essentially the radix (10) raised to the power of the position index which starts at 0.
for example, the digit symbol 8 below represents the value 8 because every digit is below 10. once you most leftward to a new position, each digit actually represents 10^1 all the way to the leftmost position 10^n.
the same set of digits for a base 16 system ends up looking the same, but the actual value is different. below, 128 in base 16 is 296. from right to left, 8 + 32 + 256 = 296. this is because rather than representing 10 symbols in each place, hex holds 16 symbols
in base 10, each place holds one of 0,1,2,3,4,5,6,7,8,9. in base 16, each place holds one of 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F where A = 10 (base 10), B = 11, C = 12, D = 13, E = 14, F = 15. so A in base 16 is equivalent to 10 in base 10. when looking at this for the first time, it looks wild because you’re so accustomed to equating the symbols “10” with the value “10” (both in base 10), so switching bases really requires you to decouple the numerical symbolic representation (may or may not be base 10) from the value (which you still want to think about and write in terms of base 10).
one of the handiest things about hex and why it’s commonly used in computing is its relationship with binary or base 2 notation. machines encode all information in binary format. compared to decimal and hexadecimal, binary notation only holds 2 values in each positional index (0 and 1). the interesting relationship, though, between binary and hexadecimal is that 16 is actually 2 raised to the 4th power. put another way, we can represent any single hexadecimal value with four binary values and vice versa. this makes converting values between the two bases much easier than converting between binary and base 10. i highly recommend checking out this khan academy video to gain a intuition behind the why
thanks to this relationship, we can use hex as a far more compact literal representation of binary values. while binary is the most efficient for computers, writing in hex makes it far easier to write and read for humans. for example, the bits 1111 can be represented with just F since they both represent the value of 15 (decimal). four bits can represent up to 15 values. what else represents 15 values? a single hexadecimal digit! and since hex is a power of 2, we can expand this beyond just four bits – we can pretty much use hex to quickly convert really any sequence of bits in most computing architectures whether they’re 32bit (8 groups of 4 bits) or 64bit
a common task i do is open bash in a container to inspect the file system….
but what happens when there is no shell at all in the image?
for example
FROM scratch
WORKDIR src
COPY README.md .
and if i run docker build . -t minimal-image to build the image, how would i confirm the contents were indeed copied over?
if i run docker run minimal-image:latest bash, i get
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown.
this makes sense because the scratch image doesn’t actually contain anything. it’s not shipped with a bash interpreter.
so what to do…
the workaround is to use the docker export command. this requires a container, so first build the container
and then we can finally export this to a .tar file
docker export minimal-container -o out.tar
now lets unzip/decompress the tar into a directory called tmp. if i don’t specify a destination directory, the contents will get unzipped directly into my current directory, which includes my host files! don’t want that 🙂
mkdir tmp && tar -xzf out.tar -C tmp
this gives me, with ls tmp
dev
etc
proc
src
sys
now before i had my WORKDIR image instruction to set the working directory to src right before my COPY instruction, and that is indeed where i find the file i copied.
anyway that’s how you inspect contents of an image without a shell!
bind mounts are what i’m accustomed to using for local docker dev. they’re the typical go to for being able to use host-native dev tooling to edit source code and ensuring that those changes are immediately reflected in the container environment. they work fine, for the most part.
my biggest issue with them has always been the speed on mac. every company i’ve worked at provisions mac’s for dev machines and the biggest reason for the slowness is because on a mac, docker is actually running in a linux VM. afaik there is no native mac container technology – they’re built for linux. so docker makes this work by running a hypervisor using apples virtualization framework. virtualization is cool but it’s expensive.
when docker desktop starts up, it mounts paths like /Users into the virtual machine and makes that available to processes running inside the container. this is what allows source directories in the mac host machine to be bind-mounted. unfortunately, every i/o operation incurs an extra cost in mapping a read or write operation in the virtual machines virtual file system to a read or write on the actual host file system.
i don’t think there’s much you can do about this cost – it’s always going to be an extra level of indirection so it’s not going to ever match native, no-container i/o speeds. but there are some alternatives that i’m eager to try out in the future
dev containers is basically taking containerization to the extreme – what if your entire workflow / tooling was inside a container?!
docker released https://docs.docker.com/desktop/synchronized-file-sharing/ , this tackles this problem by asking the question “what if we can copy/sync file changes really fast into the container?” instead of forcing the vm to reach across file system boundaries
lately i’ve been going through crafting interpreters by bob nystrom using racket and its got me thinking a lot about code gen in java vs lisps. the book uses java and there’s some use of it to template out common patterns. it’s about as awkward as i expected it to look. but i’m not writing to pick on java – mostly to appreciate the special niche that lisp continues to occupy
in java, if you want to generate a set of classes, you have to write a class that can output strings that will ultimately represent a valid java program. at runtime, you may have a method called generateObject that accepts some arguments and outputs the blueprint of a class. the output will be strings or even files but they are not being generated and compiled at the same time. another stage of compilation on this outputted source will need to be performed.
i don’t mean to pick on java. this separation of code generation and compilation in meta-programming (meta because we’re using a program to produce more source code) is common in most statically typed languages.
dynamically typed languages and interpreted languages such as ruby and python support program generation and execution of the generated code both at runtime.
for example, ruby supports defining methods dynamically at runtime that becomes available to the rest of the system via define_method and it can even evaluate arbitrary ruby code using functions like class_eval. these definitions are parsed and executed at runtime.
however, any sufficiently complex program can only be represented as strings and thus the only way to manipulate them is as plain strings.
in both situations, whether dynamic or static, languages have
a representation of a program in a particular language that conforms to a special grammar, usually EBNF
a set of primitive data structures that can be manipulated at program runtime
and the actual program representation does not conform to the same structures as its data structures. in other words, the language does not allow the program itself to be treated as a data structure – because the literal representation of the program itself (that a programmer sees) is not a data structure but rather just text strings.
take ruby as an example
class Animal
def initialize(name, sound)
@name = name
@sound = sound
end
def make_sound
puts "#{@name} says #{@sound}!"
end
end
# Creating an instance of the Animal class
cat = Animal.new("Cat", "Meow")
# Calling the make_sound method
cat.make_sound
that’s a representation of a ruby program – in no way does that resemble any of the data structures in ruby such as arrays or dictionaries.
program source representations as part of a interpretation or compilation process do eventually undergo transformations (i.e class_eval) that turn plain source text into data structures that can be manipulated. for example, the lexing and parsing phases of compilers product syntax tree data structures that can in fact be expressed with the same primitive structures supported by the language itself.
the keyword here is “eventually”. the program as it is represented before any compilation or parsing occurs is not a data structure and cannot be manipulated as such. this is as true for ruby as it is for java
lisp
the one exception to this is lisp. the fancy academic word used to describe this unique position held by lisp regarding the discrepancy between the languages external representation and its data structures is homoiconicity.
lisp is homoiconic because lisp programs manipulate s-expressions and are also written in s-expressions.
here’s a simple demonstration using a dialect of lisp (racket scheme) where we define an original program (as a list) and then we transform the program literally before eval’ing it.
since our original-program is just a list, it can be manipulated like any other list using functions like car. notice here how there’s no distinction between a manipulated program and the surrounding program representation. it’s all just lists. that is code as data.
this power extends not just to arbitrary program eval and manipulation – lisp also lets you extend its syntax in new ways to support custom language features that are not built into the language. these are known as macros. again, since the language is made up of s-expressions, any new formulation or semantic of the language can likewise be expressed in s-expressions and can be expanded and eval’d as if they were any other data structure during runtime, without having to drop into a “compilation” stage that converts something more primitive like strings into an AST
the whole language is an AST!
i’ve long wondered what the yinyang symbol of lisp represented and it’s actually from structures and interpretations from MIT. the yin yang represents eval and apply in the metacircular evaluator from the textbook. the metacircular evaluator is basically a lisp interpreter written in lisp – it is lisp evaluating itself through the use of both eval and apply.
the expression problem states that it may be easy to extend data types in a program without modifying existing code and it may be easy to extend behavior in a program without modifying existing code, but not both. this limit, as far as i know, is a limit imposed by the design of the underlying programming language. i really don’t like the name of this problem because the issue isn’t just a matter of expression, it’s also one of modification. so maybe we should call it the expression-modification problem…
object oriented languages or languages that are oriented towards concepts like colocating data and behavior under class-like constructs tend to be better at allowing you to add new types to a system (provided they follow the same behavior contracts) without having to open up and modifying existing type. for example, if you have a set of data types representing cars and they’re all supposed to understand the method / message “accelerate”, you can easily add new cars with different acceleration behavior. however, the moment you need to add a new behavior that affects all existing cars (lets say a new method named “recall”), every single class will need to be opened and modified.
functional languages or languages that are oriented towards separating data and behavior tend to be better at allowing you to add new behavior to a system without having to modify existing code. using the same example as above, if you needed to add a new function, you just need to add a new function that handles all the various car types. however, if you need to add a new data type like a new car, now you need to open all of the existing functions to handle the new type.
from a practical standpoint, what this means is that choice of language matters depending on the problem at hand because the language orientation, if one exists, can either work with or against the programs natural architecture. for example, if you’re dealing with a problem with a handful of fixed data types and most of the growth is in domain specific behavior, a function oriented design may be more compatible.
for example, lets say you have a computing problem dealing with some fixed set of accounting related concepts that are stable and don’t change overtime. but stakeholders frequently need to perform different types of reports on these various types and these reports frequently evolve and change. with a more strict OO approach, the reporting behavior may be co-located with the domain objects but this means having to open up each one every time a new reporting behavior is added (OO languages make cross-cutting behavior sharing easier with inheritance, so, lets assume the worst case and that the concrete behavior of each new report needs to be specific to each type).
nowadays we have many “multi-paradigm” languages that allow programmers to choose the more suitable style or change it if it no longer fits, but i don’t think this solves the expression problem in so much as it forces the user to pick the side of the problem they want to have. lots of problem domains also grow on both axis (data types AND behavior) and it’s not always clear which one you’re dealing with, so the problem cannot always be avoided with more planning.
from my experience, it’s easier to start with a function oriented program for most problems that are ill-specified because you can easily and quickly represent types using lightweight data types and start doing useful things with them. with object oriented approaches, particularly with statically typed ones like java, it often feels like there’s a much higher startup cost to expressing the program because before you can even get to defining any useful behavior, you have to define some set of classes (which are far more rigid and hard to change than more primitive data structures).
A common task I use for react is rendering large datasets in the UI. For example, a large list of movies or books. Here’s a simple component that renders a list of movies.
As long as you’re using a unique `key` attribute in this case, renders are pretty fast. In the example above, only simple list items are being rendered for each movie. It’s a small set of elements and there’s no complex state involved.
But what if…
The individual movie items are a lot more expensive to render and contain a lot of state
The state cannot be localized to the smaller child components and needs to live at the root level (because it’s shared with other components)
Here’s an example where we have a parent level state that maintains rating data and passes that down to render both the movie list and a sibling recommendations component.
In this case, if setMovieRatings gets called in any of the children ExpensiveMovieItem components, the parent state will update and all of its children will re-render (even if the props for the majority of components in the list stays the same). One common misunderstanding I had for a long time is that when props stay the same, a component does not re-render. In reality, any time a parent UI component state changes as a result of setState, all of its descendants re-render.
If this is a large list (1000+) items, this re-render can create noticeable lag. In this case, if the re-render takes 200ms, it’ll take 200ms between when a rating for a movie is updated to when it’s actually reflected in the UI. Since React does not care to skip re-renders automatically based on props, it’s up to you to tell it when to skip a full re-render for a component.
React.memo
React.memo is a function that accepts a component (and an optional prop comparison function) and returns another component. This new component has special memoization behavior that skips re-render based on either the built-in or user provided prop check.
Going back to the original example, here’s how you turn a normal expensive component into a memoized one:
Now if you update the parent state, only the children with changed props will render. This technique will work out of the box if all of your props are non-object primitives (strings, numbers), but you’ll have to be more careful if you have objects because the default comparison method is using Object.is, and it’s pretty common for the identity of objects to change across re-renders in React even if the literal values are the same. For example, if you’re re-creating functions that are being passed in as props then you’ll cause a re-render. Or if you’re doing object cloning in setState which creates new objects with the same values but different identities. You can get around these issues by either simplifying the prop params or providing a custom property checker.
i’ve had a lot of JWT related discussions at work lately and today I wondered how big is too big for a JWT to fit through an HTTP header. The HTTP spec doesn’t really impose a limit but most servers do set a limit that range between 8K – 16K bytes.
I figured I can whip up a quick jwt generator to get a rough sense of how big JWT’s can get!
for simplicity I made the key value pairs small strings (these will vary in real life of course) and defined a byte limit of 8K. Also to save battery I increased the key counts exponentially 😀
ok here’s the script. can you guess what the key limit is using back of napkin calc?
chrome is planning on phasing out support for third party cookies in 2024.
Third-party cookies, also known as cross-site cookies, are cookies set by a website other than the one you are currently on. For example, cnn.com might have a Facebook like button on their site. The like button will set a cookie that can be read by Facebook.
these cookies are a big deal for big tech because they’re the primary enabler of web tracking and advertising. if you work for a tech company that relies on digital ads, chances are they depend on third party cookies for those ads to follow you around the web.
cookie basics
when a browser issues a request to example.com, example.com can set a cookie on the browser under the domain example.com by responding to the request with a set-cookie response header.
once the cookie is saved, if the browser makes additional requests to example.com again in the future, the cookie under the matching domain will be forwarded back to the server.
because HTTP is a stateless protocol, it’s this cookie behavior that allows websites to “remember” its clients.
now, whether a cookie is first party or third party depends on the domain you’re currently on. if the cookie domain is the same as the current domain you’re on (in your browsers address bar), this is a first party cookie. The cookie belongs to the domain. every other cookie set by other domains is third party. so whether a cookie is first party or not depends on two things:
The domain of the cookie
The current domain of the page
if i’m on example.com, all the cookies set under example.com are first party. the browser may contain other cookies, saved under other domains like yahoo.com or wikipedia.org. those are all third party!
so from a user perspective, a cookie isn’t first party in an absolute sense. if the user visits a different site under a different domain, the same cookie is no longer considered first party. so again, the site you’re currently on according to the browser determines whether a cookie is considered first party or third party.
issue with third party cookies
so remember how a website can set cookies under its own domain? well, turns out they can also indirectly set cookies under other (third party) domains! all thanks to the magic of
HTML documents can contain many references / links to resources on other domains. every <img src...> or <link ...> or <script src...> is basically a GET request to another server that may live under different domains than the one you’re currently on!
those third party servers i.e myspace.com and xanga.com can also set cookies under their domain. the browser doesn’t really make a distinction between a request you made directly by typing in example.com in the address bar versus ones that are fired as a result of HTML rendering.
as a result, the same website you’re visiting transmits its own cookies AND cookies from other websites. some of those websites may even be… ad networks that set cookies for purposes of analytics and tracking.
this basic auto storage and transmission behavior of cookies combined with the hyperlinking nature of web sites is at the core of how ad tech works.
if you’re an advertiser, third party cookie data allows you to learn about visitor behaviors, such as websites they frequently visits and recent purchases and target them with ads
real world scenario time
here’s a real example that i’ve researched and verified myself.
hypothetically speaking, lets say bob is researching a rice cooker on amazon.com. bob sees one he likes, but feels overwhelmed so gives up and decides to go bake cookies instead
bob then visits a site with a cookie recipe and sees an advertisement for the same rice cooker he was looking at earlier. since bobs not on an amazon-owned site but is seeing amazon related ads, this advertisement was triggered by a transmission of third-party cookie data.
the third party domain in this case is amazons advertising network. amazon has an advertising network similar to Google’s doubleclick.
here’s an example of a real cookie that’s set by the ad network when you visit amazon.com:
when a page on amazon loads, it makes additional requests to the amazon ad network. the ad network sets this very long-lived cookie (at the time of writing, that date is more than 4 years into the future) that will be used to track your behavior. for the next 4 years, this cookie data will be forwarded to any request to the amazon ad networks domain .amazon-adsystem.com.
a quick aside about ad networks
there’s three primary groups involved in an ad network
sellers that have something to sell (companies selling rice cookers on amazon.com) and want their products to be advertised as widely and cheaply as possible and
those who want to make money by showing ads (content creators, bloggers, etc). amazon calls those that want to show ads through their web content associates.
consumers / buyers / site visitors
the value of an ad network is proportional to the size of these groups. if there are no consumers or buyers, there’s nobody to sell to. if there are no content creators / youtubers, ads are limited in their reach. if there are no sellers… well, then there are probably no ads either!
back to bob
now that the third party cookie from the ad network is set, when bob visits a recipe site that’s also part of the ad network, he’s going to load a third party tracking script (also known as pixels) from .amazon-adsystem.com that receives the cookie that’s set, looks up the user by ID on the amazon servers, and serves a targeted ad based on the information the ad network has on the user identified by the cookie.
over time, these scripts loaded by sites that are affiliated with the ad network continue to track bobs behavior through the long lived cookie for as long as they’re loaded and transmit the data back to the ad network. this builds a rich representation of bob as a consumer for re-targeting purposes.
with the “end” to third party cookies, this basic mechanism is threatened. the recipe site will be restricted to only transmitted first party (its own) cookies. cookies previously set by an ad network (from a users visit to amazon.com properties) will not be transmitted behind the scenes for ad retargeting.
i should mention that most browsers do give you the option now of disabling third party cookies. yes even chrome. here’s what i see if i click into my chrome settings. as you would expect, third party cookies right now are allowed by default. i recommend turning that off.
so no more ads following me around in future?
while this looks like a positive direction for data privacy, it’s definitely not the end for ad retargeting. remember that big tech makes an ungodly amount of money from their advertising platforms and it’s not in their interest to kill their golden goose.
the current google led proposal to replace third party cookies in chrome is called the privacy sandbox initiative. a major part of this initiative is to offer users more control over how their data gets shared. i’m going to be reading more on this in the future and write about it on another post.