<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://akr.am/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://akr.am/blog/" rel="alternate" type="text/html" /><updated>2026-02-23T20:28:22+00:00</updated><id>https://akr.am/blog/feed.xml</id><title type="html">seize the dev</title><author><name>Mohamed Akram</name><email>mohd.akram@outlook.com</email></author><entry><title type="html">Will Software Engineering Survive?</title><link href="https://akr.am/blog/posts/will-software-engineering-survive" rel="alternate" type="text/html" title="Will Software Engineering Survive?" /><published>2026-02-17T00:00:00+00:00</published><updated>2026-02-17T00:00:00+00:00</updated><id>https://akr.am/blog/posts/will-software-engineering-survive</id><content type="html" xml:base="https://akr.am/blog/posts/will-software-engineering-survive"><![CDATA[<p>Since LLMs have been released, many claims about LLMs or in support of them
have left me scratching my head, and wondering if software engineering
principles were a myth all along.</p>

<p>Before LLMs, it was well accepted that lines of code written is not a measure
of productivity. Rather, the less code you have, the better, because it’s
easier to review and maintain. This is all the more important for
network-facing and security-sensitive code, as it reduces the attack surface.
Nowadays, this has been forgotten or ignored. For example, the recently
released and very popular vibe-coded software OpenClaw is more than 800k lines
of code.</p>

<p>In cases where a lot of code is needed, due to the essential complexity of the
problem, the solution was to build a component, library or framework that was
well-tested and then used as a module. If the problem was common enough (which
is the case for most problems), it was published as open-source. For
infrastructure software, open-source has become practically mandatory.
Altogether, this reduced the work that needed to be done and focused efforts in
one place rather than having N versions of the same type of software with a
different set of bugs each. Since LLMs, this has gone out the window.
Duplication is now encouraged, and vibe coding a hundred differently buggy
closed-source versions of the same thing is fine.</p>

<p>Even if one were to accept that LLMs provide an immense productivity boost in
terms of writing code, that has never been a bottleneck of software
engineering. As Dijkstra said, “computer science is no more about computers
than astronomy is about telescopes”. I would add to that “programming is no
more about coding”, rather, it’s about gathering and refining requirements,
thinking hard about architecture, the correct data structures, security,
deployment, and myriad other aspects. Any potential time-saving in writing the
code will quickly evaporate in the grand scheme of things, particularly in the
long run for any serious production project.</p>

<p>Before LLMs, programming was viewed as a deterministic endeavour described in a
precise language. Programming languages, and not English, were viewed as the
correct abstraction to improve clarity. Dijkstra (again)
<a href="https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667.html">argued</a>
against “natural language programming”, while
<a href="https://en.wikipedia.org/wiki/Leslie_Lamport">Lamport</a> argues in favor of even
more formalism. I suppose natural language programming is too tempting, but
thinking logically and symbolically is unavoidable whether in mathematics or
programming. The fewer programmers know it, the worse software will become.</p>

<p>Another striking phenomenon is the suggestion that LLMs be used for “grunt
work”, such as tests. As some have argued, tests can be more important than the
code itself as they encode the desired behavior of a program. If any part
should be written by the human, it’s the test, since only they would know
what’s expected out of the program.</p>

<p>These are just some examples of software engineering principles that seem to
have been thrown out overnight. Some have justified this by saying that LLMs
are so revolutionary that we have to rethink software engineering altogether. I
find this hard to believe. Software engineering is rooted in formal concepts
from computer science and mathematics, theories including information theory,
complexity theory, systems theory, among others. These cannot be bypassed with
or without AI. Not to mention the physical limits regarding the immense
resource consumption of these models. When the calculator was introduced,
formal proofs and mathematical rigor did not go anywhere. Students still learn
to practice mathematics without a calculator, and I believe the same will
happen with software engineering once the craze dies down—if it’s meant to
survive.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[Investigating recent claims amidst the rise of LLMs.]]></summary></entry><entry><title type="html">Why is the Gmail app 700 MB?</title><link href="https://akr.am/blog/posts/why-is-the-gmail-app-700-mb" rel="alternate" type="text/html" title="Why is the Gmail app 700 MB?" /><published>2026-01-06T00:00:00+00:00</published><updated>2026-01-06T00:00:00+00:00</updated><id>https://akr.am/blog/posts/why-is-the-gmail-app-700-mb</id><content type="html" xml:base="https://akr.am/blog/posts/why-is-the-gmail-app-700-mb"><![CDATA[<p>The Gmail app, <a href="https://apps.apple.com/us/app/422689480">on the App Store</a>, is
currently 760.7 MB in size. It is in the top three <a href="https://akr.am/app-bloat/">most bloated
apps</a> out of the top 100 free apps. I don’t use it
on my phone, but I was prompted to compare it with the seemingly hefty one I
(have to) use, Outlook, while clearing up some storage space. Its measly 428 MB
size pales in comparison.</p>

<p>This isn’t new. In 2017, Axios
<a href="https://www.axios.com/2017/12/15/the-top-iphone-apps-are-taking-up-a-lot-more-space-1513303131">reported</a>
that the top iPhone apps had been taking up an increasing amount of space over
the period from 2013 to 2017. For most of that period, the size of the Gmail
app hovered around 12 MB, with a sudden jump to more than 200 MB near the start
of 2017. Other popular apps also saw a 10x or more increase in size over the
same period.</p>

<p>Gmail isn’t even the worst offender, it’s just a more popular one. The Tesla
and Crypto.com apps are around 1 GB each. So is Samsung’s SmartThings app.
What about Google’s other popular apps? Google Home is another hefty one, at
630 MB, that I used for its remote feature, which I replaced with Google TV at
almost a tenth the size. Their other popular apps average around <a href="https://akr.am/app-bloat/?list=popular&amp;developer=Google">250
MB</a> in size. This
seems tame in comparison to Microsoft, with an average app size of around
<a href="https://akr.am/app-bloat/?list=popular&amp;developer=Microsoft+Corporation">330MB</a>.
For reference, the average size of an app in the top 100 free apps is <a href="https://akr.am/app-bloat/?list=top">280
MB</a> or, in a more expanded set (including
games), <a href="https://akr.am/app-bloat/?list=popular">200MB</a>.</p>

<p>Just to put this into perspective, on my device, apps (excluding their data)
use up 35 GB, and the data is another 35 GB. iOS takes up another 25 GB. Let’s
say, 100 GB for apps, data and the OS. That leaves me with 20 GB (leaving a
margin of free space for updates) meant to be used for capturing 4K video and
high-quality photos (why else get an iPhone), and storing music (don’t even
think about lossless). The reality is that running out of space also slows
things down, since most of my photos need to be fetched from the cloud before
viewing them, and I need to re-download these hefty offloaded apps when I need
them again. And good luck if you have a limited data bundle.</p>

<p>Maybe this doesn’t matter. The latest iPhones start at 256 GB, and surely I’ll
have plenty of space when I get a new one (although I remember saying this when
I upgraded to 64 GB from 32 GB). It’s not really about the space though. These
apps don’t have 50x or even 10x the functionality. But now they’re 100x larger,
and probably slower. Why?</p>

<p>Also, can someone explain why Microsoft Authenticator is 150 MB to show 6-digit
codes?</p>

<p>It’s not clear if this is specifically an iOS problem. I don’t have an Android
device and I could not find a way to get that information from the Play Store
without a device. That said, I checked the size of Gmail on someone’s Android
phone, and it’s around 185 MB, which certainly seems much better.</p>

<p>And if you’re considering switching from the default apps, this is what the
installed size (which differs slightly from the App Store size) is of the
alternatives on my iPhone running iOS 26.2:</p>

<table>
  <thead>
    <tr>
      <th>App</th>
      <th>Apple</th>
      <th>Google</th>
      <th>Microsoft</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Files - Drive - OneDrive</td>
      <td>2.6 MB</td>
      <td><strong>370 MB</strong></td>
      <td>283 MB</td>
    </tr>
    <tr>
      <td>Passwords - Authenticator</td>
      <td>3.2 MB</td>
      <td>35 MB</td>
      <td><strong>138 MB</strong></td>
    </tr>
    <tr>
      <td>FaceTime - Meet - Teams</td>
      <td>3.4 MB</td>
      <td>263 MB</td>
      <td><strong>423 MB</strong></td>
    </tr>
    <tr>
      <td>Photos</td>
      <td>4.2 MB</td>
      <td><strong>372 MB</strong></td>
      <td>-</td>
    </tr>
    <tr>
      <td>Safari - Chrome - Edge</td>
      <td>5.1 MB</td>
      <td>313 MB</td>
      <td><strong>397 MB</strong></td>
    </tr>
    <tr>
      <td>Reminders - Tasks - To Do</td>
      <td>7.7 MB</td>
      <td>89 MB</td>
      <td><strong>132 MB</strong></td>
    </tr>
    <tr>
      <td>Mail - Gmail - Outlook</td>
      <td>8.7 MB</td>
      <td><strong>673 MB</strong></td>
      <td>376 MB</td>
    </tr>
    <tr>
      <td>Home</td>
      <td>14.1 MB</td>
      <td><strong>584 MB</strong></td>
      <td>-</td>
    </tr>
    <tr>
      <td>Notes - Keep - OneNote</td>
      <td>17.3 MB</td>
      <td>171 MB</td>
      <td><strong>315 MB</strong></td>
    </tr>
    <tr>
      <td>Maps</td>
      <td>68 MB</td>
      <td><strong>385 MB</strong></td>
      <td>-</td>
    </tr>
    <tr>
      <td>Pages - Docs - Word</td>
      <td><strong>456 MB</strong></td>
      <td>311 MB</td>
      <td>434 MB</td>
    </tr>
    <tr>
      <td>Numbers - Sheets - Excel</td>
      <td><strong>500 MB</strong></td>
      <td>337 MB</td>
      <td>370 MB</td>
    </tr>
    <tr>
      <td>Keynote - Slides - PowerPoint</td>
      <td><strong>516 MB</strong></td>
      <td>270 MB</td>
      <td>376 MB</td>
    </tr>
  </tbody>
</table>

<p>So, why is the Gmail app almost 80x the size of the native Mail app? My guess
is as good as yours.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[The current state of app bloat.]]></summary></entry><entry><title type="html">Bits of Open-Source in 2025</title><link href="https://akr.am/blog/posts/bits-of-open-source-in-2025" rel="alternate" type="text/html" title="Bits of Open-Source in 2025" /><published>2025-12-25T00:00:00+00:00</published><updated>2025-12-25T00:00:00+00:00</updated><id>https://akr.am/blog/posts/bits-of-open-source-in-2025</id><content type="html" xml:base="https://akr.am/blog/posts/bits-of-open-source-in-2025"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>As part of my professional and personal work, I sometimes contribute fixes or
enhancements to open-source projects. I’ll be going over some of the ones that
I found interesting this year.</p>

<h2 id="nodejs">Node.js</h2>

<p>The year started off with a long-standing
<a href="https://github.com/typeorm/typeorm/pull/10349">PR</a> that I had for the Node.js
ORM, TypeORM, getting merged, that improved the performance of hydrating
entities. This was the last in a series of PRs directed at performance
improvements in projects in the Node.js ecosystem that I had worked on as a
result of profiling a <a href="improving-date-formatting-performance-in-node-js">slow
application</a>.</p>

<p>Elsewhere in the Node.js world, I contributed a <a href="https://github.com/nodejs/import-in-the-middle/commits/main/?author=mohd-akram&amp;since=2025-01-01&amp;until=2025-12-31">few
fixes</a>
to the
<a href="https://www.npmjs.com/package/import-in-the-middle">import-in-the-middle</a>
project, which is a library that lets you intercept imports in Node.js, notably
used by Sentry to add instrumentation. I was already familiar with the
library’s code since last year when I had submitted a
<a href="https://github.com/nodejs/import-in-the-middle/pull/85">fix</a> that touched many
of its core parts. This made it slightly less tricky to debug the issues this
time around, which can be difficult due to the intricacies of module resolution
in a JavaScript runtime.</p>

<p>I also reported an <a href="https://github.com/nodejs/node/issues/57143">issue</a> in
Node.js regarding the <code class="language-plaintext highlighter-rouge">spawn</code> and <code class="language-plaintext highlighter-rouge">execFile</code> APIs that could lead to unsafe
usage. I was happy to see it resolved quickly and released as a <a href="https://nodejs.org/en/blog/release/v24.0.0#deprecations-and-removals">deprecation
warning</a>
in Node.js 24. With all the supply chain attacks seen this year, it was good to
contribute something to the security of the ecosystem. As part of investigating
this problem, I also learned that it is <a href="https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/">almost
impossible</a>
to pass arguments safely to cmd on Windows, and developed
<a href="https://www.npmjs.com/package/batspawn">batspawn</a> to help with that.</p>

<h2 id="macports">MacPorts</h2>

<p>As a maintainer for MacPorts, I sometimes run into projects that do not tag
their releases, so we need to rely on git to check if there was an update for
them. There wasn’t a good way to do this, so I contributed an
<a href="https://github.com/macports/macports-base/pull/364">enhancement</a> that checks
for updates in a repository using the helpful <code class="language-plaintext highlighter-rouge">git ls-remote</code> command. This was
my first contribution to MacPorts base as I had previously only contributed to
ports. For ports, I had a total of <a href="https://github.com/macports/macports-ports/commits/master/?author=mohd-akram&amp;since=2025-01-01&amp;until=2025-12-31">421
commits</a>
in 2025 as of writing.</p>

<p>One notable addition to ports was the <a href="https://angleproject.org">ANGLE</a>
project, developed by Google, which provides a conformant OpenGL ES
implementation on almost every platform and is used by Chrome for WebGL. As
part of porting it, I set up <a href="https://github.com/gsource-mirror">git mirrors</a>
for some of the dependencies hosted on googlesource.com (which doesn’t offer
stable tarballs) and understood what a bare repository is.</p>

<p>I also maintain <a href="https://include-what-you-use.org">include-what-you-use</a> on
MacPorts, which is a neat utility that ensures you include the right headers in
a C or C++ project, and submitted <a href="https://github.com/include-what-you-use/include-what-you-use/commits/master/?author=mohd-akram&amp;since=2025-01-01&amp;until=2025-12-31">several
enhancements</a>
to the upstream project to improve its behavior on macOS.</p>

<h2 id="macos">macOS</h2>

<p>Since most of my personal work is done on macOS, I sometimes run into bugs in
the core tools (or userland) that come with it. Most of the tools in macOS come
from FreeBSD, so I submit the fixes there in the hopes that they will be picked
up in the next release of macOS. Two such issues were in sed, and one of which
had originated in OpenBSD code that FreeBSD imported. I decided to submit the
<a href="https://marc.info/?l=openbsd-tech&amp;m=172381888706699">fix</a> for that one to
OpenBSD first, and reported the
<a href="https://marc.info/?l=openbsd-tech&amp;m=173383873205691">other</a>, which
interestingly had already been fixed in NetBSD years ago. They were both fixed
in OpenBSD, then picked up by FreeBSD, and finally brought to macOS 26. The
fixes themselves were simple, but it was interesting to see code move across
four different operating systems — NetBSD, OpenBSD, FreeBSD, and macOS.</p>

<p>Another issue encountered on macOS was that the default syntax highlighting for
shell scripts in Vim was poor. This was because the default highlighting was
for the ancient Bourne shell syntax and not the modern POSIX one. This finally
changed after I submitted a <a href="https://github.com/vim/vim/pull/16939">fix</a> for
it.</p>

<p>Last, but not least, less (the pager) had long had an issue that was a pet
peeve of mine: when trying to copy text from a git diff (paged through less),
tabs would get converted to spaces. Someone had mentioned the exact cause and
fix for this in a StackExchange
<a href="https://unix.stackexchange.com/questions/412060/how-to-get-less-to-show-tabs-as-tabs">post</a>
years ago, so I submitted the same
<a href="https://github.com/gwsw/less/pull/620">fix</a>, and now diffs are displayed as
expected when using <code class="language-plaintext highlighter-rouge">less -rU</code> as the git pager.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This year had more of a focus on macOS fixes while last year I followed up on
patches in the Linux world, largely to support an Arabic keyboard that I had
<a href="https://gitlab.freedesktop.org/xkeyboard-config/xkeyboard-config/-/merge_requests/549">submitted</a>
the year before, which required co-ordination with several projects under
Freedesktop.org, X.Org and GNOME to support a new
<a href="https://gitlab.freedesktop.org/xorg/proto/xorgproto/-/merge_requests/78">keysym</a>.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[My open-source interactions in the past year.]]></summary></entry><entry><title type="html">Parsing JSON in Forty Lines of Awk</title><link href="https://akr.am/blog/posts/parsing-json-in-forty-lines-of-awk" rel="alternate" type="text/html" title="Parsing JSON in Forty Lines of Awk" /><published>2025-03-09T00:00:00+00:00</published><updated>2025-03-09T00:00:00+00:00</updated><id>https://akr.am/blog/posts/parsing-json-in-forty-lines-of-awk</id><content type="html" xml:base="https://akr.am/blog/posts/parsing-json-in-forty-lines-of-awk"><![CDATA[<p>JSON is not a friendly format to the Unix shell — it’s hierarchical, and
cannot be reasonably split on any character (other than the newline, which is
not very useful) as that character might be included in a string. There are
well-known tools such as <a href="https://jqlang.org">jq</a> that let you correctly parse
JSON documents in the shell, but all require an additional dependency. Another
option is to use Python, which is ubiquitous enough that it can be expected to
be installed on virtually every machine, and for new projects would be the
recommended option.</p>

<p>However, I already had a working POSIX shell script that now had a requirement
to read and parse JSON. It had previously extracted values from HTML which,
while also being hierarchical, can be reliably split on certain characters (the
angle brackets) for basic extraction of values. awk is the closest thing to a
real programming language that’s available in the POSIX shell, so I thought I’d
try to write a basic JSON parser in it. I had already written a full-blown
<a href="https://github.com/mohd-akram/jawk">one</a> before, so I knew it was doable, but
I needed something more concise.</p>

<p>First, there are some caveats. JSON is <a href="https://seriot.ch/projects/parsing_json.html">notoriously
tricky</a> to get completely right,
despite its simple grammar. The following code assumes that it will be fed
valid JSON. It has some basic validation as a function of the parsing and will
most likely throw an error if it encounters something strange, but there are no
guarantees beyond that. In my case, I’m reading JSON from a single, trusted
source, so this is an acceptable constraint.</p>

<p>The interface is simple, a single function that accepts a JSON document and a
dotted path to a key or array index, and returns the corresponding value. It
can be used like so:</p>

<div class="language-awk highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Get one value</span>
<span class="nx">name</span> <span class="o">=</span> <span class="nx">decode_json_string</span><span class="p">(</span><span class="nx">get_json_value</span><span class="p">(</span><span class="nx">json</span><span class="p">,</span> <span class="s2">"author.name"</span><span class="p">))</span>

<span class="c1"># Loop over an object</span>
<span class="nx">get_json_value</span><span class="p">(</span><span class="nx">json</span><span class="p">,</span> <span class="s2">"dependencies"</span><span class="p">,</span> <span class="nx">deps</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="nx">name</span> <span class="o">in</span> <span class="nx">deps</span><span class="p">)</span>
	<span class="nx">version</span> <span class="o">=</span> <span class="nx">decode_json_string</span><span class="p">(</span><span class="nx">deps</span><span class="p">[</span><span class="nx">name</span><span class="p">])</span>

<span class="c1"># Loop over an array</span>
<span class="nx">get_json_value</span><span class="p">(</span><span class="nx">json</span><span class="p">,</span> <span class="s2">"payload.items"</span><span class="p">,</span> <span class="nx">items</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">items</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
	<span class="nx">get_json_value</span><span class="p">(</span><span class="nx">items</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">null</span><span class="p">,</span> <span class="nx">item</span><span class="p">)</span>
	<span class="nx">type</span> <span class="o">=</span> <span class="nx">decode_json_string</span><span class="p">(</span><span class="nx">item</span><span class="p">[</span><span class="s2">"type"</span><span class="p">])</span>
	<span class="nx">name</span> <span class="o">=</span> <span class="nx">decode_json_string</span><span class="p">(</span><span class="nx">item</span><span class="p">[</span><span class="s2">"name"</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>To keep things simple, the same function handles both arrays and objects. In
JavaScript, arrays are roughly equivalent to objects with integer keys, and we
use the same approach here. This is the
<a href="https://gist.github.com/mohd-akram/1c0d4cb337b62e3cce0ab7e02e6281fd">implementation</a>,
expanded and annotated:</p>

<div class="language-awk highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The function takes three parameters: the JSON object/array, the desired key,</span>
<span class="c1"># and an optional array to be filled if the key points to an object or array.</span>
<span class="c1"># The rest are local variables (awk only allows local variables in the form</span>
<span class="c1"># of function parameters)</span>
<span class="kd">function</span> <span class="nx">get_json_value</span><span class="p">(</span> <span class="err">\</span>
	<span class="nx">s</span><span class="p">,</span> <span class="nx">key</span><span class="p">,</span> <span class="nx">a</span><span class="p">,</span>
	<span class="nx">skip</span><span class="p">,</span> <span class="nx">type</span><span class="p">,</span> <span class="nx">all</span><span class="p">,</span> <span class="nx">rest</span><span class="p">,</span> <span class="nx">isval</span><span class="p">,</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">c</span><span class="p">,</span> <span class="nx">k</span><span class="p">,</span> <span class="nx">null</span> <span class="err">\</span>
<span class="p">)</span> <span class="p">{</span>
	<span class="c1"># Trim leading whitespace, if any</span>
	<span class="k">if</span> <span class="p">(</span><span class="nb">match</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="sr">/^</span><span class="se">[</span><span class="sr">[:space:</span><span class="se">]</span><span class="sr">]+/</span><span class="p">))</span> <span class="nx">s</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">RLENGTH</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>

	<span class="c1"># Get the type of value by its first character</span>
	<span class="nx">type</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

	<span class="c1"># This variable is needed for when we recursively call the function</span>
	<span class="c1"># It will be true if the key argument is undefined, since such</span>
	<span class="c1"># variables can behave as either a string or a number in awk</span>
	<span class="nx">all</span> <span class="o">=</span> <span class="nx">key</span> <span class="o">==</span> <span class="s2">""</span> <span class="o">&amp;&amp;</span> <span class="nx">key</span> <span class="o">==</span> <span class="mi">0</span>

	<span class="c1"># If this is a primitive</span>
	<span class="k">if</span> <span class="p">(</span><span class="nx">type</span> <span class="o">!=</span> <span class="s2">"{"</span> <span class="o">&amp;&amp;</span> <span class="nx">type</span> <span class="o">!=</span> <span class="s2">"["</span><span class="p">)</span> <span class="p">{</span>
		<span class="c1"># Ensure a key is not passed</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">all</span><span class="p">)</span> <span class="nx">error</span><span class="p">(</span><span class="s2">"invalid json array/object "</span> <span class="nx">s</span><span class="p">)</span>

		<span class="c1"># Parse the value</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nb">match</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="sr">/^</span><span class="se">(</span><span class="sr">null|true|false|"</span><span class="se">(\\</span><span class="sr">.|</span><span class="se">[^\\</span><span class="sr">"</span><span class="se">])</span><span class="sr">*"|</span><span class="se">[</span><span class="sr">.0-9Ee+-</span><span class="se">]</span><span class="sr">+</span><span class="se">)</span><span class="sr">/</span><span class="p">))</span>
			<span class="nx">error</span><span class="p">(</span><span class="s2">"invalid json value "</span> <span class="nx">s</span><span class="p">)</span>

		<span class="c1"># And return it</span>
		<span class="k">return</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">RLENGTH</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c1"># Get the first part of the key (which we will be looking for)</span>
	<span class="c1"># if the path is dotted and save the rest for now</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">all</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="nx">i</span> <span class="o">=</span> <span class="nb">index</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span> <span class="s2">"."</span><span class="p">)))</span> <span class="p">{</span>
		<span class="nx">rest</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span> <span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
		<span class="nx">key</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c1"># isval keeps track of whether we are looking at a JSON key or value</span>
	<span class="c1"># In an array, all items are values</span>
	<span class="c1"># k is the current key</span>
	<span class="c1"># If this is an array, it is the index, which starts at 0</span>
	<span class="k">if</span> <span class="p">((</span><span class="nx">isval</span> <span class="o">=</span> <span class="nx">type</span> <span class="o">==</span> <span class="s2">"["</span><span class="p">))</span> <span class="nx">k</span> <span class="o">=</span> <span class="mi">0</span>

	<span class="c1"># Loop over the characters in the provided JSON</span>
	<span class="c1"># Skip the opening brace or bracket (to avoid infinite recursion) and</span>
	<span class="c1"># increment the index by the length of the token</span>
	<span class="k">for</span> <span class="p">(</span><span class="nx">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="nb">length</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span> <span class="nx">i</span> <span class="o">+=</span> <span class="nb">length</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="p">{</span>
		<span class="c1"># Skip over whitespace</span>
		<span class="k">if</span> <span class="p">(</span><span class="nb">match</span><span class="p">(</span><span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">i</span><span class="p">),</span> <span class="sr">/^</span><span class="se">[</span><span class="sr">[:space:</span><span class="se">]</span><span class="sr">]+/</span><span class="p">))</span> <span class="p">{</span>
			<span class="nx">c</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">RLENGTH</span><span class="p">)</span>
			<span class="k">continue</span>
		<span class="p">}</span>

		<span class="c1"># Temporarily assign the first character to our token variable</span>
		<span class="nx">c</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">i</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

		<span class="c1"># If it's a closing brace or bracket, we've reached the end of</span>
		<span class="c1"># the object or array, so exit the loop</span>
		<span class="k">if</span> <span class="p">(</span><span class="nx">c</span> <span class="o">==</span> <span class="s2">"}"</span> <span class="o">||</span> <span class="nx">c</span> <span class="o">==</span> <span class="s2">"]"</span><span class="p">)</span> <span class="k">break</span>

		<span class="c1"># If we find a comma in an object, the next item will be a key,</span>
		<span class="c1"># so reset isval. If it's an array, increment the index</span>
		<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">c</span> <span class="o">==</span> <span class="s2">","</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">((</span><span class="nx">isval</span> <span class="o">=</span> <span class="nx">type</span> <span class="o">==</span> <span class="s2">"["</span><span class="p">))</span> <span class="o">++</span><span class="nx">k</span> <span class="p">}</span>

		<span class="c1"># If we see a colon, the next token will be a value</span>
		<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">c</span> <span class="o">==</span> <span class="s2">":"</span><span class="p">)</span> <span class="nx">isval</span> <span class="o">=</span> <span class="mi">1</span>

		<span class="c1"># Otherwise, we expect a JSON value</span>
		<span class="k">else</span> <span class="p">{</span>
			<span class="c1"># If the key matches, this is our desired value,</span>
			<span class="c1"># so pass the rest of the key and return the result</span>
			<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">all</span> <span class="o">&amp;&amp;</span> <span class="nx">k</span> <span class="o">==</span> <span class="nx">key</span> <span class="o">&amp;&amp;</span> <span class="nx">isval</span><span class="p">)</span>
				<span class="k">return</span> <span class="nx">get_json_value</span><span class="p">(</span><span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">i</span><span class="p">),</span> <span class="nx">rest</span><span class="p">,</span> <span class="nx">a</span><span class="p">)</span>

			<span class="c1"># Otherwise, get the full value</span>
			<span class="nx">c</span> <span class="o">=</span> <span class="nx">get_json_value</span><span class="p">(</span><span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">i</span><span class="p">),</span> <span class="nx">null</span><span class="p">,</span> <span class="nx">null</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

			<span class="c1"># And add it to the associative array</span>
			<span class="k">if</span> <span class="p">(</span><span class="nx">all</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="nx">skip</span> <span class="o">&amp;&amp;</span> <span class="nx">isval</span><span class="p">)</span> <span class="nx">a</span><span class="p">[</span><span class="nx">k</span><span class="p">]</span> <span class="o">=</span> <span class="nx">c</span>

			<span class="c1"># If this is a string and we're not expecting a value,</span>
			<span class="c1"># then it's a key, so trim the quotes and save it</span>
			<span class="k">if</span> <span class="p">(</span><span class="nx">c</span> <span class="o">~</span> <span class="sr">/^"/</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="nx">isval</span><span class="p">)</span> <span class="nx">k</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">c</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="nb">length</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
		<span class="p">}</span>
	<span class="p">}</span>

	<span class="c1"># Do a basic check that the object or array was properly closed</span>
	<span class="k">if</span> <span class="p">((</span><span class="nx">type</span> <span class="o">==</span> <span class="s2">"{"</span> <span class="o">&amp;&amp;</span> <span class="nx">c</span> <span class="o">!=</span> <span class="s2">"}"</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="nx">type</span> <span class="o">==</span> <span class="s2">"["</span> <span class="o">&amp;&amp;</span> <span class="nx">c</span> <span class="o">!=</span> <span class="s2">"]"</span><span class="p">))</span>
		<span class="nx">error</span><span class="p">(</span><span class="s2">"unterminated json array/object "</span> <span class="nx">s</span><span class="p">)</span>

	<span class="c1"># If we're here, it means we didn't find the value we're looking for</span>
	<span class="c1"># so only return something if the whole array or object was requested</span>
	<span class="k">if</span> <span class="p">(</span><span class="nx">all</span><span class="p">)</span> <span class="k">return</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>To make the parser more useful, you’ll also need a function to do some decoding
of JSON strings. This is a simple one, which handles everything except Unicode
escape sequences, but throws an error if it encounters one:</p>

<div class="language-awk highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">decode_json_string</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">out</span><span class="p">,</span> <span class="nx">esc</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="nx">s</span> <span class="o">!~</span> <span class="sr">/^"./</span> <span class="o">||</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nb">length</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="s2">"\""</span><span class="p">)</span>
		<span class="nx">error</span><span class="p">(</span><span class="s2">"invalid json string "</span> <span class="nx">s</span><span class="p">)</span>

	<span class="nx">s</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="nb">length</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>

	<span class="nx">esc</span><span class="p">[</span><span class="s2">"b"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\b"</span><span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"f"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\f"</span><span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"n"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\n"</span><span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"\""</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\""</span>
	<span class="nx">esc</span><span class="p">[</span><span class="s2">"r"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\r"</span><span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"t"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\t"</span><span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"/"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"/"</span> <span class="p">;</span> <span class="nx">esc</span><span class="p">[</span><span class="s2">"\\"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"\\"</span>

	<span class="k">while</span> <span class="p">(</span><span class="nb">match</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="sr">/</span><span class="se">\\</span><span class="sr">/</span><span class="p">))</span> <span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">RSTART</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">in</span> <span class="nx">esc</span><span class="p">))</span>
			<span class="nx">error</span><span class="p">(</span><span class="s2">"unknown json escape "</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">RSTART</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
		<span class="nx">out</span> <span class="o">=</span> <span class="nx">out</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">RSTART</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="nx">esc</span><span class="p">[</span><span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">RSTART</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span>
		<span class="nx">s</span> <span class="o">=</span> <span class="nb">substr</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">RSTART</span><span class="o">+</span><span class="mi">2</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="nx">out</span> <span class="nx">s</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And finally, since there is no built-in error function in awk, you can use
something like this:</p>

<div class="language-awk highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">error</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">printf</span> <span class="s2">"%s: %s\n"</span><span class="p">,</span> <span class="kc">ARGV</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="nx">msg</span> <span class="o">&gt;</span> <span class="s2">"/dev/stderr"</span>
	<span class="k">exit</span> <span class="mi">1</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[A single-function JSON parser for the POSIX shell.]]></summary></entry><entry><title type="html">A Tricky Floating-Point Calculation</title><link href="https://akr.am/blog/posts/a-tricky-floating-point-calculation" rel="alternate" type="text/html" title="A Tricky Floating-Point Calculation" /><published>2024-06-26T00:00:00+00:00</published><updated>2024-06-26T00:00:00+00:00</updated><id>https://akr.am/blog/posts/a-tricky-floating-point-calculation</id><content type="html" xml:base="https://akr.am/blog/posts/a-tricky-floating-point-calculation"><![CDATA[<p>I was working on my <a href="https://www.npmjs.com/package/simple-id">simple-id</a>
library and was curious about <a href="https://www.pcg-random.org/posts/bounded-rands.html">different
methods</a> to generate an
unbiased random number in a given range. As a way to test whether a random
number generator (RNG) produces good results, one can calculate the expected
number of repeats based on the <a href="https://en.wikipedia.org/wiki/Birthday_problem">birthday
problem</a> and compare it with
the output of the RNG. The formula can be found online in several places as it
can also be used for <a href="https://matt.might.net/articles/counting-hash-collisions/">counting expected hash
collisions</a> or cache
hits. The expected number of repeats can be calculated as follows:</p>

<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>n</mi><mo>−</mo><mi>d</mi><mo>+</mo><mi>d</mi><msup><mrow><mo fence="true">(</mo><mfrac><mrow><mi>d</mi><mo>−</mo><mn>1</mn></mrow><mi>d</mi></mfrac><mo fence="true">)</mo></mrow><mi>n</mi></msup></mrow><annotation encoding="application/x-tex">n - d + d \left( \frac{d-1}{d} \right)^n</annotation></semantics></math></span>

<p>Where <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span> is the number of generated values and <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span> is the range.</p>

<p>Converting this to JavaScript code:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nf">repeats</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="nx">n</span> <span class="o">-</span> <span class="nx">d</span> <span class="o">+</span> <span class="nx">d</span> <span class="o">*</span> <span class="p">((</span><span class="nx">d</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="nx">d</span><span class="p">)</span><span class="o">**</span><span class="nx">n</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In my case, I wanted to check the expected repeats after generating a million
random IDs, where each ID is 8 characters long and there are 31 characters in
the alphabet. That is, <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1</mn><msup><mn>0</mn><mn>6</mn></msup></mrow><annotation encoding="application/x-tex">n=10^6</annotation></semantics></math></span> and <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mo>=</mo><mn>3</mn><msup><mn>1</mn><mn>8</mn></msup></mrow><annotation encoding="application/x-tex">d=31^8</annotation></semantics></math></span>. Running this in JavaScript:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="nf">repeats</span><span class="p">(</span><span class="mi">1</span><span class="nx">e6</span><span class="p">,</span> <span class="mi">31</span><span class="o">**</span><span class="mi">8</span><span class="p">));</span>
</code></pre></div></div>

<p>The result is -19.72998046875.</p>

<p>Wait, what? Negative repeats? I’ve known about the infamous <code class="language-plaintext highlighter-rouge">0.1 + 0.2</code>
inaccuracy, but this is different; it’s a much larger error, and from only a
handful of operations. I double (and triple) checked that I typed the formula
correctly, and tried it in both C and Python and got the same result. I then
put it into
<a href="https://www.wolframalpha.com/input?i=n%3D1e6%3B+d%3D31**8%3B+n+-+d+%2B+d+*+%28%28d+-+1%29+%2F+d%29**n">WolframAlpha</a>,
since it supports arbitrary precision, and it returned something longer, but
much more sensible:</p>

<blockquote>
  <p><small>0.5862405426220060595736096268572069594630842175323822481323009</small></p>
</blockquote>

<p>That seemed right. I started tweaking the formula to try to get rid of the
error. I figured that, most likely, the division leads to a loss of accuracy
that’s exacerbated by raising to a very high power. To avoid this, I reached
for <code class="language-plaintext highlighter-rouge">BigInt</code> to calculate the power of the numerator and denominator
separately, then divide. Since <code class="language-plaintext highlighter-rouge">BigInt</code> only supports integers, I also multiply
by a “precision” before converting to a number, then divide by it after to get
a decimal value:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nf">repeats</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">p</span> <span class="o">=</span> <span class="nx">n</span><span class="o">*</span><span class="nx">d</span><span class="p">;</span>
  <span class="kd">const</span> <span class="nx">e</span> <span class="o">=</span> <span class="nc">Number</span><span class="p">(</span><span class="nc">BigInt</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span> <span class="o">*</span> <span class="nc">BigInt</span><span class="p">(</span><span class="nx">d</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="nc">BigInt</span><span class="p">(</span><span class="nx">n</span><span class="p">)</span> <span class="o">/</span> <span class="nc">BigInt</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="o">**</span><span class="nc">BigInt</span><span class="p">(</span><span class="nx">n</span><span class="p">))</span><span class="o">/</span><span class="nx">p</span><span class="p">;</span>
  <span class="k">return</span> <span class="nx">n</span> <span class="o">-</span> <span class="nx">d</span> <span class="o">+</span> <span class="nx">d</span> <span class="o">*</span> <span class="nx">e</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That returns <strong>0.586</strong>3037109375. On the one hand, the result is correct to
three significant figures, which is an improvement over the previous zero. On
the other, this takes a second and a half to finish computing. After trying a
few different things, I got a tip from a user on the
<a href="https://libera-math.github.io">##math</a> IRC channel to try <code class="language-plaintext highlighter-rouge">log1p</code> for the
calculation. What does <code class="language-plaintext highlighter-rouge">log</code> have to do with anything? The way computers
calculate the power is like so:</p>

<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msup><mi>x</mi><mi>n</mi></msup><mo>=</mo><msup><mi>e</mi><mrow><mo fence="true">(</mo><mi>n</mi><mi>log</mi><mo>⁡</mo><mi>x</mi><mo fence="true">)</mo></mrow></msup></mrow><annotation encoding="application/x-tex">x^n = e^{\left( n \log{x} \right)}</annotation></semantics></math></span>

<p>So we can rewrite the formula to:</p>

<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>n</mi><mo>−</mo><mi>d</mi><mo>+</mo><mi>d</mi><mi>exp</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mi>n</mi><mi>log</mi><mo>⁡</mo><mfrac><mrow><mi>d</mi><mo>−</mo><mn>1</mn></mrow><mi>d</mi></mfrac><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">n - d + d \exp{\left( n \log{\frac{d-1}{d}} \right)}</annotation></semantics></math></span>

<p>The <code class="language-plaintext highlighter-rouge">log1p</code> function returns the result of <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>log</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mn>1</mn><mo>+</mo><mi>x</mi><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">\log{\left(1 + x\right)}</annotation></semantics></math></span>. The
reason it exists is because floating-point does not work well when adding a
small <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span> to <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1</mn></mrow><annotation encoding="application/x-tex">1</annotation></semantics></math></span>. Since <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span> is very large, this is the case in our
formula. We can rewrite the expression to make it usable in <code class="language-plaintext highlighter-rouge">log1p</code>:</p>

<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>n</mi><mo>−</mo><mi>d</mi><mo>+</mo><mi>d</mi><mi>exp</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mi>n</mi><mi>log</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mn>1</mn><mo>−</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><mo fence="true">)</mo></mrow><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">n - d + d \exp{\left( n \log{\left( 1 - \frac{1}{d} \right)} \right)}</annotation></semantics></math></span>

<p>The code then becomes:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nf">repeats</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="nx">n</span> <span class="o">-</span> <span class="nx">d</span> <span class="o">+</span> <span class="nx">d</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="nx">n</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nf">log1p</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">/</span> <span class="nx">d</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Running it returns <strong>0.586</strong>3037109375, the same value as using <code class="language-plaintext highlighter-rouge">BigInt</code>, only
now it doesn’t take a second and a half to calculate. However, it still wasn’t
as accurate as I’d like. While looking into this, I came across another
function that also works better for small values, <code class="language-plaintext highlighter-rouge">expm1</code>, which returns
<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mi>x</mi></msup><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">e^x - 1</annotation></semantics></math></span>. We can factor out <span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span> to make this usable too:</p>

<span class="katex"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>n</mi><mo>+</mo><mi>d</mi><mrow><mo fence="true">(</mo><mi>exp</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mi>n</mi><mi>log</mi><mo>⁡</mo><mrow><mo fence="true">(</mo><mn>1</mn><mo>−</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><mo fence="true">)</mo></mrow><mo fence="true">)</mo></mrow><mo>−</mo><mn>1</mn><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">n + d \left( \exp{\left( n \log{\left( 1 - \frac{1}{d} \right)} \right)} - 1 \right)</annotation></semantics></math></span>

<p>And in code:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nf">repeats</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="nx">n</span> <span class="o">+</span> <span class="nx">d</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nf">expm1</span><span class="p">(</span><span class="nx">n</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nf">log1p</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">/</span> <span class="nx">d</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Running this, we get <strong>0.586240542</strong>3540622, a much improved result accurate to
nine significant figures. This was as good as it was going to get in
JavaScript.</p>

<p>Out of curiosity, I wanted to see if a better result was possible using C,
since it provides yet another function for preserving accuracy, <code class="language-plaintext highlighter-rouge">fma</code>, also
known as fused multiply and add. Rather than lose accuracy twice on both the
multiplication and the addition, rounding only happens once after the
calculation when using <code class="language-plaintext highlighter-rouge">fma</code>. It also happens to be faster, since it uses a
single native CPU instruction for both operations. C also provides the <code class="language-plaintext highlighter-rouge">long
double</code> type, which is as precise as a <code class="language-plaintext highlighter-rouge">double</code> or more, depending on the
implementation. On my machine, <code class="language-plaintext highlighter-rouge">sizeof(long double)</code> returns 16, i.e. it is a
128-bit floating-point number, also known as a quad. Trying them both out:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">double</span> <span class="nf">repeats</span><span class="p">(</span><span class="kt">double</span> <span class="n">n</span><span class="p">,</span> <span class="kt">double</span> <span class="n">d</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">fma</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">expm1</span><span class="p">(</span><span class="n">n</span> <span class="o">*</span> <span class="n">log1p</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">/</span> <span class="n">d</span><span class="p">)),</span> <span class="n">n</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">long</span> <span class="kt">double</span> <span class="nf">repeatsl</span><span class="p">(</span><span class="kt">long</span> <span class="kt">double</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="kt">double</span> <span class="n">d</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">fma</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">expm1l</span><span class="p">(</span><span class="n">n</span> <span class="o">*</span> <span class="n">log1pl</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">/</span> <span class="n">d</span><span class="p">)),</span> <span class="n">n</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This yields <strong>0.586240542</strong>40892864431 and <strong>0.586240542</strong>58953528507,
respectively, with a very slight improvement in accuracy, but still only
accurate to nine significant figures. At this point, I was out of my depth. I
tried the <a href="https://herbie.uwplse.org">Herbie</a> project which automatically
rewrites floating-point expressions and it gave some good results but the
suggestions were rather unwieldy. Let me know if you can find another way to
compute this with better accuracy — perhaps there is yet another trick
floating out there.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[Figuring out why a seemingly simple floating-point calculation returned a very wrong result.]]></summary></entry><entry><title type="html">Improving Date Formatting Performance in Node.js</title><link href="https://akr.am/blog/posts/improving-date-formatting-performance-in-node-js" rel="alternate" type="text/html" title="Improving Date Formatting Performance in Node.js" /><published>2024-05-21T00:00:00+00:00</published><updated>2024-05-21T00:00:00+00:00</updated><id>https://akr.am/blog/posts/improving-date-formatting-performance-in-node-js</id><content type="html" xml:base="https://akr.am/blog/posts/improving-date-formatting-performance-in-node-js"><![CDATA[<p>Some months ago, I was investigating why a particularly large response in a
Node.js application was taking too much time to produce. The application was an
aggregator for movie showtimes that allowed users to see relevant showtimes
based on selected filters. In some cases, many results could be returned which
made the response unusually slow. This is how I went about investigating the
issue and what came out of it.</p>

<h2 id="profiling-nodejs">Profiling Node.js</h2>

<p>Node.js has had a <code class="language-plaintext highlighter-rouge">--prof</code> option for quite some time, and it allows you to
generate a text file that shows which functions took the most time while
running your application. However, it doesn’t always work very well as much of
the CPU time spent would be marked as “unaccounted”. More recently, Node.js
provides a new option, <code class="language-plaintext highlighter-rouge">--cpu-prof</code>. When run with this flag, <code class="language-plaintext highlighter-rouge">node</code> creates a
<code class="language-plaintext highlighter-rouge">.cpuprofile</code> file that could then be loaded into Chrome DevTools and you could
visually inspect where time is being spent in your code. Using the
<a href="https://developer.chrome.com/docs/devtools/performance/nodejs">DevTools</a>, I
proceeded to profile the application, specifically looking at the <em>Bottom-Up</em>
tab, sorted by <em>Self Time</em>. This tells you which functions are doing too much
work in and of themselves (as opposed to the total time, which includes time
spent calling other functions).</p>

<h3 id="whats-slow">What’s slow</h3>

<p>It turned out there were several libraries that the application depended on
that had performance issues, but the most prominent bottleneck was date and
time formatting.</p>

<p>I used the <a href="https://moment.github.io/luxon/">Luxon</a> library for date and time
handling in this project, particularly for time zone support. In order for
Luxon to get the offset of a particular time zone for a given datetime, it
resorts to the <code class="language-plaintext highlighter-rouge">Intl.DateTimeFormat</code> API. This API provides the ability to
format any datetime value in a locale-specific manner with <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat/DateTimeFormat">many
options</a>
to customize the output. It also allows you to format a datetime in a given
time zone.</p>

<p>Since there is no native way in JavaScript to get the offset of a time zone
(<a href="https://tc39.es/proposal-temporal/docs/">yet</a>), libraries resort to this
feature to essentially format the datetime in a given time zone, then parse the
components of the formatted version and calculate the timestamp from that. By
comparing this to the known UTC timestamp, the offset can be found.
Altogether, a rather expensive operation.</p>

<h2 id="from-nodejs-to-icu">From Node.js to ICU</h2>

<p>Since the <code class="language-plaintext highlighter-rouge">format</code> method of <code class="language-plaintext highlighter-rouge">Intl.DateTimeFormat</code> is implemented using native
C++ code, it won’t show up in the Chrome profiler, only its caller. As I’m
using macOS, I usually go for the
<a href="https://help.apple.com/instruments">Instruments</a> profiler that comes with
Xcode to profile native code.</p>

<p>Using Instruments, and specifically the <em>Time Profiler</em>, showed that
<code class="language-plaintext highlighter-rouge">Intl.DateTimeFormat</code> was implemented in the V8 engine used by Node.js, and it
did little more than call the ICU library — the International Components for
Unicode. This is the library that implements the Unicode standard, and is used
by most operating systems and browsers. While generally well-optimized, it has
a <a href="https://icu.unicode.org/">vast surface area</a> so improvements can always be
made.</p>

<h2 id="making-icu-faster">Making ICU faster</h2>

<p>The first step was to write a simple benchmark in C++ that did nothing more
than format a datetime several thousand times in a loop. I ran that through
Instruments and looked at the results. Here too, the bottom-up feature (<em>Invert
Call Tree</em> in Instruments) and self time (<em>Self Weight</em>) are most useful. After
some trial and error, it turned out there were problems in essentially four
areas:</p>

<ul>
  <li>Memory allocations</li>
  <li>Floating-point operations</li>
  <li>Unoptimized hot paths</li>
  <li>Missing fast paths</li>
</ul>

<h3 id="memory-allocations">Memory allocations</h3>

<p>This was by far the biggest culprit in the slow formatting performance. ICU
would heap allocate (<code class="language-plaintext highlighter-rouge">malloc</code>) an object for every number component formatted
in a date (eg. day, hour, minute) followed by a <code class="language-plaintext highlighter-rouge">free</code> soon after. A datetime
might have six components — year, month, day, hour, minute and second. That’s
six memory allocations, times a few hundred formatting operations and it adds
up quickly. One additional allocation was also used for a calendar object that
had to be cloned per formatting operation. Eliminating all those allocations
and using the stack provided a significant performance boost right off the bat.</p>

<h3 id="floating-point-operations">Floating-point operations</h3>

<p>This was a surprise to me, but while profiling I saw <code class="language-plaintext highlighter-rouge">fmod</code> show up a lot. This
function computes the floating-point remainder, similar to <code class="language-plaintext highlighter-rouge">%</code> for integers.
It’s not surprising for calender calculations to utilize modular arithmetic
heavily, but surely there are no floats in dates? Indeed there aren’t, and
converting <code class="language-plaintext highlighter-rouge">fmod</code> to a regular <code class="language-plaintext highlighter-rouge">%</code> and ensuring integers are used throughout
provided another performance boost.</p>

<h3 id="unoptimized-hot-paths">Unoptimized hot paths</h3>

<p>When formatting a component in a datetime, such as the hour, the formatting
code needs to get the relevant information for that particular pattern
character (eg. <code class="language-plaintext highlighter-rouge">H</code> in an <code class="language-plaintext highlighter-rouge">HH</code> pattern). This was done by looping through an
array of those pattern characters until it found the matching one, and that
same index would be used for another array that contained the information. This
was changed to a simple lookup table, so it would take O(1) time to convert a
pattern character to an index rather than O(N).</p>

<h3 id="missing-fast-paths">Missing fast paths</h3>

<p>At the core of the ICU library, is the <code class="language-plaintext highlighter-rouge">UnicodeString</code> object which is used for
anything string related. In a given formatting operation, a string is
constantly appended to, and for small strings <code class="language-plaintext highlighter-rouge">UnicodeString</code> uses the stack
while for large strings it allocates on the heap. When appending, it uses
<code class="language-plaintext highlighter-rouge">memmove</code>. For date formatting, short strings are exclusively used, so it’s
prudent to add a fast path for such strings that would fit into the existing
stack buffer. In that case, avoiding the call to <code class="language-plaintext highlighter-rouge">memmove</code> and doing a simple
unrolled loop copy for small strings proved to be noticeably faster.</p>

<h3 id="final-result">Final result</h3>

<p>With all these changes, formatting is now up to twice as fast, and sometimes
more. But is it as fast as it can get? I decided to compare with the reliable
<code class="language-plaintext highlighter-rouge">strftime</code> in the C standard library. On my 2016 MBP, running <code class="language-plaintext highlighter-rouge">strftime</code> with a
simple <code class="language-plaintext highlighter-rouge">%I:%M %p</code> format (hour:minute am/pm), it can do a million formatting
operations in around 630 ms. Doing the same with ICU 74.2 (before the
optimizations, using the equivalent <code class="language-plaintext highlighter-rouge">hh:mm a</code> format), it took almost three
times as long at 1700 ms. And now, with the recently released ICU 75.1 which
includes all the mentioned optimizations, it takes around 800 ms, making it
more than twice as fast as before.</p>

<h2 id="back-to-nodejs">Back to Node.js</h2>

<p>With the recently released versions
<a href="https://nodejs.org/en/blog/release/v20.13.0">20.13</a> and
<a href="https://nodejs.org/en/blog/release/v22.1.0">22.1</a> these changes have made
their way back to Node.js. To try them out, I ran a simple benchmark comparing
20.13 (ICU 75) and 18.20 (ICU 74). The official releases from the Node.js
website (such as the ones obtained via <a href="https://github.com/nvm-sh/nvm">nvm</a>)
bundle the ICU library in the same binary. If you get Node.js from your package
manager, it might use the system ICU library which might not be the latest
version. To check which version Node.js uses, run <code class="language-plaintext highlighter-rouge">node -p
process.versions.icu</code>. Finally, let’s check if <code class="language-plaintext highlighter-rouge">Intl.DateTimeFormat</code> is faster
as expected:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">fmt</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Intl</span><span class="p">.</span><span class="nc">DateTimeFormat</span><span class="p">(</span><span class="dl">"</span><span class="s2">en-US</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
  <span class="na">hour</span><span class="p">:</span> <span class="dl">"</span><span class="s2">2-digit</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">minute</span><span class="p">:</span> <span class="dl">"</span><span class="s2">2-digit</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">hour12</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">date</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Date</span><span class="p">();</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="nx">date</span><span class="p">);</span> <span class="c1">// Warmup - the first run is much slower</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">time</span><span class="p">(</span><span class="dl">"</span><span class="s2">format</span><span class="dl">"</span><span class="p">);</span>
<span class="k">for </span><span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="nx">_000_000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="nx">fmt</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="nx">date</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">timeEnd</span><span class="p">(</span><span class="dl">"</span><span class="s2">format</span><span class="dl">"</span><span class="p">);</span>
</code></pre></div></div>

<p>Running this script, I get 2100 ms for Node.js 18 and 1300 ms for Node.js 20
— a 1.6x improvement.</p>

<p>For something closer to production use, let’s try a few thousand calls to
<code class="language-plaintext highlighter-rouge">Date.toLocaleString</code>, which includes both date and time fields, and ensure
that it’s thoroughly warmed up, simulating a long-running application:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">d</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Date</span><span class="p">();</span>
<span class="k">for </span><span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="nx">_000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="nx">d</span><span class="p">.</span><span class="nf">toLocaleString</span><span class="p">();</span> <span class="c1">// Warmup</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">time</span><span class="p">(</span><span class="dl">"</span><span class="s2">format</span><span class="dl">"</span><span class="p">);</span>
<span class="k">for </span><span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="nx">_000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="nx">d</span><span class="p">.</span><span class="nf">toLocaleString</span><span class="p">();</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">timeEnd</span><span class="p">(</span><span class="dl">"</span><span class="s2">format</span><span class="dl">"</span><span class="p">);</span>
</code></pre></div></div>

<p>In this case, we get a 2x improvement, with Node.js 18 taking 36 ms and Node.js
20 taking just 18 ms, making it twice as fast.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[A look at how I was able to improve the performance of date formatting in Node.js and the ICU library.]]></summary></entry><entry><title type="html">Unix is not Linux</title><link href="https://akr.am/blog/posts/unix-is-not-linux" rel="alternate" type="text/html" title="Unix is not Linux" /><published>2022-08-21T00:00:00+00:00</published><updated>2022-08-21T00:00:00+00:00</updated><id>https://akr.am/blog/posts/unix-is-not-linux</id><content type="html" xml:base="https://akr.am/blog/posts/unix-is-not-linux"><![CDATA[<p>Too often on the internet, tutorials and guides are written of POSIX and Unix
tools that implicitly assume a Linux installation, and more specifically a
GNU-based one. This has many implications when it comes to everything from the
behavior of the shell, its utilities, and even the C standard library. While
the dominance of Linux might mean that one can ignore this distinction, it is
still useful to be aware of it. I’ve outlined some of the more prominent
discrepancies below.</p>

<h2 id="bash-is-not-the-standard-shell">Bash is not the standard shell</h2>

<p>The default shell that is present on all Unix systems is <code class="language-plaintext highlighter-rouge">sh</code>, not <code class="language-plaintext highlighter-rouge">bash</code>. The
language used in the portable <code class="language-plaintext highlighter-rouge">sh</code> shell is described in the
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html">POSIX</a>
standard. However, on many Linux systems <code class="language-plaintext highlighter-rouge">sh</code> is linked to <code class="language-plaintext highlighter-rouge">bash</code> - this makes
bash operate in a more compatible way with the standard, but still allows
certain <code class="language-plaintext highlighter-rouge">bash</code> features that may not work on other systems. When in doubt,
refer to the standard.</p>

<h2 id="long-options-are-not-unix">Long options are not Unix</h2>

<p>Many utilities accept both a long option, eg. <code class="language-plaintext highlighter-rouge">grep --count</code> with a double
hyphen, in addition to a short option, eg. <code class="language-plaintext highlighter-rouge">grep -c</code>. The former is a GNU
invention, and they generally do not exist on other systems, such as BSDs. In
fact, the standard <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html"><code class="language-plaintext highlighter-rouge">getopts</code>
utility</a>,
and corresponding <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html"><code class="language-plaintext highlighter-rouge">getopt</code> C
function</a>
only support the short style.</p>

<h2 id="make-isnt-gnu-make">Make isn’t GNU make</h2>

<p>The version of make specified by POSIX is much more limited than the GNU
version. This one is harder to deal with as the specification is lacking in
many aspects, particularly any kind of logical or conditional operators. You
can workaround this by moving some logic to a <code class="language-plaintext highlighter-rouge">configure</code> script that generates
another Makefile that is then included by the main one. Further, the BSD
<code class="language-plaintext highlighter-rouge">make</code>s have a completely different syntax than the GNU one for things like
conditionals. Luckily, if your focus is on macOS and Linux only, you can get
away with depending on GNU features as macOS’s make is based on the GNU one.</p>

<h2 id="the-c-compiler-is-not-gcc">The C compiler is not GCC</h2>

<p>This is related to the previous point, as it often comes up in Makefiles. When
referring to the C compiler in that context, it is better to use the implicit
<code class="language-plaintext highlighter-rouge">$(CC)</code> variable, and when compiling C++ code, to use the <code class="language-plaintext highlighter-rouge">$(CXX)</code> variable.
Most BSD systems have now switched to Clang as the default compiler and do not
provide a <code class="language-plaintext highlighter-rouge">gcc</code> binary. When referring to the C and C++ compilers outside of
Makefiles, the <code class="language-plaintext highlighter-rouge">cc</code> and <code class="language-plaintext highlighter-rouge">c++</code> commands are reliable and work across systems.</p>

<h2 id="gnu-is-not-linux">GNU is not Linux</h2>

<p>This is slightly different, but even GNU interfaces are not necessarily the
ones present on a Linux system. The Alpine Linux distribution for example,
popular as a base in Docker containers due to its light weight, forgoes the GNU
C Library for musl, and uses BusyBox instead of the GNU utilities. Therefore,
one would be well advised to try to stick to portable interfaces even if
targeting solely Linux systems.</p>

<h2 id="unix-is-not-unix">Unix is not UNIX</h2>

<p>Finally, even Unix is not UNIX. The latter is a trademark that requires
certification by <a href="https://www.opengroup.org/">The Open Group</a>. <a href="https://www.opengroup.org/openbrand/register/">Certified
operating systems</a>, the most
well-known of which is macOS, are guaranteed to follow the <a href="https://pubs.opengroup.org/onlinepubs/9699919799/">UNIX
specification</a>. That said,
most Unix-like operating systems, including the BSDs, as well as the GNU tools
make a strong effort to stick to the standard as much as possible.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[Things to be aware of when discussing the shell and Unix programs.]]></summary></entry><entry><title type="html">Installing FreeBSD on Oracle Cloud</title><link href="https://akr.am/blog/posts/installing-freebsd-on-oracle-cloud" rel="alternate" type="text/html" title="Installing FreeBSD on Oracle Cloud" /><published>2022-07-31T00:00:00+00:00</published><updated>2022-07-31T00:00:00+00:00</updated><id>https://akr.am/blog/posts/installing-freebsd-on-oracle-cloud</id><content type="html" xml:base="https://akr.am/blog/posts/installing-freebsd-on-oracle-cloud"><![CDATA[<p>Oracle Cloud has a fairly generous <a href="https://docs.oracle.com/en-us/iaas/Content/FreeTier/freetier.htm">free
tier</a>, but
one thing it does not include is the ability to use <a href="https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/managingcustomimages.htm">custom
images</a>,
which require a paid account. This guide will show you how to install FreeBSD
(or any custom image) using an alternative method.</p>

<p><em>Note: For arm64 instances, a FreeBSD image is currently available under the
partner images source when creating the instance, so you can skip the steps
below unless you need a specific version.</em></p>

<h2 id="install">Install</h2>

<ol>
  <li>
    <p><a href="https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/launchinginstance.htm">Create two
instances</a>
in Oracle Cloud using the default image. We will be using one for the FreeBSD
server, and a temporary one for the installation process. Make sure to specify
your SSH public key when creating the temporary instance.</p>
  </li>
  <li>
    <p>On the FreeBSD instance page, stop it if it is running, and <a href="https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/detachingabootvolume.htm">detach the boot
volume</a>.</p>
  </li>
  <li>
    <p>On the temporary instance page, attach the FreeBSD instance’s boot volume as
a block volume. Make sure to select <em>Paravirtualized</em> as the attachment type.</p>
  </li>
  <li>
    <p>SSH into the temporary instance. Check the path of the newly attach volume.
You can do this by running <code class="language-plaintext highlighter-rouge">lsblk</code> and seeing which one has nothing mounted on
it (it will most likely be <code class="language-plaintext highlighter-rouge">/dev/sdb</code>).</p>

    <p>Then, run the following command to install a raw FreeBSD image onto the volume.
Modify the release version and the volume path as needed.</p>

    <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://download.freebsd.org/ftp/releases/VM-IMAGES/13.1-RELEASE/amd64/Latest/FreeBSD-13.1-RELEASE-amd64.raw.xz | xz <span class="nt">-dc</span> | <span class="nb">sudo dd </span><span class="nv">of</span><span class="o">=</span>/dev/sdb <span class="nv">bs</span><span class="o">=</span>1M <span class="nv">conv</span><span class="o">=</span>fdatasync
</code></pre></div>    </div>
  </li>
  <li>
    <p>Once the process is complete, detach the block volume from the temporary
instance.</p>
  </li>
  <li>
    <p>Re-attach the FreeBSD instance’s boot volume and start the instance.</p>
  </li>
</ol>

<h2 id="setup">Setup</h2>

<p>Once the FreeBSD instance has booted, we can now configure it.</p>

<ol>
  <li>
    <p>In the instance’s page, launch a Cloud Shell connection (you may need to
press Enter in the console if it appears to be stuck for a while). This will
give us preliminary access and allow us to enable SSH on the new server.</p>
  </li>
  <li>
    <p>Run <code class="language-plaintext highlighter-rouge">passwd</code> to set a password for the <code class="language-plaintext highlighter-rouge">root</code> user - make sure to choose a
strong one.</p>
  </li>
  <li>
    <p>Create a new user using <code class="language-plaintext highlighter-rouge">adduser</code>. When asked to invite the user to other
groups, enter <code class="language-plaintext highlighter-rouge">wheel</code> to give the user root privileges.</p>
  </li>
  <li>
    <p>Enable and start the SSH service by running <code class="language-plaintext highlighter-rouge">service sshd enable</code>, followed
by <code class="language-plaintext highlighter-rouge">service sshd start</code>.</p>
  </li>
  <li>
    <p>Copy your public key from your local machine using <code class="language-plaintext highlighter-rouge">ssh-copy-id
user@freebsd-instance-ip</code>.</p>
  </li>
  <li>
    <p>Now you can SSH into your new FreeBSD server and do any additional setup.
Enjoy!</p>
  </li>
</ol>

<h2 id="additional-setup">Additional Setup</h2>

<h3 id="ipv6">IPv6</h3>

<p>To get IPv6 working on a FreeBSD instance on Oracle Cloud, we need support for
DHCPv6, which is not yet available in FreeBSD’s DHCP client <code class="language-plaintext highlighter-rouge">dhclient</code>. To get
around this, we can use the <code class="language-plaintext highlighter-rouge">dual-dhclient-daemon</code> package which supports
dual-stack DHCP. You can install and enable it with the following:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pkg <span class="nb">install </span>dual-dhclient-daemon
sysrc <span class="nv">dhclient_program</span><span class="o">=</span>/usr/local/sbin/dual-dhclient
</code></pre></div></div>

<p>You will also need to <a href="https://www.51sec.org/2021/09/20/enable-ipv6-on-oracle-cloud-infrastructure/">enable
IPv6</a>
for your instance in the Oracle Cloud settings.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[A short guide to installing FreeBSD on Oracle Cloud.]]></summary></entry><entry><title type="html">A Simple Setup for C and C++</title><link href="https://akr.am/blog/posts/a-simple-setup-for-c-and-cpp" rel="alternate" type="text/html" title="A Simple Setup for C and C++" /><published>2022-02-25T00:00:00+00:00</published><updated>2022-02-25T00:00:00+00:00</updated><id>https://akr.am/blog/posts/a-simple-setup-for-c-and-cpp</id><content type="html" xml:base="https://akr.am/blog/posts/a-simple-setup-for-c-and-cpp"><![CDATA[<p>When developing C and C++ programs, the default compiler and debugger options
can be very bare bones, and not particularly helpful while developing. This can
lead to many unnecessary footguns and friction during development. Typically,
a build system is introduced, whether via an IDE such as Visual Studio or using
some form of makefiles. For quick experimentation and small programs, these can
be heavy, slow or complicated. With some minor changes, we can get much more
help from the compiler to write safer programs from the get-go, all while
reducing friction throughout the process.</p>

<h2 id="compiler">Compiler</h2>

<p>The compiler comes with a lot of useful flags to catch common problems. A good
set of defaults is the following:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">-Wall</code>, <code class="language-plaintext highlighter-rouge">-Wextra</code> - enable many useful warnings</li>
  <li><code class="language-plaintext highlighter-rouge">-pedantic</code> - warn when using non-standard language extensions</li>
  <li><code class="language-plaintext highlighter-rouge">-fsanitize=address,leak,undefined</code> - catch issues relating to memory and
undefined behavior</li>
  <li><code class="language-plaintext highlighter-rouge">-D_LIBCPP_DEBUG=1</code> (for Clang’s <code class="language-plaintext highlighter-rouge">libc++</code> - used by default on macOS and FreeBSD) or <code class="language-plaintext highlighter-rouge">-D_GLIBCXX_DEBUG</code>
(GCC’s <code class="language-plaintext highlighter-rouge">libstdc++</code> - used on Linux) - catch undefined behavior when using the C++ library</li>
</ul>

<p>Now let’s try a quick example to see how these can help:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">int</span> <span class="n">values</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="o">-</span><span class="mi">5</span><span class="p">,</span> <span class="mi">24</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">96</span><span class="p">,</span> <span class="mi">23</span> <span class="p">};</span>
	<span class="kt">int</span> <span class="n">total</span><span class="p">;</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">total</span> <span class="o">+=</span> <span class="n">total</span><span class="o">*</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
	<span class="p">}</span>
	<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">total</span><span class="p">);</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Save this file as <code class="language-plaintext highlighter-rouge">example.c</code>. You can then compile it by running
<code class="language-plaintext highlighter-rouge">make example</code> - this works because <code class="language-plaintext highlighter-rouge">make</code> will auto-detect the C file and
use a pre-defined implicit rule to build an <code class="language-plaintext highlighter-rouge">example</code> executable. Then,
run the file with <code class="language-plaintext highlighter-rouge">./example</code>. You will see that it will compile and run
without any issues, and print a total like <code class="language-plaintext highlighter-rouge">1503498210</code>. Now let’s try
compiling with some warnings:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make example CFLAGS="-Wall -Wextra -pedantic"
</code></pre></div></div>

<p>We get one warning:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>example.c:8:3: warning: variable 'total' is uninitialized when used here [-Wuninitialized]
                total += total*values[i] + 1;
                ^~~~~
example.c:6:11: note: initialize the variable 'total' to silence this warning
        int total;
                 ^
                  = 0
1 warning generated.
</code></pre></div></div>

<p>Easy enough to fix by setting <code class="language-plaintext highlighter-rouge">total</code> to 0 when declaring it. When compiling
again, we get no warnings and our little program is seemingly perfect!</p>

<h3 id="sanitizers">Sanitizers</h3>

<p>Unfortunately, compilers do not catch everything at compile-time. Sometimes
the code needs to run to detect other kinds of problems. This is known as
dynamic analysis, as opposed to static analysis, and is done with the help of
sanitizers. Let’s add a couple more flags to our build:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -B example CFLAGS="-Wall -Wextra -pedantic \
	-fsanitize=address,leak,undefined"
</code></pre></div></div>

<p><em>Note</em>: If you get a compile error, it might be because you need to install
additional libraries for these to work, namely <code class="language-plaintext highlighter-rouge">libasan</code>, <code class="language-plaintext highlighter-rouge">liblsan</code> and
<code class="language-plaintext highlighter-rouge">libubsan</code>, which you can do via your package manager.</p>

<p>Compiling again will not yield any new warnings. However, run the program now
and you’ll discover some new things:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>example.c:8:17: runtime error: signed integer overflow: 420559700 * 23 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:8:17 in
example.c:8:18: runtime error: index 9 out of bounds for type 'int [9]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:8:18 in
=================================================================
...
</code></pre></div></div>

<p>The sanitizers have detected two issues in our code. Our bounds check is faulty,
it should be <code class="language-plaintext highlighter-rouge">i &lt; 9</code> rather than <code class="language-plaintext highlighter-rouge">i &lt; 10</code>, and our result had been overflowing
due to using an <code class="language-plaintext highlighter-rouge">int</code> instead of a <code class="language-plaintext highlighter-rouge">long</code> for the total. Fixing both those issues,
and adjusting the <code class="language-plaintext highlighter-rouge">printf</code> format string to use <code class="language-plaintext highlighter-rouge">%ld</code> to match the new type,
we compile and run again. This time we get the correct result, <code class="language-plaintext highlighter-rouge">10093432801</code>.</p>

<h3 id="c">C++</h3>

<p>While sanitizers are useful, they can come with significant overhead to the
program, especially the address sanitizer. When writing modern C++, where use
of raw arrays and pointers can be less frequent, it might be helpful to just
check for out-of-bounds accesses at the library level. The C++ libraries
provided allow this by defining <code class="language-plaintext highlighter-rouge">_LIBCPP_DEBUG</code>/<code class="language-plaintext highlighter-rouge">_GLIBCXX_DEBUG</code>. <code class="language-plaintext highlighter-rouge">libstdc++</code>
also provides <code class="language-plaintext highlighter-rouge">_GLIBCXX_DEBUG_PEDANTIC</code> for further checks of non-standard
behavior.</p>

<p>Let’s rewrite our program in C++:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;vector&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
	<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">values</span> <span class="o">=</span> <span class="p">{</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="o">-</span><span class="mi">5</span><span class="p">,</span> <span class="mi">24</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">96</span><span class="p">,</span> <span class="mi">23</span> <span class="p">};</span>
	<span class="kt">int</span> <span class="n">total</span><span class="p">;</span>
	<span class="c1">// Use a range-based for loop!</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">total</span> <span class="o">+=</span> <span class="n">total</span><span class="o">*</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
	<span class="p">}</span>
	<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">total</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Compile the program with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make example2 CPPFLAGS="-D_LIBCPP_DEBUG=1 -D_GLIBCXX_DEBUG" \
	CXXFLAGS="-std=c++20 -Wall -Wextra -pedantic -fsanitize=undefined"
</code></pre></div></div>

<p>When running our program, we get this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>example2.cpp:9:17: runtime error: signed integer overflow: 420559700 * 23 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example2.cpp:9:17 in
/usr/include/c++/v1/vector:1549: _LIBCPP_ASSERT '__n &lt; size()' failed. vector[] index out of bounds
Abort trap: 6
</code></pre></div></div>

<p>Now we can resolve the overflow and out-of-bounds errors as before.</p>

<h2 id="debugger">Debugger</h2>

<p>Clang comes with a powerful debugger, <code class="language-plaintext highlighter-rouge">lldb</code>, that also includes a convenient
GUI. If you use GCC, its respective debugger, <code class="language-plaintext highlighter-rouge">gdb</code>, also comes with a TUI mode
that can be accessed via <code class="language-plaintext highlighter-rouge">gdb -tui</code>. Both debuggers are quite similar and you
can find a helpful command map on the <a href="https://lldb.llvm.org/use/map.html">LLDB
website</a>. To use the debugger, rebuild the
executable with the <code class="language-plaintext highlighter-rouge">-g</code> flag to generate debug information:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -B example CFLAGS="-g -Wall -Wextra -pedantic -fsanitize=address,leak,undefined"
</code></pre></div></div>

<p>Then, enter the debugger by running <code class="language-plaintext highlighter-rouge">lldb example</code>. Once in the REPL, you’ll
need to run the program. Enter <code class="language-plaintext highlighter-rouge">r</code> or <code class="language-plaintext highlighter-rouge">run</code> to do so. This will cause it to
run to completion, which is not exactly what we want. This time, enter
<code class="language-plaintext highlighter-rouge">b main</code> to create a breakpoint at <code class="language-plaintext highlighter-rouge">main</code> so the debugger pauses just before
our code runs. Then, run again. The debugger will now pause at the
first line of our program. You can step to the next line using <code class="language-plaintext highlighter-rouge">n</code>. You can
view the current variables at any time using <code class="language-plaintext highlighter-rouge">v</code>, or a specific variable by specifying it, eg.
<code class="language-plaintext highlighter-rouge">v total</code>. To continue till the next breakpoint or the end, use <code class="language-plaintext highlighter-rouge">c</code>.</p>

<h3 id="gui">GUI</h3>

<p>It might be helpful to use the debugger’s GUI instead. Right after running the
program with <code class="language-plaintext highlighter-rouge">r</code>, enter <code class="language-plaintext highlighter-rouge">gui</code> to go to GUI mode. The same keyboard shortcuts
will work there too, and there will be some additional ones that can be seen
with <code class="language-plaintext highlighter-rouge">h</code>. Once done, press Escape to exit the GUI.</p>

<h3 id="debugging-an-executable-that-uses-stdin">Debugging an executable that uses stdin</h3>

<p>One thing you might notice is that there is no way to pass stdin to a debugged
program. There’s a simple fix for this. Pass a file to lldb like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lldb example 3&lt; input.txt
</code></pre></div></div>

<p>Then, run <code class="language-plaintext highlighter-rouge">set set target.input-path /dev/fd/3</code> at the beginning of the debug
session.</p>

<h2 id="simplifying-the-process">Simplifying the process</h2>

<p>There’s a lot to take in here, and too many steps in some cases. We can simplify
the process with some configuration. First, add the following aliases to your
shell’s profile:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>alias build="make \
CPPFLAGS="-D_LIBCPP_DEBUG=1 -D_GLIBCXX_DEBUG" \
CFLAGS=\"-g -Wall -Wextra -pedantic -fsanitize=address,leak,undefined\" \
CXXFLAGS=\"-std=c++20 -g -Wall -Wextra -pedantic -fsanitize=undefined\""

alias debug="lldb -o r"
</code></pre></div></div>

<p>You can tweak this alias to your liking. For example, you can use <code class="language-plaintext highlighter-rouge">CC</code> and <code class="language-plaintext highlighter-rouge">CXX</code>
to specify your preferred C and C++ compiler respectively.</p>

<p>Then, add the following to <code class="language-plaintext highlighter-rouge">~/.lldbinit</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set set target.input-path /dev/fd/3
b main
</code></pre></div></div>

<p>This will automatically set the default input path and create a breakpoint at
<code class="language-plaintext highlighter-rouge">main</code> whenever you launch <code class="language-plaintext highlighter-rouge">lldb</code>. This leaves only one command, actually
running the program via <code class="language-plaintext highlighter-rouge">r</code> which is what the <code class="language-plaintext highlighter-rouge">debug</code> alias above achieves.</p>

<p>Now, it’s as simple as <code class="language-plaintext highlighter-rouge">build example</code> and <code class="language-plaintext highlighter-rouge">debug example</code> to quickly build
and debug programs. You can pass additional files and libraries to the <code class="language-plaintext highlighter-rouge">build</code>
command via the <code class="language-plaintext highlighter-rouge">LDLIBS</code> variable. For release builds, once you’re happy
with your program, call <code class="language-plaintext highlighter-rouge">make</code> directly with appropriate flags, such as <code class="language-plaintext highlighter-rouge">-O2</code>
for optimization, and avoid using the debug and sanitizer flags. Once your
project has become larger than a few files, and especially if your project is
shared with other people, it might be worth adding explicit targets to your
build system that perform similar functions to your local aliases.</p>

<h2 id="documentation">Documentation</h2>

<p>The easiest way to get documentation for any C standard library function is to
use <code class="language-plaintext highlighter-rouge">man</code>. For example, to see the documentation for <code class="language-plaintext highlighter-rouge">printf</code>, use <code class="language-plaintext highlighter-rouge">man 3
printf</code>. You can also view the list of functions in a header, eg. <code class="language-plaintext highlighter-rouge">man 3
stdio</code>. For C++, you might need to install a libstdc++ docs package. This will
then allow you to look up documentation for C++ standard library namespaces and
classes, eg. <code class="language-plaintext highlighter-rouge">man std::string</code>.</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[A few steps to allow quick development and debugging of C and C++ programs.]]></summary></entry><entry><title type="html">How to Break Software</title><link href="https://akr.am/blog/posts/how-to-break-software" rel="alternate" type="text/html" title="How to Break Software" /><published>2021-03-08T00:00:00+00:00</published><updated>2021-03-08T00:00:00+00:00</updated><id>https://akr.am/blog/posts/how-to-break-software</id><content type="html" xml:base="https://akr.am/blog/posts/how-to-break-software"><![CDATA[<p>Here are some tips to help you break software with modest effort and no special
tools:</p>

<ol>
  <li>Change your last name to
<a href="https://twitter.com/RachelTrue/status/1365461618977476610">True</a> or, for
nihilists, <a href="https://www.wired.com/2015/11/null/">Null</a></li>
  <li>Be
<a href="https://github.com/python/cpython/blob/fff3c28052e6b0750d6218e00acacd2fded4991a/Lib/logging/handlers.py">Turkish</a></li>
  <li>Move to
<a href="https://hitchdev.com/strictyaml/why/implicit-typing-removed/">Norway</a></li>
  <li>Be a horse <a href="https://bugs.launchpad.net/mahara/+bug/1615280">person</a></li>
  <li>Be born before <a href="https://bugs.php.net/bug.php?id=17123">1970</a></li>
  <li>Type
<a href="https://nakedsecurity.sophos.com/2013/08/30/apple-apps-turned-upside-down-writing-right-to-left-youre-only-6-characters-from-a-crash/">anything</a>
that isn’t a
<a href="https://nakedsecurity.sophos.com/2018/02/20/apple-fixes-that-1-character-to-crash-your-mac-and-iphone-bug/">printable</a>
ASCII
<a href="https://nakedsecurity.sophos.com/2020/04/28/iphone-word-of-death-could-crash-your-phone-what-you-need-to-know/">character</a></li>
  <li>Use the wrong printable ASCII
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1601905">character</a></li>
  <li>Practice good password
<a href="https://thenextweb.com/microsoft/2012/09/21/this-ridiculous-microsoft-longer-accepts-long-passwords-shortens/">hygiene</a></li>
</ol>

<p><a href="https://github.com/mohd-akram/blog/issues/3">Let us know</a> <em>your</em> favorite ways
to break software!</p>]]></content><author><name>Mohamed Akram</name></author><summary type="html"><![CDATA[A guide on how to break software with no special tools.]]></summary></entry></feed>