<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/assets/feed.xslt"?>
<rss version="2.0"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
<title>Web logs of McSinyx</title>
<link>https://lumvok.store</link>
<atom:link href="https://lumvok.store/feed.xml" rel="self" type="application/rss+xml"/>
<description>Random write-ups packed with pop culture references</description>
<copyright><![CDATA[🄯 2019–2024 Nguyễn Gia Phong under CC BY-SA 4.0]]></copyright>
<language>en</language>
<generator>Franklin</generator>
<item>
  <title>GNU as a Router, the Canonical Way</title>
  <link>https://lumvok.store/blog/route/index.html</link>
  <guid>https://lumvok.store/blog/route/index.html</guid>
  <description>How to set up an Ubuntu system as a router</description>
  <category>fun</category><category>recipe</category><category>net</category>
  <pubDate>Sat, 03 Aug 2024 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="gnu_as_a_router_the_canonical_way">GNU as a Router, the Canonical Way</h1>
<p>A while ago I noticed that my ISP leases IPv4 addresses out indefinitely. It was everything I&#39;d ever wanted and I gotta seize it to truly <em>self</em>-host. As an experiment, I started on something cheaper, like a single-board compooter. In 2024, support for general-purpose RISC-V chips began to ripen, so naturally due to FOMO, I bought a board with JH-7110.  Boy, was that a mistake&#33; While the bootloaders&#39; support had been well <a href="https://rvspace.org/en/project/JH7110_Upstream_Plan">upstreamed</a>, certain essential features like PCIe &#40;for NVMe&#41; has yet to reached a mainline Linux release, even worse so on the BSDs.  I ended up flashing the <em>only</em> distribution with official support <a href="https://loa.loang.net/chung/D16T24MXDP3T.3BR1X04I90CGT@guix/t">at the time</a>, Ubuntu.</p>
<p>Funny enough, after over a decade of daily driving <a href="https://www.gnu.org">GNU</a>, twas the first time I installed Ubuntu on a machine of my own.  At the time of writing, the reason for the was more apparent than ever: Canonical had been forcing Snap<sup id="fnref:snap">[1]</sup> down the users&#39; throat, even on the <em>server</em> edition. Thankfully Snap was still managed by APT and twas easy enough to remove prevent it from coming back.  Another annoyance was the lack of manual pages in the minimized installation <em>and</em> that the official way to enable them is through a script that also install other bloats SMFH &#40;the script is quite short and the actually necessary commands can be trivially found, I&#39;d rather they&#39;re documented instead&#41;.</p>
<p>That being said, not everything Ubuntu includes due to NIH is bad. Unity &#40;not the game engine that&#39;s proprietary like Snap server&#41; was loved by many; and this article is basically an appreciation post for some others: <a href="https://netplan.io">Netplan</a> and <a href="https://launchpad.net/ufw">ufw</a>.  Before diving in, lemme finish the story to give you the full context of this setup. The SBC is the VisionFive 2 which is blessed with plenty of IO:</p>
<ul>
<li><p>8 GB of memory</p>
</li>
<li><p>4 USB 3.0 type-A ports</p>
</li>
<li><p>2 RJ45 ports &#40;1 Gb and 100/10 Mb&#41;</p>
</li>
<li><p>1 M.2 slot &#40;I used this as an excuse to buy a larger SSD and put the old 256 GB one here&#41;</p>
</li>
<li><p>1 eMMC slot<sup id="fnref:mmc">[2]</sup> &#40;eMMC are cheap, got one also with 256 GB&#41;</p>
</li>
<li><p>1 TF slot</p>
</li>
<li><p>40 pin GP&#40;and predefined-purpose&#41;IO</p>
</li>
<li><p>Other stuff for interfacing with humen like HDMI, audio jack, etc.</p>
</li>
</ul>
<p>Initially, my plan for the SBC was to host services unlisted on the <a href="https://loang.net">loang network</a>.  Official services were not considered because my home network has no IPv6 and sometimes I&#39;ll like to have most of the bandwidth for meself.  Shortly afterwards, I also purchased a somewhat beefy desktop compooter with even more I/O, especially a bunch of SATA, which are a lot more attractive than connecting hard di&#42;ks via USB.  On the other hand, the SBC barely consume any electricity, well under 10 W with the NVMe drive, a Wi-Fi dongle and a fan connected.  Since it cost virtually nothing to keep it up 24/7, I decided to hand it the following two tasks:</p>
<ul>
<li><p>Reverse proxying services running on more powerful machines in the local network.</p>
</li>
<li><p>Acting as a virtual router between nodes I manage. This is particularly useful for tunneling to my work network and accessing the servers, allowing me to work remotely with low latency.</p>
</li>
</ul>
<p>Setting up the VPN with Wireguard was relatively easy, so I assumed swapping the SBC for the home router couldn&#39;t be too hard.  Once again, I <a href="https://antifandom.com/how-i-met-your-mother/wiki/Knight_Vision">chose poorly</a>, this little project&#39;d costed me so many sleepless nights so I figured I should note down what I learned here in case it can save someone else from the same pain.  <strong>Do not take inspiration from this&#33;</strong></p>
<div class="franklin-toc"><ol><li>Connecting to the Internet</li><li>Local Networking</li><li>Wireless Access Point</li><li>Name Resolution</li></ol></div>
<h2 id="connecting_to_the_internet">Connecting to the Internet</h2>
<p>My landlord handles the contract with the ISP so I don&#39;t know the details of the subscription, but there&#39;s certainly no IPv6 nor any static IPv4 address. Bandwidth to datacenters in the region is approximately 100 Mb/s and the wall socket connects to a Cat 5e cable.  I know about the latter because whatever dumb ass did the last maintenance wired that to another short one dangling from the wall socket<sup id="fnref:futa">[3]</sup>, and after getting stabbed in the eyes for months I finally to open it up and made the socket a proper socket.</p>
<p>It would not make the slightest of a difference but I connect the SBC&#39;s 1 Gb port &#40;identified in Ubuntu as end0&#41; to the Internet and the slower one &#40;end1&#41; to my desktop on the local network. Thankfully no <a href="https://docs.fsfe.org/en/teams/router-freedom-tech-wiki">special setup</a> was needed and here is the entire Netplan configuration to connect to the outside world:</p>
<pre><code class="language-yaml">network:
  ethernets:
    end0:
      dhcp4: true
  renderer: networkd
  version: 2</code></pre>
<h2 id="local_networking">Local Networking</h2>
<p>For simplicity&#39;s sake, I decided to use the same subnet for both Ethernet and Wi-Fi under a bridge br0, where addressing and routing is configured:</p>
<pre><code class="language-yaml">network:
  bridges:
    br0:
      addresses:
        - 192.168.147.254/25
      interfaces:
        - end1
      routes:
        - from: 192.168.147.128/25
          on-link: true
          to: 0.0.0.0/0
          type: nat
          via: 192.168.147.254
  ethernets:
    end1:
      dhcp4: false</code></pre>
<p>As Netplan doesn&#39;t configure any DHCP server, that&#39;s done separately by udhcpd from busybox:</p>
<pre><code class="language-plaintext">interface br0
start 192.168.147.128
end 192.168.147.253
max_leases 126
option subnet 255.255.255.128
option router 192.168.147.254</code></pre>
<p>I couldn&#39;t seem to get a concrete information on the ports used by DHCP so I open the firewall for UDP on both 67 and 68 &#40;I swear this isn&#39;t an engagement bait to test out the new mailing list&#41;:</p>
<pre><code class="language-plaintext">ufw allow in on br0 to any port 67 proto udp
ufw allow in on br0 to any port 68 proto udp</code></pre>
<h2 id="wireless_access_point">Wireless Access Point</h2>
<p>Thanks to systemd, the Wi-Fi dongle is recognized as wlx600dd0g8b33f. Yes, that abomination of a name includes the chip&#39;s full MAC address. That being said, I&#39;d like to stick to the basis of a systemd/Linux distro. Netplan doesn&#39;t support Wi-Fi hotspot with systemd-networkd but NetworkManager, so the interface had thus to be declared as Ethernet:</p>
<pre><code class="language-yaml">network:
  bridges:
    br0:
      interfaces:
        - wlx600dd0g8b33f
  ethernets:
    wlx600dd0g8b33f:
      dhcp4: false</code></pre>
<p>Actual wireless connectivity is handled by hostapd:</p>
<pre><code class="language-ini">interface&#61;wlx600dd0g8b33f
bridge&#61;br0ssid&#61;YΦ
utf8_ssid&#61;1
country_code&#61;KR
channel&#61;6
ieee80211d&#61;1
ieee80211h&#61;1
ieee80211n&#61;1
hw_mode&#61;g
wmm_enabled&#61;1wpa&#61;2
wpa_pairwise&#61;TKIP
wpa_passphrase&#61;just enter random characters</code></pre>
<h2 id="name_resolution">Name Resolution</h2>
<p>My ISP is <a href="https://www.tomshardware.com/tech-industry/cyber-security/south-korean-telecom-company-attacks-torrent-users-with-malware-over-600000-people-report-missing-files-strange-folders-and-disabled-pcs">known to be evil</a> so I&#39;d rather rely on more reputable resolvers like <a href="https://opennic.org">OpenNIC</a>, which also offers free-of-charge &#40;&#33;&#41; domain names. Most of their <a href="https://servers.opennic.org">tier 2</a> servers are located on the other side of the globe &#40;200 to 300 ms RTT&#41;, so a local cache is almost required.  <a href="https://pymumu.github.io/smartdns/en">SmartDNS</a> seems to be the best fit for this purpose, as it queries upstream servers simultaneously and also check for the IP with the lowest RTT among the results. Since I don&#39;t trust my ISP, connections to the upstream servers are encrypted:</p>
<pre><code class="language-plaintext">bind :53@br0
server-tls 51.254.162.59 -host-name ns1-dot.iriseden.fr
server-tls 202.61.197.122 -host-name dns.furrydns.de
server-tls 80.152.203.134 -host-name dot.kekew.info
server-tls 178.201.248.159 -host-name dot.kekew.info
server-tls 178.201.248.160 -host-name dot.kekew.info
server-tls 95.216.99.249 -host-name dns.froth.zone</code></pre>
<p>For the router itself, the nameserver is set in /etc/resolv.conf and Netplan is told not to change it:</p>
<pre><code class="language-yaml">network:
  ethernets:
    end0:
      dhcp4-use-dns: false</code></pre>
<p>After ufw is configured to allow UDP traffic in port 53 on br0, udhcpd is instructed to advertise this local DNS server:</p>
<pre><code class="language-plaintext">option dns 192.168.147.254</code></pre>
<p>I might consider blocking ads at the domain-name level someday, but for now uBlock Origin is working well enough on my systems and I rarely have people over, especially not for looking at <em>their</em> electronic devices.</p>
<table class="fndef" id="fndef:snap">
    <tr>
        <td class="fndef-backref">[1]</td>
        <td class="fndef-content">Not <a href="http://snap.berkeley.edu">the good one</a>.</td>
    </tr>
</table><table class="fndef" id="fndef:mmc">
    <tr>
        <td class="fndef-backref">[2]</td>
        <td class="fndef-content">Innovation&#39;s gone full circle, <em>eMMC</em> is short for <em>embedded MMC</em>.</td>
    </tr>
</table><table class="fndef" id="fndef:futa">
    <tr>
        <td class="fndef-backref">[3]</td>
        <td class="fndef-content">Basically a futanari of the RJ45 world.</td>
    </tr>
</table>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/route@cnx%3E&Subject=Re: GNU as a Router, the Canonical Way">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/route@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/route/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Best Ways to Watch YouTube Videos</title>
  <link>https://lumvok.store/blog/youtu/index.html</link>
  <guid>https://lumvok.store/blog/youtu/index.html</guid>
  <description>Do you know de wey?  Lemme show you de wey&#33;</description>
  <category>fun</category><category>recipe</category><category>net</category><category>nix</category><category>clipboard</category>
  <pubDate>Wed, 17 Jan 2024 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="best_ways_to_watch_youtube_videos">Best Ways to Watch YouTube Videos</h1>
<p>In today&#39;s episode of <a href="https://www.alexmolas.com/2023/07/15/nobody-cares-about-your-blog.html">guides nobody asked for and likely having been covered by someone more qualified</a>, lemme show you the <em>correct</em> ways to view videos hosted on YouTube and other hostile, tracker-riddled hellscapes.  Whilst I despise Google&#39;s mass surveillance practices, it stores a large proportion of culturally significant videos and clips that would be difficult to mirror to user-respecting services due to copyright.  Hell, even YouTube doesn&#39;t have the right to distribute many of them in the first place.</p>
<p>Because of YouTube&#39;s circumvention of advertisement blockers, the ad-blocking arm race finally caught mainstream media attention and tis kool to talk about that now.  Hence I&#39;m happy to jump on the bandwagon, albeit a wee bit late, but this ain&#39;t just that. Since I feed you poison—over 4&#37; of the pages linked from my site are on YouTube—the least I can do is sell you my cures.</p>
<h2 id="using_a_proper_media_player">Using a Proper Media Player</h2>
<p>The most popular solutions are either to use for a good blocker on a browser with &#40;supposedly&#41; long-term support for <a href="https://github.com/uBlockOrigin/uBlock-issues/issues/338#issuecomment-1332300191">Manifest V2</a> like <a href="https://ublockorigin.com">uBlock Origin</a> on Firefox, or use alternative front-ends such as <a href="https://invidious.io">Invidious</a> or <a href="https://github.com/TeamPiped/Piped">Piped</a>.  Although uBlock Origin is essential for a pleasant experience on the modern interwebs and alternative frontends offers the best UX for browsing videos, in-browser and service-specific media players are inferior anyway when compared to programs properly designed for a decent playback experience.</p>
<p>My favorite has been <a href="https://mpv.io">mpv</a> for as long as I can remember, as it makes it easy to adjust video brightness/contrast/etc., playback speed, subtitle size and placements, and to overamplify quiet audios.  Out of the box, it integrates with <a href="https://github.com/yt-dlp/yt-dlp">yt-dlp</a>, a time shifter with support for <a href="https://github.com/yt-dlp/yt-dlp/raw/master/supportedsites.md">most online media services</a>. Just drop the URL into an mpv window and <a href="https://www.youtube.com/watch?v&#61;_FNzL5nW_u4">boom</a>, it werks&#33;</p>
<p>Either <a href="https://uno.starshard.studio/notes/9nmgldtdgghu8m2n">drag-and-drop</a> or invoking <code>mpv &#36;url</code> is quite convenient, but not that close to following an anchor, is it?  You&#39;d need to first open mpv or a program launcher<sup id="fnref:launch">[1]</sup>, then drag the URL there, or perhaps copy and paste it for the latter cases. What if you <a href="https://ziglang.org/perf">gotta go fast</a>, aye?  As a <a href="https://video.hardlimit.com/c/morbiwars">hedgehog-maxxer</a> meself, of course I can do better, and here&#39;s how.</p>
<h2 id="with_a_browser_add-on">With a Browser Add-on</h2>
<p>While drafting this article, I noticed that the <em>ff2mpv</em> extension I was using had <a href="https://github.com/woodruffw/ff2mpv/commit/2397193b36e6.patch">technically been non-free</a> for a while.  Albeit I understand and respect the author&#39;s noble intention against violence, I believe discrimination never ends up helping those oppressed due to the power imbalance for the exclusion false-positives to be worth it.</p>
<p>For this reason, I switched to <a href="https://addons.mozilla.org/en-US/firefox/addon/iina-open-in-mpv">Open in mpv</a> and recommend it instead. The usage is practically the same: open context menu at the video URL and select <em>Open this link in mpv</em>.  The internal mechanism is a bit different though, and because it influences the installation process, I will try to briefly explain <a href="https://www.youtube.com/watch?v&#61;1Fl2sMV7Hcc">how it works</a>.</p>
<p>The way <em>Open in mpv</em> works is a bit convoluted.  First, it wraps the specified URL in a <code>mpv</code> scheme.  The new URL starts with <code>mpv://</code> is then passed back to Firefox, which must have been configured to open it in the native program <code>open-in-mpv</code>.  This program parses the URL into the equivalent mpv command and execute it.  If you are not on NixOS, see the <a href="https://github.com/Baldomo/open-in-mpv/raw/master/README.md">extension&#39;s README</a> to set it up yourself.</p>
<p>Otherwise, it can be declared in <a href="https://nixos.org/manual/nixos/stable/options#opt-programs.firefox.policies">configuration.nix&#40;5&#41;</a> as follows. The declarations should be self-explanatory after referencing Firefox&#39;s documentation for <a href="https://mozilla.github.io/policy-templates">policies.json</a>.  If you have trouble finding an extension&#39;s ID and download URL, search for it in <a href="https://gnuzilla.gnu.org/mozzarella">Mozzarella</a>.</p>
<pre><code class="language-nix">&#123; pkgs, ... &#125;:
&#123;
  programs.firefox &#61; &#123;
    enable &#61; true;
    policies &#61; &#123;
      ExtensionSettings.&quot;&#123;d66c8515-1e0d-408f-82ee-2682f2362726&#125;&quot; &#61; &#123;
        default_area &#61; &quot;menupanel&quot;;
        installation_mode &#61; &quot;normal_installed&quot;;
        install_url &#61;
          &quot;https://addons.mozilla.org/firefox&quot;
          &#43; &quot;/downloads/latest/iina-open-in-mpv/latest.xpi&quot;;
      &#125;;
      Handlers.scheme.mpv &#61; &#123;
        action &#61; &quot;useHelperApp&quot;;
        ask &#61; false;
        handlers &#61; &#91; &#123;
          name &#61; &quot;open-in-mpv&quot;;
          path &#61; &quot;&#36;&#123;pkgs.open-in-mpv&#125;/bin/open-in-mpv&quot;;
        &#125; &#93;;
      &#125;;
    &#125;;
  &#125;;
&#125;</code></pre>
<p>Even though Mozzarella is supposed to only show libre add-ons, be aware that the metadata it crawls from <a href="https://addons.mozilla.org">addons.mozzila.org</a> might not always be <a href="https://issues.guix.gnu.org/68361">correct</a>.  Ideally, browser extensions should be packaged in the distribution&#39;s repository, but packaging discipline is not exactly NixOS&#39;s strong suit.  I will probably post an update on how to declare <code>policies.json</code> in Guix once I figure that out.</p>
<h2 id="from_a_feed_reader">From a Feed Reader</h2>
<p>Now we can properly watch videos while browsing the web, but subscribing to YouTube channels on its web interface would require creating an account and subjecting one&#39;s self to more surveillance.  Fortunately, at the time of writing, YouTube still provide Atom <a href="https://en.wikipedia.org/wiki/Web_feed">feeds</a> for syndication. Funny enough, they are advertised on the channel pages as RSS:</p>
<pre><code class="language-html">&lt;link rel&#61;&quot;alternate&quot;
      type&#61;&quot;application/rss&#43;xml&quot;
      title&#61;&quot;RSS&quot;
      href&#61;&quot;https://www.youtube.com/feeds/videos.xml?channel_id&#61;…&quot;&gt;</code></pre>
<p>The referenced feed employ <a href="https://www.rssboard.org/media-rss">Media RSS</a> to communicate the video URL. This extension is widely supported by feed readers, as well as the previously mentioned feed-discovery mechanism.  I use <a href="https://lzone.de/liferea">Liferea</a>, which allows me to directly paste the YouTube channel&#39;s URL<sup id="fnref:ytc">[2]</sup>, and displays each video&#39;s description, thumbnail and enclosed media, e.g.</p>
<p><img src="https://lumvok.store/assets/liferea-youtube.png" alt="Liferea in action" /></p>
<p>For each MIME type to, enclosures can be configured to be opened by a user-preferred program.  In this case, I set <code>mpv --ytdl-format&#61;b</code> for <code>application/x-shockwave-flash</code> &#40;a reminiscence of a time when browsers needed <a href="https://ruffle.rs">Flash</a> to play videos and animations&#41; for the <em>second</em> best quality to save some bandwidth.  YouTube encodes the highest resolution video separate from the audio, so the best combined format <code>b</code> is one level lower than yt-dlp&#39;s default best video and best audio together.</p>
<h2 id="via_clipboard_integration">Via Clipboard Integration</h2>
<p>People also share videos with me via instant messaging. I find it cumbersome to open the URL in the browser then redirect it to the media player, so the clipboard is used as the bridge instead. To do this, I simply create a key binding to the command below.<sup id="fnref:wl">[3]</sup></p>
<pre><code class="language-sh">mpv --ytdl-format&#61;b &quot;&#36;&#40;xclip -out -selection clipboard&#41;&quot;</code></pre>
<h2 id="musing">Musing</h2>
<p>There, I shared how I do it so <a href="https://lumvok.store/blog/youtu">you can too</a>&#33;  If they seem needlessly complex, you share my disappointment on the UX evolution of the mainstream web.  I dream of a more semantic web, not necessarily web 3.0, perhaps just more explicitly typed, where e.g. a YouTube URL for embedding would be a <code>video/webm</code> instead of a <code>text/html</code>.</p>
<p>If <code>mailto</code> URIs can launch our email client, and social media pages can bug us to open the post in their own app, why can&#39;t we have interoperable media handling?  Maybe we should, but I&#39;m not sure if we can. <a href="https://www.searchenginejournal.com/youtube-is-showing-ads-on-non-monetized-channels/388674">Greed</a> stands in our way.  Providers force us to use their proprietary <a href="https://pluralistic.net/2023/01/21/potemkin-ai">malware</a> to consume their service.  <a href="https://www.defectivebydesign.org">DRM</a> has become the foundation of media distribution.  Grassroots movements like <a href="https://framasoft.org">Framasoft</a> might never reach mainstream status.</p>
<p>I don&#39;t mean to tell you to give up though, just to direct your energy to where it matters.  Spend less on developing <a href="https://sr.ht/~benbusby/farside">alternative front-ends</a> than on ethical replacements, bridges and inviting people over. We need more <a href="https://sepiasearch.org">videos</a>, more <a href="https://www.funkwhale.audio">music</a>, more <a href="https://castopod.org">podcasts</a>, more <a href="https://en.wikipedia.org/wiki/Open_access">knowledge</a>, better <a href="https://xmpp.org">instant</a> <a href="https://matrix.org">messaging</a>, better <a href="https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes">search engines</a>, better <a href="https://browser.mt">translations</a>, better <a href="https://www.home-assistant.io">home</a> <a href="https://platypush.tech">automation</a>, and whatnot. Against all odds, maybe things will finally start to improve even for those outside of our bubble.  <a href="https://fe.disroot.org/@mcsinyx/posts/ALaW77HgCSPq4pLxpo">Perchance.</a></p>
<table class="fndef" id="fndef:launch">
    <tr>
        <td class="fndef-backref">[1]</td>
        <td class="fndef-content">Or a terminal emulator</td>
    </tr>
</table><table class="fndef" id="fndef:ytc">
    <tr>
        <td class="fndef-backref">[2]</td>
        <td class="fndef-content">Something starting with https://www.youtube.com/@</td>
    </tr>
</table><table class="fndef" id="fndef:wl">
    <tr>
        <td class="fndef-backref">[3]</td>
        <td class="fndef-content">On <a href="https://wayland.social/@compositor/110768798303454842">Wayland</a>, replace <code>xclip</code> with something equivalent</td>
    </tr>
</table>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/youtu@cnx%3E&Subject=Re: Best Ways to Watch YouTube Videos">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/youtu@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/youtu/comments.xml</wfw:commentRss>
</item>
<item>
  <title>XML and Photo Gallery Generation: A Love Story</title>
  <link>https://lumvok.store/blog/pixml/index.html</link>
  <guid>https://lumvok.store/blog/pixml/index.html</guid>
  <description>How I make my photo gallery in XML and what&#39;s lovely about it</description>
  <category>fun</category><category>recipe</category><category>net</category>
  <pubDate>Fri, 17 Mar 2023 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="xml_and_photo_gallery_generation_a_love_story">XML and Photo Gallery Generation: A Love Story</h1>
<blockquote>
<p>I&#39;m just a language, whose style sheets are good<br />Oh, Lord, please, don&#39;t let me be misunderstood</p>
</blockquote>
<div class="admonition note"><p class="admonition-title">Tips</p><p>As usual, the article starts with a text wall of random rambling. If you are only interested in the technical aspects, feel free to skip the first two sections.</p>
</div>
<div class="franklin-toc"><ol><li>Introduction</li><li>Motivation</li><li>Preliminary</li><li>Approach</li><li>Implementation<ol><li>Page Generation</li><li>Feed Generation</li><li>Thumbnail Generation</li></ol></li><li>Discussion</li><li>Conclusion</li></ol></div>
<h2 id="introduction">Introduction</h2>
<p>Neural-optic live streaming probably, no, definitely offers the most photorealistic graphics one can set eyes on.  <a href="https://en.wikipedia.org/wiki/Computer-generated_imagery">CGI</a> is just a pathetic mimic, and photography or videography is no more than a poor plagiarism attempt when compared to quantum ray-tracing and other advanced physics simulations^W happenings.</p>
<p>On the other hand, we humen are rather shite at replaying visual memories, whilst &#40;<a href="https://en.wikipedia.org/wiki/Data_degradation">bit rot</a> aside&#41; media can be archived <a href="https://xkcd.com/1683">for forever</a>.  Besides, many of us are too busy to <em>touch grass</em> or go see cool things as regularly as we wish to.  This is how an industry based on showing us <a href="https://en.wikipedia.org/wiki/Drama">mundane stuff</a> or <a href="https://en.wikipedia.org/wiki/Fiction">obvious bullcrap</a> can still manage to make tens of thousands of <a href="https://antifandom.com/how-i-met-your-mother/wiki/Crapload">craploads</a> each year any why the interwebs are flooded with pictures of cats, kitties and pussies.</p>
<p>Finding new shits means dopamine dispensation and that&#39;s why <a href="https://www.youtube.com/watch?v&#61;1SNRULEnTVQ">they are dope</a>.  As a model netizen, I adhere to the web&#39;s social contract of mutual <a href="https://fe.disroot.org/@mcsinyx">shitposting</a> so that everyone can have a piece.  Every blue moon, I also enjoy posting more quality stuff like what you are reading right now, should you ignore the number of <a href="https://peervideo.club/w/uByA7Czy7PWYMqnu8FgXvW">Mozart</a> references in the last three paragraphs.</p>
<h2 id="motivation">Motivation</h2>
<p>Some other times, I also want to share the living things and sceneries I encounter in the <a href="https://github.com/zig-community/user-map/pull/120">new</a> place.  My camera was gifted by father before I moved and yet I shared more photos <a href="https://fotofed.nl/cnx">with strangers</a> than with my family.  The PixelFed instance I landed on irreversibly shrank and lossily compressed them, while dumping 5 MB images to the family chat room just feels weird, hence I decided to gather the decency to build a photo gallery to show my loved ones &#40;and admittedly, flex with online strangers&#41;.</p>
<p>There are not many <a href="https://en.wikipedia.org/wiki/Content_management_system">CMS</a> in the wild for photo hosting, and they often acts as a wall garden and/or a social network. Building and hosting a new one is quite overkill, thus the obvious solution left would be generating a static site.  Out of the gazillion <a href="https://en.wikipedia.org/wiki/Static_site_generator">SSG</a>, I couldn&#39;t found any that meets the my requirements:</p>
<ol>
<li><p>Generate a <a href="https://en.wikipedia.org/wiki/Web_feed">web feed</a></p>
</li>
<li><p>Automate filling <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Img">image</a> title and alt text</p>
</li>
<li><p>Offer fine-grain control for permanent <a href="https://en.wikipedia.org/wiki/Pagination">pagination</a></p>
</li>
<li><p>Generate thumbnails with custom size and name</p>
</li>
</ol>
<p>I mean, they perhaps exist, but the number I had to try and fight through would cost more time than writing the web pages and feed by hand. So I wrote them from scratch.  Y&#39;all can stand up and clap now&#33;</p>
<h2 id="preliminary">Preliminary</h2>
<p>Yes, I really started with writing <a href="https://en.wikipedia.org/wiki/XHTML">XHTML</a> and <a href="https://www.rfc-editor.org/rfc/rfc4287">Atom</a> by hand. A web page has the following structure with namespaces omitted and denoted in WXML &#40;<a href="https://www.draketo.de/software/wisp">Wisp</a>\(\times\)<a href="https://okmij.org/ftp/Scheme/SXML.html">SXML</a>&#41; so I don&#39;t have to close the tags &#40;have I given up on XML too early?-&#41;.</p>
<div class="admonition note"><p class="admonition-title">Syntax hints</p><p>For the uninitiated, any indentation or colon in Wisp represents an additional nest level, while a dot escape the nesting.  The at signs are used by SXML to denote attributes, which may remind you of <a href="https://www.w3.org/TR/xpath">XPath</a>. For example, the anchor to the previous page is <code>&lt;a href&#61;41&gt;PREV&lt;/a&gt;</code>.</p>
</div>
<pre><code class="language-julia">html
  head
    link
      @ : rel &quot;alternate&quot;
          type &quot;application/atom&#43;xml&quot;
          href &quot;/atom.xml&quot;
    ...
  body
    nav
      a : @ : href &quot;41&quot;
        . &quot;PREV&quot;
      h1 &quot;PAGE 42&quot;
      a : @ : href &quot;43&quot;
        . &quot;NEXT&quot;
    article
      @ : id &quot;foobar&quot;
      h2
        a : @ : href &quot;#foobar&quot;
          . &quot;foobar&quot;
      a : @ : href &quot;/42/foo.jpg&quot;
          img
            @ : src &quot;/42/foo.small.jpg&quot;
                alt &quot;pic of foo&quot;
                title &quot;pic of foo&quot;
      a : @ : href &quot;/42/bar.jpg&quot;
          ...
    article ...
    ...
    footer ...</code></pre>
<p>So far, adding an <code>article</code> is not yet too cumbersome, there&#39;s only a bit of redundancy for permanent links and the nesting level is acceptable with the deepest being <code>/html/body/article/a/img</code>.  It gets more repetitive once we publish it to to the linked Atom feed:</p>
<pre><code class="language-julia">feed
  entry
    link
      @ : rel &quot;alternate&quot;
          type &quot;application/xhtml&#43;xml&quot;
          href &quot;https://gallery.example/42/#foobar&quot;
    id &quot;https://gallery.example/42/#foobar&quot;
    title &quot;foobar&quot;
    content
      @ : type &quot;xhtml&quot;
      div
        img
          @ : src &quot;https://gallery.example/42/foo.jpg&quot;
              alt &quot;pic of foo&quot;
              title &quot;pic of foo&quot;
        img ...
    updated ...
  entry ...
  ...</code></pre>
<p>Since web feeds are standalone documents, they must always use absolute URLs. &#40;Welp that&#39;s not entirely true, <a href="https://www.w3.org/TR/xmlbase">XML Base</a> does exists, but not all readers support it, and more importantly, certain elements such as <code>atom:id</code> disallow relative references.&#41;  In addition, whilst the web page links a thumbnail to the original image to save bandwidths, the feed can be consumed one post at a time, which thus points to the full size version.  Therefore, copying the markup to embed it inside the Atom is error-prone and doesn&#39;t exactly spark joy.</p>
<div class="admonition note"><p class="admonition-title">Fun fact</p><p>What does spark joy is that we can embed XHTML directly into the web feed, which means the content is still XML and we don&#39;t need to quote it in CDATA. For other sites where contents don&#39;t accumulate up to hundreds of megabytes, this will allow us to slap some &#40;SPOILER ALERT&#33;&#41; stylesheet on the Atom feed and let the user agent render it in a <a href="https://simonesilvestroni.com/blog/build-a-human-readable-rss-with-jekyll">human-readable form</a>.</p>
</div>
<h2 id="approach">Approach</h2>
<p>I actually already spoiled it in the epigraph,<sup id="fnref:spoiler">[1]</sup> but for the sake of completeness let us <a href="https://xkcd.com/1445">discuss a few possible solutions</a>. What I wanted was to reduce the redundancy of manual input, in other words, a system transforming a custom information-dense format to standard yet sparser ones, which in this case are XHTML and Atom.  Given some new photos and their relevant data, the purpose was to minimize the publishing friction.</p>
<p>It&#39;s worth mentioning that the goal was not to minimize the input format, the transformation speed, or feedback latency, but all of the above, plus the cost of constructing the tool, incrementally as our requirements slightly changes over time.  Our choice for the base <a href="https://programming-journal.org/2023/7/13">programming system</a> shall affect each and every of these aspects and more.</p>
<p>Some technical dimensions are <a href="https://en.wikipedia.org/wiki/Animal_Farm">more equal</a> than others, though. For this use case, IMHO immediate feedback loop should be given the number one priority, not only because it&#39;d be frustrating to have to complete multiple rituals just to preview the changes, but also as watching and reflecting file system changes is &#40;sadly still&#41; a difficult problem.</p>
<p>For Linux<sup id="fnref:interjection">[2]</sup> there&#39;s <a href="https://man7.org/linux/man-pages/man7/inotify.7.html">inotify</a> which doesn&#39;t suck, except when it does and misses events,<sup id="fnref:entr">[3]</sup> and the standard POSIX build tool <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html">make</a> relies on <a href="https://apenwarr.ca/log/20181113">mtime which is also flaky</a>.  Some SSG work around this by spawning up a server with more sophisticated caching mechanism and even include a HTTP server sending out refresh events. Implementing such system is easily <a href="https://xkcd.com/1319">more expensive</a> than doing the original task manually.</p>
<p>Luckily, there is another way.  <em>After</em> the birth of imperative DOM manipulation programs running on VM inside browsers &#40;Ecma scripts&#41;, there came a &#40;now forgotten&#41; art of purely functional DOM transformation. More specifically, <a href="https://www.w3.org/standards/xml/transformation">XSLT</a> can declaratively transform any XML document to another, and its best part is that modern browsers natively support it, i.e. there&#39;s no difference between editing the input document and the hypothetical output XHTML.  For better portability and rendering performance, we can still generate the latter ahead-of-time &#40;AoT&#41; during deployment.</p>
<h2 id="implementation">Implementation</h2>
<p>Going back to the example, the input format could boil down to a more concise XML file, e.g. <code>42/index.xml</code>:</p>
<pre><code class="language-julia">page
  @ : prev &quot;41&quot;
      curr &quot;42&quot;
      next &quot;43&quot;
  post
    @ : title &quot;foobar&quot;
        time ...
    picture
      @ : filename &quot;foo&quot;
          desc &quot;pic of foo&quot;
    picture ...
    ...
  post ...
  ...</code></pre>
<h3 id="page_generation">Page Generation</h3>
<p>The stylesheet should then be declared at the beginning of the file, so that the user agent can automatically fetch and apply it to render the output XHML:</p>
<pre><code class="language-julia">&lt;?xml-stylesheet href&#61;&quot;/page.xslt&quot; type&#61;&quot;text/xsl&quot;?&gt;</code></pre>
<p>XSLT is essentially a templating language, similar to PHP &#40;which is also older&#41; and template libraries in your favorite languages.  For the ease of reading, I will let the target document&#39;s namespace be the default, while aliasing the transformation one as <code>xsl</code>.  The stylesheet for the web pages would look something like the following, which should be self-explanatory.</p>
<pre><code class="language-julia">xsl:stylesheet
  xsl:template : @ : match &quot;/page&quot;
    xsl:variable : @ : name &quot;base&quot;
      xsl:text &quot;/&quot;
      xsl:value-of : @ : select &quot;@curr&quot;
      xsl:text &quot;/&quot;
    html
      head ...
      body
        nav
          xsl:if : @ : test &quot;@prev &#33;&#61; &#39;&#39;&quot;
            a : @ : href &quot;/&#123;@prev&#125;/&quot;
              . &quot;PREV&quot;
          h1 : xsl:text &quot;PAGE &quot;
               xsl:value-of : @ : select &quot;@curr&quot;
          xsl:if : @ : test &quot;@next &#33;&#61; &#39;&#39;&quot;
            ...
        xsl:for-each : @ : select &quot;post&quot;
          xsl:variable : @ : name &quot;id&quot;
            xsl:value-of
              @ : select &quot;translate&#40;@title, &#39; &#39;, &#39;-&#39;&#41;&quot;
          article
            @ : id &quot;&#123;&#36;id&#125;&quot;
            h2
              a : @ : href &quot;#&#123;&#36;id&#125;&quot;
                  xsl:value-of : @ : select &quot;@title&quot;
            xsl:for-each : @ : select &quot;picture&quot;
              a : @ : href &quot;&#123;&#36;base&#125;&#123;@filename&#125;.jpg&quot;
                  img
                    @ : src &quot;&#123;&#36;base&#125;&#123;@filename&#125;.small.jpg&quot;
                        alt &quot;&#123;@desc&#125;&quot;
                        title &quot;&#123;@desc&#125;&quot;
        footer ...</code></pre>
<h3 id="feed_generation">Feed Generation</h3>
<p>Similarly, for Atom entries on a single page,</p>
<pre><code class="language-julia">xsl:stylesheet
  xsl:variable : @ : name &quot;root&quot;
    . &quot;https://gallery.example/&quot;
  xsl:template : @ : match &quot;/page&quot;
    xsl:variable : @ : name &quot;base&quot;
      xsl:value-of : @ : select &quot;&#36;root&quot;
      xsl:value-of : @ : select &quot;@curr&quot;
      xsl:text &quot;/&quot;
    xsl:for-each : @ : select &quot;post&quot;
      xsl:variable : @ : name &quot;url&quot;
        xsl:value-of : @ : select &quot;&#36;base&quot;
        xsl:text &quot;#&quot;
        xsl:value-of
          @ : select &quot;translate&#40;@title, &#39; &#39;, &#39;-&#39;&#41;&quot;
      entry
        link
          @ : rel &quot;alternate&quot;
              type &quot;application/xhtml&#43;xml&quot;
              href &quot;&#123;&#36;url&#125;&quot;
        id : xsl:value-of : @ : select &quot;&#36;id&quot;
        title : xsl:value-of : @ : select &quot;@title&quot;
        content
          @ : type &quot;xhtml&quot;
          div
            xsl:for-each : @ : select &quot;picture&quot;
              img
                @ : src &quot;&#123;&#36;base&#125;&#123;@filename&#125;.jpg&quot;
                    alt &quot;&#123;@desc&#125;&quot;
                    title &quot;&#123;@desc&#125;&quot;
        updated : xsl:value-of : @ : select &quot;@time&quot;</code></pre>
<p>The trickier part here is concatenating the entries together. Simple enough, instead of linking to the stylesheet in the data, we can read XML files directly from XSLT.</p>
<pre><code class="language-julia">xsl:template
  @ : match &quot;/&quot;
  ...
  xsl:apply-templates
    @ : select &quot;document&#40;&#39;42/index.xml&#39;&#41;/page&quot;
  xsl:apply-templates ...
  ...</code></pre>
<p>This allows us to do other cool things, such as embedding SVG in XHTML to make use of the parent element&#39;s <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/color_value#currentcolor_keyword">currentcolor</a>, while keeping the source files separate.  It is especially useful for monochromatic icons, e.g.</p>
<pre><code class="language-julia">xsl:copy-of : @ : select &quot;document&#40;&#39;cc.svg&#39;&#41;/*&quot;
xsl:copy-of : @ : select &quot;document&#40;&#39;by.svg&#39;&#41;/*&quot;
xsl:copy-of : @ : select &quot;document&#40;&#39;sa.svg&#39;&#41;/*&quot;</code></pre>
<h3 id="thumbnail_generation">Thumbnail Generation</h3>
<p>So far, we have met three out of the four requirements, only thing left is creating the thumbnails.  Inspired by Ethan Dalool, I am going for <a href="https://voussoir.net/writing/sharing_photos">fairly large ones of 1024 px in width</a>,</p>
<blockquote>
<p>large enough to comfortably browse the photos without clicking through to the big version of each, and the thumbnails are decently light and not too jpeggy at about 125-150 kilobytes on average.</p>
</blockquote>
<p>At such size, I can aim for around ten photoes<sup id="fnref:toes">[4]</sup> per page while maintaining a somewhat decent load time.  Plus, since the width of images are hardcoded, page <a href="https://en.wikipedia.org/wiki/Margin_&#40;typography&#41;">margin</a> could be automatically inferred to never stretch them.</p>
<pre><code class="language-css">html &#123;
    box-sizing: border-box;
    margin: auto;
    max-width: calc&#40;1024px &#43; 2ch&#41;;
&#125;
body &#123; margin: 0 1ch &#125;</code></pre>
<p>To generate the thumbnails, I use <a href="https://github.com/mattes/epeg">epeg</a> together with <code>make</code> for wildcarding:</p>
<pre><code class="language-julia">PICTURES :&#61; &#36;&#40;filter-out &#37;.small.jpg &#36;&#40;PREFIX&#41;/&#37;.jpg, &#36;&#40;wildcard */*.jpg&#41;&#41;
THUMBNAILS :&#61; &#36;&#40;patsubst &#37;.jpg,&#37;.small.jpg,&#36;&#40;PICTURES&#41;&#41;&#37;.small.jpg: &#37;.jpg
	epeg -w 1024 -p -q 80 &#36;&lt; &#36;@</code></pre>
<p>The Makefile also define rules for AoT compilation using <a href="https://gnome.pages.gitlab.gnome.org/libxslt/xsltproc.html">xsltproc</a> for the web pages and feed.  Apparently no feed reader supports XSLT, and for pages runtime processing negatively affect the performance due to the multiple round trips for the stylesheet and the vector icons.</p>
<pre><code class="language-julia">DATA :&#61; &#36;&#40;wildcard */index.xml&#41; index.xml
PAGES :&#61; &#36;&#40;patsubst &#37;.xml,&#37;.xhtml,&#36;&#40;DATA&#41;&#41;
OUTPUTS :&#61; &#36;&#40;THUMBNAILS&#41; &#36;&#40;PAGES&#41; atom.xmlall: &#36;&#40;OUTPUTS&#41;index.xml: &#36;&#40;LATEST&#41;/index.xml
	ln -fs &#36;&lt; &#36;@&#37;.xhtml: &#37;.xml page.xslt
	xsltproc page.xslt &#36;&lt; &gt; &#36;@atom.xml: atom.xslt &#36;&#40;DATA&#41; &#36;&#40;wildcard *.svg&#41;
	xsltproc atom.xslt &gt; atom.xml</code></pre>
<p>The <a href="https://trong.loang.net/~cnx/px">full implementation</a> is deployed to <a href="https://px.lumvok.store">px.lumvok.store</a>, mirrored to the <a href="https://www.opennic.org">OpenNIC</a> domain <a href="https://pix.sinyx.indy">pix.sinyx.indy</a> reusing the former&#39;s TLS certificate, because CA/Browser Forum disallows support for domains not recognized by ICANN and no <a href="https://wiki.opennic.org/opennic/tls">CA for OpenNIC</a> is mature enough.</p>
<h2 id="discussion">Discussion</h2>
<blockquote>
<p><em>Okay you built your site using XML macros, so what? The syntax is clunky and you hate it so much yourself that not even a single line of code example here is in actual XML. Doesn&#39;t seem like a love story to me&#33;</em></p>
</blockquote>
<p>Like all relationships, it&#39;s not that simple.  I&#39;ve learned to not judge a book by its cover and come to the understanding that XML is the &#40;ugly&#41; equivalence of <a href="https://en.wikipedia.org/wiki/S-expression">sexp</a>.<sup id="fnref:sex">[5]</sup>  Unlike afterthoughts such as C preprocessors, <a href="https://docs.djangoproject.com/en/dev/topics/templates">Django</a>-like templates, or even the Wisp-lookalike syntax of <a href="https://github.com/slim-template/slim">Slim</a>, XML stylesheets is in the same data structure.  To put it another way, one can use XSLT to generate XSLT from XSLT.  Do I need it in this case or ever at all?  Probably not, but that certainly makes XSL a lot more attractive in my eyes.</p>
<p>Furthermore, the tooling for XML is highly mature, from editors to linters and processors to rendering engines.  It&#39;d be lying to say you ain&#39;t fascinated that tis possible to directly feed browsers pure data instead of markup representations.  More than that, one can have entirely static API endpoints that are both human- and machine-readable.</p>
<blockquote>
<p><em>XSL is just declarative JS&#33;  You are so blinded by your lust for functional programming that you have become <a href="https://lumvok.store/blog/reply">the very thing you swore to destroy</a>&#33;</em></p>
</blockquote>
<p>My distaste for Ecma scripts is not due to DOM manipulation. Sure, I do find in-place modification inelegant for documents, but if only that&#39;s the only issue.  I block them on most sites because they can interact with many things other than just the DOM, imposing <a href="https://en.wikipedia.org/wiki/Mouse_tracking">privacy</a> and <a href="https://react-etc.net/entry/exploiting-speculative-execution-meltdown-spectre-via-javascript">security</a> risks while <a href="https://meta.stackexchange.com/q/2980/698165">fucking up the UX</a>.</p>
<p>Architecturally, Ecma scripts enable the absolute bloody worst possible kind of web pages with zero data at all, fetching tiny pieces of content in JSON and turn performance <a href="https://unixsheikh.com/articles/so-called-modern-web-developers-are-the-culprits.html">to shit</a>.  The user agents then try to salvage efficiency by turning themselves into a distributed system component and adding optimizations that shall never be &#40;ab&#41;used for the sake of users. O ye <a href="https://en.wikipedia.org/wiki/Wirth&#37;27s_law">cycle of doom</a>&#33;</p>
<p>Note that one can make a similar mistake with XSL regarding the number of round trips, and XML stylesheets can provide the same front-end/back-end separation.  Both can be used to provide hot loading during development and AoT rendering in production &#40;if not all, then many JS libraries support pre-rendering, ignoring the monstrous <a href="https://lumvok.store/blog/dedep">dependency graph</a>&#41;. At the end of the day, it&#39;s not the matter of technology but principle: to be in the <a href="https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys">users&#39; best interest</a>.</p>
<blockquote>
<p><em>There is nothing complex about the photo gallery, any existing SSG can do the same with minor tweaks&#33; You never needed to write a new one to begin with&#33;</em></p>
</blockquote>
<p>I am wondering the same myself, but keep in mind there are details I&#39;ve been hiding from in the example.  I went all-in for the semantic web with the hope for best portability and accessibility.  One thing I haven&#39;t mentioned is the <code>lang</code> attribute, e.g. <code>en</code>, <code>vi</code> or <code>fr</code> depending on the post.  Adding this to the web pages requires the SSG to be somewhat modular, and even harder for the web feed.</p>
<p>Moreover, generic SSG are not designed to handle the difference in content between a page&#39;s <code>article</code> and the feed&#39;s corresponding <code>entry</code>, neither for having multiple posts in a single page.  Pagination is also commonly implemented backwards, i.e. page 2 being the second latest one, making it impossible to avoid link rot.</p>
<p>Not to suggest that the majority of SSG are poorly designed, just that from a certain amount of <a href="https://guide.handmade-seattle.com/c/2021/context-is-everything">context</a> difference, tis cheaper to just redesign from scratch.  This is not about XSL vs Go/Python/JS for SSG or web dev in general, but this specific and happen-to-be-far-from-complex case.</p>
<h2 id="conclusion">Conclusion</h2>
<p>At the time of writing, XML has pretty much been superseded by JSON or YAML, for the better or worse.  I have no love for YAML for obvious reasons, but it also saddens me to sometimes see JSON being solely used as a container for HTML.  I hope that this essay can <a href="https://www.youtube.com/watch?v&#61;F3QPWrLFsOA">awaken something in you</a> about XML and remind you about the semantic web in your next project.  It worked out for me, maybe it&#39;ll work out for you too&#33;</p>
<p>The story between XML and my photo gallery is a fond love story. They were born for each other, there was no drama, everything just werkt. Their romance inspire me to better appreciate stability and maturity, and value those right in front of my eyes yet I had been <em>too blind to see</em>. Anyway, this is getting too long, so Imma end it with another <a href="https://www.youtube.com/watch?v&#61;5LvOdWi3Qno">song</a>.</p>
<blockquote>
<p>Lookin&#39; for perfect<br />Surrounded by artificial<br />You&#39;re the closest thing to real I&#39;ve seen<br />Sure, everyone has their problems<br />That&#39;s a given<br />Yours are the easiest to tolerate</p>
</blockquote>
<table class="fndef" id="fndef:spoiler">
    <tr>
        <td class="fndef-backref">[1]</td>
        <td class="fndef-content">If you know, you know.</td>
    </tr>
</table><table class="fndef" id="fndef:interjection">
    <tr>
        <td class="fndef-backref">[2]</td>
        <td class="fndef-content">Yup, just the kernel.</td>
    </tr>
</table><table class="fndef" id="fndef:entr">
    <tr>
        <td class="fndef-backref">[3]</td>
        <td class="fndef-content">But in case it works for you, check out <a href="https://eradman.com/entrproject">entr</a>.</td>
    </tr>
</table><table class="fndef" id="fndef:toes">
    <tr>
        <td class="fndef-backref">[4]</td>
        <td class="fndef-content"><em>Thumb</em>nails, pho<em>toes</em>, get it?-&#41;</td>
    </tr>
</table><table class="fndef" id="fndef:sex">
    <tr>
        <td class="fndef-backref">[5]</td>
        <td class="fndef-content">Or conventionally in most Lisp 1&#39;s, <code>sex?</code>.</td>
    </tr>
</table>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/pixml@cnx%3E&Subject=Re: XML and Photo Gallery Generation: A Love Story">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/pixml@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/pixml/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Comments for Static Sites without JavaScripts</title>
  <link>https://lumvok.store/blog/reply/index.html</link>
  <guid>https://lumvok.store/blog/reply/index.html</guid>
  <description>Comments for Static Sites without JavaScript via Emails</description>
  <category>fun</category><category>recipe</category><category>net</category>
  <pubDate>Sun, 09 Jan 2022 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="comments_for_static_sites_without_javascripts">Comments for Static Sites without JavaScripts</h1>
<blockquote>
<p>I&#39;m open for criticism<br />But really, is it any room for criticism?</p>
</blockquote>
<p>Recently, I&#39;ve switched my <a href="https://en.wikipedia.org/wiki/Web_feed">feed</a> reader from <a href="https://newsboat.org">Newsboat</a> to <a href="https://lzone.de/liferea">Liferea</a>. The latter has a GUI and some extra features which make the experience a lot more comfy.  For instance, custom enclosure handling lets me to finally migrate all of my YouTube subscriptions to <a href="https://en.wikipedia.org/wiki/Atom_&#40;Web_standard&#41;">Atom</a> and <em>conveniently</em> browse and watch videos using <a href="https://mpv.io">mpv</a>.  Image support also allows me to directly view web comics.<sup id="fnref:image">[1]</sup>  One of them, <a href="https://themonsterunderthebed.net">The Monster Under the Bed</a>,<sup id="fnref:nsfw">[2]</sup> does not embed the strips in its feed, but it has comments.</p>
<p>Yes, <a href="https://www.rssboard.org/rss-specification">RSS</a> includes support for <code>&lt;comments&gt;</code>, and I was not aware of it until <a href="https://nixnet.social/notice/AEO3fYbuzYCJl85eD2">very recently</a>.  I suppose many other people late to the &#40;web feed&#41; party are neither.  Since the rise of static sites, feeds have regain popularity, even for <a href="https://www.theregister.com/2021/05/20/google_rss_chrome_android">Google to reconsider its direction</a>.  Compare to RSS or Atom, alternatives have the following shortcomings:</p>
<ul>
<li><p><a href="https://en.wikipedia.org/wiki/Usenet">Usenet</a> is generally obsolete to most people.</p>
</li>
<li><p><a href="https://en.wikipedia.org/wiki/Mailing_list">Mailing list</a> messages are immutable.</p>
</li>
<li><p>Fora and social media are silos.<sup id="fnref:silo">[3]</sup></p>
</li>
<li><p>Social media are designed for ephemeral discussions.</p>
</li>
<li><p>Instant messaging is awful for archival.</p>
</li>
</ul>
<p>On the other hand, news feeds are commonly read-only: only a few readers can render comments and even fewer are able to post one.  On the server side, a dynamic server is needed to accept comments.  Traditionally, it&#39;s the same as the system serving the website.  Although this works, it is significantly more costly than a server dedicated to static sites, which scale a lot better.</p>
<p><a href="https://en.wikipedia.org/wiki/Hacker">Hackers</a> have came up with multiple workarounds such as using <a href="https://carlschwan.eu/2020/12/29/adding-comments-to-your-static-blog-with-mastodon">microblogging</a> or <a href="https://cactus.chat">instant messaging</a> to add comments to their static sites, but all require client-side code execution, which is an option for neither RSS nor Atom.  Furthermore, <a href="https://unixsheikh.com/articles/so-called-modern-web-developers-are-the-culprits.html">JavaScript hurts portability and performance</a> on the WWW, hence it should be avoided unless it is absolutely impossible to implement a feature otherwise.  Commenting is not an exception.</p>
<p>Following is my adventure implementing a comment section for this very blog. If you&#39;re also up to the task, I think you should view what I did as an inspiration &#40;rather than a reference&#41; and don&#39;t be afraid to experiment around until satisfaction.</p>
<div class="franklin-toc"><ol><li>Choosing Back-End</li><li>Designing Data Flow</li><li>Implementation<ol><li>Accepting Replies</li><li>Rendering Comments</li><li>Injecting Comments</li></ol></li><li>Moderation</li></ol></div>
<h2 id="choosing_back-end">Choosing Back-End</h2>
<p>As mentioned earlier, static sites or not, there still needs to be a dynamic component to accept incoming replies.  HTTP requests would be the most portable since all netizen obviously have a web browser, but those are what we&#39;re trying to replace here.  What else does everyone has nowadays? Something so common that it can be used to identify people upon service registrations?  Exactly, emails and phone numbers&#33;</p>
<p>OK, Imma stop horsing around.  My back-end of choice would be emails. It&#39;s global, it&#39;s cheap and federated.  Cellular services almost fit the bill, except that they would cost an arm and leg for one to comment around the web everyday via SMS, whose character limit is not facilitating thoughtful discussions either.  As for forum, social medium or instant messaging, no platform has nearly as large of an user base as electronic mails.</p>
<p><img src="https://lumvok.store/assets/html5-js.png" alt="HTML is often a trojan horse for JavaScript" /></p>
<p>It&#39;s not like any email would fit the comment section though.  Especially not the HTML kind with a few hundred kilobytes of embedded CSS, JS and non-content images.  From the security standpoint alone &#39;tis already a no-go.  A light markup language like Markdown<sup id="fnref:mime">[4]</sup> would be much better.</p>
<p>One great thing about using a mature technology like email is that we have all use cases covered.  Filtering, exporting and parsing emails work out-of-box regardless of one&#39;s provider, <a href="https://en.wikipedia.org/wiki/Email_client">MUA</a> and programming preferences.  I have an SourceHut account with which I can create mailing lists on-demand so I&#39;m using it; however there&#39;s no reason exporting from your private inbox is any more difficult, presuming you have set up <a href="https://drewdevault.com/2021/05/17/aerc-with-mbsync-postfix.html">offline email</a>.</p>
<div class="admonition note"><p class="admonition-title">Tips and tricks</p><p>Speaking of SourceHut, exporting a mailing list archive is rather easy, one could either use the button on the web UI or download from the API. As the operation is not exactly cost-free, the former is protected by a <a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">CSRF</a> token and the latter by <a href="https://man.sr.ht/meta.sr.ht/oauth.md">OAuth 2.0</a>.  If you are a fellow <a href="https://sr.ht">sr.ht</a> user, you can use <a href="https://man.sr.ht/builds.sr.ht/manifest.md#tasks">acurl</a> on the build service with the URL from the <a href="https://lists.sr.ht/graphql">GraphQL</a> <code>query &#123; me &#123; lists &#123; results &#123; name, archive &#125; &#125; &#125; &#125;</code>.</p>
</div>
<div class="admonition note"><p class="admonition-title">Update</p><p>I stopped paying for sr.ht in May 2024 after years of Sourcehut failing to show any measurable progress towards reaching the beta status. I am now using public-inbox for public, eh, inboxes.</p>
</div>
<h2 id="designing_data_flow">Designing Data Flow</h2>
<p>I promise, this sounds bigger than it really is, but first, let&#39;s have a glance at how static generators work.  Typically, there are three times templating happens:</p>
<ol>
<li><p>Conversion of individual articles into HTML <em>content</em></p>
</li>
<li><p>Inserting each article content in a page template to create a complete HTML document</p>
</li>
<li><p>Inserting multiple HTML contents into one RSS or Atom feed template</p>
</li>
</ol>
<p>At completion, two kinds of output are generated: website and web feed. Similarly, comments have to be rendered for both targets: an HTML comment section for web browsing and a separate RSS feed for each article&#39;s <code>&lt;wfw:commentRss&gt;</code>.<sup id="fnref:wfw">[5]</sup>  Therefore, injections should be done separately at stage 2 and 3.  The overall process of static site generation with email comments is illustrated as follows.</p>
<p><img src="https://lumvok.store/assets/formbox.svg" alt="Data transformation during generation process" /></p>
<p>For clarity, HTML and RSS input templates for comments and their parent page and web feed are omitted.  Path to each <em>comment feed</em> output being injected in the respective <em>web feed item</em> is also not shown in the figure.</p>
<h2 id="implementation">Implementation</h2>
<p>At the time of writing, this personal website of mine was generated by <a href="https://julialang.org">Julia</a> <a href="https://franklinjl.org">Franklin</a>, who was neither fast<sup id="fnref:speed">[6]</sup> nor <a href="https://github.com/tlienart/Franklin.jl/issues/936">semantic</a>, but was the only one I knew supporting LaTeX prerendering out of the box. Franklin is also rather <a href="https://franklinjl.org/syntax/utils">extendable</a> via Julia functions.</p>
<h3 id="accepting_replies">Accepting Replies</h3>
<p>Let&#39;s start with how each article can be programmatically and uniquely identified.  By default in RSS, a <a href="https://www.rssboard.org/rss-profile#element-channel-item-guid">GUID</a><sup id="fnref:guid">[7]</sup> is the permanent URL of the associated web page.  I am not exactly a creative person, so I mirrored this idea, although I only used the difference between URLs, i.e. minus the scheme, network location and trailing <code>index.html</code> &#40;Franklin always appends it to the target path of any source file that is neither <code>index.md</code> nor <code>index.html</code>&#41;:</p>
<pre><code class="language-julia">dir_url&#40;&#41; &#61; strip&#40;dirname&#40;locvar&#40;:fd_url&#41;&#41;, &#39;/&#39;&#41;
message_id&#40;&#41; &#61; &quot;&#37;3C&#36;&#40;dir_url&#40;&#41;&#41;@cnx&#37;3E&quot;</code></pre>
<p>For maximum portability, threading identification is used in emails&#39; <code>In-Reply-To</code> header, which expects a message ID, which must match <code>&lt;.&#43;@.&#43;&gt;</code>.  Once again, to avoid having to think, I opted for the path difference for the left hand side and my nickname <code>cnx</code> for the right.  The <code>mailto</code> URI could be then be constructed accordingly:</p>
<pre><code class="language-julia">using Printf: @sprintffunction hfun_mailto_comment&#40;&#41;
  @sprintf&#40;&quot;mailto:&#37;s?&#37;s&#61;&#37;s&amp;&#37;s&#61;Re: &#37;s&quot;,
           &quot;cnx.site@loa.loang.net&quot;,
           &quot;In-Reply-To&quot;, message_id&#40;&#41;,
           &quot;Subject&quot;, locvar&#40;:title&#41;&#41;
end</code></pre>
<p>The anchor was then added to the page foot:</p>
<pre><code class="language-html">&lt;a href&#61;&quot;&#123;&#123;mailto_comment&#125;&#125;&quot;
   title&#61;&quot;Reply via email&quot;&gt;&#123;&#123;author&#125;&#125;&lt;/a&gt;</code></pre>
<h3 id="rendering_comments">Rendering Comments</h3>
<p>This is when the fun begins.  Julia&#39;s standard library does not include an email parser, and I doubt your favorite language does either, unless it is named after a British comedy troupe.  Python is often described as <em>batteries included</em>, or at least it used to &#40;seemingly the consensus among current core devs has shifted towards <a href="https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068">favoring third-party libraries</a>&#41;.</p>
<div class="admonition note"><p class="admonition-title">Off-topic rambling</p><p>Standard library inclusion wasn&#39;t really the deal breaker here though. I still needed a Markdown engine and a HTML sanitizer &#40;because Markdown can include HTML&#41;, and AFAICT no stdlib has them.  The read issue was with the lack of Julia packaging on most distributions &#40;apart from Guix&#41;, and most certainly <a href="https://github.com/NixOS/nixpkgs/issues/20649">not on NixOS</a>, my current distro.  For the same reason the idea of rewriting Franklin in Python has been running in my head for a while now.  Python packaging is much more downstream-friendly and unlike Julia compilation overhead is almost non-existent.</p>
</div>
<p>On the other hand, it&#39;s trivial to pipe an external program&#39;s output to Julia, e.g. <code>readchomp&#40;&#96;echo foo bar&#96;&#41;</code> would give you the string &quot;foo bar&quot;.  Thus, the to-be-written <em>comment generator</em> should take &#40;the path to&#41; a mail box, the message ID of the article and a template, and write the result to stdout. Argument parsing is, again, thankfully in Python&#39;s stdlib:</p>
<pre><code class="language-python">from argparse import ArgumentParser
from pathlib import Path
from urllib.parse import unquoteparser &#61; ArgumentParser&#40;&#41;
parser.add_argument&#40;&#39;mbox&#39;&#41;
parser.add_argument&#40;&#39;id&#39;, type&#61;unquote&#41;
parser.add_argument&#40;&#39;template&#39;, type&#61;Path&#41;
args &#61; parser.parse_args&#40;&#41;</code></pre>
<p>I then parsed the <a href="https://datatracker.ietf.org/doc/html/rfc4155">mbox</a> into a mapping indexed by parent message IDs as follows.  They would be HTML-unquoted so that was why I needed to do the same for the input message ID.</p>
<pre><code class="language-python">from collections import defaultdict
from email.utils import parsedate_to_datetime
from mailbox import mboxdate &#61; lambda m: parsedate_to_datetime&#40;m&#91;&#39;Date&#39;&#93;&#41;.date&#40;&#41;
archive &#61; defaultdict&#40;list&#41;
for message in sorted&#40;mbox&#40;args.mbox&#41;, key&#61;date&#41;:
    archive&#91;message&#91;&#39;In-Reply-To&#39;&#93;&#93;.append&#40;message&#41;</code></pre>
<p>As said earlier, arbitrary HTML content is not exactly suitable for comments. However, it is undeniable that HTML emails have taken over the world and compromises must be made: allowing <code>multipart/alternative</code> of both <code>text/plain</code> and <code>text/html</code>.  It is not the only multipart, so are attachments and cryptographic signatures.  Since we are only interested in the plaintext part, it is actually easier done than said to extract it:</p>
<pre><code class="language-python">from bleach import clean, linkify
from markdown import markdowndef get_body&#40;message&#41;:
    if message.is_multipart&#40;&#41;:
        for payload in map&#40;get_body, message.get_payload&#40;&#41;&#41;:
            if payload is not None: return payload
    elif message.get_content_type&#40;&#41; &#61;&#61; &#39;text/plain&#39;:
        body &#61; message.get_payload&#40;decode&#61;True&#41;
        return clean&#40;linkify&#40;body, output_format&#61;&#39;html5&#39;&#41;&#41;,
                     tags&#61;..., protocols&#61;...&#41;
    return None</code></pre>
<p>Now all that&#39;s left is to render that body and relevant headers as an HTML segment or an RSS item.  This is when we revisit the template. Jinja is probably the most popular in Python, thanks to Django and Flask, but its complexity is rather unnecessary.  Instead, I went with the built-in <code>str.format</code>.</p>
<p><img src="https://lumvok.store/assets/format.jpg" alt="Double braces are brilliant, but I prefer single ones" /></p>
<p>What are templates for, exactly?  Not the complete document, apparently, because that would differs from article to article and increase the complexity for injection.  Neither a single comment, as comments are threaded into trees &#40;or a forest&#41; and their relationship can be useful.  We gotta <a href="https://genius.com/Timbaland-meet-in-tha-middle-lyrics">meet in tha middle</a> and use recursive templates instead, e.g. for nested comments:</p>
<pre><code class="language-html">&lt;div class&#61;comment&gt;
  ...
  &#123;children&#125;
&lt;/div&gt;</code></pre>
<p>To render linear comments, such as for <code>&lt;wfw:commentRss&gt;</code>, simply move the children out of the item as follows.</p>
<pre><code class="language-xml">&lt;item&gt;
  ...
&lt;/item&gt;
&#123;children&#125;</code></pre>
<p>The rest substitutions are mostly just extracted from the email&#39;s headers. Another bit that needs some extra decisions, though, is the parameters for the <code>mailto</code> URI to reply to each comment:</p>
<ul>
<li><p><code>In-Reply-To</code> set to current <code>Message-Id</code></p>
</li>
<li><p><code>Cc</code> set to current <code>Reply-To</code> &#40;if exists&#41; or <code>From</code></p>
</li>
<li><p><code>Subject</code> is inherited, with <code>Re:</code> prepended if missing</p>
</li>
</ul>
<p>This is getting boring with a lot of trivial code, so I&#39;ll leave you with a pointer to the completed script named <a href="https://trong.loang.net/~cnx/formbox">formbox</a> and move on to more interesting stuff.</p>
<h3 id="injecting_comments">Injecting Comments</h3>
<p>Inserting HTML comment sections is pretty simple.  First I wrote a simple Julia function <code>render_comments</code> calling <code>formbox</code> under the hood, then</p>
<pre><code class="language-julia">hfun_comments_rendered&#40;&#41; &#61; render_comments&#40;&quot;comment.html&quot;&#41;</code></pre>
<p><code>comments_rendered</code> is then injected below the article.  For RSS, it took an extra steps:</p>
<ol>
<li><p>Insert <code>render_comments&#40;&quot;comment.xml&quot;&#41;</code> to the comment feed template <code>comments.xml</code> &#40;notice they are two different templates&#41; and write it next to the article&#39;s output <code>index.html</code></p>
</li>
<li><p>Insert the path of the written comment feed to the <code>&lt;wfw:commentRss&gt;</code> tag in the article&#39;s feed item</p>
</li>
</ol>
<p>That&#39;s it&#33;</p>
<h2 id="moderation">Moderation</h2>
<p>I don&#39;t want a <em>Terms of Services</em> page, it&#39;d feel too corporate for my <em>personal</em> website, so I will list the rules here:</p>
<ol>
<li><p>Please be excellent to each other.  Disagreements are okay, personal insults are not.</p>
</li>
<li><p>Stay on topic.  If you want to publicly discuss with me about something else, start a new thread on a <a href="https://loa.loang.net/cnx.misc">mailing list</a> or reach me via social media.</p>
</li>
<li><p><a href="https://useplaintext.email">Use plaintext emails</a> and do not top post.  Markdown inline markups, block quotes, lists and code blocks are supported.</p>
</li>
<li><p>Comments are implied to be under <a href="https://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0</a> unless declared otherwise.</p>
</li>
<li><p>I reserve the right to remove any comment I don&#39;t like. I generally don&#39;t delete comments, but if you want to exercise your freedom of speech, publish it yourself.</p>
</li>
<li><p>I do not warrant the availability of the comments either. I will try my best but one day all comments may just disappear, just like this website itself.  Archive what you deem important.</p>
</li>
<li><p>These rules are subject to change according to my personal liking without notice.</p>
</li>
</ol>
<p>Replies will only be rendered on the website and feed after I see them, so please expect a delay of at least 24 hours.  If you are eager to reply to each other, subscribe to the <a href="https://loa.loang.net/cnx.site">site&#39;s mailing list</a> instead.</p>
<table class="fndef" id="fndef:image">
    <tr>
        <td class="fndef-backref">[1]</td>
        <td class="fndef-content">TBF there are image preview scripts in Newsboat&#39;s <a href="https://drewdevault.com/2020/06/06/Add-a-contrib-directory.html">contrib</a>.</td>
    </tr>
</table><table class="fndef" id="fndef:nsfw">
    <tr>
        <td class="fndef-backref">[2]</td>
        <td class="fndef-content">Content warning: occasionally NSFW</td>
    </tr>
</table><table class="fndef" id="fndef:silo">
    <tr>
        <td class="fndef-backref">[3]</td>
        <td class="fndef-content">Federation is getting there for social media; not so much for fora.</td>
    </tr>
</table><table class="fndef" id="fndef:mime">
    <tr>
        <td class="fndef-backref">[4]</td>
        <td class="fndef-content">But don&#39;t use <a href="https://blog.brixit.nl/markdown-email">text/markdown</a> for your emails.</td>
    </tr>
</table><table class="fndef" id="fndef:wfw">
    <tr>
        <td class="fndef-backref">[5]</td>
        <td class="fndef-content">Unfortunately there&#39;s no equivalence for Atom.</td>
    </tr>
</table><table class="fndef" id="fndef:speed">
    <tr>
        <td class="fndef-backref">[6]</td>
        <td class="fndef-content">Over 30 seconds to generate a few hundred kB of web pages.</td>
    </tr>
</table><table class="fndef" id="fndef:guid">
    <tr>
        <td class="fndef-backref">[7]</td>
        <td class="fndef-content">Not to be confused with the micro soft hijacked term for <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a>.</td>
    </tr>
</table>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/reply@cnx%3E&Subject=Re: Comments for Static Sites without JavaScripts">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/reply@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/reply/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Teredo Tunnel Simulation</title>
  <link>https://lumvok.store/blog/teredo/index.html</link>
  <guid>https://lumvok.store/blog/teredo/index.html</guid>
  <description>Teredo tunnel simulation in virtual machines</description>
  <category>fun</category><category>recipe</category><category>net</category>
  <pubDate>Fri, 03 Jul 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="teredo_tunnel_simulation">Teredo Tunnel Simulation</h1>
<p>Internet Protocol version 6 &#40;IPv6&#41;, the most recent version of the Internet Protocol, was developed by the IETF to deal with the long-anticipated problem of IPv4 address exhaustion.  Despite being superior to IPv4 in multiple aspect &#40;e.g. larger address space, extension headers&#41;, IPv6 has not been widely adopted, although it has been semi-standardized in 1998 and fully-standardized in 2017.<sup id="fnref:rfc8200">[1]</sup></p>
<p>During the transition period, teredo tunneling has been used to give IPv6 connectivity for IPv6-capable hosts that are on the IPv4 Internet but have no native connection to an IPv6 network.<sup id="fnref:rfc4380">[2]</sup> In this article, I will demontrate a way to set up such tunnel up on virtual machines, then examine the packets being sent by IPv6 nodes connected by the tunnel.</p>
<div class="franklin-toc"><ol><li>Configuration<ol><li>Virtual Machines</li><li>Teredo Tunnel Setup</li><li>Teredo Tunnel Usage</li></ol></li><li>Analysis<ol><li>Packets Capturing</li><li>Packet Contents<ol><li>Ethernet Header</li><li>IPv4 Header</li><li>IPv6 Header</li><li>ICMPv6</li></ol></li></ol></li><li>Conclusion</li></ol></div>
<h2 id="configuration">Configuration</h2><figure>
  <a href=https://video.hardlimit.com/w/d4dYuXm6p4g9sCWm156qkg>
    <img src=https://lumvok.store/assets/teredo.png alt=Screenshot>
  </a>
  <figcaption>Teredo Tunnel Simulation Speedrun</figcaption>
</figure><h3 id="virtual_machines">Virtual Machines</h3>
<p>In order to simulate Teredo tunneling, one needs two IPv6 nodes and two routers with both IPv4 and IPv6 access.  In total, there needs to be four virtual machines to be set up, thus I went for Void Linux, which is known for its low memory foot print thanks to using <code>runit</code> instead of <code>systemd</code>. To minimize resource usage and speed up the setup process, I chose the barebone live image which uses <code>musl</code> instead of <code>glibc</code>.  At boot, the image uses only 40 MB of memory.</p>
<p>For virtualization, I used <code>vert-manager</code>, simply because it is available in Debian&#39;s repository &#40;my host OS&#41;.  For some reason, on <code>amd64</code>, the kernel refuses to boot until I give it over 200 MB, but apparently that is still a really modest number.  Networking is provided to the guest OSes via NAT with default configurations.</p>
<p>It is worth mentioning that through <code>virtio</code>, one may use SSH to log into the guests systems from the host OS.  I find this especially convenient as it enables me to copy and paste not only commands but also IP addresses between host and guests as well as between guests.</p>
<p>For convenience, from now on, the outside nodes will be referred to as PC A and PC B, on the other hand the routers are named Router A and Router B.  Upon boot, they were given an Ethernet interface <code>eth0</code> with the following addresses.</p><table><tr><th align="left">Node</th><th align="left">MAC address</th><th align="left">IPv4 address</th></tr><tr><td align="left">Router A</td><td align="left"><code>52:54:00:f0:85:c7</code></td><td align="left"><code>192.168.122.127</code></td></tr><tr><td align="left">Router B</td><td align="left"><code>52:54:00:2b:01:cc</code></td><td align="left"><code>192.168.122.134</code></td></tr><tr><td align="left">PC A</td><td align="left"><code>52:54:00:3b:82:36</code></td><td align="left"><code>192.168.122.86</code></td></tr><tr><td align="left">PC B</td><td align="left"><code>52:54:00:7b:ed:c0</code></td><td align="left"><code>192.168.122.255</code></td></tr></table><p>Local IPv6 addresses were also given but we are not going to need them.</p>
<h3 id="teredo_tunnel_setup">Teredo Tunnel Setup</h3>
<p>First, I set up a IPv4 tunnel between the two routers:</p>
<pre><code class="language-sh"># On Router A
ip tunnel add tunn mode sit remote 192.168.122.134 ttl 255
ip link set tunn up
# On Router B
ip tunnel add tunn mode sit remote 192.168.122.127 ttl 255
ip link set tunn up</code></pre>
<p>For this tunnel to be able to act as a Teredo one, the two routers needs to have IPv6 addresses prefixed by <code>2001::/32</code>.<sup id="fnref:rfc4380">[2]</sup></p>
<pre><code class="language-sh"># On Router A
ip -6 addr add 2001:2::1/64 dev eth0
# On Router B
ip -6 addr add 2001:3::1/64 dev eth0</code></pre>
<p>Finally, I fellback all IPv6 lookups to the tunnel and enabled IPv6 forwarding:</p>
<pre><code class="language-sh">ip -6 route add default dev tunn
sysctl -w net.ipv6.conf.all.forwarding&#61;1</code></pre>
<h3 id="teredo_tunnel_usage">Teredo Tunnel Usage</h3>
<p>The IPv6 addresses of the PCs were set up as follows &#40;<code>0x8067</code> is <code>PC</code> in ASCII&#41;.</p>
<pre><code class="language-sh"># On PC A
ip -6 address add 2001:2::8067/64 dev eth0
# On PC B
ip -6 address add 2001:3::8067/64 dev eth0</code></pre>
<p>By giving both Router A and PC A addresses prefixed by <code>2001:2::/64</code> &#40;similarly for Router B and PC B&#41;, I implied that they can find each other through the local IPv6 network, for example on PC B:</p>
<pre><code class="language-console">&#36; ip -6 route | head -n1
2001:3::/64 dev eth0 proto kernel metric 256 pref medium</code></pre>
<p>To use the newly created tunnel, the PCs simple had to be routed directly to the routers:</p>
<pre><code class="language-sh"># On PC A
ip -6 route add default via 2001:2::1
# On PC B
ip -6 route add default via 2001:3::1</code></pre>
<p>The connection could then be verified by running on PC A:</p>
<pre><code class="language-console">&#36; traceroute 2001:3::8067
traceroute to 2001:3::8067 &#40;2001:3::8067&#41;, 30 hops max, 80 byte packets
 1  2001:2::1 &#40;2001:2::1&#41;  0.572 ms  0.441 ms  0.328 ms
 2  2001:3::1 &#40;2001:3::1&#41;  0.906 ms  0.888 ms  1.049 ms
 3  2001:3::8067 &#40;2001:3::8067&#41;  1.325 ms  1.174 ms  1.091 ms</code></pre>
<h2 id="analysis">Analysis</h2>
<p>To gain further understanding on how packets are transferred over the Teredo tunnel, I captured and took a closer look at some of them.</p>
<h3 id="packets_capturing">Packets Capturing</h3>
<p>Fortunately for me<sup id="fnref:ipfs">[3]</sup>, all traffic of guests OSes were wired to an separate interface named <code>virbr0</code>.  To capture going through the tunnel, I simply had to tell Wireshark to listen to the interface, while letting PC A ping PC B though IPv6: <code>ping -c1 2001:3::8067</code>. I then skimmed through the packets sent between the two nodes and looked for the IPv6-in-IPv4 ones.</p>
<h3 id="packet_contents">Packet Contents</h3>
<p>Catured IPv6-in-IPv4 looks exactly like how I would imagined it to be. The content of the ping request can be partially decoded as follows.</p>
<h4 id="ethernet_header">Ethernet Header</h4>
<ul>
<li><p><code>52 54 00 2b 01 cc</code>: MAC address of Router B &#40;destination&#41;</p>
</li>
<li><p><code>52 54 00 f0 85 c7</code>: MAC address of Router A &#40;source&#41;</p>
</li>
<li><p><code>08 00</code>: EtherType of IPv4</p>
</li>
</ul>
<h4 id="ipv4_header">IPv4 Header</h4>
<ul>
<li><p><code>45 00 00 7c 9b 43 40 00 ff</code>: Some flags</p>
</li>
<li><p><code>29</code>: Protocol of <em>IPv6</em></p>
</li>
<li><p><code>69 be</code>: Checksum</p>
</li>
<li><p><code>c0 a8 7a 86</code>: IPv4 address of Router B &#40;destination&#41;</p>
</li>
<li><p><code>c0 a8 7a 7f</code>: IPv4 address of Router A &#40;source&#41;</p>
</li>
</ul>
<h4 id="ipv6_header">IPv6 Header</h4>
<ul>
<li><p><code>60 00 07 e7 00 40</code>: Some flags</p>
</li>
<li><p><code>3a</code>: Next header &#40;ICMPv6&#41;</p>
</li>
<li><p><code>3f</code>: Hop limit of 63</p>
</li>
<li><p><code>20 01 00 02 00 00 00 00 00 00 00 00 00 00 80 67</code>: PC A&#39;s IPv6 address</p>
</li>
<li><p><code>20 01 00 03 00 00 00 00 00 00 00 00 00 00 80 67</code>: PC B&#39;s IPv6 address</p>
</li>
</ul>
<h4 id="icmpv6">ICMPv6</h4>
<ul>
<li><p><code>80</code>: Type of ping request</p>
</li>
<li><p><code>00 cf be 03 d9 00 01</code>: Some flags</p>
</li>
<li><p><code>e3 0d fe 5e 00 00 00 00 bc d6 0e 00 00 00
  00 00 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d
  1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d
  2e 2f 30 31 32 33 34 35 36 37</code>: Binary data to be echoed</p>
</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Via the activities elaborated above, the procedure to set up a Teredo tunnel and the content of the packets travelling through it could be well understood. These understanding may help facilite the adoption of IPv6, even for IPv6 nodes having no native connection to an IPv6 network.  I hope that the IPv6 will grow fast enough that I can see the day measures like this tunnel can soon be deprecated.</p>
<table class="fndef" id="fndef:rfc8200">
    <tr>
        <td class="fndef-backref">[1]</td>
        <td class="fndef-content"><a href="https://tools.ietf.org/html/rfc8200">RFC 8200</a></td>
    </tr>
</table><table class="fndef" id="fndef:rfc4380">
    <tr>
        <td class="fndef-backref">[2]</td>
        <td class="fndef-content"><a href="https://tools.ietf.org/html/rfc4380">RFC 4380</a></td>
    </tr>
</table><table class="fndef" id="fndef:ipfs">
    <tr>
        <td class="fndef-backref">[3]</td>
        <td class="fndef-content">Aside from web browsing, I also run an IPFS node and a bunch of local servers.  I probably need to retire some of them soon since they really clutter the traffic.</td>
    </tr>
</table>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/teredo@cnx%3E&Subject=Re: Teredo Tunnel Simulation">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/teredo@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/teredo/comments.xml</wfw:commentRss>
</item>
<item>
  <title>The Wonderful Wizard of O&#39;zip</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/2/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/2/index.html</guid>
  <description>GSoC 2020: The Wonderful Wizard of O&#39;zip</description>
  <category>gsoc</category><category>pip</category><category>python</category><category>net</category>
  <pubDate>Mon, 22 Jun 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="the_wonderful_wizard_of_ozip">The Wonderful Wizard of O&#39;zip</h1>
<blockquote>
<p>Never give up... No one knows what&#39;s going to happen next.</p>
</blockquote>
<div class="franklin-toc"><ol><li>Preface</li><li>The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</li><li>The file-like object mapping ZIP over HTTP</li><li>What&#39;s next?</li></ol></div>
<h2 id="preface">Preface</h2>
<p>Greetings and best wishes&#33;  I had a lot of fun during the last week, although admittedly nothing was really finished.  In summary, these are the works I carried out in the last seven days:</p>
<ul>
<li><p>Finilizing <a href=https://github.com/pypa/pip/pull/8320>utilities for parallelization</a></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8467>Continuing experimenting</a> on <a href=https://github.com/pypa/pip/pull/8442>using lazy wheels or dependency resolution</a></p>
</li>
<li><p>Polishing up <a href=https://github.com/pypa/pip/pull/8411>the patch</a> refactoring <code>operations.prepare.prepare_linked_requirement</code></p>
</li>
<li><p>Adding <code>flake8-logging-format</code> <a href=https://github.com/pypa/pip/pull/8423#issuecomment-645418725>to the linter</a></p>
</li>
<li><p>Splitting <a href=https://github.com/pypa/pip/pull/8456>the linting patch</a> from <a href=https://github.com/pypa/pip/pull/8332>the PR adding
  the license requirement to vendor README</a></p>
</li>
</ul>
<h2 id="the_multiprocessingdummy_wrapper">The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</h2>
<p>Yes, you read it right, this is the same section as last fortnight&#39;s blog. My mentor Pradyun Gedam gave me a green light to have <a href=https://github.com/pypa/pip/pull/8411>GH-8411</a> merged without support for Python 2 and the non-lazy map variant, which turns out to be troublesome for multithreading.</p>
<p>The tests still needs to pass of course and the flaky tests &#40;see failing tests over Azure Pipeline in the past&#41; really gave me a panic attack earlier today. We probably need to mark them as xfail or investigate why they are undeterministic specifically on Azure, but the real reason I was <em>all caught up and confused</em> was that the unit tests I added mess with the cached imports and as <code>pip</code>&#39;s tests are run in parallel, who knows what it might affect. I was so relieved to not discover any new set of tests made flaky by ones I&#39;m trying to add&#33;</p>
<h2 id="the_file-like_object_mapping_zip_over_http">The file-like object mapping ZIP over HTTP</h2>
<p>This is where the fun starts.  Before we dive in, let&#39;s recall some background information on this.  As discovered by Danny McClanahan in <a href=https://github.com/pypa/pip/pull/7819>GH-7819</a>, it is possible to only download a potion of a wheel and it&#39;s still valid for <code>pip</code> to get the distribution&#39;s metadata. In the same thread, Daniel Holth suggested that one may use HTTP range requests to specifically ask for the tail of the wheel, where the ZIP&#39;s central directory record as well as where usually <code>dist-info</code> &#40;the directory containing <code>METADATA</code>&#41; can be found.</p>
<p>Well, <em>usually</em>.  While <a href=https://www.python.org/dev/peps/pep-0427>PEP 427</a> does indeed recommend</p>
<blockquote>
<p>Archivers are encouraged to place the <code>.dist-info</code> files physically at the end of the archive.  This enables some potentially interesting ZIP tricks including the ability to amend the metadata without rewriting the entire archive.</p>
</blockquote>
<p>one of the mentioned <em>tricks</em> is adding shared libraries to wheels of extension modules &#40;using e.g. <code>auditwheel</code> or <code>delocate</code>&#41;. Thus for non-pure Python wheels, it is unlikely that the metadata lie in the last few megabytes.  Ignoring source distributions is bad enough, we can&#39;t afford making an optimization that doesn&#39;t work for extension modules, which are still an integral part of the Python ecosystem &#41;-:</p>
<p>But hey, the ZIP&#39;s directory record is warrantied to be at the end of the file&#33; Couldn&#39;t we do something about that?  The short answer is yes.  The long answer is, well, yessssssss&#33; That, plus magic provided by most operating systems, this is what we figured out:</p>
<ol>
<li><p>We can download a realatively small chunk at the end of the wheel until it is recognizable as a valid ZIP file.</p>
</li>
<li><p>In order for the end of the archive to actually appear as the end to <code>zipfile</code>, we feed to it an object with <code>seek</code> and <code>read</code> defined. As navigating to the rear of the file is performed by calling <code>seek</code> with relative offset and <code>whence&#61;SEEK_END</code> &#40;see <code>man 3 fseek</code> for more details&#41;, we are completely able to make the wheels in the cloud to behave as if it were available locally.</p>
<p><img src="https://lumvok.store/assets/cloud.gif" alt="Wheel in the cloud" /></p>
</li>
<li><p>For large wheels, it is better to store them in hard disks instead of memory. For smaller ones, it is also preferable to store it as a file to avoid &#40;error-prony and often not really efficient&#41; manual tracking and joining of downloaded segments.  We only use a small potion of the wheel, however just in case one is wonderring, we have very little control over when <code>tempfile.SpooledTemporaryFile</code> rolls over, so the memory-disk hybrid is not exactly working as expected.</p>
</li>
<li><p>With all these in mind, all we have to do is to define an intermediate object check for local availability and download if needed on calls to <code>read</code>, to lazily provide the data over HTTP and reduce execution time.</p>
</li>
</ol>
<p>The only theoretical challenge left is to keep track of downloaded intervals, which I finally figured out after a few trials and errors.  The code was submitted as a pull request to <code>pip</code> at <a href=https://github.com/pypa/pip/pull/8467>GH-8467</a>.  A more modern &#40;read: Python 3-only&#41; variant was packaged and uploaded to PyPI under the name of lazip_.  I am unaware of any use case for it outside of <code>pip</code>, but it&#39;s certainly fun to play with d-:</p>
<h2 id="whats_next">What&#39;s next?</h2>
<p>I have been falling short of getting the PRs mention above merged for quite a while.  With <code>pip</code>&#39;s next beta coming really soon, I have to somehow make the patches reach a certain standard and enough attention to be part of the pre-release—beta-testing would greatly help the success of the GSoC project. To other GSoC students and mentors reading this, I also hope your projects to turn out successful&#33;</p>
    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/2@cnx%3E&Subject=Re: The Wonderful Wizard of O&#39;zip">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/2@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/2/comments.xml</wfw:commentRss>
</item>
</channel></rss>