<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/assets/feed.xslt"?>
<rss version="2.0"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
<title>Web logs of McSinyx</title>
<link>https://lumvok.store</link>
<atom:link href="https://lumvok.store/feed.xml" rel="self" type="application/rss+xml"/>
<description>Random write-ups packed with pop culture references</description>
<copyright><![CDATA[🄯 2019–2024 Nguyễn Gia Phong under CC BY-SA 4.0]]></copyright>
<language>en</language>
<generator>Franklin</generator>
<item>
  <title>Outro</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/7/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/7/index.html</guid>
  <description>GSoC 2020: Outro</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Mon, 31 Aug 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="outro">Outro</h1>
<blockquote>
<p>Steamed fish was amazing, matter of fact<br />Let me get some jerk chicken to go<br />Grabbed me one of them lemon pie theories<br />And let me get some of them benchmarks you theories too</p>
</blockquote>
<div class="franklin-toc"><ol><li>The Look</li><li>The Benchmark<ol><li>Average Distribution</li><li>Large Distribution</li><li>Distribution with Conflicting Dependencies</li></ol></li><li>What Now?</li></ol></div>
<h2 id="the_look">The Look</h2>
<p>At the time of writing, <a href=https://github.com/pypa/pip/pull/8771>implementation-wise parallel download is ready</a>:</p>
<p><a href="https://asciinema.org/a/356704"><img src="https://lumvok.store/assets/pip-8771.svg" alt="asciicast" /></a></p>
<p>Does this mean I&#39;ve finished everything just-in-time?  This sounds to good to be true&#33;  And how does it perform?  Welp...</p>
<h2 id="the_benchmark">The Benchmark</h2>
<p>Here comes the bad news: under a decent connection to the package index, using <code>fast-deps</code> does not make <code>pip</code> faster.  For best comparison, I will time <code>pip download</code> on the following cases:</p>
<h3 id="average_distribution">Average Distribution</h3>
<p>For convenience purposes, let&#39;s refer to the commands to be used as follows</p>
<pre><code class="language-console">&#36; pip --no-cache-dir download &#123;requirement&#125;  # legacy-resolver
&#36; pip --use-feature&#61;2020-resolver \
   --no-cache-dir download &#123;requirement&#125;  # 2020-resolver
&#36; pip --use-feature&#61;2020-resolver --use-feature&#61;fast-deps \
   --no-cache-dir download &#123;requirement&#125;  # fast-deps</code></pre>
<p>In the first test, I used <a href="https://sr.ht/~cnx/axuy">axuy</a> and obtained the following results</p><table><tr><th align="right">legacy-resolver</th><th align="right">2020-resolver</th><th align="right">fast-deps</th></tr><tr><td align="right">7.709s</td><td align="right">7.888s</td><td align="right">10.993s</td></tr><tr><td align="right">7.068s</td><td align="right">7.127s</td><td align="right">11.103s</td></tr><tr><td align="right">8.556s</td><td align="right">6.972s</td><td align="right">10.496s</td></tr></table><p>Funny enough, running <code>pip download</code> with <code>fast-deps</code> in a directory with downloaded files already took around 7-8 seconds.  This is because to lazily download a wheel, <code>pip</code> has to <a href=https://github.com/pypa/pip/pull/8670>make many requests</a> which are apparently more expensive than actual data transmission on my network.</p>
<div class="admonition note"><p class="admonition-title">When is it useful then?</p><p>With unstable connection to PyPI &#40;for some reason I am not confident enough to state&#41;, this is what I got</p><table><tr><th align="right">2020-resolver</th><th align="right">fast-deps</th></tr><tr><td align="right">1m16.134s</td><td align="right">0m54.894s</td></tr><tr><td align="right">1m0.384s</td><td align="right">0m40.753s</td></tr><tr><td align="right">0m50.102s</td><td align="right">0m41.988s</td></tr></table><p>As the connection was <em>unstable</em> and that the majority of <code>pip</code> networking is performed as CI/CD with large and stable bandwidth, I am unsure what this result is supposed to tell &#40;-;</p>
</div>
<h3 id="large_distribution">Large Distribution</h3>
<p>In this test, I used <a href="https://www.tensorflow.org">TensorFlow</a> as the requirement and obtained the following figures:</p><table><tr><th align="right">legacy-resolver</th><th align="right">2020-resolver</th><th align="right">fast-deps</th></tr><tr><td align="right">0m52.135s</td><td align="right">0m58.809s</td><td align="right">1m5.649s</td></tr><tr><td align="right">0m50.641s</td><td align="right">1m14.896s</td><td align="right">1m28.168s</td></tr><tr><td align="right">0m49.691s</td><td align="right">1m5.633s</td><td align="right">1m22.131s</td></tr></table><h3 id="distribution_with_conflicting_dependencies">Distribution with Conflicting Dependencies</h3>
<p>Some requirement that will trigger a decent amount of backtracking by the current implementation of the new resolver <code>oslo-utils&#61;&#61;1.4.0</code>:</p><table><tr><th align="right">2020-resolver</th><th align="right">fast-deps</th></tr><tr><td align="right">14.497s</td><td align="right">24.010s</td></tr><tr><td align="right">17.680s</td><td align="right">28.884s</td></tr><tr><td align="right">16.541s</td><td align="right">26.333s</td></tr></table><h2 id="what_now">What Now?</h2>
<p>I don&#39;t know, to be honest.  At this point I&#39;m feeling I&#39;ve failed my own &#40;and that of other stakeholders of <code>pip</code>&#41; expectation and wasted the time and effort of <code>pip</code>&#39;s maintainers reviewing dozens of PRs I&#39;ve made in the last three months.</p>
<p>On the bright side, this has been an opportunity for me to explore the codebase of package manager and discovered various edge cases where the new resolver has yet to cover &#40;e.g. I&#39;ve just noticed that <code>pip download</code> would save to-be-discarded distributions, I&#39;ll file an issue on that soon&#41;.  Plus I got to know many new and cool people and idea, which make me a more helpful individual to work on Python packaging in the future, I hope.</p>
<p></p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/7@cnx%3E&Subject=Re: Outro">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/7@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/7/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Parallelizing Wheel Downloads</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/6/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/6/index.html</guid>
  <description>GSoC 2020: Parallelizing Wheel Downloads</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="parallelizing_wheel_downloads">Parallelizing Wheel Downloads</h1>
<blockquote>
<p>And now it&#39;s clear as this promise<br />That we&#39;re making<br />Two progress bars into one</p>
</blockquote><p>Hello there&#33; It has been raining a lot lately and some mosquito has given me the Dengue fever today.  To whoever reading this, I hope it would never happen to you.</p>
<h2>Download Parallelization</h2>
<p>I&#39;ve been working on <code>pip</code>&#39;s download parallelization for quite a while now. As distribution download in <code>pip</code> was modeled as a lazily evaluated iterable of chunks, parallelizing such procedure is as simple as submitting routines that write files to disk to a worker pool.</p>
<p>Or at least that is what I thought.</p>
<h2>Progress Reporting UI</h2>
<p><code>pip</code> is currently using customly defined progress reporting classes, which was not designed to working with multithreading code.  Firstly, I want to try using these instead of defining separate UI for multithreaded progresses. As they use system signals for termination, one must the progress bars has to be running the main thread.  Or sort of.</p>
<p>Since the progress bars are designed as iterators, I realized that we can call <code>next</code> on them.  So quickly, I throw in some queues and locks, and prototyped the first <em>working</em> <a href=https://github.com/pypa/pip/pull/8771>implementation of
progress synchronization</a>.</p>
<h2>Performance Issues</h2>
<p>Welp, I only said that it works, but I didn&#39;t mention the performance, which is terrible.  I am pretty sure that the slow down is with the synchronization, since the <code>map_multithread</code> call doesn&#39;t seem to trigger anything that may introduce any sort of blocking.</p>
<p>This seems like a lot of fun, and I hope I&#39;ll get better tomorrow to continue playing with it&#33;</p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/6@cnx%3E&Subject=Re: Parallelizing Wheel Downloads">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/6@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/6/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Sorting Things Out</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/5/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/5/index.html</guid>
  <description>GSoC 2020: Sorting Things Out</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Mon, 03 Aug 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="sorting_things_out">Sorting Things Out</h1>
<p>Hi&#33;  I really hope that everyone reading this is still doing okay, and if that isn&#39;t the case, I wish you a good day&#33;</p>
<h2 id="pip_202_released"><code>pip</code> 20.2 Released&#33;</h2>
<p>Last Wednesday, <code>pip</code> 20.2 was released, delivering the <code>2020-resolver</code> as well as many other improvements&#33;  I was lucky to be able to get the <code>fast-deps</code> feature to be included as part of the release. A brief description of this <em>experimental</em> feature as well as testing instruction can be found on <a href="https://discuss.python.org/t/announcement-pip-20-2-release/4863/2">Python Discuss</a>.</p>
<p>The public exposure of the feature also remind me of some further <a href=https://github.com/pypa/pip/pull/8681>optimization</a> to make on <a href=https://github.com/pypa/pip/pull/8670>the lazy wheel</a>. Hopefully without download parallelization it would not be too slow to put off testing by concerned users of <code>pip</code>.</p>
<h2 id="preparation_for_download_parallelization">Preparation for Download Parallelization</h2>
<p>As of this moment, we already have:</p>
<ul>
<li><p><a href=https://github.com/pypa/pip/pull/8162#issuecomment-667504162>Multithreading pool fallback working</a></p>
</li>
<li><p>An opt-in to use lazy wheel to optain dependency information, and thus getting a list of wheels at the end of resolution ready to be downloaded together</p>
</li>
</ul>
<p>What&#39;s left is <em>only</em> to interject a parallel download somewhere after the dependency resolution step.  Still, this struggles me way more than I&#39;ve ever imagined.  I got so stuck that I had to give myself a day off in the middle of the week &#40;and study some Rust&#41;, then I came up with <a href=https://github.com/pypa/pip/pull/8638>something what was agreed upon as difficult to maintain</a>.</p>
<p>Indeed, a large part of this is my fault, for not communicating the design thoroughly with <code>pip</code>&#39;s maintainers and not carefully noting stuff down during &#40;verbal&#41; discussions with my mentor.  Thankfully <a href=https://github.com/pypa/pip/pull/8685>Chris Hunt came to the rescue</a> and did a refactoring that will make my future work much easier and cleaner.</p>
    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/5@cnx%3E&Subject=Re: Sorting Things Out">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/5@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/5/comments.xml</wfw:commentRss>
</item>
<item>
  <title>I&#39;ve Walked 500 Miles…</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/4/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/4/index.html</guid>
  <description>GSoC 2020: I&#39;ve Walked 500 Miles…</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Mon, 20 Jul 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="ive_walked_500_miles">I&#39;ve Walked 500 Miles…</h1>
<blockquote>
<p>... and I would walk 500 more<br />Just to be the man who walks a thousand miles<br />To fall down at your door</p>
<p><img src="https://lumvok.store/assets/500-miles.gif" alt="500 miles" /></p>
</blockquote>
<div class="franklin-toc"><ol><li>The Main Road</li><li>The Side Quests</li><li>Snap Back to Reality</li></ol></div>
<h2 id="the_main_road">The Main Road</h2>
<p>Hi, have you met <code>fast-deps</code>?  It&#39;s &#40;going to be&#41; the name of <code>pip</code>&#39;s experimental feature that may improve the speed of dependency resolution of the new resolver.  By avoid downloading whole wheels to just obtain metadata, it is especially helpful when <code>pip</code> has to do heavy backtracking to resolve conflicts.</p>
<p>Thanks to <a href=https://github.com/pypa/pip/pull/8532#discussion_r453990728>Chris Hunt's review on GH-8537</a>, my mentor Pradyun Gedam and I worked out a less hacky approach to inteject the call to lazy wheel during the resolution process.  A new PR <a href=https://github.com/pypa/pip/pull/8588>GH-8588</a> was filed to implement it—I could have <em>just</em> worked on top of the old PR and rebased, but my <code>git</code> skill is far from gud enuff to confidently do it.</p>
<p>Testing this one has been a lot of fun though.  At first, integration tests were added as a rerun of the tests for the new resolver, with an additional flag to use feature <code>fast-deps</code>.  It indeed made me feel guilty towards <a href="https://travis-ci.com">Travis</a>, who has to work around 30 minutes more every run. Per Chris Hunt&#39;s suggestion, in the new PR, I instead write a few functional tests for the area relating the most to the feature, namely <code>pip</code>&#39;s subcommands <code>wheel</code>, <code>download</code> and <code>install</code>.</p>
<p>It was also suggested that a mock server with HTTP range requests support might be better &#40;in term of performance and reliablilty&#41; than for testing. However, <a href=https://github.com/pypa/pip/pull/8584#issuecomment-659227702>I have yet to be able to make
Werkzeug do it</a>.</p>
<p>Why did I say I&#39;m half way there?  With the parallel utilities merged and a way to quickly get the list of distribution to be downloaded being really close, what left is <em>only</em> to figure out a way to properly download them in parallel. With no distribution to be added during the download progress, the model of this will fit very well with the architecture in <a href="https://lumvok.store/assets/pip-parallel-dl.pdf">my original proposal</a>. A batch downloader can be implemented to track the progress of each download and thus report them cleanly as e.g. progress bar or percentage. This is the part I am second-most excited about of my GSoC project this summer &#40;after the synchronization of downloads written in my proposal, which was then superseded by <code>fast-deps</code>&#41; and I can&#39;t wait to do it&#33;</p>
<h2 id="the_side_quests">The Side Quests</h2>
<p>As usual, I make sure that I complete every side quest I see during the journey:</p>
<ul>
<li><p><a href=https://github.com/pypa/pip/pull/8568>GH-8568</a>: Declare constants in <code>configuration.py</code> as such</p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8571>GH-8571</a>: Clean up <code>Configuration.unset_value</code> and nit the class&#39; <code>__init__</code></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8578>GH-8578</a>: Allow verbose/quite level to be specified via config file and env var</p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8599>GH-8599</a>: Replace tabs by spaces for consistency</p>
</li>
</ul>
<h2 id="snap_back_to_reality">Snap Back to Reality</h2>
<p>A bit about me, I actually walked 500 meters earlier today to a bank and walked 500 more to another to prepare my Visa card for purchasing the upcoming <a href="https://wiki.pine64.org/index.php/PinePhone">PinePhone</a> prototype.  It&#39;s one of the first smartphones to fully support a GNU/Linux distribution, where one can run desktop apps &#40;including proper terminals&#41; as well as traditional services like SSH, HTTP server and IPFS node because why not?  Just a few hours ago, I pre-ordered the <a href="https://postmarketos.org/blog/2020/07/15/pinephone-ce-preorder/">postmarketOS community edition</a> with additional hardware for convergence.</p>
<p>If you did not come here for a PinePhone ad, please take my apologies though d-; and to ones reading this, I hope you all can become the person who walks a thousand miles to fall down at the door opening to all what you ever wished for&#33;</p>
<p></p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/4@cnx%3E&Subject=Re: I&#39;ve Walked 500 Miles…">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/4@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/4/comments.xml</wfw:commentRss>
</item>
<item>
  <title>I&#39;m Not Drowning On My Own</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/3/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/3/index.html</guid>
  <description>GSoC 2020: I&#39;m Not Drowning On My Own</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Mon, 06 Jul 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="im_not_drowning_on_my_own">I&#39;m Not Drowning On My Own</h1>
<div class="franklin-toc"><ol><li>Cold Water</li><li>Warm Water</li><li>Learning How To Swim</li><li>Diving Plan</li></ol></div>
<h2 id="cold_water">Cold Water</h2>
<p>Hello there&#33;  My schoolyear is coming to an end, with some final assignments and group projects left to be done.  I for sure underestimated the workload of these and in the last &#40;and probably next&#41; few days I&#39;m drowning in work trying to meet my deadlines.</p>
<p>One project that might be remotely relevant is <a href="https://github.com/McSinyx/cheese-shop">cheese-shop</a>, which tries to manage the metadata of packages from the real <a href="https://pypi.org">Cheese Shop</a>.  Other than that, schoolwork is draining a lot of my time and I can&#39;t remember the last time I came up with something new for my GSoC project &#41;-;</p>
<h2 id="warm_water">Warm Water</h2>
<p>On the bright side, I received a lot of help and encouragement from contributors and stakeholders of <code>pip</code>.  In the last week alone, I had five pull requests merged:</p>
<ul>
<li><p><a href=https://github.com/pypa/pip/pull/8332>GH-8332</a>: Add license requirement to <code>_vendor/README.rst</code></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8320>GH-8320</a>: Add utilities for parallelization</p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8504>GH-8504</a>: Parallelize <code>pip list --outdated</code> and <code>--uptodate</code></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8411>GH-8411</a>: Refactor <code>operations.prepare.prepare_linked_requirement</code></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8467>GH-8467</a>: Add utitlity to lazily acquire wheel metadata over HTTP</p>
</li>
</ul>
<p>In addition to helping me getting my PRs merged, my mentor Pradyun Gedam also gave me my first official feedback, including what I&#39;m doing right &#40;and wrong too&#33;&#41; and what I should keep doing to increase the chance of the project being successful.</p>
<p><a href=https://github.com/pypa/pip/pull/7819>GH-7819</a>&#39;s roadmap &#40;Danny McClanahan&#39;s discoveries and works on lazy wheels&#41; is being closely tracked by <code>hatch</code>&#39;s maintainter Ofek Lev, which really makes me proud and warms my heart, that what I&#39;m helping build is actually needed by the community&#33;</p>
<h2 id="learning_how_to_swim">Learning How To Swim</h2>
<p>With <a href=https://github.com/pypa/pip/pull/8467>GH-8467</a> and <a href=https://github.com/pypa/pip/pull/8530>GH-8530</a> merged, I&#39;m now working on <a href=https://github.com/pypa/pip/pull/8532>GH-8532</a> which aims to roll out the lazy wheel as the way to obtain dependency information via the CLI flag <code>--use-feature&#61;lazy-wheel</code>.</p>
<p><a href=https://github.com/pypa/pip/pull/8532>GH-8532</a> was failing initially, despite being relatively trivial and that the commit it used to base on was passing.  Surprisingly, after rebasing it on top of <a href=https://github.com/pypa/pip/pull/8530>GH-8530</a>, it suddenly became green mysteriously.  After the first &#40;early&#41; review, I was able to iterate on my earlier code, which used the ambiguous exception <code>RuntimeError</code>.</p>
<p>The rest to be done is <em>just</em> adding some functional tests &#40;I&#39;m pretty sure this will be either overwhelming or underwhelming&#41; to make sure that the command-line flag is working correctly.  Hopefully this can make it into the beta of the upcoming release <a href=https://github.com/pypa/pip/pull/8511>this month</a>.</p>
<p><img src="https://lumvok.store/assets/lazy-wheel.jpg" alt="Lazy wheel" /></p>
<p>In other news, I&#39;ve also submitted <a href=https://github.com/pypa/pip/pull/8538>a patch improving the tests
for the parallelization utilities</a>, which was really messy as I wrote them. Better late than never&#33;</p>
<p>Metaphors aside, I actually can&#39;t swim d-:</p>
<h2 id="diving_plan">Diving Plan</h2>
<p>After <a href=https://github.com/pypa/pip/pull/8532>GH-8532</a>, I think I&#39;ll try to parallelize downloads of wheels that are lazily fetched only for metadata.  By the current implementation of the new resolver, for <code>pip install</code>, this can be injected directly between the resolution and build/installation process.</p>
<p></p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/3@cnx%3E&Subject=Re: I&#39;m Not Drowning On My Own">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/3@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/3/comments.xml</wfw:commentRss>
</item>
<item>
  <title>The Wonderful Wizard of O&#39;zip</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/2/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/2/index.html</guid>
  <description>GSoC 2020: The Wonderful Wizard of O&#39;zip</description>
  <category>gsoc</category><category>pip</category><category>python</category><category>net</category>
  <pubDate>Mon, 22 Jun 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="the_wonderful_wizard_of_ozip">The Wonderful Wizard of O&#39;zip</h1>
<blockquote>
<p>Never give up... No one knows what&#39;s going to happen next.</p>
</blockquote>
<div class="franklin-toc"><ol><li>Preface</li><li>The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</li><li>The file-like object mapping ZIP over HTTP</li><li>What&#39;s next?</li></ol></div>
<h2 id="preface">Preface</h2>
<p>Greetings and best wishes&#33;  I had a lot of fun during the last week, although admittedly nothing was really finished.  In summary, these are the works I carried out in the last seven days:</p>
<ul>
<li><p>Finilizing <a href=https://github.com/pypa/pip/pull/8320>utilities for parallelization</a></p>
</li>
<li><p><a href=https://github.com/pypa/pip/pull/8467>Continuing experimenting</a> on <a href=https://github.com/pypa/pip/pull/8442>using lazy wheels or dependency resolution</a></p>
</li>
<li><p>Polishing up <a href=https://github.com/pypa/pip/pull/8411>the patch</a> refactoring <code>operations.prepare.prepare_linked_requirement</code></p>
</li>
<li><p>Adding <code>flake8-logging-format</code> <a href=https://github.com/pypa/pip/pull/8423#issuecomment-645418725>to the linter</a></p>
</li>
<li><p>Splitting <a href=https://github.com/pypa/pip/pull/8456>the linting patch</a> from <a href=https://github.com/pypa/pip/pull/8332>the PR adding
  the license requirement to vendor README</a></p>
</li>
</ul>
<h2 id="the_multiprocessingdummy_wrapper">The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</h2>
<p>Yes, you read it right, this is the same section as last fortnight&#39;s blog. My mentor Pradyun Gedam gave me a green light to have <a href=https://github.com/pypa/pip/pull/8411>GH-8411</a> merged without support for Python 2 and the non-lazy map variant, which turns out to be troublesome for multithreading.</p>
<p>The tests still needs to pass of course and the flaky tests &#40;see failing tests over Azure Pipeline in the past&#41; really gave me a panic attack earlier today. We probably need to mark them as xfail or investigate why they are undeterministic specifically on Azure, but the real reason I was <em>all caught up and confused</em> was that the unit tests I added mess with the cached imports and as <code>pip</code>&#39;s tests are run in parallel, who knows what it might affect. I was so relieved to not discover any new set of tests made flaky by ones I&#39;m trying to add&#33;</p>
<h2 id="the_file-like_object_mapping_zip_over_http">The file-like object mapping ZIP over HTTP</h2>
<p>This is where the fun starts.  Before we dive in, let&#39;s recall some background information on this.  As discovered by Danny McClanahan in <a href=https://github.com/pypa/pip/pull/7819>GH-7819</a>, it is possible to only download a potion of a wheel and it&#39;s still valid for <code>pip</code> to get the distribution&#39;s metadata. In the same thread, Daniel Holth suggested that one may use HTTP range requests to specifically ask for the tail of the wheel, where the ZIP&#39;s central directory record as well as where usually <code>dist-info</code> &#40;the directory containing <code>METADATA</code>&#41; can be found.</p>
<p>Well, <em>usually</em>.  While <a href=https://www.python.org/dev/peps/pep-0427>PEP 427</a> does indeed recommend</p>
<blockquote>
<p>Archivers are encouraged to place the <code>.dist-info</code> files physically at the end of the archive.  This enables some potentially interesting ZIP tricks including the ability to amend the metadata without rewriting the entire archive.</p>
</blockquote>
<p>one of the mentioned <em>tricks</em> is adding shared libraries to wheels of extension modules &#40;using e.g. <code>auditwheel</code> or <code>delocate</code>&#41;. Thus for non-pure Python wheels, it is unlikely that the metadata lie in the last few megabytes.  Ignoring source distributions is bad enough, we can&#39;t afford making an optimization that doesn&#39;t work for extension modules, which are still an integral part of the Python ecosystem &#41;-:</p>
<p>But hey, the ZIP&#39;s directory record is warrantied to be at the end of the file&#33; Couldn&#39;t we do something about that?  The short answer is yes.  The long answer is, well, yessssssss&#33; That, plus magic provided by most operating systems, this is what we figured out:</p>
<ol>
<li><p>We can download a realatively small chunk at the end of the wheel until it is recognizable as a valid ZIP file.</p>
</li>
<li><p>In order for the end of the archive to actually appear as the end to <code>zipfile</code>, we feed to it an object with <code>seek</code> and <code>read</code> defined. As navigating to the rear of the file is performed by calling <code>seek</code> with relative offset and <code>whence&#61;SEEK_END</code> &#40;see <code>man 3 fseek</code> for more details&#41;, we are completely able to make the wheels in the cloud to behave as if it were available locally.</p>
<p><img src="https://lumvok.store/assets/cloud.gif" alt="Wheel in the cloud" /></p>
</li>
<li><p>For large wheels, it is better to store them in hard disks instead of memory. For smaller ones, it is also preferable to store it as a file to avoid &#40;error-prony and often not really efficient&#41; manual tracking and joining of downloaded segments.  We only use a small potion of the wheel, however just in case one is wonderring, we have very little control over when <code>tempfile.SpooledTemporaryFile</code> rolls over, so the memory-disk hybrid is not exactly working as expected.</p>
</li>
<li><p>With all these in mind, all we have to do is to define an intermediate object check for local availability and download if needed on calls to <code>read</code>, to lazily provide the data over HTTP and reduce execution time.</p>
</li>
</ol>
<p>The only theoretical challenge left is to keep track of downloaded intervals, which I finally figured out after a few trials and errors.  The code was submitted as a pull request to <code>pip</code> at <a href=https://github.com/pypa/pip/pull/8467>GH-8467</a>.  A more modern &#40;read: Python 3-only&#41; variant was packaged and uploaded to PyPI under the name of lazip_.  I am unaware of any use case for it outside of <code>pip</code>, but it&#39;s certainly fun to play with d-:</p>
<h2 id="whats_next">What&#39;s next?</h2>
<p>I have been falling short of getting the PRs mention above merged for quite a while.  With <code>pip</code>&#39;s next beta coming really soon, I have to somehow make the patches reach a certain standard and enough attention to be part of the pre-release—beta-testing would greatly help the success of the GSoC project. To other GSoC students and mentors reading this, I also hope your projects to turn out successful&#33;</p>
    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/2@cnx%3E&Subject=Re: The Wonderful Wizard of O&#39;zip">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/2@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/2/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Unexpected Things When You&#39;re Expecting</title>
  <link>https://lumvok.store/blog/2020/gsoc/article/1/index.html</link>
  <guid>https://lumvok.store/blog/2020/gsoc/article/1/index.html</guid>
  <description>GSoC 2020: Unexpected Things When You&#39;re Expecting</description>
  <category>gsoc</category><category>pip</category><category>python</category>
  <pubDate>Tue, 09 Jun 2020 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1 id="unexpected_things_when_youre_expecting">Unexpected Things When You&#39;re Expecting</h1>
<p>Hi everyone, I hope that you are all doing well and wishes you all good health&#33; The last week has not been really kind to me with a decent amount of academic pressure &#40;my school year is lasting until early Jully&#41;. It would be bold to say that I have spent 10 hours working on my GSoC project since the last check-in, let alone the 30 hours per week requirement. That being said, there were still some discoveries that I wish to share.</p>
<div class="franklin-toc"><ol><li>The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</li><li>The change in direction</li></ol></div>
<h2 id="the_multiprocessingdummy_wrapper">The <code>multiprocessing&#91;.dummy&#93;</code> wrapper</h2>
<p>Most of the time I spent was to finalize the multi&#123;processing,threading&#125; wrapper for <code>map</code> function that submit tasks to the worker pool. To my surprise, it is rather difficult to write something that is not only portable but also easy to read and test.</p>
<p>By <a href=https://github.com/pypa/pip/pull/8320>the latest commit</a>, I realized the following:</p>
<ol>
<li><p>The <code>multiprocessing</code> module was not designed for the implementation details to be abstracted away entirely.  For example, the lazy <code>map</code>&#39;s could be really slow without specifying suitable chunk size &#40;to cut the input iterable and distribute them to workers in the pool&#41;. By <em>suitable</em>, I mean only an order smaller than the input.  This defeats half of the purpose of making it lazy: allowing the input to be evaluated lazily.  Luckily, in the use case I&#39;m aiming for, the length of the iterable argument is small and the laziness is only needed for the output &#40;to pipeline download and installation&#41;.</p>
</li>
<li><p>Mocking <code>import</code> for testing purposes can never be pretty.  One reason is that we &#40;Python users&#41; have very little control over the calls of <code>import</code> statements and its lower-level implementation <code>__import__</code>. In order to properly patch this built-in function, unlike for others of the same group, we have to <code>monkeypatch</code> the name from <code>builtins</code> &#40;or <code>__builtins__</code> under Python 2&#41; instead of the module that import stuff. Furthermore, because of the special namespacing, to avoid infinite recursion we need to alias the function to a different name for fallback.</p>
</li>
<li><p>To add to the problem, <code>multiprocessing</code> lazily imports the fragile module during pools creation.  Since the failure is platform-specific &#40;the lack of <code>sem_open</code>&#41;, it was decided to check upon the import of the <code>pip</code>&#39;s module.  Although the behavior is easier to reason in human language, testing it requires invalidating cached import and re-import the wrapper module.</p>
</li>
<li><p>Last but not least, I now understand the pain of keeping Python 2 compatibility that many package maintainers still need to deal with everyday &#40;although Python 2 has reached its end-of-life, <code>pip</code>, for example, <a href=https://github.com/pypa/pip/pull/6148>will still support it for another year</a>&#41;.</p>
</li>
</ol>
<h2 id="the_change_in_direction">The change in direction</h2>
<p>Since last week, my mentor Pradyun Gedam and I set up weekly real-time meeting &#40;a fancy term for video/audio chat in the worldwide quarantine era&#41; for the entire GSoC period. During the last session, we decided to put parallelization of download during resolution on hold, in favor of a more beneficial goal: <a href=https://github.com/pypa/pip/pull/7819>partially download the wheels during
dependency resolution</a>.</p>
<p><img src="https://lumvok.store/assets/swirl.png" alt="" /></p>
<p>As discussed by Danny McClanahan and the maintainers of <code>pip</code>, it is feasible to only download a few kB of a wheel to obtain enough metadata for the resolution of dependency.  While this is only applicable to wheels &#40;i.e. prebuilt packages&#41;, other packaging format only make up less than 20&#37; of the downloads &#40;at least on PyPI&#41;, and the figure is much less for the most popular packages.  Therefore, this optimization alone could make <a href="https://www.ei8fdb.org/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts">the upcoming backtracking resolver</a>&#39;s performance par with the legacy one.</p>
<p>During the last few years, there has been a lot of effort being poured into replacing <code>pip</code>&#39;s current resolver that is unable to resolve conflicts. While its correctness will be ensured by some of the most talented and hard-working developers in the Python packaging community, from the users&#39; point of view, it would be better to have its performance not lagging behind the old one.  Aside from the increase in CPU cycles for more rigorous resolution, more I/O, especially networking operations is expected to be performed.  This is due to <a href=https://github.com/pypa/pip/pull/7406#issuecomment-583891169>the lack
of a standard and efficient way to acquire the metadata</a>.  Therefore, unlike most package managers we are familiar with, <code>pip</code> has to fetch &#40;and possibly build&#41; the packages solely for dependency informations.</p>
<p>Fortunately, <a href=https://www.python.org/dev/peps/pep-0427#recommended-archiver-features>PEP 427#recommended-archiver-features</a> recommends package builders to place the metadata at the end of the archive. This allows the resolver to only fetch the last few kB using <code>HTTP range requests</code>_ for the relevant information. Simply appending <code>Range: bytes&#61;-8000</code> to the request header in <code>pip._internal.network.download</code> makes the resolution process <em>lightning</em> fast.  Of course this breaks the installation but I am confident that it is not difficult to implement this optimization cleanly.</p>
<p>One drawback of this optimization is the compatibility.  Not every Python package index support range requests, and it is not possible to verify the partial wheel.  While the first case is unavoidable, for the other, hashes checking is usually used for pinned/locked-version requirements, thus no backtracking is done during dependency resolution.</p>
<p>Either way, before installation, the packages selected by the resolver can be downloaded in parallel.  This warranties a larger crowd of packages, compared to parallelization during resolution, where the number of downloads can be as low as one during trail of different versions of the same package.</p>
<p>Unfortunately, I have not been able to do much other than <a href=https://github.com/pypa/pip/pull/8411>a minor clean up</a>.  I am looking forward to accomplishing more this week and seeing what this path will lead us too&#33;  At the moment, I am happy that I&#39;m able to meet the blog deadline, at least in UTC&#33;</p>
<p></p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/2020/gsoc/article/1@cnx%3E&Subject=Re: Unexpected Things When You&#39;re Expecting">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/2020/gsoc/article/1@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/2020/gsoc/article/1/comments.xml</wfw:commentRss>
</item>
<item>
  <title>Infinite Sequences: A Case Study in Functional Python</title>
  <link>https://lumvok.store/blog/conseq/index.html</link>
  <guid>https://lumvok.store/blog/conseq/index.html</guid>
  <description>SICP subsection 3.5.2 in Python</description>
  <category>fun</category><category>math</category><category>python</category>
  <pubDate>Thu, 28 Feb 2019 00:00:00 +0000</pubDate>
  <content:encoded><![CDATA[
<h1>Infinite Sequences: A Case Study in Functional Python</h1>
<p>In this article, we will only consider sequences defined by a function whose domain is a subset of the set of all integers.  Such sequences will be <em>visualized</em>, i.e. we will try to evaluate the first few &#40;thousand&#41; elements, using functional programming paradigm, where functions are more similar to the ones in math &#40;in contrast to imperative style with side effects confusing to inexperenced coders&#41;.  The idea is taken from <a href="https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-24.html#&#37;_sec_3.5.2">subsection 3.5.2 of SICP</a> and adapted to Python, which, compare to Scheme, is significantly more popular: Python is pre-installed on almost every modern Unix-like system, namely macOS, GNU/Linux and the &#42;BSDs; and even at MIT, the new 6.01 in Python has recently replaced the legendary 6.001 &#40;SICP&#41;.</p>
<p>One notable advantage of using Python is its huge <strong>standard</strong> library. For example the <em>identity sequence</em> &#40;sequence defined by the identity function&#41; can be imported directly from <code>itertools</code>:</p>
<pre><code class="language-python">&gt;&gt;&gt; from itertools import count
&gt;&gt;&gt; positive_integers &#61; count&#40;start&#61;1&#41;
&gt;&gt;&gt; next&#40;positive_integers&#41;
1
&gt;&gt;&gt; next&#40;positive_integers&#41;
2
&gt;&gt;&gt; for _ in range&#40;4&#41;: next&#40;positive_integers&#41;
... 
3
4
5
6</code></pre>
<p>To open a Python emulator, simply lauch your terminal and run <code>python</code>. If that is somehow still too struggling, navigate to <a href="https://www.python.org/shell">the interactive shell</a> on Python.org.</p>
<p><em>Let&#39;s get it started</em> with somethings everyone hates: recursively defined sequences, e.g. the famous Fibonacci &#40;\(F_n = F_{n-1} + F_{n-2}\), \(F_1 = 1\) and \(F_0 = 0\)&#41;.  Since <a href="https://neopythonic.blogspot.com/2009/04/final-words-on-tail-calls.html">Python does not support</a> <a href="https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-11.html#call_footnote_Temp_48">tail recursion</a>, it&#39;s generally <strong>not</strong> a good idea to define anything recursively &#40;which is, ironically, the only trivial <em>functional</em> solution in this case&#41; but since we will only evaluate the first few terms &#40;use the <strong>Tab</strong> key to indent the line when needed&#41;:</p>
<pre><code class="language-python">&gt;&gt;&gt; def fibonacci&#40;n, a&#61;0, b&#61;1&#41;:
...     # To avoid making the code look complicated,
...     # n &lt; 0 is not handled here.
...     return a if n &#61;&#61; 0 else fibonacci&#40;n - 1, b, a &#43; b&#41;
... 
&gt;&gt;&gt; fibo_seq &#61; &#40;fibonacci&#40;n&#41; for n in count&#40;start&#61;0&#41;&#41;
&gt;&gt;&gt; for _ in range&#40;7&#41;: next&#40;fibo_seq&#41;
... 
0
1
1
2
3
5
8</code></pre>
<div class="admonition note"><p class="admonition-title">Note</p><p>The <code>fibo_seq</code> above is just to demonstrate how <code>itertools.count</code> can be use to create an infinite sequence defined by a function. For better performance, the following should be used instead:</p>
<pre><code class="language-python">def fibonacci_sequence&#40;a&#61;0, b&#61;1&#41;:
    yield a
    yield from fibonacci_sequence&#40;b, a&#43;b&#41;</code></pre>
</div>
<p>It is noticable that the elements having been iterated through &#40;using <code>next</code>&#41; will disappear forever in the void &#40;oh no&#33;&#41;, but that is the cost we are willing to pay to save some memory, especially when we need to evaluate a member of &#40;arbitrarily&#41; large index to estimate the sequence&#39;s limit. One case in point is estimating a definite integral using <a href="https://en.wikipedia.org/wiki/Riemann_sum#Left_Riemann_sum">left Riemann sum</a>.</p>
<pre><code class="language-python">def integral&#40;f, a, b&#41;:
    def left_riemann_sum&#40;n&#41;:
        dx &#61; &#40;b-a&#41; / n
        def x&#40;i&#41;: return a &#43; i*dx
        return sum&#40;f&#40;x&#40;i&#41;&#41; for i in range&#40;n&#41;&#41; * dx
    return left_riemann_sum</code></pre>
<p>The function <code>integral&#40;f, a, b&#41;</code> as defined above returns a function taking \(n\) as an argument.  As \(n\to\infty\), its result approaches \(\int_a^b f(x)\mathrm d x\).  For example, we are going to estimate \(\pi\) as the area of a semicircle whose radius is \(\sqrt 2\):</p>
<pre><code class="language-python">&gt;&gt;&gt; from math import sqrt
&gt;&gt;&gt; def semicircle&#40;x&#41;: return sqrt&#40;abs&#40;2 - x*x&#41;&#41;
... 
&gt;&gt;&gt; pi &#61; integral&#40;semicircle, -sqrt&#40;2&#41;, sqrt&#40;2&#41;&#41;
&gt;&gt;&gt; pi_seq &#61; &#40;pi&#40;n&#41; for n in count&#40;start&#61;2&#41;&#41;
&gt;&gt;&gt; for _ in range&#40;3&#41;: next&#40;pi_seq&#41;
... 
2.000000029802323
2.514157464087051
2.7320508224700384</code></pre>
<p>Whilst the first few aren&#39;t quite close, at index around 1000, the result is somewhat acceptable:</p>
<pre><code class="language-julia">3.1414873191059525
3.1414874770617427
3.1414876346231577</code></pre>
<p>Since we are comfortable with sequence of sums, let&#39;s move on to sums of a sequence, which are called series.  For estimation, again, we are going to make use of infinite sequences of partial sums, which are implemented as <code>itertools.accumulate</code> by thoughtful Python developers.  <a href="https://en.wikipedia.org/wiki/Geometric_series">Geometric</a> and <a href="https://math.oregonstate.edu/home/programs/undergrad/CalculusQuestStudyGuides/SandS/SeriesTests/p-series.html">p-series</a> can be defined as follow:</p>
<pre><code class="language-python">from itertools import accumulate as partial_sumsdef geometric_series&#40;r, a&#61;1&#41;:
    return partial_sums&#40;a*r**n for n in count&#40;0&#41;&#41;def p_series&#40;p&#41;:
    return partial_sums&#40;1 / n**p for n in count&#40;1&#41;&#41;</code></pre>
<p>We can then use these to determine whether a series is convergent or divergent. For instance, one can easily verify that the \(p\)-series with \(p = 2\) converges to \(\pi^2 / 6 \approx 1.6449340668482264\) via</p>
<pre><code class="language-python">&gt;&gt;&gt; s &#61; p_series&#40;p&#61;2&#41;
&gt;&gt;&gt; for _ in range&#40;11&#41;: next&#40;s&#41;
... 
1.0
1.25
1.3611111111111112
1.4236111111111112
1.4636111111111112
1.4913888888888889
1.511797052154195
1.527422052154195
1.5397677311665408
1.5497677311665408
1.558032193976458</code></pre>
<p>We can observe that it takes quite a lot of steps to get the precision we would generally expect &#40;\(s_{11}\) is only precise to the first decimal place; second decimal places: \(s_{101}\); third: \(s_{2304}\)&#41;. Luckily, many techniques for series acceleration are available. <a href="https://en.wikipedia.org/wiki/Shanks_transformation">Shanks transformation</a> for instance, can be implemented as follow:</p>
<pre><code class="language-python">from itertools import islice, teedef shanks&#40;seq&#41;:
    return map&#40;lambda x, y, z: &#40;x*z - y*y&#41; / &#40;x &#43; z - y*2&#41;,
               *&#40;islice&#40;t, i, None&#41; for i, t in enumerate&#40;tee&#40;seq, 3&#41;&#41;&#41;&#41;</code></pre>
<p>In the code above, <code>lambda x, y, z: &#40;x*z - y*y&#41; / &#40;x &#43; z - y*2&#41;</code> denotes the anonymous function \((x, y, z) \mapsto \frac{xz - y^2}{x + z - 2y}\) and <code>map</code> is a higher order function applying that function to respective elements of subsequences starting from index 1, 2 and 3 of <code>seq</code>. On Python 2, one should import <code>imap</code> from <code>itertools</code> to get the same <a href="https://en.wikipedia.org/wiki/Lazy_evaluation">lazy</a> behavior of <code>map</code> on Python 3.</p>
<pre><code class="language-python">&gt;&gt;&gt; s &#61; shanks&#40;p_series&#40;2&#41;&#41;
&gt;&gt;&gt; for _ in range&#40;10&#41;: next&#40;s&#41;
... 
1.4500000000000002
1.503968253968257
1.53472222222223
1.5545202020202133
1.5683119658120213
1.57846371882088
1.5862455815659202
1.5923993101138652
1.5973867787856946
1.6015104548459742</code></pre>
<p>The result was quite satisfying, yet we can do one step futher by continuously applying the transformation to the sequence:</p>
<pre><code class="language-python">&gt;&gt;&gt; def compose&#40;transform, seq&#41;:
... 	yield next&#40;seq&#41;
... 	yield from compose&#40;transform, transform&#40;seq&#41;&#41;
... 
&gt;&gt;&gt; s &#61; compose&#40;shanks, p_series&#40;2&#41;&#41;
&gt;&gt;&gt; for _ in range&#40;10&#41;: next&#40;s&#41;
... 
1.0
1.503968253968257
1.5999812811165188
1.6284732442271674
1.6384666832276524
1.642311342667821
1.6425249569252578
1.640277484549416
1.6415443295058203
1.642038043478661</code></pre>
<p>Shanks transformation works on every sequence &#40;not just sequences of partial sums&#41;.  Back to previous example of using left Riemann sum to compute definite integral:</p>
<pre><code class="language-python">&gt;&gt;&gt; pi_seq &#61; compose&#40;shanks, map&#40;pi, count&#40;2&#41;&#41;&#41;
&gt;&gt;&gt; for _ in range&#40;10&#41;: next&#40;pi_seq&#41;
... 
2.000000029802323
2.978391111182236
3.105916845397819
3.1323116570377185
3.1389379264270736
3.140788413965646
3.140921512857936
3.1400282163913436
3.1400874774021816
3.1407097229603256
&gt;&gt;&gt; next&#40;islice&#40;pi_seq, 300, None&#41;&#41;
3.1415061302492413</code></pre>
<p>Now having series defined, let&#39;s see if we can learn anything about power series. Sequence of partial sums of power series \(\sum c_n (x - a)^n\) can be defined as</p>
<pre><code class="language-python">from operator import muldef power_series&#40;c, start&#61;0, a&#61;0&#41;:
    return lambda x: partial_sums&#40;map&#40;mul, c, &#40;x**n for n in count&#40;start&#41;&#41;&#41;&#41;</code></pre>
<p>We can use this to compute functions that can be written as <a href="https://en.wikipedia.org/wiki/Taylor_series">Taylor series</a>:</p>
<pre><code class="language-python">from math import factorial
def exp&#40;x&#41;:
    return power_series&#40;1/factorial&#40;n&#41; for n in count&#40;0&#41;&#41;&#40;x&#41;def cos&#40;x&#41;:
    c &#61; &#40;&#40;1 - n&#37;2&#41; * &#40;1 - n&#37;4&#41; / factorial&#40;n&#41; for n in count&#40;0&#41;&#41;
    return power_series&#40;c&#41;&#40;x&#41;def sin&#40;x&#41;:
    c &#61; &#40;n&#37;2 * &#40;2 - n&#37;4&#41; / factorial&#40;n&#41; for n in count&#40;1&#41;&#41;
    return power_series&#40;c, start&#61;1&#41;&#40;x&#41;</code></pre>
<p>Amazing&#33;  Let&#39;s test &#39;em&#33;</p>
<pre><code class="language-python">&gt;&gt;&gt; e &#61; compose&#40;shanks, exp&#40;1&#41;&#41; # this should converges to 2.718281828459045
&gt;&gt;&gt; for _ in range&#40;4&#41;: next&#40;e&#41;
... 
1.0
2.749999999999996
2.718276515152136
2.718281825486623</code></pre>
<p>Impressive, huh? For sine and cosine, series acceleration is not even necessary:</p>
<pre><code class="language-python">&gt;&gt;&gt; from math import pi as PI
&gt;&gt;&gt; s &#61; sin&#40;PI/6&#41;
&gt;&gt;&gt; for _ in range&#40;5&#41;: next&#40;s&#41;
... 
0.5235987755982988
0.5235987755982988
0.49967417939436376
0.49967417939436376
0.5000021325887924
&gt;&gt;&gt; next&#40;islice&#40;cos&#40;PI/3&#41;, 8, None&#41;&#41;
0.500000433432915</code></pre>
<p></p>    <a href="mailto:cnx.site@loa.loang.net?In-Reply-To=%3Cblog/conseq@cnx%3E&Subject=Re: Infinite Sequences: A Case Study in Functional Python">Reply via email</a>]]></content:encoded>
  <comments><![CDATA[https://lists.sr.ht/~cnx/site?search=In-Reply-To:%3Cblog/conseq@cnx%3E]]></comments>
  <wfw:commentRss>https://lumvok.store/blog/conseq/comments.xml</wfw:commentRss>
</item>
</channel></rss>