<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>COMSOL Blog &#187; Cluster &amp; Cloud Computing</title>
	<atom:link href="http://www.comsol.no/blogs/category/general/cluster-and-cloud-computing-general/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.comsol.no/blogs</link>
	<description></description>
	<lastBuildDate>Thu, 22 Nov 2018 09:33:42 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.9.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>	<item>
		<title>How to Use the Cluster Sweep Node in COMSOL Multiphysics®</title>
		<link>https://www.comsol.no/blogs/how-to-use-the-cluster-sweep-node-in-comsol-multiphysics/</link>
		<comments>https://www.comsol.no/blogs/how-to-use-the-cluster-sweep-node-in-comsol-multiphysics/#comments</comments>
		<pubDate>Tue, 12 Jun 2018 08:32:51 +0000</pubDate>
		<dc:creator><![CDATA[Pär Persson Mattsson]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Studies & Solvers]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs?p=262281</guid>
		<description><![CDATA[In a previous blog post, we explained how to run a job from the COMSOL Multiphysics® software on clusters directly from the COMSOL Desktop® environment, without any interaction with a Linux® operating system terminal. Since this terminal is sometimes treated with excessive respect, the ability to start a cluster job directly from the graphical user interface is one of the most useful features in the COMSOL® software. Plus, there’s more to it&#8230; Enter the Cluster Sweep node. What Is the [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>In a previous blog post, we explained how to run a job from the COMSOL Multiphysics® software on clusters directly from the COMSOL Desktop® environment, without any interaction with a Linux® operating system terminal. Since this terminal is sometimes treated with excessive respect, the ability to start a cluster job directly from the graphical user interface is one of the most useful features in the COMSOL® software. Plus, there’s more to it&#8230; Enter the <em>Cluster Sweep</em> node.</p>
<p><span id="more-262281"></span></p>
<h3>What Is the Cluster Sweep Node?</h3>
<p>One way to parallelize the computation of a parameter set is to combine the parametric sweep with the <em>Cluster Computing</em> node. When doing so, you create one large cluster job that spans a number of nodes. The more nodes you add, the more parameter values are computed in parallel (as long as there are more parameters than nodes, of course).</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/simple-cluster-diagram.png" title="" alt="A schematic showing four nodes in a typical cluster." class="alignnone size-full wp-image-252321" width="600" height="356" /><br />
<em>A cluster example.</em></p>
<p>You can also use the <em>Cluster Sweep</em> node to parallelize computations. It is designed for when you want to split up a parametric sweep into several cluster computing jobs. You define a list of parametric values in the <em>Cluster Sweep</em> node. For each of these values, a separate batch job is sent to your cluster queue. When the computations are done, COMSOL Multiphysics incorporates the results back into the main process.</p>
<p>You can even nest parametric sweeps this way, combining the cluster sweep with a &#8220;normal&#8221; parametric sweep. You decide which parameters you start separate jobs for and which parameters you want to keep “inside” the jobs.</p>
<p>In short, the <em>Cluster Sweep</em> node is a powerful tool that COMSOL Multiphysics supplies you with to help you be in full control of your modeling process.</p>
<blockquote><p>Note that to use a cluster sweep, a Floating Network License (FNL) is required. It is also recommended that you are familiar with the settings discussed in this blog post: <a href="/blogs/how-to-run-on-clusters-from-the-comsol-desktop-environment/">How to Run on Clusters from the COMSOL Desktop® Environment</a>. If you follow the steps in that blog post and save your settings, they will automatically be used in the <em>Cluster Sweep</em> node.</p></blockquote>
<h3>When to Use the Cluster Sweep Node</h3>
<p>By now, you know what the <em>Cluster Sweep</em> node is, and you might find yourself wondering two things:</p>
<ol>
<li>When should I use it?</li>
<li>When is it preferable over the <em>Cluster Computing</em> node?</li>
</ol>
<p>The first case that comes to mind is when you have a parameter set and you don’t know if your model will converge or even be valid for all parameter combinations. Your parameter set could control your geometry and, for some values, the geometry causes your solving or meshing to fail. If you compute the model with a parametric sweep, COMSOL Multiphysics cancels the computation at the first failing geometry &mdash; even if later ones would finish. By splitting this computation into individual jobs, your computations will be started for each parameter value.</p>
<p>Another situation is when the amount of interesting parameter values is simply too large to be feasible for a single cluster job. If you have parameters that control the frequency, geometry, materials, boundary conditions, and so on, you end up with <em>a lot</em> of computations if you want results for all available combinations. If you put all of these computations into one large job and send it to your cluster, you will almost surely end up with an unhappy cluster administrator and a lot of angry colleagues (more on that later).</p>
<p>Good news: Using the <em>Cluster Sweep</em> node, you can split your potentially enormous job into several smaller ones. To do so, you add a <em>Parametric Sweep</em> node to your model in addition to the <em>Cluster Sweep</em> node. Setting up your model in this way creates what is known as a <em>nested parametric sweep</em> (similar to a nested for-loop in programming). To learn how to do this, keep reading. We’ve included a short tutorial in this blog post.</p>
<p>There is one more thing to note about the <em>Cluster Sweep</em> node: You can use it to potentially increase the throughput of jobs on your cluster.</p>
<h3>Using a Cluster Sweep to Optimize Scheduling</h3>
<p>Earlier in this blog post, I mentioned an unhappy cluster admin, and you might be wondering why. Computation time is a valuable resource on high-performance clusters. Because of this, most clusters have some kind of queue or scheduling system implemented. How large jobs are handled is up to the cluster administrator, and a rule of thumb is that large jobs mean long waiting times. Why? Large jobs occupy a lot of computational resources, and they can take a long time to complete. Hence, in order to not hold up other users’ jobs, large jobs are assigned a low priority. Of course, this all depends on how your cluster admin has configured the scheduler; that is, your mileage may vary.</p>
<p>What does this have to do with the <em>Cluster Sweep</em> node? Suppose that you have access to a cluster where it is hard to get a large job scheduled, but smaller ones are easier, since they fill the gaps in the scheduler (an unused cluster node is an expensive cluster node). You can use a cluster sweep to split the large job into small ones.</p>
<p>Let’s look at it with the help of an example: Instead of starting 1 large job on 8 nodes to parallelize 800 parameter values, you can start 8 jobs that each use 1 node to compute their own sets of 100 parameter values. The jobs will then be scheduled independently and, depending on how your cluster is set up, the small jobs might finish faster than the large job would!</p>
<h3>Setting Up Cluster Sweeps and Nested Parametric Sweeps</h3>
<p>If you have previous experience with the batch sweep and the <em>Cluster Computing</em> node, using the <em>Cluster Sweep</em> node is easy. (To find descriptions on how to set up a batch sweep, check out the blog posts &#8220;<a href="/blogs/the-power-of-the-batch-sweep/">The Power of the Batch Sweep</a>&#8221; and &#8220;<a href="/blogs/added-value-task-parallelism-batch-sweeps/">Added Value of Task Parallelism in Batch Sweeps</a>&#8220;.)</p>
<p>To demonstrate how to set up both a pure cluster sweep and a nested parametric sweep, let’s turn to my favorite example model: the <a href="/model/joule-heating-of-a-microactuator-8493">parameterized thermal microactuator</a>. (It&#8217;s my favorite because the model shows the multiphysics capabilities of the COMSOL® software.) Since it is parameterized, it’s very easy to add parametric and cluster sweeps to the model.</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/06/thermal-microactuator-modeled-cluster-sweep-.png" title="" alt="A thermal microactuator modeled using the cluster sweep functionality in COMSOL Multiphysics®." width="1000" height="750" class="alignnone size-full wp-image-262301" /><br />
<em>Modeling the Joule heating of a microactuator. Current flows through two of the arms, causing them to heat up. Then, the thermal expansion causes the actuator to bend.</em></p>
<h4>Adding a Cluster Sweep</h4>
<p>We start by adding a cluster sweep over the actuator length parameter, called <em>L</em>. To do so, first right-click <em>Study 1</em> and click <em>Cluster Sweep</em>. This adds a node where you can set up your cluster settings, analogue to the instructions in this <a href="/blogs/how-to-run-on-clusters-from-the-comsol-desktop-environment/">blog post on running clusters from the COMSOL Desktop®</a>. (If you haven’t read the blog post yet, now is a good time.)</p>
<p>Next, in the <em>Study Settings</em> window, you can add the parameters that you want to sweep over. Click the plus symbol and, in the drop-down list, choose the parameter <em>L</em>. Then, in the <em>Parameter value list</em> field, write (for example) “100 170 240 310”. In the <em>Parameter unit</em> field, write “um” (for micrometers).</p>
<p>If you want to bring the results back into your main model, make sure to check the <em>Synchronize solutions</em> check box. This way, you&#8217;ll have all the results available for further analysis and postprocessing.</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/06/adding-cluster-sweep-comsol-model.png" title="" alt="A screenshot showing the settings for adding cluster sweep results to a model." width="418" height="765" class="alignnone size-full wp-image-262331" /><br />
<em>Adding a cluster sweep to the model.</em></p>
<p>We have now created a cluster sweep, which loops over the length of the microactuator. Assuming that the cluster settings are correct, all we need to do now is click <em>Compute</em> and the separate jobs are sent to the cluster.</p>
<h4>Adding a Nested Parametric Sweep</h4>
<p>Now, let’s create a nested parametric sweep so that each of our cluster jobs contains a parametric sweep themselves. We do this by adding a parametric sweep over the voltage parameter, called <em>DV</em>. To do so, follow these steps:</p>
<ol>
<li>Right-click <em>Study 1</em> and click <em>Parametric Sweep</em>, which adds a node where you can set up a parametric sweep</li>
<li>In the <em>Study Settings</em> window, click the plus symbol and, in the drop-down list, choose the parameter <em>DV</em></li>
<li>In the <em>Parameter value list</em> field, write “1 2 3 4 5”</li>
<li>In the <em>Parameter Unit</em> field, write “V”</li>
<li>Click <em>Compute</em> so that COMSOL Multiphysics will schedule the jobs for you</li>
</ol>
<p><img src="https://cdn.comsol.com/wordpress/2018/06/nested-parametric-sweep-comsol-screenshot.png" title="" alt="A screenshot of the settings for a nested parametric sweep in COMSOL Multiphysics®." width="520" height="647" class="alignnone size-full wp-image-262621" /><br />
<em>Adding a parametric sweep to the model, which creates a nested parametric sweep.</em></p>
<p>You can either wait for the jobs to complete (their status is shown in the <em>External Processes</em> window) or you can detach from the processes, save the model, close COMSOL Multiphysics, and let the jobs run themselves. When you come back to your workstation, just open your saved model and reattach, and the software will process the results just as when using the regular <em>Cluster Computing</em> node. This workflow is perfect for overnight simulations!</p>
<h3>Concluding Thoughts</h3>
<p>In this blog post, you have learned how you can optimize the parallelization of parametric computations on clusters using the <em>Cluster Sweep</em> node. You have also learned when it is beneficial to use different approaches and, as a bonus, how to avoid making your system admin unhappy.</p>
<p>As with cluster computing in general, you must decide what approach to use depending on the model you want to compute. To know when to use a cluster sweep and a distributed parametric sweep, you have to try the two approaches on your models and your cluster. As always, to master something, you have to test it!</p>
<p>As mentioned, you’ll need an FNL to use a cluster sweep, since this functionality is a network-based technology.</p>
<h3>Next Step</h3>
<p>If you want to learn more about the <em>Cluster Sweep</em> node, you can contact us by simply clicking the button below.</p>
<div class="flex-center"><a href="/contact" class="btn-solid btn-md btn-orange">Contact COMSOL</a></div>
<p>&nbsp;</p>
<p><em>Linux is a registered trademark of Linus Torvalds in the U.S. and other countries.</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/how-to-use-the-cluster-sweep-node-in-comsol-multiphysics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Run on Clusters from the COMSOL Desktop® Environment</title>
		<link>https://www.comsol.no/blogs/how-to-run-on-clusters-from-the-comsol-desktop-environment/</link>
		<comments>https://www.comsol.no/blogs/how-to-run-on-clusters-from-the-comsol-desktop-environment/#comments</comments>
		<pubDate>Fri, 09 Mar 2018 09:52:31 +0000</pubDate>
		<dc:creator><![CDATA[Lars Drögemüller]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs?p=252261</guid>
		<description><![CDATA[Many types of analyses benefit from running the COMSOL Multiphysics® software on high-performance computing (HPC) hardware. This is one of the main reasons behind the Cluster Computing node, which helps seamlessly integrate the COMSOL® software with any kind of HPC infrastructure, while maintaining the convenience of a graphical user interface. In this blog post, learn how to run large simulations remotely on HPC hardware directly from the COMSOL Desktop® graphical environment. What Is Cluster Computing? The most common type of [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Many types of analyses benefit from running the COMSOL Multiphysics® software on high-performance computing (HPC) hardware. This is one of the main reasons behind the <em>Cluster Computing</em> node, which helps seamlessly integrate the COMSOL® software with any kind of HPC infrastructure, while maintaining the convenience of a graphical user interface. In this blog post, learn how to run large simulations remotely on HPC hardware directly from the COMSOL Desktop® graphical environment.</p>
<p><span id="more-252261"></span></p>
<h3>What Is Cluster Computing?</h3>
<p>The most common type of HPC hardware is a <em>cluster</em>; a bunch of individual computers (often called nodes) connected by a network. Even if there is only one dedicated simulation machine, you can think of it as a one-node cluster.</p>
<blockquote><p>The <em>COMSOL Reference Manual</em> also calls a single COMSOL Multiphysics process a node. The difference is rarely important, but when it does matter, we will call a computer a physical node or host and an instance of the COMSOL Multiphysics program a <em>compute node or process</em>.</p></blockquote>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/simple-cluster-diagram.png" title="" alt="A simple diagram illustrating what a cluster is." width="600" height="356" class="alignnone size-full wp-image-252321" /><br />
<em>An example of a cluster with four compute nodes.</em></p>
<p>The work that we want to perform on the cluster is bundled into atomic units, called <em>jobs</em>, that are submitted to the cluster. A job in this context is a study being run with COMSOL Multiphysics.</p>
<p>When you submit a job to a cluster, the cluster does two things:</p>
<ul>
<li>Decides which nodes run which jobs and at what time</li>
<li>Restricts the access to the nodes, so multiple jobs do not interfere with each other</li>
</ul>
<p>These tasks are performed by special programs called <em>schedulers</em> and <em>resource managers</em>, respectively. Here, we use the term scheduler interchangeably for both terms, since most programs perform both tasks at once, anyway.</p>
<p>Note that it is possible to submit COMSOL Multiphysics jobs to a cluster using the <code>comsol batch</code> command (on the Linux® operating system) or <code > comsolbatch.exe</code> (on the Windows® operating system) in a script that you submit to the cluster. You might prefer this method if you&#8217;re already familiar with console-based access to your cluster. For additional information, please see the COMSOL Knowledge Base article &#8220;<a href="/support/knowledgebase/1001/">Running COMSOL® in parallel on clusters</a>&#8220;. </p>
<p>In the following sections, we will discuss using the <em>Cluster Computing</em> node to submit and monitor cluster jobs from the COMSOL Desktop® graphical interface.</p>
<h3>Adding the Cluster Computing Node to a Simple Model</h3>
<p>Whenever I want to configure the <em>Cluster Computing</em> node for a cluster that I am not familiar with yet, I like to start with a simple busbar model. This model solves in a few seconds and is available with any license, which makes testing the cluster computing functionality very easy.</p>
<p>To run the busbar model on a cluster, we add the <em>Cluster Computing</em> node to the main study. We might need to enable <em>Advanced Study Options</em> first, though. To do so, we activate the option in <em>Preferences</em> or click the <em>Show</em> button in the Model Builder toolbar.</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/advanced-study-options-comsol-multiphysics-.png" title="" alt="A screenshot showing the Advanced Study Options in the COMSOL Multiphysics GUI." width="376" height="559" class="alignnone size-full wp-image-252331" /><br />
<em>Activate</em> Advanced Study Options <em>to enable the</em> Cluster Computing <em>node.</em></p>
<p>Now the <em>Cluster Computing</em> node can be added to any study by right-clicking the study and selecting <em>Cluster Computing</em>.</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-option-COMSOL-software.png" title="" alt="A screenshot showing the Cluster Computing option in COMSOL Multiphysics." width="519" height="687" class="alignnone size-full wp-image-252341" /><br />
<em>Right-click a study node and select</em> Cluster Computing <em>from the menu to add it to the model.</em></p>
<p><a href="https://cdn.comsol.com/wordpress/2018/03/computing-clusters-UI.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2018/03/computing-clusters-UI.png" title="Default settings" alt="A cropped screenshot showing the Cluster Computing settings in COMSOL Multiphysics." width="898" height="706" class="alignnone size-full wp-image-252351" /></a><br />
<em>The default settings for the</em> Cluster Computing <em>node.</em></p>
<p>If you can&#8217;t find the <em>Cluster Computing</em> node, chances are your license is not cluster enabled (such as CPU licenses and academic class kit licenses). In this case, you can contact your sales representative to discuss <a href="/products/licensing">licensing options</a>.</p>
<h3>Settings for the Cluster Computing Node</h3>
<p>The most complex part of using the <em>Cluster Computing</em> node is finding the right settings and using it for the first time. Once the node works on your cluster for one model, it is very straightforward to adjust the settings slightly for other simulations.</p>
<p>To store the settings as defaults, you can change the settings under <em>Preferences</em> in the sections <em>Multicore and Cluster Computing</em> and <em>Remote Computing</em>. Alternatively, you can apply the default settings to the <em>Cluster Computing</em> node directly and click the <em>Save</em> icon at the top of the Settings window. It is highly recommended to store the settings as default settings either way, so you do not have to type everything again for the next model.</p>
<p>Discussing all of the possible settings for the <em>Cluster Computing</em> node is out of scope of this blog post, so we will focus on a typical setup. The <em>COMSOL Multiphysics Reference Manual</em> contains additional information. In this blog post, the following is assumed:</p>
<ul>
<li>COMSOL Multiphysics® is running on a local Windows® machine and we want to submit jobs to a remote cluster</li>
<li>The cluster is running on Linux® and has SLURM® software installed as the scheduler</li>
</ul>
<p>These settings are shown in this screenshot:</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-node-settings-screenshot.png" title="" alt="A screenshot showing the typical Cluster Computing node setup in COMSOL Multiphysics." width="477" height="729" class="alignnone size-full wp-image-252361" /></p>
<p>First, let&#8217;s talk about the section labeled <em>Cluster computing settings</em>. Since our cluster uses SLURM® software as its scheduler, we set the <em>Scheduler type</em> to &#8220;SLURM&#8221;. The following options are SLURM®-specific:</p>
<ul>
<li><em>Scheduler</em> is left empty to instruct SLURM® software to just use the one scheduler that is available</li>
<li><em>User</em> is our username, which can be left empty to use the username we log in with on the cluster</li>
<li><em>Queue name</em> is the name of the queue to which the job is submitted</li>
</ul>
<p>On the machine used in this example, we have two queues: &#8220;cluster&#8221; for jobs of up to 10 physical compute nodes with 64 GB of RAM each and &#8220;fatnode&#8221; for a single node with 256 GB. Every cluster will have different queues, so ask your cluster administrator what queues to use.</p>
<p>The next field is labeled &#8220;Directory&#8221;. This is where the solved COMSOL Multiphysics files go on a local computer when the job is finished. This is also where the COMSOL® software will store any intermediate, status, and log files.</p>
<p>The next three fields specify locations on the cluster. Notice that <em>Directory</em> was a Windows® path (since we are working on a Windows® computer here), but these are Linux® paths (since our cluster uses Linux®). Make sure that the kind of path matches the operating system on the local and remote side!</p>
<p>The <em>Server Directory</em> specifies where files should be stored when using cluster computing from a COMSOL Multiphysics session in client-server mode. When executing cluster computing from a local machine, this setting is not used, so we leave it blank. We do need the <em>external COMSOL batch directory</em>, however. This is where model files, status files, and log files should be kept on the cluster during the simulation. For these paths, be sure to choose a directory that already exists where you have write permissions; for example, some place in your home directory. (See this previous <a href="/blogs/getting-client-server-mode/">blog post on using client-server mode</a> for more details.)</p>
<p>The <em>COMSOL installation directory</em> is self-explanatory and should contain the <code>folders bin</code>, <code > applications</code>, and so on. This is usually something like &#8220;/usr/local/comsol/v53a/multiphysics/&#8221; by default, but it obviously depends on where COMSOL Multiphysics is installed on the cluster.</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/remote-computing-settings-clusters.png" title="" alt="A screenshot showing the Remote computing settings in COMSOL Multiphysics." width="477" height="759" class="alignnone size-full wp-image-252371" /><br />
<em>Remote connection settings.</em></p>
<p>The next important section is the <em>Remote and Cloud Access</em> tab. This is where we specify how to establish the connection between the local computer and remote cluster.</p>
<p>To connect from a Windows® workstation to a Linux® cluster, we need the third-party program <a href="https://www.putty.org/" target="_blank">PuTTY</a> to act as the SSH client for the COMSOL® software. Make sure to have PuTTY installed and that you can connect to your cluster with it. Also, make sure that you set up password-free authentication with a public-private key pair. There are many tutorials online on how to do this and your cluster administrator can help you. When this is done, enter the installation directory of PuTTY as the <em>SSH directory</em> and your private key file from the password-free authentication in the <em>SSH key file</em>. Set the <em>SSH user</em> to your login name on the cluster.</p>
<p>While SSH is used to log in to the cluster and run commands, SCP is used for file transfer, for example, when transferring model files to or from the cluster. PuTTY uses the same settings for SCP and SSH, so just copy the settings from SSH.</p>
<p>Lastly, enter the address of the cluster under <em>Remote hosts</em>. This may be a host name or an IP address. Remember to also set the <em>Remote OS</em> to the correct operating system on the cluster.</p>
<p>When you are done, you can click the <em>Save</em> icon at the top of the Settings window to start with these settings next time you want to run a remote cluster job.</p>
<p>Another possibility to test whether your cluster settings work is to use the <a href="/model/cluster-setup-validation-55711">Cluster Setup Validation app</a>, available as of COMSOL Multiphysics version 5.3a.</p>
<h3>Running a Study on a Cluster</h3>
<p>The settings that change every time you run a study include the model name and the number of physical nodes to use. When you click to run the study, COMSOL Multiphysics begins the process of submitting the job to the cluster. The first step is invisible and involves running SCP to copy the model file to the cluster. The second step is starting the simulation by submitting a job to the scheduler. Once this stage starts, the <em>External Process</em> window automatically appears and informs you of the progress of your simulation on the cluster. During this stage, the COMSOL Desktop® is locked and the software is busy tracking the remote job.</p>
<p><a href="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-progress-start.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-progress-start.png" title="First three stages" alt="The three first stages of cluster computing progress for a COMSOL Multiphysics study." width="970" height="743" class="alignnone size-full wp-image-252381" /></a><br />
<a href="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-progress-log-finish.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2018/03/cluster-computing-progress-log-finish.png" title="Final three stages" alt="The three final stages of cluster computing progress for a COMSOL Multiphysics study." width="964" height="733" class="alignnone size-full wp-image-252391" /></a><br />
<em>Tracking the progress of the remote job in the</em> External Process <em>window from scheduling the job (top) to Done (bottom).</em></p>
<p>This process is very similar to how the <em>Batch Sweep</em> node works. In fact, you may recognize the <em>External Process</em> window from <a href="/blogs/the-power-of-the-batch-sweep/">using the batch sweep functionality</a>. Just like when using a batch sweep, we can regain control of the GUI by clicking the <em>Detach Job</em> button below the <em>External Process</em> window, to detach the GUI from the remote job. We can later reattach to the same job by clicking the <em>Attach job</em> button, which replaces the <em>Detach job</em> button when we are detached.</p>
<p>Normally, running COMSOL Multiphysics on two machines simultaneously requires two license seats, but you can check the <em>Use batch license</em> option to detach from a remote job and keep editing locally with only one license seat. In fact, you can even submit multiple jobs to the cluster and run them simultaneously, as long as both jobs are just variations of the same model file; i.e., they only differ in their global parameter values. The only restriction is that your local username needs to be identical to the username on the remote cluster so the license manager can tell that the same person is using both licenses. Otherwise, an extra license seat will be consumed, even when the <em>Use batch license</em> option is enabled.</p>
<p>As soon as the simulation is done, you are prompted to open the resulting file:</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/open-MPH-file-dialog-box-.png" title="" alt="A dialog box asking to open an MPH-file after the cluster job has finished." width="266" height="171" class="alignnone size-full wp-image-252401" /><br />
<em>Once the cluster job has finished, you are prompted to immediately open the solved file.</em> </p>
<p>If you select <em>No</em>, you can still open the file later, because it will have already been downloaded and copied to the directory that was specified in the settings. Let&#8217;s have a look at these files:</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/cluster-job-files.png" title="" alt="A list of files created during the cluster job." width="200" height="200" class="alignnone size-full wp-image-252411" /><br />
<em>Files created during the cluster job on the local side.</em></p>
<p>These files are created and updated as the simulation progresses. COMSOL Multiphysics periodically retrieves each file from the remote cluster to update the status in the <em>Progress</em> window and informs you as soon as the simulation is done. The same files are also present on the remote side:</p>
<p><img src="https://cdn.comsol.com/wordpress/2018/03/putty-files-cluster-job.png" title="" alt="A screenshot of PuTTY files created during the cluster job for a COMSOL Multiphysics model." width="595" height="340" class="alignnone size-full wp-image-252421" /><br />
<em>Files created during the cluster job on the remote side. Note: Colors have been changed from the default color scheme in PuTTY to emphasize MPH-files.</em></p>
<p>Here is a rundown of the most relevant file types:</p>
<table class="table-blog">
<tr>
<th>
File
</th>
<th>
Remote Side
</th>
<th>
Local Side
</th>
</tr>
<tr>
<td>
backup*.mph
</td>
<td>
N/A
</td>
<td>
<ul>
<li>Copy of the model file in the state it is in at the moment <em>Compute</em> is clicked</li>
</ul>
</td>
</tr>
<tr>
<td>
* .mph
</td>
<td>
<ul>
<li>Input file is stored here before the job starts</li>
<li>Output file is written here during the simulation</li>
</ul>
</td>
<td>
<ul>
<li>Output file is copied from the remote side as soon as the simulation is done</li>
</ul>
</td>
</tr>
<tr>
<td>
*.mph.log
</td>
<td>
<ul>
<li>Usually written to the <em>Log</em> window, but with additional information about memory usage and current progress percentage</li>
</ul>
</td>
<td>
<ul>
<li>Copied from the remote side continuously</li>
<li>Used to update progress information</li>
</ul>
</td>
</tr>
<tr>
<td>
*.mph.recovery
</td>
<td>
<ul>
<li>Tracks the location of the current recovery data, in case the simulation fails</li>
</ul>
</td>
<td>
<ul>
<li>Copied from the remote side continuously</li>
</ul>
</td>
</tr>
<tr>
<td>
*.mph.status
</td>
<td>
<ul>
<li>Checks for <em>Cancel</em> and <em>Stop</em> events</li>
<li>Written when the status changes</li>
</ul>
</td>
<td>
<ul>
<li>Copied from and to the remote side continuously</li>
<li>Used to update status information</li>
</ul>
</td>
</tr>
<tr>
<td>
*.mph.host
</td>
<td>
N/A
</td>
<td>
<ul>
<li>Contains address of the host that the job was submitted to (in case there were multiple ones)</li>
</ul>
</td>
</tr>
</table>
<h3>Using Cluster Computing Functionality for Your COMSOL Multiphysics® Simulations</h3>
<p>The busbar model, being so small, is not something that we would want to realistically run on a cluster. After using that example to test the functionality, we can open up any model file, add the <em>Cluster Computing</em> node (populated with the defaults we set before), change the number of nodes and filename, and click <em>Compute</em>. The <em>Run remote</em> options, scheduler type, and all of the associated settings don&#8217;t need to be changed again.</p>
<p>What does the COMSOL® software do when we run a model on multiple hosts? How is the work split up? Most algorithms in the software are parallelized, meaning the COMSOL Multiphysics processes on all hosts work together on the same computation. Distributing the work over multiple computers provides more computing resources and can increase performance for many problems. </p>
<p>However, it should be noted that the required communication between cluster nodes can produce a performance bottleneck. How fast the model will solve depends a lot on the model itself, the solver configuration, the quality of the network, and many other factors. You can find more information in this <a href="/blogs/tag/hybrid-modeling-series/">blog series on hybrid modeling</a>.</p>
<p>Another reason to use the hardware power of a cluster is that the total memory that a simulation needs stays approximately constant, but there is more memory among all of the hosts, so the memory needed per host goes down. This allows us to run really large models that would not otherwise be possible to solve on a single computer. In practice, the total memory consumption of the problem goes up slightly, since the COMSOL Multiphysics processes need to track their own data as well as the data they receive from each other (usually much less). Also, the exact amount of memory a process will need is often not predictable, so adding more processes can increase the risk that a single physical node will run out of memory and abort the simulation.</p>
<p>A much easier case is running a distributed parametric sweep. We can speed up the computation by using multiple COMSOL Multiphysics processes and having each work on a different parameter value. We call this type of problem &#8220;embarrassingly parallel&#8221;, since the nodes do not need to exchange information across the network at all while solving. In this case, if the number of physical nodes is doubled, then ideally the simulation time will be cut in half. The actual speedup is typically not quite this good, as it takes some time to send the model to each node and additional time to copy the results back.</p>
<p>To run a distributed parametric sweep, we need to activate the <em>Distribute parametric sweep</em> option at the bottom of the settings for the parametric sweep. Otherwise, the simulation will run one parameter at a time using all of the cluster nodes, with the parallelization performed on the level of the solver, which is much less efficient.</p>
<p>If you run an auxiliary sweep, you can also check the <em>Distribute parametric solver</em> option in the study step, for example, to run a frequency sweep over many frequencies in parallel using multiple processes on potentially many physical nodes. Note that if you use a continuation method, or if individual simulations depend on each other, then this method of distributing the parameters does not work.</p>
<blockquote><p>Note: Do not use the <em>Distribute parametric sweep</em> option in the <em>Cluster Computing</em> node itself, as it has been depreciated. It is better to specify this at the parametric sweep directly.</p></blockquote>
<p><a href="https://cdn.comsol.com/wordpress/2018/03/parametric-sweep-comsol-multiphysics-screenshot.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2018/03/parametric-sweep-comsol-multiphysics-screenshot.png" title="Parametric Sweep settings" alt="A cropped screenshot showing the Parametric Sweep settings in COMSOL Multiphysics." width="899" height="692" class="alignnone size-full wp-image-252431" /></a><br />
<em>Activate the</em> Distribute parametric sweep <em>option to run each set of parameters on a different node in parallel.</em></p>
<p>To run a sweep in parallel, we can also use the <em>Cluster Sweep</em> node, which combines the features of the <em>Batch Sweep</em> node with the ability of the <em>Cluster Computing</em> node to run jobs remotely. You can say that a cluster sweep is the remote version of the batch sweep, just like the <em>Cluster Computing</em> node is the remote version of the <em>Batch</em> node. We will discuss cluster sweeps in more detail in a <a href="/blogs/how-to-use-the-cluster-sweep-node-in-comsol-multiphysics/">future blog post</a>.</p>
<p>The most important difference to remember is that the <em>Cluster Computing</em> node submits one job for the entire study (even if it contains a sweep), while the <em>Cluster Sweep</em> and <em>Batch Sweep</em> nodes create one job for each set of parameter values.</p>
<h3>Cluster Computing with Apps</h3>
<p>All of what is covered in this blog post is also available from simulation apps that are run from either COMSOL Multiphysics or <a href="/comsol-server">COMSOL Server™</a>. An app simply inherits the cluster settings from the model on which it is based.</p>
<p>When running apps from COMSOL Server™, you get access to cluster preferences in the administration web page of COMSOL Server™. You can let your app use these preferences to have the cluster settings hardwired and customized for a particular app. If you wish, you can design your apps so that the user of the app gets access to one or more of the low-level cluster settings. For example, in your app&#8217;s user interface, you can design a menu or list where users can select between different queues, such as the &#8220;cluster&#8221; or &#8220;fatnode&#8221; options mentioned earlier.</p>
<h3>Concluding Thoughts</h3>
<p>Whether you are using a university cluster, a virtual cloud environment, or your own hardware, the <em>Cluster Computing</em> node enables you to easily run your simulations remotely. You don&#8217;t usually need an expensive setup for this purpose. In fact, sometimes all you need is a <a href="/blogs/building-beowulf-cluster-faster-multiphysics-simulations/">Beowulf cluster for running parametric sweeps</a> while you take care of other tasks locally.</p>
<p>Cluster computing is a powerful tool to speed up your simulations, study detailed and realistic devices, and ultimately help you with your research and development goals.</p>
<div class="flex-center">
<a href="/contact" class="btn-solid btn-md btn-orange">Contact COMSOL</a>
</div>
<p><em>SLURM is a registered trademark of SchedMD LLC.</p>
<p>Linux is a registered trademark of Linus Torvalds in the U.S. and other countries.</p>
<p>Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/how-to-run-on-clusters-from-the-comsol-desktop-environment/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Running COMSOL Multiphysics® with Cloud Computing</title>
		<link>https://www.comsol.no/blogs/running-comsol-multiphysics-with-cloud-computing/</link>
		<comments>https://www.comsol.no/blogs/running-comsol-multiphysics-with-cloud-computing/#comments</comments>
		<pubDate>Fri, 20 Feb 2015 14:30:17 +0000</pubDate>
		<dc:creator><![CDATA[Pär Persson Mattsson]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=64441</guid>
		<description><![CDATA[We have previously written about HPC with the COMSOL Multiphysics® software, clusters, and hybrid computing. But not all of us have a cluster available in the office (or the hardware to build a Beowulf cluster). So what possibilities do we have if we really need that extra compute power that a cluster can give us? One solution is to utilize cloud computing, a service that provides compute power on a temporary basis, to give our computations and productivity a boost. [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>We have previously written about HPC with the COMSOL Multiphysics® software, clusters, and <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">hybrid computing</a>. But not all of us have a cluster available in the office (or the hardware to <a href="http://www.comsol.com/blogs/building-beowulf-cluster-faster-multiphysics-simulations/">build a Beowulf cluster</a>). So what possibilities do we have if we really need that extra compute power that a cluster can give us? One solution is to utilize cloud computing, a service that provides compute power on a temporary basis, to give our computations and productivity a boost.</p>
<p><span id="more-64441"></span></p>
<h3>Three Cases Where You Need More Computer Power</h3>
<p>Imagine that you are modeling an electronic device and are interested in its temperature distribution during operation. After testing a few setups, you discover that the heat flux boundary condition you applied is not a well-suited approximation for your model. You realize that a fluid flow simulation is required in order to achieve more accurate results. The only problem is that you have used almost all of your laptop’s 4 GB of RAM to model the heat transfer simulation, based on the heat flux approximation. You require two-way coupling, and including the simulation of fluid flow will only add even more degrees of freedom to your computation &mdash; and require even more RAM.</p>
<p>What now? You need more computer power.</p>
<p>Now imagine, instead, that you are analyzing the mechanics of a structural component with a lot of small details for your customer. In order to optimize the design, you are required to run the analysis for a large number of different design dimensions. As you have only the one processor locally and each run will take quite a bit of time, you realize that you will not reach your customer&#8217;s deadline.</p>
<p>The solution? You would need to run these simulations in parallel on multiple processors.</p>
<p>Finally, let’s look at an application independent of the physics involved, but still reliant on the analysis performed. You have set up your model using the physics interface of your choice, but it&#8217;s the end of the day and you just want to get the computed solution to your model as quickly and easily as possible, overnight. Utilizing a direct solver does not require a lot of work with manipulating solver settings and the like, but the required RAM of a direct solver increases greatly with the number of degrees of freedom in your model. </p>
<p>What&#8217;s the fix this time? You need a bigger computer.</p>
<p>What if there was another solution to all three cases&#8230;</p>
<h3>Enter: Cloud Computing</h3>
<p>This is where cloud computing comes into the picture. Compute clouds are services that make computing power available to those who need it, when they need it.</p>
<p>The service has several advantages, especially if you don’t have the time, money, and experience to invest in a traditional cluster or server rack. You might also not need a cluster available 24/7, but only need that extra compute power during certain periods of time, for instance, for that one-off analysis or task that needs to be performed quicker.</p>
<p><img src="https://cdn.comsol.com/wordpress/2015/02/Running-COMSOL-Multiphysics-on-the-cloud.png" title="" alt="Running COMSOL Multiphysics on the cloud." width="1000" height="387" class="alignnone size-full wp-image-64471" /><br />
<em>An organization can access COMSOL Multiphysics® and the hardware resources of cloud computing to run many different analyses at the time they require, utilizing the resources they require.</em></p>
<p>Utilizing cloud computing will have a positive impact on your workflow. The ability to add more compute power directly when you need it will enable you to be more agile in your day-to-day COMSOL Multiphysics® simulation work. You won’t have to worry about the lack of adequate hardware on-site and you can go about your daily business with the certainty that you can expand into the cloud whenever the situation calls for it.</p>
<h3>Using the COMSOL® Software on Remote Computing Resources</h3>
<p>Traditionally, when using cloud computing services, you need expertise in the network and hardware technology being used, as well as the operating system and software implemented by the cloud service to support running your application. In an example workflow, you register to the cloud service, research what specifications their machines require, rent the machine, and then connect it to your network to allow access to your license server. Next comes the easy part: installing COMSOL Multiphysics® and running your model.</p>
<p>However, since <a href="http://www.comsol.com/multiphysics/high-performance-computing">HPC</a> is becoming more and more important in the CAE community, we have partnered with cloud computing providers to make it as simple as possible for you to take the step into the cloud. </p>
<blockquote><p>Note: COMSOL Multiphysics has been able to utilize remote computing resources for a long time, either through batch jobs started from the user interface or the command line, or on-the-fly through client-server technology. For this, you only need a Floating Network License (FNL) for COMSOL Multiphysics®.</p></blockquote>
<h3>Get Started: Contact Our Cloud Computing Partners</h3>
<ul>
<li><a href="https://www.nimbix.net/new-contact/" target="_blank">Nimbix</a></li>
<li><a href="http://www.rescale.com/signup/vip" target="_blank">Rescale</a></li>
<li><a href="https://www.cpu-24-7.com/" target="_blank">CPU 24/7</a></li>
<li><a href="http://www.nor-tech.com/" target="_blank">Nor-Tech</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/running-comsol-multiphysics-with-cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Much Memory Is Needed to Solve Large COMSOL Models?</title>
		<link>https://www.comsol.no/blogs/much-memory-needed-solve-large-comsol-models/</link>
		<comments>https://www.comsol.no/blogs/much-memory-needed-solve-large-comsol-models/#comments</comments>
		<pubDate>Fri, 24 Oct 2014 20:26:29 +0000</pubDate>
		<dc:creator><![CDATA[Walter Frei]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=38877</guid>
		<description><![CDATA[One of the most common questions we get is: How large of a model can you solve in COMSOL Multiphysics? It turns out that this is quite tricky to answer decisively, so in this blog entry, we will talk about memory requirements, model size, and how you can predict the amount of memory you will need for solving large 3D finite element problems. Let&#8217;s Look at Some Data The plot below shows the amount of memory needed to solve various [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>One of the most common questions we get is: How large of a model can you solve in COMSOL Multiphysics? It turns out that this is quite tricky to answer decisively, so in this blog entry, we will talk about memory requirements, model size, and how you can predict the amount of memory you will need for solving large 3D finite element problems.</p>
<p><span id="more-38877"></span></p>
<h3>Let&#8217;s Look at Some Data</h3>
<p>The plot below shows the amount of memory needed to solve various different 3D finite element problems in terms of the number of degrees of freedom (DOF) in the model. </p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/Memory-requirements-with-a-second-polynomial-curve-fit-with-respect-to-degrees-of-freedom.png" title="" alt="Graph depicting memory requirements with respect to degrees of freedom." width="977" height="699" class="alignnone size-full wp-image-38903" /><br />
<em>Memory requirements (with a second-polynomial curve fit) with respect to degrees of freedom for various representative cases.</em></p>
<p>There are five different cases presented here:</p>
<ul>
<li>Case 1: A heat transfer problem of a spherical shell. There is radiative heat transfer between all of the surfaces. The model is solved with the default iterative solver.</li>
<li>Case 2: A structural mechanics problem of a cantilevered beam, solved with the default direct solver.</li>
<li>Case 3: A wave electromagnetics problem solved with the default iterative solver.</li>
<li>Case 4: The same structural mechanics problem as Case 2, but using an iterative solver.</li>
<li>Case 5: A heat transfer problem of a block of material. Only conductive heat transfer is considered. The model is solved with the default iterative solver.</li>
</ul>
<p>What you should see from this graph is that, with a computer that has 64 GB of random access memory (RAM), you can solve problems that range in size anywhere from ~26,000 DOF on the low end all the way up to almost 14 million degrees of freedom. So why this wide range of numbers? Let&#8217;s look at how to interpret these data&#8230;</p>
<h3>Degrees of Freedom, Explained</h3>
<p>For most problems, <a href="http://www.comsol.com/comsol-multiphysics">COMSOL Multiphysics</a> solves a set of governing partial differential equations via the <a href="http://www.comsol.com/multiphysics/finite-element-method">finite element method</a>, which takes your CAD model and subdivides the domains into <em>elements</em>, which are defined by a set of nodes on the boundaries. </p>
<p>At each node, there will be at least one <em>unknown</em>, and the number of these unknowns is based upon the physics that you are solving. For example, when solving for temperature, you only have a single unknown (called T, by default) at each node. When solving a structural problem, you are instead computing strains and the resultant stresses, thus you are solving for three unknowns (u,v,w), which are the displacements of each node in the x-y-z space. </p>
<p>For a turbulent fluid flow problem, you are solving for the fluid velocities (also called u,v,w by default) and pressure (p) as well as extra unknowns describing the turbulence. If you are solving a <a href="http://www.comsol.com/multiphysics/what-is-diffusion">diffusion</a> problem with many different species, you will have as many unknowns per node as you have chemical species. Additionally, different physics within the same model can have a different default <em>discretization</em> order, meaning there can be additional nodes along the element edges, as well as in the element interior.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/elements.png" title="" alt="Diagram of various elements." width="827" height="348" class="alignnone size-full wp-image-38905" /><br />
<em>A second-order tetrahedral element solving for the temperature field, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARBAMAAAAidOHKAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADBQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////L2OGaQAAAA50Uk5TABGIZneqRFWZIt3uM8w0ZYRTAAAAAWJLR0QAiAUdSAAAAAlwSFlzAAAAeAAAAHgAnfVaYAAAAFtJREFUCNdjYMAGhJRAwADIck1kf8SQ0QBkBTCwvGTgVGBg4DRgYF3AwJbAwMAswMDdwMA2AaznnAFMt98EGCsKbuIOGIPnMYwFNA4KgMZBAXcDlGF0b5ECqpsAJWYPDhtP+s0AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjFUMjM6MTQ6MzYrMDE6MDDKyUHiAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIxVDIzOjE0OjM2KzAxOjAwu5T5XgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlL2I5ZWNlMThjOTUwYWZiZmE2YjBmZGJmYTRmZjczMWQzLmR2aSAtb+cbL6UAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS9iOWVjZTE4Yzk1MGFmYmZhNmIwZmRiZmE0ZmY3MzFkMy5wc5d5OxUAAAAASUVORK5CYII=" />, will have a total of 10 unknowns per element, while a first-order element solving the laminar Navier-Stokes equations for velocity, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIUAAAAZCAMAAAAooB02AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxFEmaqIuyLMVe7dAs2r6AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAGaSURBVEjH7ZbZdoUgDEXDKCDD//9tgwy2NdDC8q3Ng9frBjyeAAHgP/aC8XfHE3KjkxIvf5XUG32Ol0Xgd5nlLlq9rgKsW7XifF8E+NWcsHX3fhGh/nJrE77A2jB/zeGLJcJkFz2ZnhmkmW1/5ZlVQLkWYeqOvjjP0t6AyS5GRqmYQZrpvu4uL+q1dPgc7WGqAuHARy5R02oGB+we/6GCjKJC4ThoD4tUkxkcsD0VuOHltaIHc3sGSTZRIcYZwXb5CT0t5pBkExWfZ2ef0rHeBV+yK4B7o5j5kucbMoP+a5qBsZyX4XRXdeT385B+tVIvNSaBd1KAOiHiy1xsM71D5gxiSzKUr0wVaPsadLhVYKQ03R1FFckOJdSJlU3K5qcLrcTcEELBBMs9mrQAa+GaRpm9LH72PafnrcPsQ8FPBqr6ur6DP6oZZ5Dwe6/7ZzLRB5kGLBcDc50t9Go1A/etsgsttBdXXvlzy3ZWHZZmnPlY1grfqE2MjQhx/MlTUQzYbe/OYWvhxOcilsmfGm2d+NYUm7fPh38jPgAZoA22UVDIcQAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQxMjoxMjo1NSswMTowMLhvahMAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMTI6MTI6NTUrMDE6MDDJMtKvAAAAIXRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADgweDE1KzI2NSs2MzX/MFShAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTUJYMTIgMDAzLjAwMgqB2uIdAAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvNzcxMGE2ZTliZjljZGFiNzgyMTgwYzE1MWFlMjJiZDguZHZpIC1vKQnb7QAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlLzc3MTBhNmU5YmY5Y2RhYjc4MjE4MGMxNTFhZTIyYmQ4LnBzw20c4AAAAABJRU5ErkJggg==" />, and pressure, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAARBAMAAADwJOuSAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADBQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////L2OGaQAAAA50Uk5TADOIRHe7mRHdVSLuZsxpovMrAAAAAWJLR0QAiAUdSAAAAAlwSFlzAAAAeAAAAHgAnfVaYAAAAFdJREFUCNdjYMAEjErMTgIgBhNrIns7iCHJ4cAQAmIsWLeAYRNY0QkGrjdgRhoDzyswo4eBVQFEA8VtC0AMnpY56WAZtgQpiMkcB6BWAI0BA+ZUZQbsAACgQg7m5uhWgAAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMVQyMzo1NDoxOSswMTowMM59RfQAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjFUMjM6NTQ6MTkrMDE6MDC/IP1IAAAAIHRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADl4MTArMzAxKzYzNtljscgAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS84Mzg3OGM5MTE3MTMzODkwMmUwZmUwZmI5N2E4YzQ3YS5kdmkgLW8m8b22AAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvODM4NzhjOTExNzEzMzg5MDJlMGZlMGZiOTdhOGM0N2EucHOmynKCAAAAAElFTkSuQmCC" />, will have a total of 16 unknowns per element.</em></p>
<p>COMSOL Multiphysics will use the information about the physics, material properties, boundary condition, element type, and element shape to assemble a system of equations (a <a href="http://en.wikipedia.org/wiki/Square_matrix" target="_blank">square matrix</a>), which need to be solved to get the answer to the finite element problem. The size of this matrix is the number of <em>degrees of freedom</em> (DOFs) of the model, where the number of DOFs is a function of the number of elements, the discretization order used in each physics, and the number of variables solved for.</p>
<p>These systems of equations are typically <a href="http://en.wikipedia.org/wiki/Sparse_matrix" target="_blank">sparse</a>, which means that most of the terms in the matrix are zero. For most types of finite element models, each node is only connected to the neighboring nodes in the mesh. Note that element shape matters; a mesh composed of tetrahedra will have different matrix sparsity from a mesh composed of hexahedra (brick) elements.</p>
<p>Some models will include non-local couplings between nodes, resulting in a relatively dense system matrix. Radiative heat transfer is a typical problem that will have a dense system matrix. There is radiative heat exchange between any surfaces than can see each other, so each node on the radiating surfaces is connected to every other node. The result of this is clearly seen in the plots I shared at the beginning of this blog post. The thermal model that includes radiation has much higher memory requirements than the thermal model without radiation.</p>
<p>You should see, at this point, that it is not just the number of DOFs, but also the sparsity of the system matrix that will affect the amount of memory needed to solve your COMSOL Multiphysics model. Let&#8217;s now take a look at how your computer manages memory.</p>
<h3>How Your Operating System Manages Memory</h3>
<p>COMSOL Multiphysics uses the memory management algorithms provided by the Operating System (OS) that you are working with. Regardless of which OS you are using, the performance of these algorithms is quite similar on all of the latest OS&#8217;s that we support. </p>
<p>The OS creates a <a href="http://en.wikipedia.org/wiki/Virtual_memory" target="_blank">Virtual Memory Stack</a>, which the COMSOL software sees as a continuous space of free memory. This continuous block of virtual memory can actually map to different physical locations, so some part of the data may be stored within RAM and other parts will be stored on the hard disk. The OS manages where (in RAM or on disk) that the data is actually stored, and by default you do not have any control over this. The amount of virtual memory is controlled by the OS, and it is not something that you usually want to change.</p>
<p>Under ideal circumstances, the data that COMSOL Multiphysics needs to store will fit entirely within RAM, but once there is no longer enough space, part of the data will spill over to the hard disk. When this happens, performance of all programs running on the computer will be noticeably degraded. </p>
<p>If too much memory space is requested by the COMSOL software, then the OS will determine that it can no longer manage memory efficiently (even via the hard disk) and will tell COMSOL Multiphysics that there is no more memory available. This is the point at which you will get an out-of-memory message and COMSOL Multiphysics will stop trying to solve the model. </p>
<p>Next, let&#8217;s take a look at what COMSOL Multiphysics is doing when you get this out-of-memory message and what you can do about it.</p>
<h3>When Does COMSOL Use the Most Memory?</h3>
<p>When you set up and solve a finite element problem, there are three memory intensive steps: <em>Meshing</em>, <em>Assembly</em>, and <em>Solving</em>.</p>
<ul>
<li><strong>Meshing:</strong> During the meshing step, the CAD geometry is subdivided into finite elements. The default meshing algorithm applies a free tetrahedral mesh over most of the modeling space. Free tetrahedral meshing of large complex structures will require a lot of memory. In fact, it can sometimes require more memory than actually solving the system of equations, so it is possible to run out of memory even at this step. If you do find that meshing is taking significant time and memory, then you should subdivide (or <em>partition</em>) your geometry into smaller sub-domains. Generally, the smaller the domains, the less memory intensive they are to mesh. By meshing in a sequence of operations, rather than all at once, you can reduce the memory requirements. Within the context of this blog entry, it is also assumed that there are no modeling simplifications (such as exploiting symmetry or using thin layer boundary conditions) that could be leveraged to simplify the model and reduce the mesh size.</li>
<li><strong>Assembly:</strong> During the assembly step, COMSOL Multiphysics forms the system matrix as well as a vector describing the loads. Assembling and storing this matrix requires significant memory &#8212; possibly more than the meshing step &#8212; but always less than the solution step. If you run out of available memory here, you should increase the amount of RAM in your system.</li>
<li><strong>Solving:</strong> During the solution step, COMSOL Multiphysics employs very general and robust algorithms capable of solving <a href="http://www.comsol.com/blogs/solving-multiphysics-problems/">nonlinear problems, which can consist of arbitrarily coupled physics</a>. At the very core of these algorithms, however, the software will always be solving a system of linear equations, and this can be done using either <a href="http://www.comsol.com/blogs/solutions-linear-systems-equations-direct-iterative-solvers/">direct or iterative methods</a>. So let&#8217;s look at these two methods from the point of view of when they should be used and how much memory they need.</li>
</ul>
<h3>Direct Solvers</h3>
<p>Direct solvers are very robust and can handle essentially any problem that will arise during finite element modeling. The sparse matrix direct solvers used by COMSOL Multiphysics are the <a href="http://graal.ens-lyon.fr/MUMPS/" target="_blank">MUMPS</a>, <a href="http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-7E829836-0FEF-46B2-8943-86A022193462.htm" target="_blank">PARDISO</a>, and <a href="http://www.netlib.org/linalg/spooles/spooles.2.2.html" target="_blank">SPOOLES</a> solvers. There is also a dense matrix solver, which should only be used if you know the system matrix is fully populated. </p>
<p>The drawback to all of these solvers is that the memory and time required goes up very rapidly as the number of DOFs and the matrix density increase; the scaling is very close to quadratic with respect to number of DOFs.</p>
<p>As of writing this, both the MUMPS and PARDISO direct solvers in the COMSOL software come with an <em>out-of-core</em> option. This option overrides the OS&#8217;s memory management and lets COMSOL Multiphysics directly control how much data will be stored in RAM and when and how to start writing data to the hard drive. Although this is superior to the OS&#8217;s memory management algorithm, it will be slower than solving the problem entirely in RAM.</p>
<p>If you have access to a cluster supercomputer, such as the Amazon Web Service™ Amazon Elastic Compute Cloud™, you can also use the MUMPS solver to distribute the problem over many nodes of the cluster. Although this does allow you to solve much larger problems, it is also important to realize that solving on a cluster may be <a href="http://www.comsol.com/blogs/added-value-task-parallelism-batch-sweeps/">slower than solving on a single machine</a>.</p>
<p>Due to their aggressive (approximately quadratic) scaling with problem size, the direct solvers are only used as the default for a few of the 3D physics interfaces (although they are almost always used for 2D models, for which their scaling is much better.) </p>
<p>The most common case where the direct solver is used by default is for 3D structural mechanics problems. While this choice has been made for robustness, it is also possible to use an iterative solver for many structural mechanics problems. The method for switching the solver settings is demonstrated in the example model of the <a href="http://www.comsol.com/model/stresses-and-strains-in-a-wrench-8502">stresses in a wrench</a>.</p>
<h3>Iterative Solvers</h3>
<p>Iterative solvers require relatively much less memory than the direct solvers, but they require more customization of the settings to get them to work well. </p>
<p>With all of the predefined physics interfaces where it is reasonable to do so, we have provided default iterative solver suggestions that are selected for robustness. These settings are handled automatically and do not require any user interaction, so as long as you are using the built-in physics interfaces, you do not need worry about these settings. </p>
<p>The memory and time needed by an iterative solver will be relatively much less than a direct solver for the same problem, so when they can be used, they should be. The scaling as the problem size increases is much closer to linear, as opposed to the quadratic scaling typical of the direct solvers.</p>
<p>At the time of writing this, the iterative solvers should be used on a computer that has enough RAM to solve the problem, so if you get an out-of-memory message when using an iterative solver, you should upgrade the amount of RAM on your computer. </p>
<p>It is also possible to use an iterative solver on a cluster computer using <a href="http://en.wikipedia.org/wiki/Domain_decomposition_methods" target="_blank">Domain Decomposition methods</a>. This class of iterative methods has recently been introduced into the software, so stay tuned for more details about this in the future.</p>
<h3>Predicting Memory Requirements</h3>
<p>Although the data shown above do provide an upper and lower bound of memory requirements, these bounds are quite wide. We&#8217;ve seen that introducing a small change to a model, such as introducing a non-local coupling like radiative heat transfer, can significantly change memory requirements. So let&#8217;s introduce a general recipe for how you can predict memory requirements.</p>
<p>Start with a representative model that contains the combination of physics you want to solve and approximates the true geometric complexity. Begin with as coarse a mesh as possible, and then gradually increase the mesh refinement. Alternatively, start with a smaller representative model and gradually increase the size. </p>
<p>Solve each model and monitor memory requirements. Observe the default solver being used. If it is a direct solver, use the out-of-core option in your tests, or consider if an iterative solver can be used instead. Fit a second-order polynomial to the data, and use this curve to predict the memory required by the size of the larger problem that you eventually want to solve. This is the most reliable way to predict the memory requirements of large, complex, 3D multiphysics models.</p>
<p>As we have now seen, the memory needed will depend upon (at least) the geometry, mesh, element types, combination of physics being solved, couplings between the physics, and the scope of any non-local model couplings. At this point, it should also be made clear that it is not generally possible to predict the memory requirements in all cases. You may need to repeat this procedure several times for variations of your model.</p>
<p>It is also fair to say that setting up and solving large models in the most efficient way possible is something that can require some deep expertise of not just the solver settings, but also of finite element modeling in general. If you do have a particular modeling concern, please <a href="http://www.comsol.com/support">contact your COMSOL Support Team</a> for guidance.</p>
<h3>Summary</h3>
<p>You should now have an understanding of why the memory requirements for a COMSOL Multiphysics model can vary dramatically. You should also be able to predict with confidence the memory requirements of your larger models and decide what kind of hardware is appropriate for your modeling challenges.</p>
<p><em>Amazon Web Services and Amazon Elastic Compute Cloud are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/much-memory-needed-solve-large-comsol-models/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Understanding Parallel Computing</title>
		<link>https://www.comsol.no/blogs/understanding-parallel-computing/</link>
		<comments>https://www.comsol.no/blogs/understanding-parallel-computing/#comments</comments>
		<pubDate>Tue, 07 Oct 2014 16:15:18 +0000</pubDate>
		<dc:creator><![CDATA[Walter Frei]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[COMSOL Now]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Parallel Computing]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=38099</guid>
		<description><![CDATA[People are always asking how the performance of COMSOL Multiphysics® simulation software will improve on a parallel system, especially now that large multi-core desktop computers are relatively inexpensive and it&#8217;s easy to rent time on cloud services like the Amazon Elastic Compute Cloud™. It turns out, though, that it&#8217;s not always possible to get faster performance just by throwing more hardware at the problem. To understand why, let’s take a conceptual look at computers and the algorithms COMSOL® software uses. [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>People are always asking how the performance of COMSOL Multiphysics® simulation software will improve on a parallel system, especially now that large multi-core desktop computers are relatively inexpensive and it&#8217;s easy to rent time on cloud services like the Amazon Elastic Compute Cloud™. It turns out, though, that it&#8217;s not always possible to get faster performance just by throwing more hardware at the problem. To understand why, let’s take a conceptual look at computers and the algorithms COMSOL® software uses.</p>
<p><span id="more-38099"></span></p>
<h3>A Very Simple Model of a Typical Desktop Computer</h3>
<p>Let&#8217;s start by considering a very simplified model of a computer, composed of just three parts: random access memory (RAM), which is used to store information; a processing unit, which performs mathematical operations on the information; and a <a href="http://en.wikipedia.org/wiki/Bus_%28computing%29" target="_blank">bus</a>, which transfers the data between the two.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/Schematic-parts-of-computer.png" title="" alt="Design showing parallel computing in a typical desktop computer." width="600" height="368" class="alignnone size-full wp-image-38109" /><br />
<em>Schematic of the key parts of a computer.</em></p>
<p>For the purposes of this blog post, let&#8217;s imagine that all of the data about the problem is sitting in the RAM and that this data gets moved over to the processing unit via the bus. The memory bus itself can be composed of <a href="http://en.wikipedia.org/wiki/Multi-channel_memory_architecture" target="_blank">several channels operating in parallel</a>, effectively increasing throughput. The processing unit can be composed of several chips, each of which can have several computational cores that are able to work on data simultaneously, after it has been loaded from the memory via the bus. Let&#8217;s use this as our mental model of the computer sitting on our desktop.</p>
<h3>Shall We Play a Game?</h3>
<p>Many problems in computer science can be thought of as games that we played as children. Let&#8217;s look at three of the classics.</p>
<h4>Finding Walter (Waldo)</h4>
<p>First, let&#8217;s try to find a face in the crowd, á la <a href="http://en.wikipedia.org/wiki/Where%27s_Wally%3F" target="_blank"><em>Where&#8217;s Waldo?</em></a></p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/COMSOL-Conference-photo.jpg" title="" alt="A photo of attendees at the COMSOL Conference." width="700" height="321" class="alignnone size-full wp-image-38111" /><br />
<em>A photo from the COMSOL Conference. Can you find me?</em></p>
<p>Suppose we have a photo with hundreds of people &#8212; what&#8217;s the fastest way of finding one person? </p>
<p>You could scan through the entire image by yourself, checking faces one by one to see if they match the person you are searching for. But, this can be quite slow. You can also invite your friends over to help. In this case, you would first subdivide the picture into smaller pieces. Each person can then independently work on one piece at a time.</p>
<p>In the language of computer science, we would say that this game is <em>completely parallel</em>.</p>
<p>Having two people working will halve the solution time, four people will cut the solution time in four, and so on. But, there is a limit &#8212; you can only have as many friends helping you as there are faces in the crowd. Beyond that point, inviting more people to help won&#8217;t speed up the process, and it may even slow things down.</p>
<h4>Solving a Puzzle</h4>
<p>Next, let&#8217;s try to solve a jigsaw puzzle.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/Jigsaw-puzzle.png" title="" alt="An image of a jigsaw puzzle." width="700" height="439" class="alignnone size-full wp-image-38113" /><br />
<em>Can you put together the image?</em></p>
<p>This is a bit more complicated &#8212; you can have multiple people working at once, but they cannot work independently. Each person will take a few dozen puzzle pieces for themselves from the main pile and try to fit them together with the pieces that their friends are putting together. People will try to fit their own pieces together. They will pass pieces back and forth, and they will be in constant communication with each other.</p>
<p>A computer scientist would call this a <em>partially parallel</em> game.</p>
<p>Although adding more people will decrease the solution time, it will not be a simple mathematical relationship. Suppose you have a 1,000-piece puzzle, and 10 people with 100 pieces each. They will spend relatively more time working on their own pieces and less time talking. On the other hand, if you have 100 people with 10 pieces each, there will be a lot more talking and moving pieces around. And what will happen when you have 1,000 people working on a puzzle with 1,000 pieces? Try that one at home!</p>
<p>You can probably see that for a puzzle of a certain size, there is some maximum number of people that can be working it. This number will be much lower than the number of puzzle pieces. Adding more people won&#8217;t speed things up noticeably.</p>
<h4>Stacking Blocks</h4>
<p>Finally, let&#8217;s try to stack some blocks on top of each other to form a tower, and then raise the height of the tower by taking the blocks from the lower levels and adding them to the top without causing the structure to topple over.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/JENGA-tower.jpg" title="" alt="A JENGA tower." width="300" height="447" class="alignnone size-full wp-image-38121" /><br />
<em>How high can you stack the tower? (JENGA® tower standing on one tile. &#8220;Jenga distorted&#8221; by Guma89 &#8212; Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 via <a href="http://commons.wikimedia.org/wiki/File:Jenga_distorted.jpg#mediaviewer/File:Jenga_distorted.jpg" target="_blank">Wikimedia Commons</a>.)</em></p>
<p>In this game, only one person can play at a time, and we can say that the game is <em>completely serial</em>.</p>
<p>Playing with more people won&#8217;t finish the game any faster, and if you invite too many people to play, some of them will never get a chance to do anything. In fact, it&#8217;s probably fastest (albeit not very sociable) to play this game by yourself.</p>
<h3>How Does This Relate to COMSOL Multiphysics®?</h3>
<p>You can probably already see the relationships between playing these games and using COMSOL Multiphysics. How about we start classifying the problems you solve in COMSOL Multiphysics into these categories:</p>
<ul>
<li><strong>Partially Parallel</strong> &#8212; All problems in COMSOL Multiphysics have a partially parallel component. <a href="http://www.comsol.com/blogs/solutions-linear-systems-equations-direct-iterative-solvers/">Solving a system of linear equations</a> is a partially parallelizable problem. Thus, no matter what class of problem you are solving, some (usually significant) fraction of the solution time is spent solving a partially parallel problem. For problems where you are solving a stationary, frequency-domain, or eigenfrequency problem only, almost all of the time is spent solving the system of linear equations.</li>
<li><strong>Completely Parallel</strong> &#8212; A completely parallel problem arises when you use the <em>Parametric Sweep</em> functionality, such as when sweeping over a range of geometric dimensions. Each step in the parameter sweep will need to solve a partially parallelizable problem, but each parameter can be solved independently, and if you solve the parameters independently there is no information exchanged between the various cases.</li>
<li><strong>Completely Serial</strong> &#8212; A serial problem arises when subsequent parts of the solution depend on previously computed values. Time-dependent models, models using continuation methods, and optimization problems fall into this category. All such models still need to solve a system of linear equations, but they do so sequentially. The possible speedup is mainly governed by the speedup possible when solving the partially parallel system of linear equations.</li>
</ul>
<h4>How Does This Relate to My Desktop Hardware?</h4>
<p>When solving, COMSOL Multiphysics is spending most of its time solving a partially parallel problem. So, what actually happens in the hardware?</p>
<p>The COMSOL software starts with the information about the loads, boundary conditions, material properties, and finite element mesh and generates data that is used during the solution process. Many gigabytes of memory are needed to store the system matrices, generate intermediate information, and compute the solution. Ideally, all of this information should be stored in the RAM.</p>
<blockquote><p>It is worth noting that COMSOL Multiphysics does offer hard-disk based solvers as well, but these will be slower than when data is accessed directly from the RAM. Their advantage is that they allow you to solve larger problems.</p></blockquote>
<p>Of course, this data has to be operated on by the processors, so it turns out that the bottleneck in the solution on a desktop computer is actually the bandwidth of the memory bus &#8212; much more so than the processor speed, or even the number of processor cores. </p>
<h4>What About Clusters?</h4>
<p><a href="http://en.wikipedia.org/wiki/Computer_cluster" target="_blank">Cluster computers</a> are really nothing more than ordinary computers, or <em>nodes</em>, connected with an additional communication layer.</p>
<p>Let&#8217;s assume that we are working with a cluster where each node is equivalent in performance to that of our single computer. Data passes between the nodes via the interconnect hardware. The interconnect speed is dependent upon not just the type of hardware, but also the physical configuration. In sum, it is usually slower than the memory bus speed on any individual node. This introduces an additional consideration.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/10/Model-of-cluster.png" title="" alt="A model of cluster with four computer nodes." width="600" height="356" class="alignnone size-full wp-image-38117" /><br />
<em>A simple model of a cluster with four compute nodes.</em></p>
<p>We have already seen that the partially parallelizable case is the most important to understand, so we&#8217;ll focus on that.</p>
<p>On a cluster, we would say that we are solving this problem in a <em>distributed parallel</em> sense. In the context of our game of putting together a puzzle, we can think of this as grouping several of our friends in different rooms of our house, and giving them each a stack of pieces. You would now additionally need to send pieces and information back and forth between different rooms.</p>
<p>COMSOL Multiphysics adjusts the solution algorithm to maximize the amount of work done locally and minimize the amount of data that is passed back and forth. These distributed parallel solvers, which are available when you use the Floating Network License, adjust the solution algorithm to efficiently split the problem up onto the different nodes of the cluster. Again, we can see that there is a limit. If there are too many nodes involved, we will just be communicating data back and forth all the time. So, for each particular problem, there is some number of nodes beyond which <a href="http://www.comsol.com/blogs/added-value-task-parallelism-batch-sweeps/">solution speed will not improve</a>.</p>
<p>Now, if you have a completely parallel problem, such as a parametric sweep, where each step in the sweep can be solved entirely within the RAM available on one node, then a cluster is an almost perfect way to speed up your modeling. You could use up to as many nodes as there are parameter values that you want to sweep over.</p>
<h3>Summary of Parallel Computing and COMSOL Multiphysics®</h3>
<p>You should now have an understanding of the different types of problems that COMSOL Multiphysics solves, in terms of parallelization and how these relate to performance.</p>
<p>When working on a single computer, the performance bottleneck is the bus speed rather than the clock speed and number of processors. For desktop machines, we also publish some more specific hardware purchasing guidelines in our <a href="http://www.comsol.com/support/knowledgebase/866/">Knowledge Base</a>. For cluster computers, performance can be much more variable, depending on problem size, cluster architecture, and the type of problem being solved. If you want more technical details about clusters, please see this <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">series of blogs on hybrid parallel computing</a>. </p>
<p><em>Amazon Web Services, the “Powered by Amazon Web Services” logo, and Amazon Elastic Compute Cloud are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.</p>
<p>JENGA® is a registered trademark owned by Pokonobe Associates.</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/understanding-parallel-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modeling Convective Cooling of Electrical Devices</title>
		<link>https://www.comsol.no/blogs/modeling-convective-cooling-electrical-devices/</link>
		<comments>https://www.comsol.no/blogs/modeling-convective-cooling-electrical-devices/#comments</comments>
		<pubDate>Mon, 23 Jun 2014 04:01:38 +0000</pubDate>
		<dc:creator><![CDATA[Fabian Scheuren]]></dc:creator>
				<category><![CDATA[AC/DC & Electromagnetics]]></category>
		<category><![CDATA[CAD Import & LiveLink for CAD Products]]></category>
		<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[Computational Fluid Dynamics (CFD)]]></category>
		<category><![CDATA[Electrical]]></category>
		<category><![CDATA[Fluid]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Heat Transfer & Phase Change]]></category>
		<category><![CDATA[Interfacing]]></category>
		<category><![CDATA[AC/DC Module]]></category>
		<category><![CDATA[CFD Module]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=33343</guid>
		<description><![CDATA[One of the main issues with high-power electrical devices is thermal management. Together with BLOCK Transformatoren-Elektronik GmbH, we created a model using COMSOL Multiphysics simulation software that encompasses all of the important details when modeling heating of high-power electrical devices. To do so, we had to utilize high performance computing (HPC) with hybrid modeling. Here, we will discuss how to approach this real-life task with the COMSOL software. Modeling Thermal Management: Test Set-Up Our test set-up consists of a copper [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>One of the main issues with high-power electrical devices is thermal management. Together with <a href="http://block.eu/en_US/home/" target="_blank">BLOCK Transformatoren-Elektronik GmbH</a>, we created a model using COMSOL Multiphysics simulation software that encompasses all of the important details when modeling heating of high-power electrical devices. To do so, we had to utilize high performance computing (HPC) with <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">hybrid modeling</a>. Here, we will discuss how to approach this real-life task with the COMSOL software.</p>
<p><span id="more-33343"></span></p>
<h3>Modeling Thermal Management: Test Set-Up</h3>
<p>Our test set-up consists of a copper coil wound around a laminated iron core with some plastic and aluminum parts for stability. A conventional computer fan is placed one meter away from it. The occurring electromagnetic losses have to be calculated as well as the turbulent non-isothermal fluid flow around the device. The iron core has an air gap, which is intentionally included in order to analyze the influence it has on currents inside the coil and aluminum parts.</p>
<div class="row">
<div class="spanWP-side-by-side">
<img src="https://cdn.comsol.com/wordpress/2014/06/Inductor-device.png" title="Inductor device" alt="Inductor device" /></p>
<p><em>The inductor device.</em></p>
</div>
<div class="spanWP-side-by-side">
<img src="https://cdn.comsol.com/wordpress/2014/06/Test-set-up.png" alt="Test set-up" title="" width="400" height="248" class="alignnone size-full wp-image-33349" /></p>
<p><em>Schematic of the test set-up.</em></p>
</div>
</div>
<h3>First Things First</h3>
<p>Engineers &#8212; especially those working within project deadlines &#8212; are always looking for the right balance between computational (and modeling) efforts and accuracy. Therefore, it is a good idea to start with thinking of a suitable simplification, since the aspect ratio of the model geometry is quite challenging.</p>
<p>The distance between the fan and device is roughly one meter, while the interior gaps between the copper winding are about 0.1 millimeters, resulting in an aspect ratio of 10,000. In order to keep the processing time as low as possible, we choose a submodeling approach. A first model with a simplified transformer geometry is used to calculate the large-scale flow field around the device. Due to symmetry, only half of the geometry is modeled. The results of this model are exported and used as an inlet condition for the following step.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/06/velocity-field-streamline-plot.png" alt="A streamline plot generated with COMSOL Multiphysics of the velocity field" title="" width="650" height="433" class="alignnone size-full wp-image-33367" /><br />
<em>Streamline plot of the velocity field. This field was used as an inlet boundary condition in the detailed model (at the position of the slice plot).</em></p>
<h3>Detailed Geometry</h3>
<p>The geometry of the detailed electrical device is built in SolidWorks® software and imported into COMSOL Multiphysics® via the <a href="http://www.comsol.com/cad-import-module">CAD Import Module</a>. Only a small part is used for the non-isothermal flow calculation in the detailed submodel (about 400 mm by 900 mm). The electromagnetic part needs to be solved for an even smaller domain (200 mm by 200 mm).</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/06/Detailed-geometry-of-the-inductor-device.png" title="" alt="Screenshot of a detailed geometry of the inductor device" width="650" height="433" class="alignnone size-full wp-image-33369" /></p>
<h3>Modeling the Laminated Iron Core</h3>
<p>The iron core is laminated in order to reduce eddy currents. We&#8217;ll use the <a href="http://www.comsol.com/paper/homogenization-approaches-for-laminated-magnetic-cores-using-the-example-of-tran-15452">same approach as described by TU Dresden &amp; ABB</a>. The material is homogenized and defined with an orthotropic electrical conductivity. This allows us to keep a single domain and a coarser mesh rather than resolving the lamination geometrically with all small plates.</p>
<h3>Electromagnetic Losses</h3>
<p>Due to the alternating current at 500 Hz, inductive effects in the coil (skin and proximity effect) have to be resolved. Additionally, eddy currents in the aluminum plates and iron core will heat up the device.</p>
<div class="row">
<div class="spanWP-side-by-side">
<a href="https://cdn.comsol.com/wordpress/2014/06/Eddy-currents-surface-plot.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2014/06/Eddy-currents-surface-plot.png" alt="Surface plot generated by COMSOL Multiphysics of the eddy currents in the aluminum plates" title="Eddy currents in the aluminum plates (Click to enlarge)" width="770" height="513" class="alignnone size-full wp-image-33375" /></a></p>
<p><em>Surface plot of the eddy currents in the aluminum plates. The air gap in the iron core is highlighted in red. Most currents are induced close to this gap.</em></p>
</div>
<div class="spanWP-side-by-side">
<a href="https://cdn.comsol.com/wordpress/2014/06/Slice-plot-of-current-density.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2014/06/Slice-plot-of-current-density.png" alt="Current density inside the copper coil shown in a slice plot" title="Current density in the copper coil (Click to enlarge)" width="770" height="513" class="alignnone size-full wp-image-33377" /></a></p>
<p><em>Slice plot of the current density inside the copper coil. The air gap within the iron core is shown in red.</em></p>
</div>
</div>
<p>Due to hysteresis, there are also some magnetization losses. These are quite small in comparison to the eddy current losses and are not explicitly solved for. The table below shows the magnetization losses as functions of the magnetic flux density Q<sub>mag</sub> = f(B). We could use an interpolation function instead of solving hysteresis time-dependently.</p>
<table class="table-blog" cellpadding="4px">
<tr>
<th>
Part
</th>
<th>
<p style="text-align:right !important;padding:0;margin:0">Electromagnetic losses</p>
</th>
</tr>
<tr>
<td>
Copper coil
</td>
<td align="right">
<p style="text-align:right !important;padding:0;margin:0">37.2 W</p>
</td>
</tr>
<tr>
<td>
Aluminum, eddy currents
</td>
<td align="right">
<p style="text-align:right !important;padding:0;margin:0">36.2 W</p>
</td>
</tr>
<tr>
<td>
Laminated core, eddy currents
</td>
<td align="right">
<p style="text-align:right !important;padding:0;margin:0">0.02 W</p>
</td>
</tr>
<tr>
<td>
Laminated core, magnetic losses
</td>
<td align="right">
<p style="text-align:right !important;padding:0;margin:0">0.004 W</p>
</td>
</tr>
</table>
<h3>Velocity Field and Temperature Distribution</h3>
<p>The device reaches a maximum temperature of 125°C on the backside of the coil.</p>
<div class="row">
<div class="spanWP-side-by-side">
<a href="https://cdn.comsol.com/wordpress/2014/06/Velocity-field-and-temperature-distribution.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2014/06/Velocity-field-and-temperature-distribution.png" alt="Surface plot of the temperature distribution and streamline plot of the velocity field" title="Velocity field and temperature distribution (Click to enlarge)" width="770" height="513" class="alignnone size-full wp-image-33391" /></a></p>
<p><em>Streamline plot of the velocity field and surface plot of the temperature distribution.</em></p>
</div>
<div class="spanWP-side-by-side">
<a href="https://cdn.comsol.com/wordpress/2014/06/Close-view-of-inductor-device-velocity-field.png" target="_blank"><img src="https://cdn.comsol.com/wordpress/2014/06/Close-view-of-inductor-device-velocity-field.png" alt="Close view of inductor device velocity field" title="Velocity field (Click to enlarge)" width="770" height="513" class="alignnone size-full wp-image-33393" /></a></p>
<p><em>Alternative view of the velocity field.</em></p>
</div>
</div>
<h3>Best Choice for Multiphysics High Performance Computing</h3>
<p>Today, our task was to find the best solution for computing thermal designs of transformers. In the case of <a href="http://www.block.eu/en_US/home/" target="_blank">BLOCK Transformatoren</a>, they decided that COMSOL Multiphysics was the most suitable for their application after comparing the handling and results of several simulation tools.</p>
<p>In the end, this model involved simultaneous solving for a maximum of 8 million degrees of freedom (DOFs), using a robust combination of direct and iterative solvers. Memory (RAM) usage peaked at 89 GB of memory.</p>
<p>In order to be able to solve highly complex models, they chose the <a href="http://www.comsol.com/rtgplus">Ready-to-Go+ (RTG+) package</a> with a benchmarked cluster for optimal performance. With everything being set for advanced simulations at BLOCK, we can expect their products to be pushed even closer to the limit in the future.</p>
<p><em>SolidWorks is a registered trademark of Dassault Systèmes SolidWorks Corp.</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/modeling-convective-cooling-electrical-devices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a Beowulf Cluster for Faster Multiphysics Simulations</title>
		<link>https://www.comsol.no/blogs/building-beowulf-cluster-faster-multiphysics-simulations/</link>
		<comments>https://www.comsol.no/blogs/building-beowulf-cluster-faster-multiphysics-simulations/#comments</comments>
		<pubDate>Fri, 11 Apr 2014 18:07:43 +0000</pubDate>
		<dc:creator><![CDATA[Pär Persson Mattsson]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Clusters]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=29707</guid>
		<description><![CDATA[Many of us need up-to-date software and hardware in order to work efficiently. Therefore, we need to follow the pace of technological development. But, what should we do with the outdated hardware? It feels wasteful to send the old hardware to its grave or to just put it in a corner. Another, more productive, solution is to use the old hardware to build a Beowulf cluster and use it to speed up computations. About Beowulf Clusters In 1994, a group [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Many of us need up-to-date software and hardware in order to work efficiently. Therefore, we need to follow the pace of technological development. But, what should we do with the outdated hardware? It feels wasteful to send the old hardware to its grave or to just put it in a corner. Another, more productive, solution is to use the old hardware to build a Beowulf cluster and use it to speed up computations.</p>
<p><span id="more-29707"></span></p>
<h3>About Beowulf Clusters</h3>
<p>In 1994, a group of researchers at NASA built a small cluster consisting of normal workstations. They called this cluster, or parallel workstation, <a href="http://www.phy.duke.edu/~rgb/brahma/Resources/beowulf/papers/ICPP95/icpp95.html" target="_blank">Beowulf</a>. Since then, the term <em>Beowulf cluster</em> has been used to describe clusters that are built up from commodity hardware (for example, normal workstations), using open source software. The definition is quite loose regarding the computer hardware and network interconnects. The most important point is that the workstations are no longer used as workstations, but are used as nodes in a High Performance Computing (<a href="http://www.comsol.com/multiphysics/high-performance-computing">HPC</a>) cluster, instead.</p>
<p>Beowulf clusters can be used to compute all kinds of problems, but as we have mentioned earlier in the <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">Hybrid Modeling series</a>, in order to take advantage of the added work power in the cluster, the problem has to be parallelizable. Therefore, Beowulf clusters have been used to compute particle simulations, genetics problems, and &#8212; probably most interesting for us COMSOL Multiphysics users &#8212; parametric sweeps and large matrix multiplications.</p>
<p>But, why would we want to build a cluster using non-HPC hardware? One reason might be “because we already have the hardware”. For example, after an office-wide workstation or laptop upgrade has taken place, we might not know what to do with the old, outdated computers, but we still don’t want to throw them away. An alternative could be to use the concentrated computational power of idle workstations after office hours or on the weekend.</p>
<h3>What Do We Need to Set Up the Cluster?</h3>
<p>First of all, we need the hardware that we are going to use. For this blog post, we used our old faithful laptops as nodes, but we could just as well have used workstations or old servers. Either way, when setting up a Beowulf cluster, we should try to choose the nodes in such a way that they have similar hardware. Our laptops are no longer &#8220;performance monsters&#8221;; they are equipped with an Intel® T2400 @1.83GHz processor and 2 GB of RAM each. They are also all supplied with Ethernet network cards, so we used these to connect them together. To do this, we also need a switch. In our case, we used an old HP® 1800 switch, but, even here, we could use normal commodity hardware (such as a home office five-port switch), depending on how many nodes we are going to use.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/04/Beowulf-cluster.jpg" alt="A Beowulf cluster consisting of an old switch and six old laptops" title="" width="770" height="867" class="alignnone size-full wp-image-29717" /><br />
<em>Our Beowulf cluster, built from six old laptops and an old switch.</em></p>
<p>Since a Beowulf cluster (according to the above definition) needs an open source operating system, we installed a Linux® distribution on the laptops. Although there are <a href="http://en.wikipedia.org/wiki/Beowulf_cluster" target="_blank">specially designed operating systems for Beowulf cluster computing</a>, it is possible to use a standard server operating system (e.g. Debian®).</p>
<p>When the set-up of the hardware, network, operating system, and a shared file system is done, the only step left is to install the software &#8212; COMSOL Multiphysics®. No further installation of a Message Passing Interface (MPI) or scheduler is necessary, since the COMSOL software contains all it needs in order to compute on a cluster.</p>
<h3>Setting Up the Beowulf Cluster and Installing COMSOL Multiphysics</h3>
<p>For our set-up, we chose Debian® Stable 6, which is one of the distributions supported by COMSOL Multiphysics at the time of writing this blog post. Next, we set up the systems. In our scenario, we tried to keep the installation as slim as possible by only installing the basic system with an additional SSH Server to get access to the cluster over the network. A desktop environment was not needed in our case; it would have reduced the performance of our Beowulf system.</p>
<p>After a successful installation of the operating system, we needed to set up the network and, of course, the shared file system for the compute nodes. For the shared file system, we installed the NFS server on the first node, which operates as the head node. Then, we exported the locations for the shared file system from there.</p>
<p>Here is one example of a set-up:</p>
<pre>
/srv/data/comsolapp     For the COMSOL application
/srv/data/comsoljobs    For the COMSOL cluster jobs storage <strong><span style="color:#336699">for</span></strong> the users
</pre>
<p>On the compute nodes, we mounted these shares automatically.</p>
<p>Since we have no desktop environment installed on our systems, we need to use the automated installer (see page 77 of the <a href="http://static.comsol.com/doc/COMSOL_InstallationGuide.pdf" target="_blank">COMSOL Multiphysics Installation Guide</a>). For our purposes, we used the “setupconfig.ini” file from the installation media and edited it for our needs.</p>
<p>The most important step here is to set the “showgui” option to “0” instead of “1”. Another important aspect is the destination path. Here, we chose the network share because it is much easier to maintain and upgrade to new versions of COMSOL Multiphysics.</p>
<p>To start the installation, just add the parameter &#8220;-s /path/to/the/setupconfig.ini&#8221;, for example:</p>
<pre>
cd /media/cdrom/
./setup –s /path/to/the/setupconfig.ini
</pre>
<p>Now, the text-based installer starts and the output is sent to the terminal.</p>
<p>To let COMSOL Multiphysics know what compute nodes can be used, we need to write a simple &#8220;mpd.hosts&#8221; file containing the list of hostnames:</p>
<pre>
<strong><span style="color:#000000">mpd.hosts</span></strong>
cn01
cn02
...
cn06
</pre>
<p>Finally, we start the COMSOL Server on the first node, with six nodes:</p>
<pre>
//comsol server -f mpd.hosts -nn <span style="color:#009900">6</span> -multi on
</pre>
<p>Now you can start COMSOL Multiphysics on your desktop and connect to the server.</p>
<h3>The Results Are in: Old Hardware, Increased Productivity</h3>
<p>To test our &#8220;brand new&#8221; cluster, we chose a modified version of the <a href="http://www.comsol.com/model/tuning-fork-computing-the-eigenfrequency-and-eigenmode-8499">Tuning Fork model</a>, available in the Model Gallery. For our test run, we decided to increase the number of parameters computed in the parametric sweep to 48. We then computed the model using the COMSOL Multiphysics <em>batch</em> command, letting it use one to six laptops. You can see the measured total simulations per day in the graph below.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/04/Beowulf-cluster-productivity.png" alt="Bar graph showing an increase in productivity when using Beowulf clusters with COMSOL Multiphysics" title="" width="600" height="379" class="alignnone size-full wp-image-29723" /><br />
<em>The productivity increase (Jobs/Day), taking into account the total time from opening the file to saving the result for the different number of laptops used.</em></p>
<p>As we can see, if we use six laptops, we reach almost 140 jobs per day in comparison to just under 40 jobs per day when using just one laptop. All in all, this gives us a speedup of almost 3.5. That&#8217;s impressive considering that we are using old laptops.</p>
<p>We have to note, though, that the measured time is not the solution time, but the total time for running the simulations. This includes opening, computing, and saving the model. Opening and saving is serial in nature, and due to Amdahl&#8217;s law (mentioned in our earlier blog post about <a href="http://www.comsol.com/blogs/added-value-task-parallelism-batch-sweeps/">batch sweeps</a>), this means that we do not see the true speedup of the solver. If we were to connect to our Beowulf cluster with the COMSOL Client/Server functionality and then compare the computation times, we would obtain an <em>even larger</em> productivity increase compared to the numbers above.</p>
<p>As a conclusion, this means we can, indeed, use old hardware together with COMSOL Multiphysics to increase our productivity and speed up computations (especially parametric ones).</p>
<p><em>Debian is a registered trademark of Software in the Public Interest, Inc. in the United States.<br />
HP is a registered trademark of Hewlett-Packard Development Company, L.P.<br />
Intel is a trademark of Intel Corporation in the U.S. and/or other countries.<br />
Linux is a registered trademark of Linus Torvalds.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/building-beowulf-cluster-faster-multiphysics-simulations/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automate Your Modeling Tasks with the COMSOL API for use with Java®</title>
		<link>https://www.comsol.no/blogs/automate-modeling-tasks-comsol-api-use-java/</link>
		<comments>https://www.comsol.no/blogs/automate-modeling-tasks-comsol-api-use-java/#comments</comments>
		<pubDate>Thu, 27 Mar 2014 18:06:27 +0000</pubDate>
		<dc:creator><![CDATA[Thorsten Koch]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Technical Content]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=29095</guid>
		<description><![CDATA[To keep up with today&#8217;s fast-paced development cycles, R&#38;D engineers and scientists need efficient tools to provide answers quickly and free them from routine tasks. COMSOL Multiphysics® has built-in features like parametric sweeps to increase simulation productivity. In addition to graphical modeling, COMSOL offers an Application Programming Interface (API) that you can use to automate any repetitive modeling step. Here&#8217;s how to get started with the COMSOL API for use with Java®. Intro to the COMSOL API The COMSOL API [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>To keep up with today&#8217;s fast-paced development cycles, R&amp;D engineers and scientists need efficient tools to provide answers quickly and free them from routine tasks. COMSOL Multiphysics® has built-in features like parametric sweeps to increase simulation productivity. In addition to graphical modeling, COMSOL offers an Application Programming Interface (API) that you can use to automate any repetitive modeling step. Here&#8217;s how to get started with the COMSOL API for use with Java®.</p>
<p><span id="more-29095"></span></p>
<h3>Intro to the COMSOL API</h3>
<p>The COMSOL API is an interface to all algorithms and data structures that define a COMSOL model. When you set up a model with the COMSOL Desktop®, you interact with the COMSOL API behind the scenes. LiveLink™ <em>for</em> MATLAB®, the topic of a <a href="http://www.comsol.com/blogs/solutions-starting-point-values-livelink-matlab/">recent blog post</a>, also operates using the COMSOL API but in an interactive fashion rather than compiled. Today, we focus on the COMSOL API for use with Java®.</p>
<h3>COMSOL Desktop as Code Generator</h3>
<p>You don&#8217;t have to be an expert Java® programmer to get started with the COMSOL API. You can dive right in from the tool you are already working with &#8212; the COMSOL Desktop. Each action you perform in this graphical modeling environment gets recorded in the model history. You can export this history as Java code by saving your model and choosing &#8220;Model File for Java®&#8221; as the file type. This method is useful to create building blocks for your program.</p>
<h4>Hello World!</h4>
<p>Let&#8217;s start with a simple example to get familiar with the process: the COMSOL API version of a &#8220;Hello, World!&#8221; program.</p>
<p>In the COMSOL Desktop, create a model that only contains a 3D geometry. Add a block to the geometry with dimensions of 0.1m x 0.2m x 0.5m and save a &#8220;Model File for Java®&#8221; with the name &#8220;HelloWorld.java&#8221;.</p>
<p>If you open the output in a text editor, it will look like this:</p>
<pre>import com.comsol.model.*; 
import com.comsol.model.util.*;

public class HelloWorld {

   public static void main(String[] args) {
      run(); 
   }
   public static Model run() {
      Model model = ModelUtil.create("Model"); 
      model.modelNode().create("comp1"); 
      model.geom().create("geom1", 3); 
      model.geom("geom1").feature().create("blk1", "Block"); 
      model.geom("geom1").feature("blk1").set("size", new String[]{"0.1", "0.2", "0.5"});
      model.geom("geom1").run("fin");
      return model; 
   } 
}
</pre>
<p>The first two lines are <code>import</code> statements that point to the COMSOL API. This is followed by the definition of the class <code > HelloWorld</code>. The class name is the same as the file name, as is the rule in Java programming.</p>
<p>The class contains a <code>main()</code> method that in turn calls a static <code>run()</code> method to build and return a <code > Model</code> object. For small programming projects, you can directly modify this method. In other words, you don&#8217;t necessarily have to use the more advanced object-oriented features of Java.</p>
<h4>The Compact History Function</h4>
<p>The COMSOL Desktop offers another feature that is useful when generating code: the &#8220;Compact History&#8221; function in the File menu. When you set up a model, you usually add some features that you later remove or move around. All of these modifications get recorded in the model history, which then holds many unnecessary steps.</p>
<p>The &#8220;Compact History&#8221; function cleans up the history, removes duplicate and deleted entries, and reorders everything according to the order in the Model Builder. If you use the function before exporting, you will get clean code.</p>
<p>Why not automatically clean up the history before saving a Java® file then? Well, it is sometimes useful not to compact the history.</p>
<p>Suppose you are developing code that uses the COMSOL API and it turns out you need some parts that you can quickly set up with the COMSOL Desktop. You start modifying the model you have been working with and save it as a Java file. Fortunately, you have not compacted the model history before saving and can therefore easily find all the changes right at the end of the exported code. That is much easier than trying to spot changes otherwise scattered throughout the entire model code.</p>
<h3>Compiling and Running COMSOL API Code</h3>
<p>Java® is a compiled language and before you can do useful things with your code, you have to compile it in a class file. For this you need a Java compiler, such as the free <a href="http://www.oracle.com/java/" target="_blank">Java® Development Kit (JDK)</a>.</p>
<p>Once you have installed a JDK, you can use the</p>
<p><code>comsolcompile </code>(<code > comsol compile</code> on Linux® or Mac®)</p>
<p>command, which is part of a COMSOL software installation, to compile your code. The command automatically sets up the path to the COMSOL API for the Java® compiler.</p>
<p>To compile the example above, use the command</p>
<p><code>comsolcompile -jdkroot PATH_TO_JDK HelloWorld.java</code></p>
<p>Here, <code>PATH_TO_JDK</code> is the directory where you have installed the JDK. Note that the COMSOL API adheres to Java® 1.5 and the above method works with JDK 1.5 or 1.6.</p>
<p>You can also use an Integrated Development Environment (IDE), such as <a href="http://www.eclipse.org/" target="_blank">Eclipse™</a>. Set up your projects with Java 1.5 compatibility and add all JAR files in the &#8220;plugins&#8221; directory of the COMSOL Multiphysics® installation to your build path.</p>
<p>Once you have compiled your code into a class file, you can open it in the COMSOL Desktop via the File &gt; Open menu. If you do that with the example above, you will see your &#8220;Hello World&#8221; model with one block in a 3D geometry. The same result can be achieved with a regular COMSOL mph model file. Next, we want to change that and do something beyond the capabilities of a regular model file.</p>
<p>But, before we go ahead and modify the example, let&#8217;s examine the structure and meaning of the code in the <code>run()</code> method above.</p>
<h3>The COMSOL API in a Nutshell</h3>
<p>The &#8220;Hello World&#8221; example can already teach us some of the most important aspects of working with the COMSOL API. Let&#8217;s go through the run() method to understand what&#8217;s happening.</p>
<p>The first line,</p>
<p><code>Model model = ModelUtil.create("Model");</code></p>
<p>creates a new model with <code>ModelUtil.create()</code>, &#8212; a static method that takes a name (the <code>String "Model"</code>)  as an argument. <code > ModelUtil</code> is the COMSOL API&#8217;s little helper, a collection of utility methods. You can use this to load models or create new ones from scratch, for example.</p>
<p><code>ModelUtil.create()</code></p>
<p>returns a <code>Model</code> object. This object holds all the settings of a COMSOL model, i.e. it contains the entire model tree that you usually see in the Model Builder of the COMSOL Desktop.</p>
<p>The line </p>
<p><code>model.modelNode().create("comp1");</code></p>
<p>creates a new component node in the model tree. A model component has an associated geometry that is added with the line </p>
<p><code>model.geom().create;("geom1", 3);</code></p>
<p>The second argument (the numeral 3) makes the component geometry 3D.</p>
<p>Note that the first argument for both <code>create()</code> methods is a String, the so-called <em>tag</em>. Tags are used everywhere in the <code > model</code> object to uniquely identify the features that are contained in the model. This is necessary because a model can contain many features of the same type. You can, for instance, have more than one component in your model, each with its own geometry. The component geometry might contain many primitive shapes of the same type. The physics settings might then use many boundary conditions of the same type, and so on. Thus, giving each item a unique tag is a way of housekeeping.</p>
<p>You can make the tags of any COMSOL Multiphysics model visible in the COMSOL Desktop by enabling the &#8220;Show Name and Tag&#8221; or &#8220;Show Type and Tag&#8221; setting in the &#8220;Model Builder Node Label&#8221; menu of the Home tab.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Tags-displayed-in-the-Model-Builder.png" alt="Tags displayed in the Model Builder" title="" width="550" height="262" class="alignnone size-full wp-image-29103" /><br />
<em>Tags are displayed in the Model Builder of the COMSOL Desktop when the &#8220;Show Name and Tag&#8221; or &#8220;Show Type and Tag&#8221; option is selected from the Model Builder Node Label settings.</em></p>
<p>The next line of code,</p>
<p><code>model.geom("geom1").feature().create("blk1", "Block");</code>;</p>
<p>creates a block in the first geometry <code>"geom1"</code>.</p>
<p>You can recognize the hierarchy of the model tree in this line. The first part, <code>model.geom("geom1")</code> associates the command with geometry <code>"geom1"</code> and the second part, <code > feature().create("blk1", "Block")</code> adds a new feature to it. The feature is a block identified with the tag <code>"blk1"</code>. Thinking in terms of the COMSOL Desktop, you can picture the first part as a right-click on <code>"geom1"</code>, and the second part as choosing Block from the geometry menu that pops up.</p>
<p>After the block is created, its properties are modified. This happens in the line</p>
<p><code>model.geom("geom1").feature("blk1").set("size", new String[]{"0.1", "0.2", "0.5"});</code></p>
<p>Again, the first part identifies the first block <code>"blk1"</code> of <code>"geom1"</code>, and the second part modifies the size property with a <code > set()</code> method.</p>
<p>The first argument of the latter specifies the property that you want to change, &#8220;size&#8221; in this case. The second argument assigns the new values &#8212; in this particular case that would be the width, height, and length properties of the block.</p>
<blockquote><p>Note, that although these properties are set to real numbers, the argument is passed as an array of strings. Why is that? Remember, that in the COMSOL software, you can type in a mathematical expression instead of specific numbers anywhere you want. This is also true for the COMSOL API, and for that reason such properties are passed as strings.</p></blockquote>
<p>Finally, before the new model is returned, the line </p>
<p><code>model.geom("geom1").run("fin");</code></p>
<p>runs the finalization of the geometry, just like when you press the &#8220;Build All&#8221; button in the COMSOL Desktop.</p>
<p>This is the COMSOL API in a nutshell and all you need to know to get started. There are, of course, many more details, but you can always pick them up as you go along, by using the COMSOL Desktop as a code reference together with the documentation of the COMSOL API for use with Java®.</p>
<h3>Example: Building a Spiral Inductor geometry</h3>
<p>To illustrate an application of the COMSOL API, we can look at the <a href="http://www.comsol.com/model/integrated-square-shaped-spiral-inductor-129">model of a spiral inductor</a> from our Model Gallery.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/COMSOL-API-example-model.png" alt="COMSOL API example model" title="" width="1231" height="588" class="alignnone size-full wp-image-29119" /></p>
<p>The aim of the model is to compute the self inductance for the given coil design. If you look at the geometry closely, you can see that it is built up from an arrangement of blocks. In the model, the dimensions of the spiral are fixed to particular values. When designing such a device, it is useful to look at different configurations, like inductors with different cross sections and number of windings, for instance. To this end, you can set up the geometry parametrically in the COMSOL Desktop.</p>
<p>Parameterizing the cross section is easy to do there. The number of windings, on the other hand, cannot be parameterized in the COMSOL Desktop, because the length of a piece of wire changes every third turn. As you saw above, however, you can use the COMSOL API to specify all properties of a block in your Java® program. You can use that to build a spiral conductor geometry automatically.</p>
<p>From the &#8220;Hello World&#8221; example above, you already know how to build one block. To build a spiral conductor, you create many blocks with different size properties and orientations and arrange them in the desired configuration. In the code, you need a couple of variables to keep track of:</p>
<table cellpadding="7px">
<tr>
<th>Variable</th>
<th>Code</th>
</tr>
<tr>
<td>Cross section</td>
<td><code>( wire_width and wire_height)</code></td>
</tr>
<tr>
<td>Length of a piece of wire</td>
<td><code>(piece_length)</code></td>
</tr>
<tr>
<td>Position</td>
<td><code>(pos_x and pos_y)</code></td>
</tr>
<tr>
<td>Orientation</td>
<td><code>(rotation_angle)</code></td>
</tr>
<tr>
<td> Wire spacing</td>
<td><code>(inner_spacing and loop_spacing)</code></td>
</tr>
<tr>
<td>Number of turns</td>
<td><code>(n_loop)</code></td>
</tr>
</table>
<p>You then generate the blocks that make up the spiral in a loop.</p>
<p>As you now know, you need a unique tag to identify each block. You could create that yourself, by concatenating a block counter to a base string, such as <code>"blk"</code>. The better alternative, however, is to use the <code > uniquetag()</code> method provided by the COMSOL API. This method does the same thing, but keeps track of counters internally and makes sure that a tag is not used twice.</p>
<p>In the code snippet below, the tag for each block is generated with </p>
<p><code>model.geom("geom1").feature().uniquetag("blk")</code></p>
<p>With the new unique tag, you now create a block and set its properties. Aside from the <code>"size"</code>, you also have to change the <code>"pos"</code> and <code>"rot"</code> properties for the position and orientation respectively. After that, you update the variables for the next iteration.</p>
<div class="wistia_responsive_padding" style="padding:75.0% 0 0 0;position:relative;">
<div class="wistia_responsive_wrapper" style="height:100%;left:0;position:absolute;top:0;width:100%;"><iframe src="https://fast.wistia.net/embed/iframe/yzmkspbo6v?videoFoam=true" title="Wistia video player" allowtransparency="true" frameborder="0" scrolling="no" class="wistia_embed" name="wistia_embed" allowfullscreen mozallowfullscreen webkitallowfullscreen oallowfullscreen msallowfullscreen width="100%" height="100%"></iframe></div>
</div>
<p><script src="https://fast.wistia.net/assets/external/E-v1.js" async></script></p>
<p><em>A spiral inductor geometry is generated automatically with the COMSOL API for use with Java®.</em></p>
<h3>What&#8217;s Next?</h3>
<p>You can use the COMSOL API for many more things than creating geometries, of course. In fact, you can automate any modeling task you normally perform with the COMSOL Desktop. In the example of the spiral inductor, you could, for instance, continue to compute the new result of the updated geometry. You could also run a parametric sweep in your code to compute a whole range of spiral conductors automatically. Then you could proceed to process the results of the computation by creating plots that are exported to image files, extracting the inductance, and exporting all values to a file.</p>
<p>We have merely scratched the surface of the capabilities of the COMSOL API. Beyond programming the tasks you can do manually in the COMSOL Desktop, the COMSOL API gives you access to and control over data structures like the finite element mesh, finite element matrices, and the solution data sets.</p>
<p>Besides writing class files that can be opened by the COMSOL Desktop, you can write programs that connect to a COMSOL Server process or even standalone programs that integrate COMSOL technology. Therefore, if you routinely deal with a particular simulation task, the COMSOL API is a powerful and flexible tool for you to automate that task.</p>
<p><em>Eclipse is a trademarks of Eclipse Foundation, Inc. Linux is a registered trademark of Linus Torvalds. Mac is a trademark of Apple Inc., registered in the U.S. and other countries. MATLAB is a registered trademark of The MathWorks, Inc. Oracle and Java are registered trademarks of Oracle and/or its affiliates. </em></p>

<script charset="ISO-8859-1" src="//fast.wistia.com/static/concat/iframe-api-v1.js"></script>]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/automate-modeling-tasks-comsol-api-use-java/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Added Value of Task Parallelism in Batch Sweeps</title>
		<link>https://www.comsol.no/blogs/added-value-task-parallelism-batch-sweeps/</link>
		<comments>https://www.comsol.no/blogs/added-value-task-parallelism-batch-sweeps/#comments</comments>
		<pubDate>Thu, 20 Mar 2014 15:29:42 +0000</pubDate>
		<dc:creator><![CDATA[Pär Persson Mattsson]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[COMSOL Now]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Hybrid Modeling series]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=28801</guid>
		<description><![CDATA[One thing we haven’t talked much about so far in the Hybrid Modeling blog series is what speedup we can expect when adding more resources to our computations. Today, we consider some theoretical investigations that explain the limitations in parallel computing. We will also show you how to use the COMSOL software&#8217;s Batch Sweeps option, which is a built-in, embarrassingly parallel functionality for improving performance when you reach these limits. Amdahl&#8217;s and Gustafson-Barsis&#8217; laws We have mentioned before how speedup [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>One thing we haven’t talked much about so far in the <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">Hybrid Modeling blog series</a> is what speedup we can expect when adding more resources to our computations. Today, we consider some theoretical investigations that explain the limitations in parallel computing. We will also show you how to use the COMSOL software&#8217;s <em>Batch Sweeps</em> option, which is a built-in, <a href="http://www.comsol.com/blogs/intro-distributed-memory-computing/#embarrassingly-parallel"> embarrassingly parallel functionality</a> for improving performance when you reach these limits.</p>
<p><span id="more-28801"></span></p>
<h3>Amdahl&#8217;s and Gustafson-Barsis&#8217; laws</h3>
<p>We have mentioned before how speedup through the addition of compute units is dependent on the algorithm (in this post we will use the term <em>processes</em>, but added compute units can also be <em>threads</em>). A <em>strictly serial algorithm</em>, like computing the elements of the <a href="http://www.comsol.com/blogs/intro-shared-memory-computing/#fibonacci-series">Fibonacci series</a>, does not benefit at all from an added process, while a <em>parallel algorithm</em>, like vector addition, can make use of as many processors as we have elements in the vector. Most algorithms in the real world are somewhere in between these two.</p>
<p>To analyze the possible maximum speedup of an algorithm, we will assume that it consists of a fraction of perfectly parallelizable code and a fraction of strictly serial code. Let us call the fraction of parallelized code <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TADMRZruqIoiZVUR33e7MGhF5ZAAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjc/dCoAgDAXg/elsWr7/25aGwxKic3HAT5QN4HeQWrOE6KS90qYyhNul8VVpUPbS+R2i+6B+QB4U6OboX4EVtgykQtNcUmvdFR45DF/jx7JslMNCogshwWdOA0oBjeIai48AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjFUMjM6MTQ6NDUrMDE6MDDx5FJmAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIxVDIzOjE0OjQ1KzAxOjAwgLnq2gAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM2iFBhCgAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzg3NTY3ZTM3YTFmZTY5OWZlMWM1ZDNhNzkzMjVkYTZmLmR2aSAtb5wf+2sAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS84NzU2N2UzN2ExZmU2OTlmZTFjNWQzYTc5MzI1ZGE2Zi5wc6ovqmoAAAAASUVORK5CYII=" />, where <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TADMRZruqIoiZVUR33e7MGhF5ZAAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjc/dCoAgDAXg/elsWr7/25aGwxKic3HAT5QN4HeQWrOE6KS90qYyhNul8VVpUPbS+R2i+6B+QB4U6OboX4EVtgykQtNcUmvdFR45DF/jx7JslMNCogshwWdOA0oBjeIai48AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjFUMjM6MTQ6NDUrMDE6MDDx5FJmAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIxVDIzOjE0OjQ1KzAxOjAwgLnq2gAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM2iFBhCgAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzg3NTY3ZTM3YTFmZTY5OWZlMWM1ZDNhNzkzMjVkYTZmLmR2aSAtb5wf+2sAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS84NzU2N2UzN2ExZmU2OTlmZTFjNWQzYTc5MzI1ZGE2Zi5wc6ovqmoAAAAASUVORK5CYII=" /> is a number between (and including) 0 and 1. This automatically means that our algorithm has a fraction of serial code that is equal to <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEEAAAAXCAMAAABj/tKdAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxFEIpmqiLvMVe7dxWS54QAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAADoSURBVDjLzZTdGoMgCIYFRUQr7/9uV5k/9TxrtJ2MA4v0eyMCjPkbA3yqsO7kki1Xryc4Pnmh6CU+CIJkcJjWBcWmJwQTe8Qu1WePCFP/DpCvCGZud2FSEnDPP8RQ4o9UNxIpCXuswtORcbZ1IxsdAbZXE2zaApQ7gqNutdpsW0RB8NKtpmkTIDbMB8K7NOxahAthUWYyuELxRyEwtB3l36QF1gZyEo+miq0d7ZEYnmeWu+bCmHOeW+y9onwyWkvUx8BQ1aWzNOaXweEhXB+UBDscxLG7DYCOEAcVn4eUcsphV12m3C/2Al+HBZbgg8gEAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjIzKzAxOjAwQuTZSgAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyMyswMTowMDO5YfYAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMzl4MTQrMjg2KzYzNnaIj8oAAAAmdEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNUjEyIDAwMy4wMDIK7bPeSwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlL2FjNGE1M2E1MjgyZTk5ZThjNzkwMTZhM2U0ZTE1ODBhLmR2aSAtb5S8Z2sAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS9hYzRhNTNhNTI4MmU5OWU4Yzc5MDE2YTNlNGUxNTgwYS5wc8jjVu0AAAAASUVORK5CYII=" />.</p>
<p>Considering the computation time, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAXCAMAAAB6dTw7AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGIRFWZqiLMu93uQqlahwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAD8SURBVDjLxZPdloUgCIUx/xAz3v9tRzGV0+RZa66Gi6KtH7KpAP4lzPFLsm7lzq8wTQj25m4xtk2o4BQcRPZ1JzbZpbESuBWglGv54Ccg65aprTUVw2xNRHCtGJx0qyRtpCIl616XZyks/c5nvcTRlBE7lzwGap1PoPQ9kcXUpYdwSLe9wzgyukXxUHsKCugWJPLDAuHZR4pWAanMlJeFS6Y6Xor3CrjwBSgJdGhAWVgAsd0CysICDB9bQFmAMkzjxxzr8+pCW1hjfViAcxxofOayzrM986mKpIHHgTMov+sRN4D6+D5l2gGU3tTDwzaMeRHRwRfi+y/6t/gBZBMGK0sDhV8AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDE6MzE6MjMrMDE6MDBC5NlKAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAxOjMxOjIzKzAxOjAwM7lh9gAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAyOXgxNCsyOTErNjM20mJZ+gAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlL2U3MWY0ODMyNDcxMjk2MWFlM2E1OWJlOGM1YmZiYTg0LmR2aSAtb3ZkPS4AAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS9lNzFmNDgzMjQ3MTI5NjFhZTNhNTliZThjNWJmYmE4NC5wc3ljjHIAAAAASUVORK5CYII=" />, for <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVUQiqswRmd27d+4zEvZc5AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjZDRCsAgCEXNZktb+f9/21Yso2DMBy8c5HAL4P847OMndqi7N4Uz2pnSE1FxIJaemgYSbuH1egk1FcDk6iriZHbOrUOZOkhYm5JZTVVWxHl74qbCoIL0+S0VPw8B93YV+vwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MzErMDE6MDC2ksx6AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjMxKzAxOjAwx890xgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzQ0YzI5ZWRiMTAzYTI4NzJmNTE5YWQwYzlhMGZkYWFhLmR2aSAtb9VHLSIAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS80NGMyOWVkYjEwM2EyODcyZjUxOWFkMGM5YTBmZGFhYS5wc6zBqzgAAAAASUVORK5CYII=" /> active processes, and starting with the case <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADcAAAARCAMAAABO8MRfAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVSJEqswRuzN3md3uOADOZgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAACMSURBVCjPxZHREsQQDEUjDS1C/v9vd5aZFjvBPvW+ZHBPuAHwhgxWHUsn9Usr5rvpzmtOoR8ulNLoEpxQAeM5cJxqFQ9T+eE8cSmHxL84KvEAVvlGrsYj9jcW7KOgcpzLNzQGbKVyycGeeo5W41A4I2E0xJ13cv5p3M7Fatx2vI5DJwlph0LOmfesmj6l+AP7JvxG0wAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQwMTozMToyMyswMTowMELk2UoAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMDE6MzE6MjMrMDE6MDAzuWH2AAAAIXRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADMzeDEwKzI4OSs2MznTvLznAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvYTVkZTA0OGMwYWY1ZTY5YWRkMzY5OGJmMTE0M2E1NzkuZHZpIC1vcI9GcwAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlL2E1ZGUwNDhjMGFmNWU2OWFkZDM2OThiZjExNDNhNTc5LnBzvEAkvwAAAABJRU5ErkJggg==" />, we can use the representation <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAO0AAAAXCAMAAADQmjjCAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGIRCKZqlW7zN3uCJgYIQAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAJzSURBVFjD7ZjZkoQgDEUFZVFc/v9vR1kkhEiwp2peevKgtqTOJXTC4jD823eZkPDXOP2S8Awohd50qsuicKE/qWzieqHH2GJCu2WgFeAmMIBbqMNQp7osChf646ynYTnUSbTX+2mOWOWikGKGEAESgQMkoa6O4071DZGq9D1nPIxnnRerrxdSjWsCu/Z4YkAg8IAg9GAL5QuZXRaFs77xyTdvfhBO5rTergm8NHMZAzKhDchCZD8ffN2raKNw1he+hnf/W59jIFQN3ltEDMiENkA0C8Q9+L6LNgmX+vIQ6XG+cyiDneaoAJAJbUByk360hJuLdHekLx8tokXhUj9UnbdV12B7T55SZ5M0IBMowFC5+b9N2aWcfxzpy0eLaFG41A9V5+0YarBS+QkYDcgEClC5iWuY9JUZRXE7ypePFtOicKm/2yZYMWtQAWhGa3aF3Mb7EltC4qzh1op2IvIM06hoYdV9FC0EMNHOyO1qkfLu5HnzebP564iRMFoD0izVNaZR0cKqa0c70pkMAS8zWaWeSQFbP8xkTKOihVW3UbPU3RE4S2kakAkUoHKbp9BHU67JjvLlo8W0KFzow6ojVyBuRw4B5ApEAKKb3sS5B56UK88OH65AmBaFs75Q67HBXA13ZffdqpCgzd0FBiQCB0hC0h3HseNUd6QvZNKGaOTuApipN3QLdwriCCQgu626/usdg3w2SKt2jpXVm3X74pRFEmzzVGA2onFhkI9W0KJwIwCDD2KSXW0ZwgMguY0dJz/TfTqEtCjcDECgCdS+/niBCE+A6OZ6RlOIDidMi8LtAP74S43sEuj9UgNo1Jea77EfI3kVkLp/pSwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDE6MzE6MjMrMDE6MDBC5NlKAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAxOjMxOjIzKzAxOjAwM7lh9gAAACJ0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxNDJ4MTQrMjM0KzYzNo4KHrsAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS85M2UxMmVjZTI3YzBkMjc3MGQzZGFmNzI4NTEwMjBkNy5kdmkgLW/afAgZAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvOTNlMTJlY2UyN2MwZDI3NzBkM2RhZjcyODUxMDIwZDcucHP0R4n0AAAAAElFTkSuQmCC" />. When running <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVUQiqswRmd27d+4zEvZc5AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjZDRCsAgCEXNZktb+f9/21Yso2DMBy8c5HAL4P847OMndqi7N4Uz2pnSE1FxIJaemgYSbuH1egk1FcDk6iriZHbOrUOZOkhYm5JZTVVWxHl74qbCoIL0+S0VPw8B93YV+vwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MzErMDE6MDC2ksx6AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjMxKzAxOjAwx890xgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzQ0YzI5ZWRiMTAzYTI4NzJmNTE5YWQwYzlhMGZkYWFhLmR2aSAtb9VHLSIAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS80NGMyOWVkYjEwM2EyODcyZjUxOWFkMGM5YTBmZGFhYS5wc6zBqzgAAAAASUVORK5CYII=" /> processes, the serial fraction of the code is not affected, but the perfectly parallelized code will be computed <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVUQiqswRmd27d+4zEvZc5AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjZDRCsAgCEXNZktb+f9/21Yso2DMBy8c5HAL4P847OMndqi7N4Uz2pnSE1FxIJaemgYSbuH1egk1FcDk6iriZHbOrUOZOkhYm5JZTVVWxHl74qbCoIL0+S0VPw8B93YV+vwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MzErMDE6MDC2ksx6AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjMxKzAxOjAwx890xgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzQ0YzI5ZWRiMTAzYTI4NzJmNTE5YWQwYzlhMGZkYWFhLmR2aSAtb9VHLSIAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS80NGMyOWVkYjEwM2EyODcyZjUxOWFkMGM5YTBmZGFhYS5wc6zBqzgAAAAASUVORK5CYII=" /> times faster. Therefore, the computation time for <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVUQiqswRmd27d+4zEvZc5AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjZDRCsAgCEXNZktb+f9/21Yso2DMBy8c5HAL4P847OMndqi7N4Uz2pnSE1FxIJaemgYSbuH1egk1FcDk6iriZHbOrUOZOkhYm5JZTVVWxHl74qbCoIL0+S0VPw8B93YV+vwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MzErMDE6MDC2ksx6AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjMxKzAxOjAwx890xgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzQ0YzI5ZWRiMTAzYTI4NzJmNTE5YWQwYzlhMGZkYWFhLmR2aSAtb9VHLSIAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS80NGMyOWVkYjEwM2EyODcyZjUxOWFkMGM5YTBmZGFhYS5wc6zBqzgAAAAASUVORK5CYII=" /> processes is <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQsAAAAXCAMAAAAr6Jo0AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxEiiERVmarMu93uGvR62QAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAMMSURBVFjD7VhLgqwgDBQUFH/c/7YjSCDQFbHfLN5matHdo5mqIhICDsMf/vAWSn9cGqdf0sgEQC3AqN967oAccWeTLYj685jo08UlBLketUhj7tsSAYW1iLKVhyfUYq9AjrizcZ2nYfH2InTh8rRmAR8GZdbtytxse2nGNMbuiQwTFLUG26eHh3E1Yq9AjpizSDN6E7muDzfTHRUvDlMY4LB3Ug5ptB03socJiloNbYGHjAWxcLFXIEfZ2V2X6xEzdFFOW1E47m8fFJbnKhFpdrIHCZhaDauBhzIKgWX/KhfkKDtTcfKd8c/ZhArNsccds/hYjOcjr0hT7CECJZUeja72IOQCiL3C2XxHaJ9X7TXPP5Muplrd5y43oin2EAGF6Sih9jVN18VCD0IugJhgsJYhR5Wzu84jtma5MG6/bbj8ZPRcULUyRFPsOdAxKCwO3bqFFj9noAchF0AMo5EhR5Wz9cg/ff7lztjOaLQ2T2fLMXRoij0L6iGFqTDiOUwBVw+28SDkAohBtDLkqHJ2uk/e4ajbne11VUwDcmFO24SN+eO+o+hBHW3LvWfjdn895WICk7eVQblgdV54Tb1cvcgFpIG5WJuwcEfr7HRYJ+zhuh/nyRE/xwexq7EX0HLSyqBcsDovvMrXM7P8wyjUCKR5VSOW7OmYzbwBaz1kUsjSrZFGBuaC1flw0MxzTQ90+aHztZOvwJCGrZ3ghJHCwjwIfsxdZIoindDId8jSzUUjkx1xZ6zOS39qS3Xvnn4gDeupgCCFzYe6ThSTTe0id41D2KH/Y09tZLKj7EzZzR+8AOKXXa+L1a75ea8l0bjzdIkHEaSwQe/ee1pS6bF9ehByAcQwKhlhr8VghF3x0j2pdmggQQnbylZl7J1K21xIngGYzOce/APCacl9cRyGNO7xbGbYWtMf2SKwdMFlsiN5aAbWqO7vLp5pBAIKG0u4/m4Gyp4BmEx29DQ0hd4oua9fbDU0EkEK24sh+/UrKsEzAJPJjh6H9j/e8ely/+0z7nlGYDLoHd8fbvwARUsadhCYJzUAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDE6MzE6MjQrMDE6MDCHQ+fEAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAxOjMxOjI0KzAxOjAw9h5feAAAACJ0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxNjB4MTQrMjI1KzYzNvRRz3oAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS84Zjc3YjZiZDJiODY3NjdkYjNjYTBkMDU3NzdkMWRhNS5kdmkgLW/CzjxHAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvOGY3N2I2YmQyYjg2NzY3ZGIzY2EwZDA1Nzc3ZDFkYTUucHNBcnzXAAAAAElFTkSuQmCC" />, and the speedup is <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAWUAAAAXCAMAAADkxq73AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxEiiJlVRLvMqt3uaShhkAAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAQCSURBVGje7ZnblqQgDEUFBLzz/387ckkIKUCsmX6Z1XnoqlazzyFKRGuafuM3/uMQ8mOTmv8Jq8Bo8WPiCV1htYXflOMhwDA3Phtj7KKV/76qRDcxFr/fDuFnk0NQ1qTjforx4GbCe3HN0EScRdxbSHQCKHoaDjDMjMvNM9QexnQg3/mB62O/T8lqRvDqWOdpcfexq7WEpc2WkASzdxJei6MCogmLxV6RaFfsqPCfAgyXxvd4opyXsytsFS5snn0Rpm3kXAZHKqRpgyxp1A4eM0aaZsJrcaoAaMLiB9ckIPiVHSiM/xxgmBpfEiFcTTtutmf8dH73MjBtY8M7Qtq6UtYGHjPGyF7Ca/GNVMGjKasMA322kEBOeTBStldVBsPUuEkof05FPrFnPGRxoTNdz2wRpt4V0lZNWdkjYvZuwmtxWoUwHNFqM1j9UqJeZaS8qzIaJsZvHWxNB04Z7eK9OPWtLU9A07vrSic+WdkjYBbTTWiLP1c5ooElA0Bsh2bCTKJeZXT0VGUmA4ap8c25TcXdO+uM2m7RhMUzvrr7IpBrjnK15rAXZVb2CBiruwlN8ZYuqUJEAysU9V5Awe0LhZlEvcro6KnKTAaHSefJYnZ3BXmXzV5hoQNDMXkG3uuCydCgcseJXzMre4SDt35CU7ylS6qwUZbwo1r9VWuZMJOoV9lV+LXgMuDNsL4lY3vMAz3LlZAZWstBl32qslD9hNfiWSGhE0vhHy7MJKY4P/b40avyXJlOXOazyunIeK0jVrtyUT9YZdJle1U+5m7Ce/GskNCJ5ROlxBpkYS4xqXBpn+Ev7qpVWZPZBG2by3xUeS42IFa4ci6xgarGzCVdtlNl8sRQTWiLq6eOAWhS5TBwKUphLgGc8t/RjsFkPqsMN909qJ4wWSxbPdny6Z/ehegCgHTZzKJ3v4ARop/QFm/pogKgE8tfu36A2k6lsK2vDlkxz8G7H5NBw2jcnHERkqYyzAHetjY89fpUUzNIl62v5AKG3NerCW3xVqACoBNrPcWq7vmaNmfhs/4A/uVKjsmgYTS+SSvW1aYrWqXufbjTFM+1+dTrq9Ulxb1UOensjp/GXpdNtICBM95I6InXIysgGsTlvUx14Bj3fkrUqwyOyAjqUchUnkqkf6sh8KbQeDAdecLmUWFFjFqGE96KIzqz9rywVstTPqtyy1EliEz1CbuIxksW++LNX4eVnhjGE96KkzchiaXP2t5WVN8WjQSVQcNN47rar+TgavmBFTHSDie8FSdoYKmjtvfrIbSCyKDhjnFR+ynBfvdjCWdFTOc9yN+KU3Ribaa699shtILIoOGe8R//ReoYTngtXqAjS871vd8NoRlEpvWL1G/8ZPwBZWklB1AS2lgAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDE6MzE6MjQrMDE6MDCHQ+fEAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAxOjMxOjI0KzAxOjAw9h5feAAAACJ0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAyMTR4MTQrMTk4KzYzNlX6+WkAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS81OWNhODI5MzU5MGRlYTA5ZDNmNWI0MDUyYmE3ZGQ0YS5kdmkgLW/NNmrhAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNTljYTgyOTM1OTBkZWEwOWQzZjViNDA1MmJhN2RkNGEucHNpCrYSAAAAAElFTkSuQmCC" />.</p>
<h4>Amdahl&#8217;s Law</h4>
<p>This expression is at the heart of <a href="http://en.wikipedia.org/wiki/Amdahl%27s_Law" target="_blank">Amdahl&#8217;s law</a>. Plotting <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC8AAAAXCAMAAACs0OZeAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAEDSURBVDjLxVNJlsUgCHQCcQr3v21HHKK/zfu9axaRFAUUxCj1D6bNL8i6/d0BAHqy1Q+2p0EzX+O40U2kWiRJKA80sL6flNNdPMDKTyQH164Ypi4W2HHlRnroPrazNnVpwljayTXsF0XQOSCqJ1waxbMMdC31+Zk/++GRyL+RJMEYnoTIHG0TmD7kE8ZWC+0ysIfElyTwI/+SdY6PAduG7p1euPNL3uIPv1eAuPGJ7ZnvYAUmX7M58313ksTLmBevja5Qj8QiC+iLn/v8kK+imQ7qELB3se2EzAVo5c92pt4mPT4YJXU0j2d8uW87TC98yifUgHozrQ8gOvWe8P1//Kv9AH/JBhibsF59AAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI0KzAxOjAwh0PnxAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNCswMTowMPYeX3gAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMjh4MTQrMjkxKzYzNknHFZUAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS8yODk5ODczZjlmMjEyZDVjMDAyZjY1MWRkMzYyOWRkYi5kdmkgLW+vaLEHAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvMjg5OTg3M2Y5ZjIxMmQ1YzAwMmY2NTFkZDM2MjlkZGIucHPZbyHIAAAAAElFTkSuQmCC" /> for different values of <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TADMRZruqIoiZVUR33e7MGhF5ZAAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjc/dCoAgDAXg/elsWr7/25aGwxKic3HAT5QN4HeQWrOE6KS90qYyhNul8VVpUPbS+R2i+6B+QB4U6OboX4EVtgykQtNcUmvdFR45DF/jx7JslMNCogshwWdOA0oBjeIai48AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjFUMjM6MTQ6NDUrMDE6MDDx5FJmAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIxVDIzOjE0OjQ1KzAxOjAwgLnq2gAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM2iFBhCgAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzg3NTY3ZTM3YTFmZTY5OWZlMWM1ZDNhNzkzMjVkYTZmLmR2aSAtb5wf+2sAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS84NzU2N2UzN2ExZmU2OTlmZTFjNWQzYTc5MzI1ZGE2Zi5wc6ovqmoAAAAASUVORK5CYII=" /> and <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAARCAMAAADnhAzLAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGaIVUQiqswRmd27d+4zEvZc5AAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAABgSURBVBjTjZDRCsAgCEXNZktb+f9/21Yso2DMBy8c5HAL4P847OMndqi7N4Uz2pnSE1FxIJaemgYSbuH1egk1FcDk6iriZHbOrUOZOkhYm5JZTVVWxHl74qbCoIL0+S0VPw8B93YV+vwAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MzErMDE6MDC2ksx6AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjMxKzAxOjAwx890xgAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAAxMXgxMCszMDArNjM5GO98mwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlLzQ0YzI5ZWRiMTAzYTI4NzJmNTE5YWQwYzlhMGZkYWFhLmR2aSAtb9VHLSIAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS80NGMyOWVkYjEwM2EyODcyZjUxOWFkMGM5YTBmZGFhYS5wc6zBqzgAAAAASUVORK5CYII=" />, we now see something interesting in the graph below.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Speedup-for-increasing-the-number-of-processes.png" alt="Speedup for increasing the number of processes" title="" width="770" height="400" class="alignnone size-full wp-image-28819" /><br />
<em>The speedup for increasing the number of processes for different fractions of parallelizable code.</em></p>
<p>For 100% parallelized code, the sky is the limit. Yet, we find that for <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADUAAAAWCAMAAABXACTaAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TACKIuzN3zESZEWaqVd3uPaz6zQAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAACwSURBVDjLzdLLDoQgDAVQaAstReX//3bAySj4CrKau2gi6QkUNOZvY0cM4GtD4HyvYqo+sE8xir1TFEpVjIdh1UNoFhoFa5GpGTY4dK1plZYtWHOR3YDX84Fr5bYCPyPIV2PWqvQSbTTHRqEetfbTfih7uVutYvhKK3VH/hH0SfGs7PL8eLiyvKIPN0+YUlrAnNO8GMiyCFRv6vly+pyJzV3sbAbi4ohCGFEURtTbfACRSwMdMVJIxQAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQwMTozMToyNCswMTowMIdD58QAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMDE6MzE6MjQrMDE6MDD2Hl94AAAAIXRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADMyeDEzKzI4OSs2MzbhK9HcAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvYzlmY2VjOTA5NDZmNWUzM2E4NGYyNzkxODY1ZGExOWIuZHZpIC1vWYSh8wAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlL2M5ZmNlYzkwOTQ2ZjVlMzNhODRmMjc5MTg2NWRhMTliLnBzW/Sg0wAAAABJRU5ErkJggg==" />, the asymptotic limit or the theoretical maximal speedup is <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAASkAAAAgCAMAAAB5P37MAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxEiiJlVRLvMqt3uaShhkAAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAQcSURBVGje7VnretsgDOUqDNjm/d92CGMjHFOg3fatXfQjqR3gSAfdoIy95S1v+R7Cxe2FVH8d8jOgmjcX68gNSwGAWbTsTbMvI5SZUNc4VLivLFXnhORwyDIMqsnfaVpZTI/MfzJQOJwq/dKbtT6QBxOoWxwsA+8Mo+oUSJvm6dWrIVANjjx5qn/9U08qLH9wHHpObezDSzexQQ5OHj4Sqk6B5CG9VgEGQAVI7+hj0f/204DSBWvJE3s+rfzT22Ui/tyIA1J1CKTZju/gxkAdoQNEpb+bYopgQV6hZwZ/HrD/ZqaoOgRyOxReghwDpXT4Wv85pghWBB8qJmvOYyKN5m493NKVmIROto5MxSTBU3IXVgIoxcHcJlF11it16pzeUp6qQPtMLVAv1mOqbaALwcl+uvF5RsKNxSknRnMVRBvivgtb5E4c+lRK60ztMuq9Qpy73P2DqONvaSoyrG6gLTxCh7kt1mOqaSCS7sPepSpknXGgxR0+4gFKhFiFj0QemMohmD4hZe1w94+iTijW7mnBk4uydAuP0OEq/btMfWBg2pq9myQzkrw+4Gmhj+SFqZRAwkMkZXUKU1vdofRBCx1ctplSDx7ZNDAP6DcZGQnnCXGt9iWm3CtTlTqXcTrUTe8MU6tqM6WJQ55prGWgglFswlRaQvCniXIm+l6ZqtW5jOOhznlladmLvqt5HY2+loG5MDAvmOJGyUUKLtMQzmVMrGIBywFDd7PXFuEcncPV1D03zbD3sBpgiqhTIDFN1SsV0BbeRQc/h26DGb1lIGxHTYy/cYX6YzneDypVUEoy69mGeucqazcez08KchVi7tptvXVOjoQp32CKqMNIYb+lKQLaxDrpOPUc7hJaBjphuLUmlW62q2guQ25iZlBMeKbU5X0yf4tYxsN+eXrZbb1/GMEawgb4wY8/IysBtF2DI15J1CmQsOJMulav8wSz7waOxuJ8dy52/dSShoECD4H8oA8PEjyujG0Od+faZ+elr6OFJ53LzGlmQKg6FLKWCVBZmlc/PKlrIHoPRuWmZWLMLTxmtehc5ISJkBuZYyZOyPNinnvxCVBCj+k29mzUQIxIPMn7ZYklw4KRUhppYpI/FsjJQpKkISZuXT4hen16OwEqiEc8L/YgXzcwlxF60DVj948KggEw86j86TZrEBSlOory3tVYlk8Y+KJ3whVl8vBFrUjdjO9eqzYgK5m5Ha7daPB2+DMG/jaRqTLvhSkxmjT+N1lT5+vJDqXdVsat9jhU/NHK8I1kl9bKKvYWfALF7M5F/GLzgfkjRZyHLnIk9U4LbHZsuvBl+h2OKPL1AKFio3Kc1MyOiVa9mUJZX/uDVH6xbHMw2MT9/SrzD4qGfb17jDguNmIUSjwXw9r7b+N/K+rtQj9TfgErXyBeDce5/QAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQwMTozMToyNCswMTowMIdD58QAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMDE6MzE6MjQrMDE6MDD2Hl94AAAAInRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADE3OHgxOSsyMTYrNjMxgdQaiwAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlL2RkOTM1M2Y2YTMxMWI2NDllMWYzOWM5YTNmYjhlNmEyLmR2aSAtb+IySdwAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS9kZDkzNTNmNmEzMTFiNjQ5ZTFmMzljOWEzZmI4ZTZhMi5wcx2WB0AAAAAASUVORK5CYII=" />.</p>
<p>For a 95% parallelized code, we find that <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIwAAAAXCAMAAADuv1eMAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAJcSURBVEjH7VbbsuQgCAwiKubi///tAmomycyJqZqtrX04PExGo03TCGaafu0vGbhHy9B/4cMTUYgJR+u4rXDpNO0aRaOgr3wYu+Qd44zmsg5xjiOAxR6ROIYXQFwiLMajZKJsPJhG2kn0s/uANs31fxmJG9hCX5X/vHNZZRvbcJ1Lhjqb0y0SKg/a0hvaFHPzNRKmbgkW9S7jasOiw4Mc8R7MuPpCb2gTNV4jaaEu2OyRm7dU6pAuANstVMn994wmUZRnp3+p9IslKzc5uZHRIXnmjpT5Dira2xKuaPq3lIz3STYtuXq3B/XtlcyswwVTyq2ywl6ajl926gxRkK5oOk1z2YZsynQkU9psCJYsHapyruHQnjM62ik4ekdr/LdhbziTWdus1wrFl84FL2R+MoviitakOyp1RyZZ4Ryy7BGjz0ttd/1sjslA+IDmaXq2u5FphyQv53eix7Z8JIMf02Slma5osS0QtT0EjxEdoLUuAJRj7aRFAmnUay2QpZ7YfkBdPSlSRTke0hSguz0e4L3EXPVwRaPVd6bgtV0UqF1CwvDFe9T2umouW2mz5jep97QKRta11nhJgWCrxZ1vr1SXu1AvtLorALM1Qj/JpPZn7e5Jbgdp097vimN/Bl8rOGnHEuG4hucpMsyNxH3T24oZH9GMpF5rUJnpxaAHiyRiyN1971+p3yAJ4dAlZdS7AlPPQxzf29NHtJepBprsVb4mlFSOIMkViaomgR97COMeOjKVS6/wOUb5XmAKiBgwRKwypuUpkBuW5vcG8HBh+OZT7zGbf/DZ+V/YH31EEfqbPQDmAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI1KzAxOjAwITTscAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNSswMTowMFBpVMwAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAODR4MTQrMjYzKzYzNv0E34cAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS8wZGQ1ZDk0NWY4NDI5NjI3M2M5OGEyNGE3YmZkNzYzZi5kdmkgLW8wOXYfAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvMGRkNWQ5NDVmODQyOTYyNzNjOThhMjRhN2JmZDc2M2YucHPnfhslAAAAAElFTkSuQmCC" /> &#8212; a maximum speedup of twenty times, even if we have an infinite number of processes. Furthermore, we have <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIIAAAAXCAMAAADwdmc/AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAIPSURBVEjH7VbZluQgCBVRXLL4/3/bIpgymZqY7jp95mV8MahcrmzGmP/jBwPszSa6D5Cd955CxNm5pCdsPC1bIebogalD84xhM4u4hBnTtU3Bp0AvgLAGWJsDkp8S8Nm8wzCLfJeZIyk1mI1ZLweDraolEXO81bceF6FwwTBBmc386ESF2l0Pl21NLE0M01Dk/A7DeGUzcyPIgb1NWa3FIqLuPaRwxqjcy7NcXoV0SQOWSUpBxJyeUThj8GcpGaOZjSWJzTb1vDJCQWNMR1HZ9BpDJYvVKwb7wS9ln3IoZlQvukrUwiGiP4Lpx/E3CuWEb/dpKp0pbLrqlnpJVJ/6WT6dKXQMddPolTsKUZL/FUeHGFxev0PhguG8eaatFDT4avPYwwsI3gXighH0QPWnA3IY0AICrwBgTVJbGxl4TpRN0n2V/OuJZ/lCdpeaIui2xnRMf1A4Y/itaXPRg+PyLiD1Xcm74hxy69s4WlqUiSMYq824MUTms1lN57tnbKBwYKgWQUqtXTnTgKXfxtqvawt17vAu9plcZFuxNZjqpLR2V09ak6d9p+bRjiEe4+cFhA+3aqiV4ev1IHejvd/E3tMjwtDLqtTred6gR623/ZDvy+Hc6rvNVHIAC/x8yS1p1vto3t9mg13Dz+YSQn2ZkydEJKSaocJ8vVe306L6fADcbtMnv02POfzaj9s/HF8lZQ+usIx8fAAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQwMTozMToyNSswMTowMCE07HAAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMDE6MzE6MjUrMDE6MDBQaVTMAAAAIXRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADc4eDE0KzI2Nis2MzYXnshDAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvZGFiYmZlYzdlNDRhMDEzYjMwMWU3ZWM0NmNiZWM0MWEuZHZpIC1vwU8a3wAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlL2RhYmJmZWM3ZTQ0YTAxM2IzMDFlN2VjNDZjYmVjNDFhLnBzXnWMKgAAAABJRU5ErkJggg==" />, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIIAAAAXCAMAAADwdmc/AAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAIsSURBVEjH7VbLkuMgDEQI8RC2+f+/XQSKwVPZhUnV1lxGh8RxgdS0Wk2M+Y0PAuzOKnTfz+yIyIeIq3VJV9g43sXx3CrXn85vVbUTUsuSBnNYIT3aV6AU/F2YCreomwsTsZRPtHPuM40fuecrK/582xNPQZ3vl3xUDonr45kLQ3vJ0SyDyoAQWJOtYPeyvh3xpqz3hkQk4+xh3Yo0QyA90Yo96Auu9sWvIk2gFr8kuNYkpJmFsifhox+872Se+enFyaWkmTgtkqGbIRguhXHdvdy26E6aIVDffGCM3MfW39Nl04gx0jaYB4Sq8VyuJYZiZghlIkHhCEu256G7KTTHAG2+QGhblwp6QjgnUmcVFXxCeBsQHxCUnQez/4AQS5PErIWzi0RHm9cQIpgZgkpphfuGYEqfiGMk1FzX8Q4CvmkEiJflklU4Qd9naxx4hwEtYDMYAKwitdUMoelN3ezovjDcPCgEDqMRHm6SJzk+mj9YoLPNkQw9OK6fBfpYV8yuOIcmZXMKXB3KJCqIl6s2qbakuUgSwdXS8fo+GxDYekipWZ4zLbGRopVeJzbs3E3qS3XoXZu9eKkdFKstDQlyf15aU6xDyJrQildCdxSxaqiTQSj9ehV92Ux83QsRYfYyd5OeSJduGPRfQs4rXTzrvS1QOIAFub46XL+yvDv8xjX1PoReuTZzCPVmTuQR0aOvCu2nPzbz2J3L+rMA2FvnP/jbtI3hv/1x+8H4A/IfD20rhaPsAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI1KzAxOjAwITTscAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNSswMTowMFBpVMwAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gANzh4MTQrMjY2KzYzNheeyEMAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS9iMzYxNjA2NjZlMjRmYzhhOTdkMDdhOGM5Nzk4MTY2YS5kdmkgLW//0INCAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvYjM2MTYwNjY2ZTI0ZmM4YTk3ZDA3YThjOTc5ODE2NmEucHOrYbHxAAAAAElFTkSuQmCC" />, and <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAHgAAAAXCAMAAAD3GHtvAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAIBSURBVEjH7VbbksMgCBUBL+Ti///tippoOm3M7uzs0/Jg20Q4cJBjjfm3iYH9/A7p+/GImZ0PONsX6w4bLs665CfkHgDFi6sV/YmLn7mtunqO3g3+SZhFQSPPcDHXtwykLTVMmnHlota2aapLf7otSaB8k3Dvj4rJ+7nLS4s7K7iguVLXwE6v008ilMQonQ68vEZ4b1A27GWVjjG47fcBkvS1JJqeHci1VJliAT69DVOMzV/ibQBfXqees6QkOOmP0qt+sQJzB14xBKknxvW5sLHbZQZ9GrLzvKR9ipzMAJy6s8JUb+6082iX9F86avfpGF6At5d3+AL8yVyHaUQM5N0Bh1Q63XvcRlGeAUPHJT64eQRs6jTIejzd17fA+JbqMhitp749zopC4Ag9WsAiCACYj5zNSgWsm7dC8lrn+DxG4jvVDjqPw+E6T5OtcVulGx25AEleE9SBzIlSIkITF7NpP+o4Re1u2Ek1TNFY3WEvQeTmDlFcGQkQ6yDGokdksr9KYobKzSRVRqKTQKyf6KiOTyhaQuwjNAGeCMieijUCrKo7VAVQ8dT2cy4F5IA6ZCE0hQsIF8GJ3DbMJPOzaW3apy3fkJqAeMhtyaXXWt29MBk3F6EPpgzqVbV4n+/AyA4RHTqPlcmw3nrb+RT/1ADu3rof/AV5jPzLf33+3L4AWvoNokN+WZ8AAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDE6MzE6MjUrMDE6MDAhNOxwAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAxOjMxOjI1KzAxOjAwUGlUzAAAACF0RVh0cHM6SGlSZXNCb3VuZGluZ0JveAA3MngxNCsyNjkrNjM2f/m38wAAACd0RVh0cHM6TGV2ZWwAQWRvYmVGb250LTEuMDogQ01NSTEyIDAwMy4wMDIKMReWuwAAAEl0RVh0cHM6U3BvdENvbG9yLTAAL2Rldi9zaG0vemYyLWNhY2hlL2YwNzJhODNiYWJkNDRkNTQ3Yzk1NGFkYjg5NTk2MzgzLmR2aSAtb2xdNqwAAABFdEVYdHBzOlNwb3RDb2xvci0xAC9kZXYvc2htL3pmMi1jYWNoZS9mMDcyYTgzYmFiZDQ0ZDU0N2M5NTRhZGI4OTU5NjM4My5wcy5/pr4AAAAASUVORK5CYII=" />. The theoretical maximum speedup decreases quickly when decreasing the fraction of parallelized code.</p>
<p>But don&#8217;t give up and go home just yet!</p>
<h4>Gustafson-Barsis&#8217; Law</h4>
<p>There is one thing that Amdahl’s law does <em>not</em> consider, and that is the fact that when we buy a faster and larger computer to be able to run more processes, we usually don’t want to compute our small models from yesterday faster. Instead, we want to compute new, larger (and cooler) models. That&#8217;s what the <a href="http://en.wikipedia.org/wiki/Gustafson%27s_law" target="_blank">Gustafson-Barsis&#8217; law</a> is all about. This is based on the assumption that the size of the problem we want to compute increases linearly with the number of available processes.</p>
<p>Amdahl&#8217;s law assumes that the size of the problem is fixed. When adding new processors they are working on parts of the problem that was originally handled by a lesser number of processes. By adding more and more processes, you are not utilizing the full ability of the added processes as eventually the size of what they are able to work on reaches a lower limit. Yet, by assuming that the size of the problem increases with the number of added processes, then you are utilizing all the processes to an assumed level, and the speedup of the performed computations remains unbounded.</p>
<p>The equation describing this phenomenon is <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMgAAAAXCAMAAAB57vBWAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGImVUiRLvMqt3uaC8RZwAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAI7SURBVFjD7VdJksMgDDSLMIsx///txOwQsaTmMJWq6UscG7pbIMn4OP7xDSD07Rbjf6U8R++LA4A4JfN/VPghEHD6AWLTCX63ppogKh9yW6vzRfUzlZng+Uq07hkqL/NErWCHm6qBw5pqiKgsQe9rtb5MWALnZUQyQ5y/zZ0fq3eWiY0Wr6EawStTYGYrkKhV+zrjRL9N3GReG36df3zuJNfIZ0s1QFbWW4FErdoXmOoRyWZsGHK6EPv9i0A6KhxZ+aNAal8v/pK7V6pI6ULlpsTWasFMGQhAE7CnwpGVNwKptGpf2jnNogXTlogUOoqLvJpUFeR+yYG90pZfWLg9FY6svAyk0RL1Lp9g3B0rPt0Tt2+Z2SlAuaqQF4P4+lMWEe6pcGTlZSCNFnT5TG/R0tmrHQDTjuNb5kMe58u7Gt5THRzbUzQQbGSrVXzR6nGhk31hzgOxJHT2Ekgx/0b1elUU5LckGgg2stXKvjg0RhMdcV0mlEAYklpOhc4O/eJjVDi2U6vVyhbOeGGCmo0lJ/p2K/Lxoy72VKC38g1RWqRtiZ3OXZSXgbRa2RdY30tSF09N8C2v9XRZn3WBg6OD7HXsYLv9tlpZUlNBlBJpg5i/gMvZ7p1wr9ifoyGyHwgVjqB8gLhvMZ/QaGVf9Dl5kdzhpcEnL48oXKzemAuMlOdaY18DP2K5qFvn4xk+WImiNfYl0Yyma5u/DkTu1VKjNfNFsK8jsfxEpKvPpjUI2RxYtKa+vvhT9+vxA2DIEcEGADvaAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI2KzAxOjAwENz27QAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMGGBTlEAAAAidEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMTIweDE0KzI0NSs2MzbisUbxAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvNGI2MTQzZWFhOTAzZTA1ODhlNWVkYTMyMDFlYjNjODUuZHZpIC1vWWp3HgAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlLzRiNjE0M2VhYTkwM2UwNTg4ZTVlZGEzMjAxZWIzYzg1LnBz4jrFBwAAAABJRU5ErkJggg==" />, which gives us a far more optimistic result for what is called <em>scaled speedup</em> (which is like productivity), as shown in the graph below:</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Graph-depicting-how-the-size-of-the-job-increases-with-the-number-of-available-processes.png" alt="Graph depicting how the size of the job increases with the number of available processes" title="" width="770" height="400" class="alignnone size-full wp-image-28821" /><br />
<em>When taking into account that the size of the job normally increases with the number of available processes, our predictions are more optimistic.</em></p>
<h3>The Cost of Communication</h3>
<p>Gustafson-Barsis&#8217; law implies that we are only restricted in the size of the problem we can compute by the resources we have for adding processes. Yet, there are other factors that affect speedup. Something we&#8217;ve tried to stress so far in <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">this blog series</a> is that communication is expensive. But we haven’t talked about how expensive it is yet, so let&#8217;s look at some examples.</p>
<p>Let&#8217;s consider an overhead that is dominated by the communication and synchronization required in parallel processing, and model this as time added to the computation time. This means that the amount of communication increases when we increase the number of processes, and that this increase will be modeled by a function <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJ4AAAAXCAMAAADN5AZZAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxFEiJkiVaruzLvdwwLpHgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAIDSURBVEjH7VbbdoQgDOQSEuT6/39bFRJ1l64eu2fbh+ZFJWGYDAmi1L99wLR5GrLwMWiHFpF4UONsTvnlgc1vN8ds0/IB9BNSBlFfhIawkjDRM9vaWFGEHiCOBdOHBGtO99lBhJCvQUNqeSjKLF9tgqc+l5w41gyg0c9e3TVdlPOXoCExVVf7IMXmqbZHcCyV9qxrItP97cXAi59BY+Gac3VqLyUcRNQidWmzps473qaXO+QptK8SwSXne0gXUYVJYlsZtAKZF3HqpYHttf5gGiuhuwRNVbZZV9ef1i3WRVTpWB+ectebbHcYt9l2Tphk5iZ8PjdmT4WL0GXbIWZKZV1HdxFVlYC4Nr8siCgvOxPtopOKf5SPFz2FrlkmxXwoAy69DYPlfKI3NEq7Dx/3sciLnkJv9HhvuQzoMUWuyYv0DoR83K/PnXEOXZgecAK8H3zqCYbI+UTPDjbX1e8bJ05XoUUjSlyV7QQCSai4h1A26uf5oTWYlGn0YFB7XgryFBpiP0W4Z7YTiBMKx+Nwt0ejrhQryzSgwc1hEjLn0CYhgM4sBOZa0M9/vVI5A7sKjWF1HPboFTtlwjRfNUYe6YxL0EbblzL4NB4//amZb2CzkL4NvTcaFznduhJ4r+I28R3QPoxGzb0LVSSzk+Yt0FoPBunedRn1YeJboH/3Mv9H7QuyDBHu6SZJ/AAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMBDc9u0AAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMTEtMjJUMDE6MzE6MjYrMDE6MDBhgU5RAAAAIXRFWHRwczpIaVJlc0JvdW5kaW5nQm94ADk1eDE0KzI1OCs2MzZKcwnSAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvMGQ5NWVhMzQ1YjUyZGFkNGY1Zjg2ZjFlODdiYWU0ZmMuZHZpIC1vAnGIpQAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlLzBkOTVlYTM0NWI1MmRhZDRmNWY4NmYxZTg3YmFlNGZjLnBz3EKdigAAAABJRU5ErkJggg==" />, where <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA0AAAAMBAMAAABLmSrqAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADBQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////L2OGaQAAAA50Uk5TABF3iFUzzJnuRGYi3arnfTKrAAAAAWJLR0QAiAUdSAAAAAlwSFlzAAAAeAAAAHgAnfVaYAAAADdJREFUCNdjYMAAQsYOIIo1lbEUREtM4HoNosug0s0QivMlhGYF0tIgxhEGxkIQHetpBJEJBREAXRgIWprGqKgAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MjgrMDE6MDDvoIk3AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjI4KzAxOjAwnv0xiwAAAB90RVh0cHM6SGlSZXNCb3VuZGluZ0JveAA4eDcrMzAyKzYzOTdZhioAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS80YThhMDhmMDlkMzdiNzM3OTU2NDkwMzg0MDhiNWYzMy5kdmkgLW+aeJNWAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNGE4YTA4ZjA5ZDM3YjczNzk1NjQ5MDM4NDA4YjVmMzMucHNMqBoCAAAAAElFTkSuQmCC" /> is a constant and <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC8AAAAXCAMAAACs0OZeAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGIRFWZ7qq7Iszdv9Ww8gAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAD3SURBVDjLvZPbFoQgCEXxgqKS/P/fTmmaWjOrp+ElxC0cSAH+b0rfQsbeQhpRFceZ8xhW87tvacVtsDEVJ7aQkyMBx21P7nAVkcHx4ZDrISkBKwebeObxTGu3HqJcv3LU9YuihE11D+WKeCkNhUkNCmEREn2LsdT+i/49oRsPaDknti3ymVLdITMVaOXkkh/KONvPwGlCmG58jjARE9/avXgW84MPfuWV6O88S/sdufVL0wD3tRoWvm/2eS7yIY3lertgalmMknG6AlO51HvjDR5tuA/MEK5U5B55uohAerhMHJ9wPUwHFY3PR6kHnix8tXfv8Z19AFNBBeDUm6STAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI2KzAxOjAwENz27QAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMGGBTlEAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMjh4MTQrMjkxKzYzNknHFZUAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS81YzMxNmEwYTk2MWY5ZDlmYjc0ODZjN2I5MDZkN2FlMC5kdmkgLW/SwrbMAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNWMzMTZhMGE5NjFmOWQ5ZmI3NDg2YzdiOTA2ZDdhZTAucHPSQxX4AAAAAElFTkSuQmCC" /> is some function. Hence, we can compute the speedup through: <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAVkAAAAXCAMAAACWCcxBAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxEiiJlVRLvMqu7dIpNltgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAQnSURBVFjD7VnZsqQgDJUtCIr8/9+OrAYMatt3pmqqbh56EchJDpAEnKZf+ZX/Xxg/PRLyp5QrNgRBre8tvZEfcmRMUQ8gAUDPSoTfRqTBkGSO7fobOxT6nfQVkF5ya4M9lqJETY/lK0c4AMPAFEUdALfBOLGELnItdvugRq1LmAUD73kFi/4tCX+l+y4U9pClldB/J184Ip1cbWM9RVELsKRZ96GDNvkh8/Gp9LGr/WBhYOEgFov/xq8K0ncmsbP0Kzgq6fTfS+/I/Hgk2yajGutJijDAnE0LC1mWZTPpLX372Dq/30YWeQ4xQB0grUAJXy120dN2rkrsR8z2jjwfDGsHTFOEASB3Dcyzukq23NGnoOLeEtt4npDYYEtWkztskoOq5DNme0eeD7bQAQ8oQgB7S41na9kdyqdoXWKdpTfwE4sO42fAIDxqZnZVTesZm+SgWnrLbIvTOdIPloLOnAy8BvOEIgxgvbciube0MURpm53TdDbHDphDcFWCPNcNSCRyr0lKCtIlQvXYJAfV0ltmW5zOkW4wX/ie36lqjnvZAQ8oagBmWLyLfvlKgosVRcUAqPyNHMBCM5t/JRAWAE2Ydt052WOTHHhCPykdDsCFVulMTUy9mrLL7yjqACbudDNs6wqj1N/ALOc1paF9e8x79tvgpvo+PGcC2ybqBzStZ+y0D5b0dcmsJPZNh1Mdp7RqnFuVwxzV4u6GIsQsxyPLMOUFMYxF9lVKiTz2tbcF4uH5KrFtYSDn1e3aesYWcW1s8bM2kcwqtG1KOOxwquOU1oZL5TB11dFrihCABPy8DGOen4cZl5z3cYpF9MjVRSTuokGtsA9mo22cNa1n7Kyn/fs4GnQ4V9HA+HGedmWmLinCACUjL7HfllXrvsrSu1m27OaUI+NiqA42GQwbWD1n5VYggYQ1GlQo3baesUlmt6cZrMPRbKyVJ88kEWdVDb6XFGEA2FJR0hREpxgyWb7j8gwSv53e1+aDYrJ2qak+gZiN7QdwCfnxUQhs9Nn3bdXV4/ALrVs84Gsic8yVxyuKGgDLNTNG55UrUpRf/QbtxLmwV8pchM+0Wte7MAva7VOQypPyLIFMfK/2fIlstZXAJpnNSg79Q2lwrk8KfJ0NkPXlcTtxRVEDwEPGZzV7DA6e4cyWmZWODLO3ImoBXkGWo/AVtwf4jtmRpZQgnLvTLR/cTNrK9xVF08U9wOCyJFbxaa/YfPIIU2f89FiQQRkkFxl960DIG5lHgnH0mxsZpSZ3jLui6AxwKCGDXLqF4gszLB3xFLgdgO3V7FP3OJrLDCJWsvUpX+vTnhjn1S2i04T1JEVXAIy61y8xnZuXt4noIusAQbUwfPx6YGApJQhHv3qpAKwZd0XRFcBfelvTTnQE4XLQ+lCevq05cP7525pf+XH5A7yMIoVBWHMcAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI2KzAxOjAwENz27QAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMGGBTlEAAAAidEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMjA3eDE0KzIwMis2MzajpaHoAAAAJ3RFWHRwczpMZXZlbABBZG9iZUZvbnQtMS4wOiBDTU1JMTIgMDAzLjAwMgoxF5a7AAAASXRFWHRwczpTcG90Q29sb3ItMAAvZGV2L3NobS96ZjItY2FjaGUvOTg1ZGEzYmRhNzNkZTdmZGNjMWY0MDc0ZTAzYWM0YjQuZHZpIC1vm51t6gAAAEV0RVh0cHM6U3BvdENvbG9yLTEAL2Rldi9zaG0vemYyLWNhY2hlLzk4NWRhM2JkYTczZGU3ZmRjYzFmNDA3NGUwM2FjNGI0LnBzQtnw/wAAAABJRU5ErkJggg==" />.</p>
<p>The graph below shows the case where the fraction of parallelized code is 95%, and where we can see the speedup for an increasing number of processes, for different functions of <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC8AAAAXCAMAAACs0OZeAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGIRFWZ7qq7Iszdv9Ww8gAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAD3SURBVDjLvZPbFoQgCEXxgqKS/P/fTmmaWjOrp+ElxC0cSAH+b0rfQsbeQhpRFceZ8xhW87tvacVtsDEVJ7aQkyMBx21P7nAVkcHx4ZDrISkBKwebeObxTGu3HqJcv3LU9YuihE11D+WKeCkNhUkNCmEREn2LsdT+i/49oRsPaDknti3ymVLdITMVaOXkkh/KONvPwGlCmG58jjARE9/avXgW84MPfuWV6O88S/sdufVL0wD3tRoWvm/2eS7yIY3lertgalmMknG6AlO51HvjDR5tuA/MEK5U5B55uohAerhMHJ9wPUwHFY3PR6kHnix8tXfv8Z19AFNBBeDUm6STAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI2KzAxOjAwENz27QAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMGGBTlEAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMjh4MTQrMjkxKzYzNknHFZUAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS81YzMxNmEwYTk2MWY5ZDlmYjc0ODZjN2I5MDZkN2FlMC5kdmkgLW/SwrbMAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNWMzMTZhMGE5NjFmOWQ5ZmI3NDg2YzdiOTA2ZDdhZTAucHPSQxX4AAAAAElFTkSuQmCC" />, assuming <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA0AAAAMBAMAAABLmSrqAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADBQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////L2OGaQAAAA50Uk5TABF3iFUzzJnuRGYi3arnfTKrAAAAAWJLR0QAiAUdSAAAAAlwSFlzAAAAeAAAAHgAnfVaYAAAADdJREFUCNdjYMAAQsYOIIo1lbEUREtM4HoNosug0s0QivMlhGYF0tIgxhEGxkIQHetpBJEJBREAXRgIWprGqKgAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTgtMTEtMjJUMDA6MDI6MjgrMDE6MDDvoIk3AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE4LTExLTIyVDAwOjAyOjI4KzAxOjAwnv0xiwAAAB90RVh0cHM6SGlSZXNCb3VuZGluZ0JveAA4eDcrMzAyKzYzOTdZhioAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS80YThhMDhmMDlkMzdiNzM3OTU2NDkwMzg0MDhiNWYzMy5kdmkgLW+aeJNWAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNGE4YTA4ZjA5ZDM3YjczNzk1NjQ5MDM4NDA4YjVmMzMucHNMqBoCAAAAAElFTkSuQmCC" />= 0.005 (this constant would vary between different problems and platforms). In the case of no overhead, the result is as predicted by Amdahl’s law, but when we start adding overhead, we see that something is happening.</p>
<p>For a linearly increasing overhead, we find that the speedup doesn’t reach a value greater than five before the communication starts to counteract the increased computation power added by more processes. For a quadratic function, <img class="latexImg" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC8AAAAXCAMAAACs0OZeAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAADNQTFRF////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////8T5qHgAAAA90Uk5TAGYzdxGIRFWZ7qq7Iszdv9Ww8gAAAAFiS0dEAIgFHUgAAAAJcEhZcwAAAHgAAAB4AJ31WmAAAAD3SURBVDjLvZPbFoQgCEXxgqKS/P/fTmmaWjOrp+ElxC0cSAH+b0rfQsbeQhpRFceZ8xhW87tvacVtsDEVJ7aQkyMBx21P7nAVkcHx4ZDrISkBKwebeObxTGu3HqJcv3LU9YuihE11D+WKeCkNhUkNCmEREn2LsdT+i/49oRsPaDknti3ymVLdITMVaOXkkh/KONvPwGlCmG58jjARE9/avXgW84MPfuWV6O88S/sdufVL0wD3tRoWvm/2eS7yIY3lertgalmMknG6AlO51HvjDR5tuA/MEK5U5B55uohAerhMHJ9wPUwHFY3PR6kHnix8tXfv8Z19AFNBBeDUm6STAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDE4LTExLTIyVDAxOjMxOjI2KzAxOjAwENz27QAAACV0RVh0ZGF0ZTptb2RpZnkAMjAxOC0xMS0yMlQwMTozMToyNiswMTowMGGBTlEAAAAhdEVYdHBzOkhpUmVzQm91bmRpbmdCb3gAMjh4MTQrMjkxKzYzNknHFZUAAAAndEVYdHBzOkxldmVsAEFkb2JlRm9udC0xLjA6IENNTUkxMiAwMDMuMDAyCjEXlrsAAABJdEVYdHBzOlNwb3RDb2xvci0wAC9kZXYvc2htL3pmMi1jYWNoZS81YzMxNmEwYTk2MWY5ZDlmYjc0ODZjN2I5MDZkN2FlMC5kdmkgLW/SwrbMAAAARXRFWHRwczpTcG90Q29sb3ItMQAvZGV2L3NobS96ZjItY2FjaGUvNWMzMTZhMGE5NjFmOWQ5ZmI3NDg2YzdiOTA2ZDdhZTAucHPSQxX4AAAAAElFTkSuQmCC" />, the result is even worse and, as you might recall from our earlier blog post on <a href="http://www.comsol.com/blogs/intro-distributed-memory-computing/">distributed memory computing</a>, the increase of communication is quadratic in the case of all-to-all communication.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Speedup-with-added-overhead.png" alt="Speedup with added overhead" title="" width="770" height="400" class="alignnone size-full wp-image-28823" /><br />
<em>Speedup with added overhead. The constant, c, is chosen to be 0.005.</em></p>
<p>Due to this phenomenon, we cannot expect to have a speedup on a cluster for, say, a small time-dependent problem when adding more and more processes. The amount of communication would increase faster than any gain from added processes. Yet, in this case, we have only considered a fixed problem size and the &#8220;slowdown&#8221; effect introduced through communication would be less relevant as we increase the size of our problem.</p>
<h3>Batch Sweeps in COMSOL Multiphysics</h3>
<p>Let us now leave the theoretical aspect and learn how to make use of the batch sweep feature in COMSOL Multiphysics. As our example model, we will use the electrodeless lamp, which is available in the <a href="http://www.comsol.com/model/electrodeless-lamp-10062">Model Gallery</a>. This model is small, at around 80,000 degrees of freedom, but needs about 130 time steps in its solution. To make this transient model parametric as well, we will compute the model for several values of the lamp power, namely 50 W, 60 W, 70 W, and 80 W.</p>
<p>On my workstation, a Fujitsu® CELSIUS® equipped with an Intel® Xeon® E5-2643 quad core processor and 16 GB of RAM, the following compute times are received:</p>
<div style="text-align: center">
<table cellpadding="10">
<tr>
<th>Number of Cores</th>
<th>Compute Time per Parameter</th>
<th>Compute Time for Sweep</th>
</tr>
<tr>
<td>1</td>
<td>30 mins</td>
<td>120 mins</td>
</tr>
<tr>
<td>2</td>
<td>21 mins</td>
<td>82 mins</td>
</tr>
<tr>
<td>3</td>
<td>17 mins</td>
<td>68 mins</td>
</tr>
<tr>
<td>4</td>
<td>18 mins</td>
<td>72 mins</td>
</tr>
</table>
</div>
<p>The speedup here is far from perfect &#8212; just about 1.7 for three cores and has even decreased for four cores. This is due to fact that it is a small model with a low number of degrees of freedom per thread within each time step.</p>
<p>We will now use the batch sweep functionality to parallelize this problem in another way: we will switch from <em>data parallelism</em> to <em>task parallelism</em>. We will create a batch job for each parameter value and see what this does to our computation times. To do this, we first activate the “Advanced Study Options”, then we right-click on “Study 1” and choose “Batch Sweep”, as illustrated in the animation below:</p>
<div class="wistia_responsive_padding" style="padding:77.81% 0 0 0;position:relative;">
<div class="wistia_responsive_wrapper" style="height:100%;left:0;position:absolute;top:0;width:100%;"><iframe src="https://fast.wistia.net/embed/iframe/y5yidn5r9z?videoFoam=true" title="Wistia video player" allowtransparency="true" frameborder="0" scrolling="no" class="wistia_embed" name="wistia_embed" allowfullscreen mozallowfullscreen webkitallowfullscreen oallowfullscreen msallowfullscreen width="100%" height="100%"></iframe></div>
</div>
<p><script src="https://fast.wistia.net/assets/external/E-v1.js" async></script></p>
<p><em>How to activate batch sweep in a model, include the parameter values, and specify the number of simultaneous jobs.</em></p>
<p>The graph below indicates the productivity or &#8220;speedup&#8221; we can get by controlling the parallelization. When running one batch job using four cores, we get the result from above: 72 minutes. When changing the configuration to two batch jobs simultaneously, each using two cores, we can compute all the parameters in 48 minutes. Finally, when computing four batch jobs at the same time, each using one processor, the total computation time is 34 minutes. This gives speedups of 2.5 and 3.5 times, respectively &#8212; a lot better compared to parallelizing through using pure shared memory alone.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Batch-sweep-simulations-per-day-versus-configuration-of-processes-and-threads.png" alt="Batch sweeps simulations per day versus configuration of processes and threads" title="" width="615" height="351" class="alignnone size-full wp-image-28825" /><br />
<em>Simulations per day for the electrodeless lamp model. &#8220;4&#215;1&#8243; means four batch jobs run simultaneously, using one core each. </em></p>
<h3>Concluding the Hybrid Modeling Series</h3>
<p>Throughout this <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">blog series</a>, we have learned about shared, distributed, and hybrid memory computing and what their weaknesses and strengths are, as well as the large potential of parallel computing. We have also learned that there is no such thing as a free lunch when it comes to computing; we cannot just add processes and hope for perfect speedup for all types of problems.</p>
<p>Instead, we need to choose the best way to parallelize a problem to get the most performance gain out of our hardware, much like we have to choose the correct solver to get the best solution time when solving a numerical problem.</p>
<p>Selecting the right parallel configuration is not always easy, and it can be hard to know beforehand how you should “hybridize” your parallel computations. But as in many other cases, experience comes from playing around and testing, and with COMSOL Multiphysics, you have the possibility to do that. Try it yourself with different configurations and different models, and you will soon know how to set the software up in order to get the best performance out of your hardware.</p>
<p><em>Fujitsu is a registered trademark of Fujitsu Limited in the United States and other countries. CELSIUS is a registered trademark of Fujitsu Technology Solutions in the United States and other countries. Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.</em></p>

<script charset="ISO-8859-1" src="//fast.wistia.com/static/concat/iframe-api-v1.js"></script>]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/added-value-task-parallelism-batch-sweeps/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hybrid Computing: Advantages of Shared and Distributed Memory Combined</title>
		<link>https://www.comsol.no/blogs/hybrid-computing-advantages-shared-distributed-memory-combined/</link>
		<comments>https://www.comsol.no/blogs/hybrid-computing-advantages-shared-distributed-memory-combined/#comments</comments>
		<pubDate>Thu, 06 Mar 2014 14:53:55 +0000</pubDate>
		<dc:creator><![CDATA[Jan-Philipp Weiss]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[COMSOL Now]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Hybrid Modeling series]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=28089</guid>
		<description><![CDATA[Previously in this blog series, my colleague Pär described parallel numerical simulations with COMSOL Multiphysics on shared and distributed memory platforms. Today, we discuss the combination of these two methods: hybrid computing. I will try to shed some light onto the various aspects of hybrid computing and modeling, and show how COMSOL Multiphysics can use hybrid configurations in order to squeeze out the best performance on parallel platforms. Introducing Hybrid Computing In recent years, cluster systems have become more and [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Previously in this <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">blog series</a>, my colleague Pär described parallel numerical simulations with COMSOL Multiphysics on shared and distributed memory platforms. Today, we discuss the combination of these two methods: <em>hybrid computing</em>. I will try to shed some light onto the various aspects of hybrid computing and modeling, and show how COMSOL Multiphysics can use hybrid configurations in order to squeeze out the best performance on parallel platforms.</p>
<p><span id="more-28089"></span></p>
<h3>Introducing Hybrid Computing</h3>
<p>In recent years, cluster systems have become more and more powerful by incorporating the latest multicore technologies. Parallelism now spreads out across several levels. Huge systems have to deal with parallelism across nodes, sockets, cores, and even vector units (where operations are performed on short vectors or 1D arrays of data, not on scalar values or single data items).</p>
<p>In addition, the memory systems are organized into several levels as well. And as these hierarchies get deeper and deeper and more complex, the programming and execution models need to reflect these nested configurations. It turns out that it is not sufficient to deal with just a <em>single</em> programming and execution model. Therefore, computing is becoming increasingly <em>hybrid</em>.</p>
<h3>Core&#8217;s Law and Clusters</h3>
<p>Multiple computing cores are ubiquitous and everyone needs to deal with parallelism. As clock frequencies have stalled at a high level of around 2-5 GHz, the ever increasing hunger for computing power can only be satisfied by adding more and more cores. The well-known <em><a href="http://en.wikipedia.org/wiki/Moore%27s_law" target="_blank">Moore&#8217;s law</a></em> has turned into a corollary <em>Core&#8217;s law</em> &#8212; stating that the core count per die area will keep on increasing exponentially.</p>
<p>A direct consequence of this development is that the resources per core (e.g. the cache memory and number of memory channels) will become smaller due to logistics. The latest multicore incarnations, of the classical CPU-type, have up to sixteen cores but only up to four memory channels.</p>
<p>Typically, several multicore CPUs are providing highly capable shared memory nodes that bring impressive computing power in terms of GigaFLOP/s (i.e. <a href="http://en.wikipedia.org/wiki/Flops" target="_blank">billion floating point operations per second</a>). Several of these shared memory nodes are then integrated into clusters via high-speed interconnects &#8212; providing nearly unlimited resources in terms of (Giga/Tera/Peta)FLOP/s and memory capacities. The only limits are your IT department&#8217;s budget and floor space.</p>
<p>These days, a cluster system needs to have more than 100,000 cores in order to receive a high ranking in the recent <a href="http://www.top500.org/" target="_blank">TOP500 list</a>.</p>
<h3>Reaching the Limits</h3>
<p>A cluster represents a distributed memory system where messages are sent between the nodes by means of message passing. Here, the <em><a href="http://en.wikipedia.org/wiki/Message_Passing_Interface" target="_blank">Message Passing Interface</a></em> (MPI) with several open source and commercial implementations is the defacto standard. Typically, inside the node, <a href="http://en.wikipedia.org/wiki/OpenMP" target="_blank">OpenMP</a> is used for shared memory programming.</p>
<p>Yet, numerical experiments easily unveil the limitations of multicore platforms: it is getting more and more difficult to feed the beasts. Put plainly, it is hard to get as much data to the cores quickly enough to keep them busy and crunching numbers. Basically, you could say that FLOP/s are available for free but you must keep an eye on the <em>computational intensity</em>, i.e. the number of FLOP per data element &#8212; a characteristic of the algorithm. The computational intensity is providing an upper bound for the achievable FLOP/s-rate.</p>
<p>If the computational intensity is increasing linearly in problem size, the bandwidth is not limiting performance. But the typical operation for finite element numerical simulations is of sparse matrix-vector type, which is bandwidth-bound, and the bandwidth is typically proportional to the number of memory channels in a multicore system. Once the available memory bandwidth is saturated, turning on more and more cores has no added value. This is the reason why speedups on multicore CPUs and shared memory platforms often saturate as the memory traffic is already at a maximum level, even though not all cores are being used for computations. Cluster systems, on the other hand, have the additional benefit of increased accumulated bandwidth and therefore better performance. (For further details about bandwidth limitations exemplified by the STREAM benchmark go <a href="http://www.admin-magazine.com/HPC/Articles/Finding-Memory-Bottlenecks-with-Stream" target="_blank">here</a>.)</p>
<p>On the hardware side, attempts have been made to mitigate the bandwidth limitations by introducing a hierarchy of <a href="http://en.wikipedia.org/wiki/Cache_%28computing%29" target="_blank">caches</a>. These can range from small level-1 caches, restricted to a single core with only a few hundred KB of memory size, to up to big level-3 caches, shared between several cores with up to a few dozen MB of memory size. The aim of the caches is to keep the data as close to the cores as possible, so that data that is to be reused does not need to be transferred from the main memory over and over again. This removes some of the pressure from the memory channels.</p>
<p>Even a single multicore processor itself can bring about a nested memory hierarchy. Yet, packaging several multicore processors into multiple sockets builds a shared memory node with <em><a href="http://en.wikipedia.org/wiki/Non-uniform_memory_access" target="_blank">non-uniform memory access</a></em> (NUMA). In other words, some parts of the application data is stored in memory local to a core and some of the data is stored in remote memory places. Therefore, some parts of the data can be accessed very fast while other accesses have longer latencies. This means that the correct placement of data and a corresponding distribution of compute tasks are critical for performance.</p>
<h3>Being Aware of Hierarchies on Performance</h3>
<p>We have learned that shared memory systems build up a hierarchical system of cores and memories and that the programming model, algorithms, and implementations need to be fully aware of these hierarchies. As the computing resources of a shared memory node are limited, additional power can be added by connecting several shared memory nodes by a fast interconnection network in distributed computing.</p>
<p>To bring back our previous analogy from the <a href="http://www.comsol.com/blogs/intro-shared-memory-computing/#analogy-shared-memory">shared memory</a> and <a href="http://www.comsol.com/blogs/intro-distributed-memory-computing/#analogy-distributed-memory">distributed memory computing</a> blog posts, we are now using a variable number of conference locations to represent the cluster, where each location provides a conference room with a big table to represent a shared memory node.</p>
<p>If the overall work to be done increases more and more, the conference manager can call other locations of the company for help. Suppose the conference rooms are located in Boston, San Francisco, London, Paris, and Munich, for instance. These remote locations represent the distributed memory processes (the MPI processes). The manager can now include new locations on demand, such as adding Stockholm &#8212; or in terms of hybrid computing, she can set up additional processes (i.e. conference tables) per shared memory node (i.e. per conference room location).</p>
<p>Each conference room location (<em>process</em>) has a phone on the table in the meeting room, which employees can use to call any other location (another <em>process</em>) and ask for data or information (<em>message passing</em>). The local staff (a limited resource) is sitting around each conference table in a particular location. Every employee at the conference table represents a <em>thread</em> that helps solving the tasks at the conference room table.</p>
<p>On the table, local data is available in a report (<em>level-1 caches</em>), in folders (<em>level-2 caches</em>), in folders located inside cabinets (<em>level-3 caches</em>), in the library on the same floor (<em>main memory</em>), or are filed in the archive in the basement (<em>hard-disk</em>). Several assistants (<em>memory channels</em>) are running around in the building in order to fetch new folders with requested information from the library or the archive. The number of assistants is limited and they can only carry a limited number of folders at the same time (<em>bandwidth</em>).</p>
<p>Having more people at the table does not add any value if there are no more assistants available who can bring them enough data to work with. It is clear that the conference manager needs to make sure that the data necessary for the work is available on the table and that all the employees in the room can contribute effectively to reach a solution to a given problem. She should also make sure that the number of calls to other conference locations via the phone on the table is kept at a minimum. In terms of numerics, the implementations should be hierarchy-aware, data should be kept local, and the amount of communication should be kept at a minimum.</p>
<p>The phone calls between conference locations represent MPI calls between processes. On the meeting room table, a shared memory mechanism should be employed. In total, a perfect interplay of distributed memory (MPI) and shared memory (OpenMP) is required.</p>
<h3>Example of a Hybrid Cluster Configuration</h3>
<p>Let&#8217;s take a closer look at some possible cluster and core configurations. In our test benchmark model below, we investigated a small cluster that is made up of three shared memory nodes. Every node has two sockets with a quad-core processor in each socket. The total core count is 24. Each processor has a local memory bank, also illustrating the NUMA configuration of main memory. </p>
<p>Now, we test the cases where three, six, twelve, or twenty-four MPI processes are configured on this cluster. With three MPI processes, we have one MPI process per node and eight threads per MPI process that can communicate via shared memory/OpenMP inside the node and across the two sockets of the node. With six MPI processes we would have one MPI process per socket, i.e. one per processor. Each MPI process then needs four threads. The third possibility of twelve MPI processes is to set up two MPI processes per processor with two threads each. Finally, we can test one MPI process per core totaling to twenty-four MPI processes on the system. This is the non-hybrid case, where no shared memory parallelism is needed, and all communication occurs through distributed memory.</p>
<p>Which configuration do you think is the best one?  </p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/MPI-configurations.png" alt="Illustration depicting different MPI configurations on a cluster with three shared memory nodes" title="" width="486" height="873" class="alignnone size-full wp-image-28115" /><br />
<em>Different MPI configurations on a cluster with three shared memory nodes consisting of two sockets, each with a quad-core processor and local memory banks.</em></p>
<h3>Why Hybrid?</h3>
<p>Why not use a single programming and execution model and ignore the hierarchical core and memory configuration? First of all, it&#8217;s because shared memory (OpenMP) mechanisms cannot be used globally on standard type systems with standard installations (no world-spanning table is available to share the data).</p>
<p>So why not use message-passing globally across all cores &#8212; with 24 MPI processes as in the example above? Of course it would be possible; you can use message-passing even between cores on the same shared memory node. But it would mean that every employee, in our analogy above, would have his or her own phone and would have to make calls to all other employees worldwide. There would be problems of engaged signals or people put on hold and, as a result, not working.</p>
<p>In fact, the real scenario is more complex because modern MPI implementations are well aware of hierarchies and are trimmed to use shared memory mechanisms for local communication. One of the downsides of MPI is that memory resources are wasted quadratically in the number of participating processes. The reason for this is that internal buffers are set up where the data is stored (and possibly duplicated) before the actual messages can be sent between the processes. On 10<sup>6</sup> cores, this would require 10<sup>12</sup> memory buffers for a single global communication call (if the MPI implementation is not hierarchy-aware). In hybrid computing, the number of MPI processes is typically lower than the core count &#8212; saving resources in terms of memory and data transfers.</p>
<p>Another big advantage of using the hybrid model is that many mechanisms (like data placement, thread pinning, or load balancing, to name a few) need some dedicated actions of the programmer. The hybrid model provides a much more versatile tool to express these details. It shares the advantages of a global and discrete view of the memory. It is a natural choice to consider the hybrid OpenMP + MPI model since it fits with the hybrid structure of memory and core configurations, as illustrated in the figures.</p>
<p>Most importantly, the hybrid model is flexible and adaptable and helps reduce overheads and demands for resources. In terms of finite element modeling, hierarchies in the data and tasks can often be derived from the physical model, its geometry, the algorithms, and the solvers used. These hierarchies can then be translated into shared and distributed memory mechanisms.</p>
<p>Of course, the hybrid model also combines the pitfalls of both shared memory and distributed computing, and ends up being much more complex. However, the final outcome is worth the effort! <a href="http://www.comsol.com/comsol-multiphysics">COMSOL Multiphysics</a> provides sophisticated data structures and algorithms that represent and exploit multi-level parallelism to a large extent. It supports shared memory and distributed memory at the same time and the user can tune the interaction of both by a set of parameters for best performance.</p>
<h3>Benchmarking Your Models and Your System</h3>
<p>After all these theoretical explanations, it&#8217;s now time to connect the concept with real models. I suppose you are quite curious to investigate the scalability, the speedups, and the productivity gains when running COMSOL Multiphysics in parallel on the compute servers and clusters in your department.</p>
<p>In order to obtain proper results, it is very important to keep an eye on the problem size. The size of the subproblems on a shared memory node have to be large enough for every thread to obtain a reasonable amount of work and that the ratio of computations per process to the amount of data exchanged via messages between the processes is sufficiently large. As Pär had mentioned in his earlier <a href="http://www.comsol.com/blogs/intro-shared-memory-computing/">blog post</a>, it is crucial to consider whether problems are parallelizable at all. For example, if the major effort when setting up your model is dedicated to computing a long time-stepping series (that may last for hours) but the problem size in every time step is not significantly large, you will not see significant benefits when ramping up additional nodes and cores.</p>
<p>I encourage you to try various configurations of the hybrid model, even when you only have a shared-memory machine available.</p>
<h3>A Hybrid Scalability Study</h3>
<p>In the test scenario presented here, we consider a structural mechanics model representing a ten-spoke wheel rim where the tire pressure and the load distribution is simulated.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Model-of-a-wheel-rim.png" alt="A model and submodel of a wheel rim" width="314" height="300" class="alignnone size-full wp-image-28117" title="Model of a wheel rim" /><br />
<em>Model of a wheel rim and its corresponding submodel.</em></p>
<p>Our simulation is run on the three compute nodes mentioned above, where each node has two sockets with a quad-core Intel Xeon® E5-2609 processor with 64 GB RAM per node, and where 32 GB are associated with a processor. The nodes are interconnected by (a rather slow) Gigabit ethernet connection. In total, we have 24 cores available on this particular machine.</p>
<p>In the graph below, we compare the number of simulations of this one model that can be run per day, according to the hybrid model configuration. We consider the case of 1, 2, 3, 6, 12, and 24 MPI processes running. This leads to 1, 2, 3, 4, 6, 8, 12, 16, and 24 active cores running, depending on the configuration. Each bar in the graph represents an (<em>nn x np</em>)-configuration, where <em>nn</em> is the number of processes, <em>np</em> is the number of threads per process, and <em>nn*np</em> is the number of active cores. The bars are grouped into blocks with the same active core count, where their configuration is listed at the top of the bars. </p>
<p>The graph shows that the performance increases, in general, with the number of active cores. We see slight variations for the different configurations. When reaching the full system load with 24 active cores, we find that the best configuration is to assign one MPI process per socket (i.e. six MPI processes in total). The performance and productivity gain on the three-node system with a hybrid process-thread configuration (case 6&#215;4) is more than twice as good as the performance on a single shared memory node (case 1&#215;8). It is also almost 30% better than the completely distributed model (case 24&#215;1), which uses the same number of cores.</p>
<p>When comparing this to its neighboring fully-distributed configuration (case 12&#215;1), we see that there is no real gain in performance despite doubling the number of cores. This is because the slow Gigabit ethernet network is already very close to its limits with 12 MPI processes. More MPI processes are therefore not beneficial. The situation is different when comparing the 12&#215;1- and 12&#215;2-configurations, where the number of threads per process is doubled along with the number of active cores. This basically means that, in this case, the amount of communication via ethernet is not increased.   </p>
<p><img src="https://cdn.comsol.com/wordpress/2014/03/Benchmarking-model-using-different-configurations-in-a-hybrid-model.png" alt="A graph used to benchmark a structural mechanics model of a wheel rim using different configurations in a hybrid model" width="756" height="388" class="alignnone size-full wp-image-28123" title="Benchmarking model using different configurations in a hybrid model" /><br />
<em>Benchmarking a structural mechanics model of a wheel rim using different configurations in a hybrid model. The y-axis indicates performance and productivity gain through the total number of simulations of this model that can be run during a day. The bars indicate different configurations of</em> nn x np, <em>where</em> nn <em>is the number of MPI processes, and</em> np <em>is the number of threads per process.</em></p>
<h3>Setting up Hybrid Runs in COMSOL Multiphysics</h3>
<p>When running COMSOL Multiphysics in the parallel hybrid mode, you have various possibilities to adjust the number of processes and threads used. Some settings can be found in the <em>Multicore</em> and <em>Cluster Computing</em> section in the <em>Preference</em> dialogue or in the <em>Cluster Computing</em> subnode of the <em>Study</em> node in the Model Builder. You can fine-tune the settings in the <em>Cluster Computing</em> subnode of the <em>Job Configurations</em> node, where you can specify the number of (physical) nodes you want to use in your cluster computation. Using the drop-down menu in the <em>Settings</em> window, you have additional choices, like the number of processes per host or the node granularity, which decides whether to put one process per node, socket, or core.</p>
<p>The number of threads used per process is set automatically. COMSOL Multiphysics always utilizes the maximum number of available cores, i.e. the number of cores per process is set to the number of available cores on the node divided by the number of processes on the node. You can override the number of cores per process by setting the <em>Number of processors</em> in the <em>Multicore</em> section of the <em>Preferences</em> dialogue to the requested value.</p>
<p>On Linux systems, you have the command line options <em>-nn</em> for the number of processes, <em>-np</em> for overwriting the automatically determined number of threads per process, and the option <em>-nnhost</em> that sets the number of processes per host. A natural choice for <em>nnhost</em> is either 1 or the number of sockets per node.</p>
<h3>Next Steps</h3>
<ul>
<li>For additional options, use cases, and examples please refer to the COMSOL Multiphysics documentation.</li>
<li>Our final post in this <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">Hybrid Modeling blog series </a>will be on the topic of batch sweeps.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/hybrid-computing-advantages-shared-distributed-memory-combined/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Intro to the What, Why, and How of Distributed Memory Computing</title>
		<link>https://www.comsol.no/blogs/intro-distributed-memory-computing/</link>
		<comments>https://www.comsol.no/blogs/intro-distributed-memory-computing/#comments</comments>
		<pubDate>Thu, 20 Feb 2014 14:03:45 +0000</pubDate>
		<dc:creator><![CDATA[Pär Persson Mattsson]]></dc:creator>
				<category><![CDATA[Cluster & Cloud Computing]]></category>
		<category><![CDATA[COMSOL Now]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Introduction]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Hybrid Modeling series]]></category>

		<guid isPermaLink="false">http://com.staging.comsol.com/blogs/?p=27641</guid>
		<description><![CDATA[In the latest post in this Hybrid Modeling blog series, we discussed the basic principles behind shared memory computing &#8212; what it is, why we use it, and how the COMSOL software uses it in its computations. Today, we are going to discuss the other building block of hybrid parallel computing: distributed memory computing. Processes and Clusters Recalling what we learned in the last blog post, we now know that shared memory computing is the utilization of threads to split [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>In the latest post in this Hybrid Modeling blog series, we discussed the basic principles behind shared memory computing &#8212; what it is, why we use it, and how the COMSOL software uses it in its computations. Today, we are going to discuss the other building block of hybrid parallel computing: <em>distributed memory computing</em>.</p>
<p><span id="more-27641"></span></p>
<h3>Processes and Clusters</h3>
<p>Recalling what we learned in the <a href="http://www.comsol.com/blogs/intro-shared-memory-computing/">last blog post</a>, we now know that shared memory computing is the utilization of threads to split up the work in a program into several smaller work units that can run in parallel within a node. These threads share access to a certain portion of memory &#8212; hence the name <em>shared memory computing</em>. By contrast, the parallelization in <em>distributed memory computing</em> is done via several processes executing multiple threads, each with a private space of memory that the other processes cannot access. All these processes, distributed across several computers, processors, and/or multiple cores, are the small parts that together build up a parallel program in the <em>distributed memory</em> approach.</p>
<p>To put it plainly, the memory is not shared anymore, it is distributed (check out the diagram in <a href="http://www.comsol.com/blogs/hybrid-parallel-computing-speeds-up-physics-simulations/">our first blog post</a> in this series.)</p>
<p>To understand why distributed computing was developed in this way, we need to consider the basic concept of <em>cluster computing</em>. The memory and computing power of a single computer is limited by its resources. In order to get more performance and to increase the amount of available memory, scientists started connecting several computers together into what is called a <em>computer cluster</em>.</p>
<h3>Splitting of the Problem</h3>
<p>The idea of physically distributing processes across a computer cluster results in a new level of complexity when parallelizing problems. Every problem needs to be split into pieces &#8212; the data needs to be split and the corresponding tasks need to be distributed. Consider a matrix type problem, where operations are performed on a huge array. This array can be split into blocks (maybe disjointed, maybe overlapping) and each process then handles its private block. Of course, the operations and data on each block might be coupled to the operations and data on other blocks, which makes it necessary to introduce a communication mechanism between the processes.</p>
<p>To this end, data or information required by other processes will be gathered into chunks that are then exchanged between processes by sending messages. This approach is called <em>message-passing</em>, and messages can be exchanged globally (all-to-all, all-to-one, one-to-all) or point-to-point (one sending process, one receiving process). Depending on the couplings of the overall problem, a lot of communication might be necessary.</p>
<p>The objective is to keep data and operations as local as possible in order to keep the communication volume as low as possible.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/02/Speeding-up-communications-distributed-memory-computing-copy.jpg" alt="Diagram showing message-passing among computers" width="2958" height="898" class="alignnone size-full wp-image-27675" title="Speeding up communications distributed memory computing copy" /><br />
<em>The number of messages that needs to be sent in all-to-all can be described by a <a href="http://mathworld.wolfram.com/CompleteGraph.html" target="_blank">complete graph</a>. The increase is quadratic with respect to the number of compute nodes used.</em></p>
<h3>Speeding up Computations or Solving Larger Problems</h3>
<p>The scientist who is working on a computer cluster can benefit from the additional resources in the following two ways.</p>
<p>First, with more memory and more computing power available, she can increase the problem size and hereby solve larger problems in the same amount of time through adding additional processes and keeping the work load per process (i.e. size of the subproblem and number of operations) at the same level. This is called <em>weak scaling</em>.</p>
<p>Alternatively, she can maintain the overall problem size and distribute smaller subproblems to a larger number of processes. Every process then needs to deal with a smaller workload and can finish its tasks much faster. In the optimal case, the reduction in computing time for a problem of fixed size distributed on <em>P</em> processes will be <em>P</em>. Instead of one simulation per time unit (one hour, one day, etc.), <em>P</em> simulations can be run per time unit. This approach is known as <em>strong scaling</em>.</p>
<p>In short, distributed memory computing can help you solve larger problems within the same amount of time or help you solve problems of fixed size in a shorter amount of time.  </p>
<h3><a name="analogy-distributed-memory"></a>Communication Needed</h3>
<p>Let&#8217;s now take a closer look at message-passing. How do the processes know what the other parts of the programs are doing? As we know from above, the processes explicitly have to send and receive the information and variables that they or other processors need. This, in turn, brings with it some drawbacks, especially concerning the time it takes to send the messages over the network.</p>
<p>As an illustration, we can recall <a href="http://www.comsol.com/blogs/intro-shared-memory-computing/#analogy-shared-memory">the conference room analogy discussed in the shared memory blog post</a>, where collaborative work occurred around a table and all the information was made freely available for everyone sitting at that table to access and work in, even in parallel. Suppose that this time, the conference room and its table have been replaced by individual offices, where the employees sit and make changes to the papers they have in front of them.</p>
<p>In this scenario, one employee, let&#8217;s call her Alice, makes a change to report A, and wants to alert and pass these changes to her coworker Bob. She now needs to stop what she&#8217;s doing, get out of her office, walk over to Bob’s office, deliver the new information, and then return to her desk before continuing to work. This is much more cumbersome than sliding a sheet of paper across the table in a conference room. The worst case in this scenario is that Alice will spend more time <em>alerting her coworkers about her changes</em> than actually <em>making</em> changes.</p>
<p>The communication step in the new version of our analogy can be a bottleneck, slowing down the overall work progress. If we were to reduce the amount of communication that needs to be done (or speed up the communication, perhaps by installing telephones in the offices, or using an even faster kind of network) we could spend less time waiting on messages being delivered and more time computing our numerical simulations. In distributed memory computing, the bottleneck is usually the technology that passes electronic data to each other, the wires between the nodes, if you like. The current industry standard for providing high throughput and low latency is <em>Infiniband</em>, which allows the passage of messages to occur a lot quicker than ethernet.</p>
<h3>Why Use Distributed Memory?</h3>
<p>Distributed memory computing has a number of advantages. One of the reasons why you would utilize distributed memory is the same as in the shared memory case. When adding more compute power, either in the form of additional cores, sockets, or nodes in a cluster, we can start more and more processes and take advantage of the added resources. We can use the gained compute power to get the results of the simulations faster.</p>
<p>With the distributed memory approach, we also get the advantage that with every compute node added to a cluster, we have more memory available. We are no longer limited by the amount of memory our mainboard allows us to build in and we can, in theory, compute arbitrarily large models. In most cases, scalability of distributed memory computing exceeds that of shared memory computing, i.e. the speed-up will saturate at a much larger number of processes (compared to the number of threads used).</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/02/Number-of-simulations-graph.png" alt="Bar graph showing the number of simulations performed per day with respect to the number of processes used" title="" width="703" height="417" class="alignnone size-full wp-image-27655" /><br />
<em>Number of simulations per day with respect to the number of processes used for the Perforated Muffler model depicted below. A 1 GB/s ethernet was used as the communication network. The first four processes are executed on one compute node, and the ethernet network comes into use after that. The small difference in simulations per day between 4 and 5 processes shows the impact of a slow communications network, even for a parametric problem. The compute nodes used are equipped with Intel® Xeon® E5-2609 and 64 GB DDR3 @1600 MHz.</em></p>
<p>However, we do have to be aware of the limitations as well. Just as in the shared memory case, there are some problems that are well suited for computing with the distributed memory approach, while others are not. This time, we also need to look at the amount of communication that is needed to solve the problem, not only if it is easily parallelized.</p>
<p>Consider, for instance, a time-dependent problem where a large amount of particles are interacting in such a way that after each step, all of the particles need to have the information about every other particle. Assuming that each of the particles are computed by its own process, the amount of communications in such an example can be described by the fully connected graphs (shown above), and the number of messages per iteration grows rapidly as the number of particles and processes increase. In contrast, a parametric sweep, where the parametric values can be computed independently of each other, requires almost no communication at all, and will not suffer as much from the communication bottleneck.</p>
<p><img src="https://cdn.comsol.com/wordpress/2014/02/Muffler-model.png" alt="A small parametric model of a muffler for which speed-up was obtained" title="" width="580" height="463" class="alignnone size-full wp-image-27657" /></p>
<p><em>The model for which the speed-up was obtained. It is a small parametric model (750,000 degrees of freedom) using the PARDISO direct solver. This model is available in the <a href="http://www.comsol.com/model/muffler-with-perforates-1843">Model Gallery</a></em>.</p>
<h3>How COMSOL Takes Advantage of Distributed Memory Computing</h3>
<p>Users that have access to a <a href="http://www.comsol.com/products/licensing/">floating network license (FNL)</a> have the possibility to use the distributed functionality of the <a href="http://www.comsol.com/products">COMSOL software</a> on single machines with multiple cores, on a cluster, or even in the cloud. The COMSOL software&#8217;s solvers can be used in distributed mode without any further configuration. Hence, you can do larger simulations in the same amount of time or speed up your fixed-size simulation. Either way, COMSOL Multiphysics helps you to increase your productivity.</p>
<p><a name="embarrassingly-parallel"></a>The distributed functionality is also very useful if you are computing parametric sweeps. In this case, you can automatically distribute the runs associated to different parameter values across the processes you are starting COMSOL Multiphysics with. Since the different runs in such a sweep can be computed independently of each other, this is called an &#8220;embarrassingly parallel problem&#8221;. The speed-up is, with a good interconnection network, almost the same as the number of processes.</p>
<p>For a detailed instruction on how to set up your compute job for distributed memory computing, we recommend the COMSOL Reference manual, where we list several examples on <a href="http://www.comsol.com/model/micromixer-cluster-and-batch-versions-7581">how to start COMSOL in distributed mode</a>. You may also refer to the user guide of your <a href="http://www.comsol.com/multiphysics/high-performance-computing">HPC</a> cluster for details about how to submit computing jobs.</p>
<p><em>Next up in this <a href="http://www.comsol.com/blogs/tag/hybrid-modeling-series/">blog series</a>, we will dig deeper into the concept of hybrid modeling &#8212; check back soon!</em></p>
]]></content:encoded>
			<wfw:commentRss>https://www.comsol.no/blogs/intro-distributed-memory-computing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
