DevHeads.net

Performance issues/difference of two servers running same task (one is quicker)

Hi

I need some advice what to do next, even if someone tells me to
check out (an)other mailing list(s), tuning site or point me in a better
direction how to solve my annoying problem: one server is much faster
for certain tasks although on "shitty" hardware.

I have tried many things to solve my issue
- changed buffer/pool/cache/etc mysqld
- changed server settings apache/php
- changed various OS settings (sysctl) e.g. turned off IPV6
but havent figured it out.

I have a development server (local) and life servers (data center)
Used mainly for many different websites and one online training site.

the development and life server in question run the same software setup:
- CentOS Linux release 7.6.1810
- bind 32:9.9.4-74.el7_6.1
- Apache/2.4.6 (CentOS)
- PHP 7.1.29
- mysqld Ver 5.7.26
- wordpress, woocommerce, wishlistmember, Sensei etc
- software are all in the same stages of updates.
- even many of the linux conf files are the same (/etc/host, bind, etc)
- the databases are copies/identical

Life server is a Poweredge M710,48GB,2xXeon L5630,LSI Raid1 SSD
Dev server is a DIY, GIGABYTE MX31-BS0, 32GB, 1xXeon E3-1245,MDADM RAID0 1TB Seagate Spinners

Clearly the development server is hardware wise way below the specs of the Dell but
software wise they are identical (they get upgraded at the same time).

During normal operations (i.e. display websites, online training courses etc) the DELL
displays the websites faster although it sits 1000KM up north in a datacenter on
a different network than the local server on the same network as my machine.

Yet the DEV server outshines the DELL when creating a few large custom tables, ie
the local server takes 5s while the DELL takes 15s (small tables), more for bigger tables.

The task is based on:
- level, member, course, group are all ID's
- members can belong to a group, a level and can access many courses
- the ID restricts what they can access and what they belong to.
- a course for each member can have various stages of completion
- using an API (wishlist member) that performs LOCAL calls when accessed locally
I can get who belongs to what and make up my info I need, then use PHP
to make up the table.
- DB calls ARE LOCAL!

Now when I try to create a table of members belonging to the same group level
doing the same course with different stages of completion the DELL takes on average
3 times longer to complete the table (normally about 20 to 30 rows).

I have put microtime() calls before and after certain calls, and it's visibly different:
DEV
Jul 04 04:57:26 UTC _members took 0.0005459785461425 ms
Jul 04 04:57:26 UTC _members took 0.0005321502685546 ms
LIFE
Jul 04 05:00:36 UTC _members took 0.0014369487762451 ms
Jul 04 05:00:36 UTC _members took 0.0013291835784912 ms
If I do this 300+ times, the outcome is very different.

So my questions:

- How can it be that the DELL takes so much longer alltough on the far better hardware?
- How can it be allthough everything (software/os/plugins) is the same?
- This even happens if the DELL is on low load (i.e. middle of the night) and
only serves a few requests.

Same software, same config, same database, same amount of data in the database
yet on better hardware it's slower?

Any ideas anyone?

Comments

Re: Performance issues/difference of two servers runnin

By Steven Tardy at 07/05/2019 - 01:18

On Thu, Jul 4, 2019 at 2:43 AM Jobst Schmalenbach < ... at barrett dot com.au>
wrote:

As others have said the DEV server is a generation newer CPU. For CPU
details I often reference Intels “ark” pages:

<a href="https://ark.intel.com/content/www/us/en/ark/products/47927/intel-xeon-processor-l5630-12m-cache-2-13-ghz-5-86-gt-s-intel-qpi.html" title="https://ark.intel.com/content/www/us/en/ark/products/47927/intel-xeon-processor-l5630-12m-cache-2-13-ghz-5-86-gt-s-intel-qpi.html">https://ark.intel.com/content/www/us/en/ark/products/47927/intel-xeon-pr...</a>
12M Cache, 2.13 GHz, 5.86 GT/s Intel® QPI

<a href="https://ark.intel.com/content/www/us/en/ark/products/52274/intel-xeon-processor-e3-1245-8m-cache-3-30-ghz.html" title="https://ark.intel.com/content/www/us/en/ark/products/52274/intel-xeon-processor-e3-1245-8m-cache-3-30-ghz.html">https://ark.intel.com/content/www/us/en/ark/products/52274/intel-xeon-pr...</a>
8M Cache, 3.30 GHz

The “generations” I mentioned are:
Code NameProducts formerly Westmere EP
<https://ark.intel.com/content/www/us/en/ark/products/codename/54534/westmere-ep.html>
Code NameProducts formerly Sandy Bridge
<https://ark.intel.com/content/www/us/en/ark/products/codename/29900/sandy-bridge.html>

Westmere systems used DDR at 800/1066MHz.
Sandy Bridge systems used DDR at 1066/1333MHz.
Not a huge difference, but likely another contributing factor of
performance.

I would also look at power settings in the BIOS and c-state settings in the
BIOS and OS as disabling c-states (often enabled by default to meet
green/energy star compliance) can make a noticeable performance difference.

Hope that helps.

Have you run "tuned-adm profile throughput-performance"

By Gordon Messmer at 07/05/2019 - 14:48

On 7/4/19 10:18 PM, Steven Tardy wrote:

I'd be surprised if it did, but now that you mention it, I think that we
should probably mention more often that CentOS's default performance
policy is power-saving, which will cut maximum performance in half. 
Every physical system running CentOS should have run "tuned-adm profile
throughput-performance".

<a href="http://jperrin.org/centos/boosting-centos-server-performance/" title="http://jperrin.org/centos/boosting-centos-server-performance/">http://jperrin.org/centos/boosting-centos-server-performance/</a>

Re: Have you run "tuned-adm profile throughput-performa

By Pete Biggs at 07/06/2019 - 09:52

On Fri, 2019-07-05 at 11:48 -0700, Gordon Messmer wrote:
I'm a bit confused.

I've just done some quick experiments on an HPC system. It was
previously set to whatever the default is and then changed to
"throughput-performance". There was no discernible change in
computation time for on 8-core job (on a dual 4-core Xeon; don't judge,
it's an old system I use for testing) - the overall time for the run
was just under an hour for both give or take 10 seconds.

So my question is, would the tuning parameters be expected to make a
difference on long-term CPU bound processes? Or does the CPU just go
at full speed if necessary? Does it depend on the CPU generation?

I'm perfectly willing to set all my HPC cluster nodes to whatever is
necessary to get the best performance, but will changing the profile to
a performance one mean that the machine will use more power when idle?

Finally, is there a decent online source where I can read up on what
the different tuned profile/parameters mean.

Thanks

P.

Re: Have you run "tuned-adm profile throughput-performa

By fred smith at 07/05/2019 - 16:46

On Fri, Jul 05, 2019 at 11:48:45AM -0700, Gordon Messmer wrote:
Not for my (admittedly dog-like) AcerAspire One netbook, dual core
1.6 GHz Aton with a whopping 2 gigs of RAM.

it would run for a little while, pause for a minute or two while the
hard drive went chunka-chunka, then eventually come back to life.
not pleasant.

Re: Have you run "tuned-adm profile throughput-performa

By Gordon Messmer at 07/05/2019 - 14:52

On 7/5/19 11:48 AM, Gordon Messmer wrote:

I take that back.  Disabling power-saving in the firmware probably also
disabled CPU frequency scaling, which would prevent CentOS's default
policy from scaling the frequency down to its minimum, so I wouldn't be
surprised.

Re: Performance issues/difference of two servers runnin

By Gordon Messmer at 07/04/2019 - 13:46

On 7/3/19 11:43 PM, Jobst Schmalenbach wrote:

It looks like the DIY system has a CPU that's nearly twice as fast as
the Dell's.  The additional CPU in the Dell will run more tasks
concurrently, but it won't make a single process faster.

You might also think that the SSD RAID would make the Dell faster, but
that will only be true if the process that you're testing performs a
significant amount of IO.  If your DB operations are happening mostly in
memory (that is, if the data is cached), then the faster CPU will be the
primary determining factor.

The other thing that you left out of your description is the amount of
data on each server.  If your live server has a lot of data in its DB
and the dev system has a small dataset suitable for testing, then
generally you'd expect that the dev system's data is more likely to live
in cache and avoid disk IO, and processing the smaller set will also
take less CPU time.

<a href="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1245+%40+3.30GHz&amp;id=1202" title="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1245+%40+3.30GHz&amp;id=1202">https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1245+%40+3.30GHz&amp;...</a>

<a href="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+L5630+%40+2.13GHz&amp;id=2086" title="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+L5630+%40+2.13GHz&amp;id=2086">https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+L5630+%40+2.13GHz&amp;id...</a>

Re: Performance issues/difference of two servers runnin

By Jobst Schmalenbach at 07/05/2019 - 20:52

On Thu, Jul 04, 2019 at 10:46:19AM -0700, Gordon Messmer (<a href="mailto:gordon. ... at gmail dot com">gordon. ... at gmail dot com</a>) wrote:
I made the buffer pool size on the DELL double the size of the DIY
when I started trying to figure out why the speed difference.

Most of the DB's are small as they contain websites.
The biggest DB is the Online Training DB, which are the same on both machine
as I constantly copy the data from the life server to the DIY.

Very good analysis indeed.
Makes total sense.

Re: Performance issues/difference of two servers runnin

By Roberto Ragusa at 07/06/2019 - 10:26

Could you try the same operations on COPIES of the databases, on both machines?
An original live DB can be slower than a copy, because of data structure
fragmentation, garbage collections etc. (on the filesystem, but also in tables)

Just a thought about another thing to try, since we have established that
the production hardware is indeed faster.

Regards.

Re: Performance issues/difference of two servers runnin

By Roberto Ragusa at 07/04/2019 - 03:39

On 7/4/19 8:43 AM, Jobst Schmalenbach wrote:
Try this to see how fast the CPU and kernel are (including meltdown/spectre
slowdowns):

time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000

Then try this to see how fast your disks are for DB operations:

cd /a/directory/on/the/filesystem/you/want/to/test
time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done"
rm test

Regards.

Re: Performance issues/difference of two servers runnin

By Jobst Schmalenbach at 07/05/2019 - 20:37

On Thu, Jul 04, 2019 at 09:39:18AM +0200, Roberto Ragusa (<a href="mailto: ... at robertoragusa dot it"> ... at robertoragusa dot it</a>) wrote:
Thank you for the tips.
Here are the results (DELL is faster overall):

[DIY ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000
real 0m1.931s
user 0m1.022s
sys 0m0.896s
[DELL ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000
real 0m1.308s
user 0m0.389s
sys 0m0.919s

Dell faster overall

[DIY /mnt] #>time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done"
real 1m12.944s
user 0m1.604s
sys 0m2.595s
[DELL /mnt] #>time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done"
real 0m2.270s
user 0m0.509s
sys 0m1.475s

Expected the DIY to be slower here, it's running MDADM RAID1 on Seagete Spinners compared to LSI RAID1 SSD

The result shows the DELL overall is faster, back to the drawing board after I followed all the other hints in this thread.

Jobst

Re: Performance issues/difference of two servers runnin

By Johnny Hughes v... at 07/04/2019 - 03:07

Two ideas:

a) the DELL maybe faster over all but if I'm right single core speed is
slower than on DEV machine.

b) how do the LSI/SSD perform compared to the MDADM/RAID0 on the DEV
server? I'm not sure the DELL is a clear winner here.

Regards,
Simon

Re: Performance issues/difference of two servers runnin

By Jobst Schmalenbach at 07/05/2019 - 20:40

On Thu, Jul 04, 2019 at 09:07:35AM +0200, Simon Matter via CentOS (<a href="mailto: ... at centos dot org"> ... at centos dot org</a>) wrote:
Yes, but since BOTH have "other" things to do at the same time the sheer number of CPUs of the DELL should help

See my answer to the disk task test to another email.