Lame compiles for hyperthreading?

Topic: Lame compiles for hyperthreading? (Read 12465 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Lame compiles for hyperthreading?

2004-01-23 19:50:04

I noticed the last time I ran my Lame 3.90.3 that "both processors" in the task manager did not appear to be running at 100%. Is there something that can be done in the compiler to take advantage of hyperthreading and encode mp3s even faster. Other problems could be maybe I need another compile of Lame that does this already. One final thought, maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen.

Lame compiles for hyperthreading?

Reply #1 – 2004-01-23 19:56:57

Quote

maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen

this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.

Lame compiles for hyperthreading?

Reply #2 – 2004-01-23 20:19:11

Quote

Quote
maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen

this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.

COMPLETELY WRONG.

A real-world program is not just "integer", "memory" or "floating point". Even if it is well optimized, there will be benefits from running two LAME simultaneously.

For example if LAME(1) is waiting on the RAM to perform a floating-point read, at the same time the other thread can do an integer, MMX or whatever operation which is not reading from memory.

If sven_bent cared to verify it, he would have seen that:
- lame alone => takes 'u' seconds
- two lame together => takes 'v' seconds

I expect that v < 2u, therefore hyperthreading is useful. Probably there might be ~ 15-25% gain or so.

If the cache is bigger, then hyperthreading is more useful.

PS: I'm not advocating intel cpus in particular (actually I use AMD).

Lame compiles for hyperthreading?

Reply #3 – 2004-01-23 21:22:13

Two LAME processes is the ideal solution here since encoding 2 mp3s is perfectly parallel.

Lame compiles for hyperthreading?

Reply #4 – 2004-01-24 00:07:00

Perfectly parallel ? Do you mean performance is nearly doubled?

Lame compiles for hyperthreading?

Reply #5 – 2004-01-24 05:39:16

I have a 2.8 GHz Pentium 4 with Hyperthreading. Running one LAME session, I get approximately 7.3x encoding at -aps. Running two sessions, each session runs a bit slower (approximately 4.5x each), but the net effect is that you're getting approximately 9.0x encoding across the two sessions. So it's certainly not 2x the original encoding speed, but it's nothing to sneeze at, either.

Lame compiles for hyperthreading?

Reply #6 – 2004-01-24 09:03:48

There is a possibility for Lame to become multithreaded in the future. Expected performance gain on a HT system would be about 1.2 - 1.3x.

However, this is only a possibility that currently has a major blocker: none of the Lame developers has an hyperthreading or multiprocessor computer.

Lame compiles for hyperthreading?

Reply #7 – 2004-01-24 19:41:43

To borrow something from Folding@home (for those that don't know, it's a distributed computing project that synthesizes proteins to cure disease) when you are using hyperthreading there is a setting where you set the first one to run on "Machine ID =1" and another to run on "Machine ID = 2".

As much as it was interesting to hear that running two lame sessions improves efficiency, I believe we were still doing this on 1 of the 2 "processors" that are on a hyperthreading CPU. To take this one step further, it would make sense to run 4 sessions with 2 on each CPU. So I'm wondering if we take this experiment to the max what kind of improvement are we seeing.

It is interesting that none of the developers are on a hyperthreading CPU. Some people go the Intel route, some go the AMD route (I had a shitty 486 AMD CPU once and never went back). I guess what I am not clear about is how much of programming needs to be geared for hyperthreading. It almost sounds to me like when you read the Intel compiler marketing B.S. that you just throw the new compiler at it and magical things happen. That's the marketing crap to get your money I guess. In reality, as Sven_bent was trying to say, the code itself would need to change in order to allocate the work to threads as perfect to 50-50 as possible to get maximum synergy. This number may not be 50-50 because I'm fairly certain both "CPUs" in an Intel "C" processor are not identical, I think one is more of a mini processor -- somebody more technical can answer this question.

I think this is interesting though, I mean to me the next big step in terms of speed should be focused on using hyperthreading in my opinion. I'm pretty sure these guys are focusing on quality at minimal mp3 size though.

Lame compiles for hyperthreading?

Reply #8 – 2004-01-24 19:57:34

Quote

Perfectly parallel ? Do you mean performance is nearly doubled?

With 2 CPUs yes. With HT the improvement is whatever the HT logic in Intel's CPUs can manage (as opposed to many multithreaded problems were the improvement is limited by the nature of problem).

A 20 or 30% improvement seems pretty reasonable for most encoding I'd think since encoding doesn't seem to be very cache dependant and is very CPU intensive.

Lame compiles for hyperthreading?

Reply #9 – 2004-01-24 20:04:40

Quote

Quote

Quote
maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen

this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.

COMPLETELY WRONG.

A real-world program is not just "integer", "memory" or "floating point". Even if it is well optimized, there will be benefits from running two LAME simultaneously.

For example if LAME(1) is waiting on the RAM to perform a floating-point read, at the same time the other thread can do an integer, MMX or whatever operation which is not reading from memory.

If sven_bent cared to verify it, he would have seen that:
- lame alone => takes 'u' seconds
- two lame together => takes 'v' seconds

I expect that v < 2u, therefore hyperthreading is useful. Probably there might be ~ 15-25% gain or so.

If the cache is bigger, then hyperthreading is more useful.

PS: I'm not advocating intel cpus in particular (actually I use AMD).

yes you are right i was thinkinh of two total identical threads at precise the same time.

ofcause when ypu do two different encodings, they might have several periodes where each thread utilise different ressources on the cpu.

my bad

Lame compiles for hyperthreading?

Reply #10 – 2004-01-24 20:15:45

Quote

There is a possibility for Lame to become multithreaded in the future. Expected performance gain on a HT system would be about 1.2 - 1.3x.

However, this is only a possibility that currently has a major blocker: none of the Lame developers has an hyperthreading or multiprocessor computer.

If there is actually real interest in getting LAME multithreaded and adding support for SMP/HT (and there is someone with the time to do the development), I could donate a shell account or something like that on my Dual Xeon box to one of the LAME developers.

Lame compiles for hyperthreading?

Reply #11 – 2004-01-26 03:56:06

Dibrom, it would be pretty cool to see what the speed gain would be so hopefully somebody takes you up on your offer.

I guess the question is are they running out of other things to do to take on this task!

Lame compiles for hyperthreading?

Reply #12 – 2004-01-26 22:25:08

sven_bent, actually, you might want to do a bit more research to understand exactly what is going on in a hyperthreading P4. You seem to be under the impression that a CPU is a CPU is a CPU, and it can only do one thing at a time. Well, the P4 has multiple execution units and it can indeed do more than one thing at a time. No, it isn't the same as true physically dual CPUs, that's not what I'm trying to say. But it isn't the same as a single CPU handling multiple threads either, it does actually handle more than one thing at a time in a similar fashion that dual CPUs do. It just isn't as elaborate as dual CPUs, and therefore isn't as fast. It's not letting one task do some work while the other task waits for something. More than one thing is actually happening at once.

Lame compiles for hyperthreading?

Reply #13 – 2004-01-27 00:09:17

Good article on Hyper Threading:

http://anandtech.com/cpu/showdoc.html?i=1576

Lame compiles for hyperthreading?

Reply #14 – 2004-01-30 05:04:10

So it appears that it is reliant on the developing making good use of threads in their programming.

I'm not a C++ guy so I'm not positive that I understood the slideshow by Intel but I think that is what I read!

Lame compiles for hyperthreading?

Reply #15 – 2004-01-30 07:21:13

yes, all that is required is having more than one thread doing work at the same time. And I believe there may be certain ways of optimizing to better take advantage (rather, to better work with this somewhat sub-par "dual" processor setup) of hyperthreading and how it works in the P4. Two true processors can obviously chew through two threads better than a hyperthreading P4, but you can still alter/optimize your code to better utilize what the hyperthreading P4 can offer.

Lame compiles for hyperthreading?

Reply #16 – 2004-02-04 05:14:30

Multi-threaded LAME would be cool.

Lame compiles for hyperthreading?

Reply #17 – 2004-03-26 03:40:04

Was any progress made with a multi-threading LAME compile?

Lame compiles for hyperthreading?

Reply #18 – 2004-03-26 04:36:41

I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Obviously though you'd have to sit down and try it to be sure.

Lame compiles for hyperthreading?

Reply #19 – 2004-03-26 12:17:11

Quote

I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.

Lame compiles for hyperthreading?

Reply #20 – 2004-03-26 12:51:11

Quote

Quote
I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.

Which one? TOS #8 is about sound quality related statements. Besides, he said "I doubt", "almost certainly" and "you'd have to sit down and try it to be sure", so this post seems to critically question things rather than making statements.

If you want to know what these questions/assumptions/ideas are based on, you simply might want to ask.

Lame compiles for hyperthreading?

Reply #21 – 2004-03-26 13:24:01

Quote

Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.

Please ABX your opinion.

Lame compiles for hyperthreading?

Reply #22 – 2004-03-26 13:55:59

Quote

I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Actually, I'd guess that a carefully HT optimized version of LAME would be slightly faster than running two copies of LAME at once -- on an HT CPU. However, I don't think that an SMP optimized version of LAME would be appreciably faster than running two copies of LAME at once on an SMP system.

This is all just shooting from the hip though, so to speak.

Lame compiles for hyperthreading?

Reply #23 – 2004-03-26 20:01:01

I think that running 2 instances of LAME would run both of them on the same "CPU" as opposed to spreading it across the "2 CPUs".

I am saying CPU in quotes because it is kind of 1 CPU, but the way resources are set up it's faked to be 2 CPUs. Because the 2 CPUs share some resources you don't see exactly twice the speed. If 1 or 2 instances of LAME running on 1 "CPU" bottlenecks the areas where resources are shared then using the 2nd "CPU" won't gain too much.

_Shorty had mentioned that it would make a difference earlier in the post here. I think to put a number to it we need somebody with 2 skills. 1. Somebody that has programmed with multithreading in mind and 2. Somebody that can speak to the extent of resource bottlenecks between "CPUs" to determine the gain from multithreading specifically for mp3 encoding.

If anybody has connections to talk the LAME developers about this, I'd be interested to hear the feedback.

Lame compiles for hyperthreading?

Reply #24 – 2004-03-26 22:00:55

Quote

I think that running 2 instances of LAME would run both of them on the same "CPU" as opposed to spreading it across the "2 CPUs".

Not on any version of NT. This will, happen on 9x though (because it does not support HT).

What HT does is allow two threads at once. If you run two instances of LAME, you have two threads, the scheduler sees two CPUs (NT <= 5.0) or one CPU with a two thread capacity (NT > 5.0) and assigns both threads. So in theory the ideal way to do this is to run two complete copies of LAME since that would max out the CPU's available execution resources to the fullest (at least the fullest possible with a given program). A single instance of LAME that was HT aware would probably be slower simply becuase it adds the additional complexity of trying to run the encoder in parallel with itself - that is figure out a way to have two threads doing work productively on one song. I don't doubt that its possible, just I think its probably less effcient then simply working on two independent files at once (assuming LAME is running out of cache or anything like that when running two instances).

However the reason I said probably was because 1) i haven't tried it, and 2) Intel's added some new stuff to Prescott that allows apps to use HT more intelligently then in Northwood. I don't really know what this does or if it could help LAME since I don't do this type of programming (actually I do very little real programming). So if someone wants to try it, I'm all for it since I'd like to see what happens. But I'd be plesantly surprised if they were able to make one instance run faster then two instances.

Notice