HydrogenAudio

Lossy Audio Compression => MP3 => MP3 - General => Topic started by: postul8or on 2004-01-23 19:50:04

Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-01-23 19:50:04
I noticed the last time I ran my Lame 3.90.3 that "both processors" in the task manager did not appear to be running at 100%.  Is there something that can be done in the compiler to take advantage of hyperthreading and encode mp3s even faster.  Other problems could be maybe I need another compile of Lame that does this already.  One final thought, maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen.
Title: Lame compiles for hyperthreading?
Post by: sven_Bent on 2004-01-23 19:56:57
Quote
maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen


this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.
Title: Lame compiles for hyperthreading?
Post by: NumLOCK on 2004-01-23 20:19:11
Quote
Quote
maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen


this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.

COMPLETELY WRONG.

A real-world program is not just "integer", "memory" or "floating point". Even if it is well optimized, there will be benefits from running two LAME simultaneously.

For example if LAME(1) is waiting on the RAM to perform a floating-point read, at the same time the other thread can do an integer, MMX or whatever operation which is not reading from memory.

If sven_bent cared to verify it, he would have seen that:
- lame alone => takes 'u' seconds
- two lame together => takes 'v' seconds

I expect that v < 2u, therefore hyperthreading is useful. Probably there might be ~ 15-25% gain or so.

If the cache is bigger, then hyperthreading is more useful.

PS: I'm not advocating intel cpus in particular (actually I use AMD).
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-01-23 21:22:13
Two LAME processes is the ideal solution here since encoding 2 mp3s is perfectly parallel.
Title: Lame compiles for hyperthreading?
Post by: NumLOCK on 2004-01-24 00:07:00
Perfectly parallel ? Do you mean performance is nearly doubled? 
Title: Lame compiles for hyperthreading?
Post by: jhwpbm on 2004-01-24 05:39:16
I have a 2.8 GHz Pentium 4 with Hyperthreading.  Running one LAME session, I get approximately 7.3x encoding at -aps.  Running two sessions, each session runs a bit slower (approximately 4.5x each), but the net effect is that you're getting approximately 9.0x encoding across the two sessions.  So it's certainly not 2x the original encoding speed, but it's nothing to sneeze at, either.
Title: Lame compiles for hyperthreading?
Post by: Gabriel on 2004-01-24 09:03:48
There is a possibility for Lame to become multithreaded in the future. Expected performance gain on a HT system would be about 1.2 - 1.3x.

However, this is only a possibility that currently has a major blocker: none of the Lame developers has an hyperthreading or multiprocessor computer.
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-01-24 19:41:43
To borrow something from Folding@home (for those that don't know, it's a distributed computing project that synthesizes proteins to cure disease) when you are using hyperthreading there is a setting where you set the first one to run on "Machine ID =1" and another to run on "Machine ID = 2".

As much as it was interesting to hear that running two lame sessions improves efficiency, I believe we were still doing this on 1 of the 2 "processors" that are on a hyperthreading CPU.  To take this one step further, it would make sense to run 4 sessions with 2 on each CPU.  So I'm wondering if we take this experiment to the max what kind of improvement are we seeing.

It is interesting that none of the developers are on a hyperthreading CPU.  Some people go the Intel route, some go the AMD route (I had a shitty 486 AMD CPU once and never went back).  I guess what I am not clear about is how much of programming needs to be geared for hyperthreading.  It almost sounds to me like when you read the Intel compiler marketing B.S. that you just throw the new compiler at it and magical things happen.  That's the marketing crap to get your money I guess.  In reality, as Sven_bent was trying to say, the code itself would need to change in order to allocate the work to threads as perfect to 50-50 as possible to get maximum synergy.  This number may not be 50-50 because I'm fairly certain both "CPUs" in an Intel "C" processor are not identical, I think one is more of a mini processor -- somebody more technical can answer this question.

I think this is interesting though, I mean to me the next big step in terms of speed should be focused on using hyperthreading in my opinion.  I'm pretty sure these guys are focusing on quality at minimal mp3 size though.
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-01-24 19:57:34
Quote
Perfectly parallel ? Do you mean performance is nearly doubled? 

With 2 CPUs yes.  With HT the improvement is whatever the HT logic in Intel's CPUs can manage (as opposed to many multithreaded problems were the improvement is limited by the nature of problem). 

A 20 or 30% improvement seems pretty reasonable for most encoding I'd think since encoding doesn't seem to be very cache dependant and is very CPU intensive.
Title: Lame compiles for hyperthreading?
Post by: sven_Bent on 2004-01-24 20:04:40
Quote
Quote
Quote
maybe to take advantage of hyperthreading I'd have to run 2 sessions of LAME to make it happen


this will benefit you nothing at alle

by hyperthreading you dont have dobuel the ressroce...you just can shsre the whoe lcpu over more then one thread

part a doing someting for thread X
part b interger unit doing something for thread y

if you run to processor the utilise the same ressource on the cpu..there will be no benefits as you still only have the same part to run the thread at.

COMPLETELY WRONG.

A real-world program is not just "integer", "memory" or "floating point". Even if it is well optimized, there will be benefits from running two LAME simultaneously.

For example if LAME(1) is waiting on the RAM to perform a floating-point read, at the same time the other thread can do an integer, MMX or whatever operation which is not reading from memory.

If sven_bent cared to verify it, he would have seen that:
- lame alone => takes 'u' seconds
- two lame together => takes 'v' seconds

I expect that v < 2u, therefore hyperthreading is useful. Probably there might be ~ 15-25% gain or so.

If the cache is bigger, then hyperthreading is more useful.

PS: I'm not advocating intel cpus in particular (actually I use AMD).

yes you are right i was thinkinh of two total identical threads at precise the same time.

ofcause when ypu do two different encodings, they might have several periodes where each thread utilise different ressources on the cpu.

my bad
Title: Lame compiles for hyperthreading?
Post by: Dibrom on 2004-01-24 20:15:45
Quote
There is a possibility for Lame to become multithreaded in the future. Expected performance gain on a HT system would be about 1.2 - 1.3x.

However, this is only a possibility that currently has a major blocker: none of the Lame developers has an hyperthreading or multiprocessor computer.

If there is actually real interest in getting LAME multithreaded and adding support for SMP/HT (and there is someone with the time to do the development), I could donate a shell account or something like that on my Dual Xeon box to one of the LAME developers.
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-01-26 03:56:06
Dibrom, it would be pretty cool to see what the speed gain would be so hopefully somebody takes you up on your offer.

I guess the question is are they running out of other things to do to take on this task!
Title: Lame compiles for hyperthreading?
Post by: _Shorty on 2004-01-26 22:25:08
sven_bent, actually, you might want to do a bit more research to understand exactly what is going on in a hyperthreading P4. You seem to be under the impression that a CPU is a CPU is a CPU, and it can only do one thing at a time. Well, the P4 has multiple execution units and it can indeed do more than one thing at a time. No, it isn't the same as true physically dual CPUs, that's not what I'm trying to say. But it isn't the same as a single CPU handling multiple threads either, it does actually handle more than one thing at a time in a similar fashion that dual CPUs do. It just isn't as elaborate as dual CPUs, and therefore isn't as fast. It's not letting one task do some work while the other task waits for something. More than one thing is actually happening at once.
Title: Lame compiles for hyperthreading?
Post by: Jens Rex on 2004-01-27 00:09:17
Good article on Hyper Threading:

http://anandtech.com/cpu/showdoc.html?i=1576 (http://anandtech.com/cpu/showdoc.html?i=1576)
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-01-30 05:04:10
So it appears that it is reliant on the developing making good use of threads in their programming.

I'm not a C++ guy so I'm not positive that I understood the slideshow by Intel but I think that is what I read!
Title: Lame compiles for hyperthreading?
Post by: _Shorty on 2004-01-30 07:21:13
yes, all that is required is having more than one thread doing work at the same time. And I believe there may be certain ways of optimizing to better take advantage (rather, to better work with this somewhat sub-par "dual" processor setup) of hyperthreading and how it works in the P4. Two true processors can obviously chew through two threads better than a hyperthreading P4, but you can still alter/optimize your code to better utilize what the hyperthreading P4 can offer.
Title: Lame compiles for hyperthreading?
Post by: JohnMK on 2004-02-04 05:14:30
Multi-threaded LAME would be cool. 
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-03-26 03:40:04
Was any progress made with a multi-threading LAME compile?
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-03-26 04:36:41
I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Obviously though you'd have to sit down and try it to be sure.
Title: Lame compiles for hyperthreading?
Post by: askoff on 2004-03-26 12:17:11
Quote
I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.
Title: Lame compiles for hyperthreading?
Post by: tigre on 2004-03-26 12:51:11
Quote
Quote
I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.

Which one? TOS #8 is about sound quality related statements. Besides,  he said "I doubt", "almost certainly" and "you'd have to sit down and try it to be sure", so this post seems to critically question things rather than making statements.

If you want to know what these questions/assumptions/ideas are based on, you simply might want to ask.
Title: Lame compiles for hyperthreading?
Post by: cheerow on 2004-03-26 13:24:01
Quote
Doesn't this already brake some rule of this forum? Why do you make such arguments, if you can't know for sure.

Please ABX your opinion.
Title: Lame compiles for hyperthreading?
Post by: Ardax on 2004-03-26 13:55:59
Quote
I doubt theres any real interest since a HT optimized LAME would almost certainly be slower then simply running two threads at once (which has been supported in EAC for years).

Actually, I'd guess that a carefully HT optimized version of LAME would be slightly faster than running two copies of LAME at once -- on an HT CPU.  However, I don't think that an SMP optimized version of LAME would be appreciably faster than running two copies of LAME at once on an SMP system.

This is all just shooting from the hip though, so to speak.
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-03-26 20:01:01
I think that running 2 instances of LAME would run both of them on the same "CPU" as opposed to spreading it across the "2 CPUs".

I am saying CPU in quotes because it is kind of 1 CPU, but the way resources are set up it's faked to be 2 CPUs.  Because the 2 CPUs share some resources you don't see exactly twice the speed.  If 1 or 2 instances of LAME running on 1 "CPU" bottlenecks the areas where resources are shared then using the 2nd "CPU" won't gain too much.

_Shorty had mentioned that it would make a difference earlier in the post here.  I think to put a number to it we need somebody with 2 skills.  1.  Somebody that has programmed with multithreading in mind and 2.  Somebody that can speak to the extent of resource bottlenecks between "CPUs" to determine the gain from multithreading specifically for mp3 encoding.

If anybody has connections to talk the LAME developers about this, I'd be interested to hear the feedback.
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-03-26 22:00:55
Quote
I think that running 2 instances of LAME would run both of them on the same "CPU" as opposed to spreading it across the "2 CPUs".


Not on any version of NT.  This will, happen on 9x though (because it does not support HT).

What HT does is allow two threads at once.  If you run two instances of LAME, you have two threads, the scheduler sees two CPUs (NT <= 5.0) or one CPU with a two thread capacity (NT > 5.0) and assigns both threads.  So in theory the ideal way to do this is to run two complete copies of LAME since that would max out the CPU's available execution resources to the fullest (at least the fullest possible with a given program).  A single instance of LAME that was HT aware would probably be slower simply becuase it adds the additional complexity of trying to run the encoder in parallel with itself - that is figure out a way to have two threads doing work productively on one song.  I don't doubt that its possible, just I think its probably less effcient then simply working on two independent files at once (assuming LAME is running out of cache or anything like that when running two instances).

However the reason I said probably was because 1) i haven't tried it, and 2) Intel's added some new stuff to Prescott that allows apps to use HT more intelligently then in Northwood.  I don't really know what this does or if it could help LAME since I don't do this type of programming (actually I do very little real programming).  So if someone wants to try it, I'm all for it since I'd like to see what happens.  But I'd be plesantly surprised if they were able to make one instance run faster then two instances.
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-03-30 21:10:23
To clarify, yes I'm thinking of a HT enabled OS such as WinXP using a CPU with hyperthreading enabled.  If the program couldn't auto detect the CPU type and OS to determine if hypertheading code should be enabled then the user would need to set that in the .ini file etc.

My understanding of hyperthreading is that it's effectiveness is all about how you do the coding for it.  If you can pawn off processes (eg. mathematical calculations to make a VBR mp3) into different threads and bring the results back into one place without much overhead then I don't see why one song couldn't use both CPUs.  Maybe the effort to code this is very high and the probability of adding bugs is too great.

If it's too hard to hyperthread within one track then sure it make sense to begin encoding track 1 on CPU#1 and encoding track 2 on CPU#2.  If you could encode both songs on CPU#1 simulanteously and the speed is no different than assigning one to CPU#1 and one to CPU#2 I have to question why Intel ever bothered with hyperthreading.  Unless hyperthreading is a useless gimmick, by definition should something that is coded to use hyperthreading tend to be faster (yes there is overhead in synchronizing results between 2 CPUs, but well made programs will optimize the process).

I think we are all making decent theories about the effect hyperthreading may have.  What we really need are some programmers that have used hyperthreading before to vouch for it's effect.  The effect in mp3 encoding vs. other processes may be completely different so we may not really know how useful it is until the product is finished, or at least well prototyped in a sample encoder.
Title: Lame compiles for hyperthreading?
Post by: maikmerten on 2004-03-30 21:19:59
Quote
(I had a shitty 486 AMD CPU once and never went back)

Off topic:

AMD 486 processors were absolutely identical Intel clones. The only company having its own 486-design was Cyrix with their Cx486. It´s usually safe to blame the mainboard for problems with AMD 486 processors.
Title: Lame compiles for hyperthreading?
Post by: Gabriel on 2004-03-30 21:56:48
Quote
The effect in mp3 encoding vs. other processes may be completely different so we may not really know how useful it is until the product is finished, or at least well prototyped in a sample encoder.

You can expect a 20-25% increase (at least it is the results of Shine HT)
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-04-01 21:00:00
Hmmm, if NT is able to allocate things to threads by itself then why with the Folding@home application do you have to assign each one a "machineID".  Everything I have read has said to make the hyperthreading "work" you assign different machine IDs to 2 instances of Folding@home and let them both run at the same time.

My contention is that whatever significance this "Machine ID" has to Folding@home, it's the same thing we need in LAME encoding.  If this machine ID can be autodetected by the OS then then LAME should use hyperthreading, and setting MachineID's on Folding@home is a waste.
Title: Lame compiles for hyperthreading?
Post by: plonk420 on 2004-04-03 14:43:23
Quote
Quote
(I had a shitty 486 AMD CPU once and never went back)

Off topic:

AMD 486 processors were absolutely identical Intel clones. The only company having its own 486-design was Cyrix with their Cx486. It´s usually safe to blame the mainboard for problems with AMD 486 processors.

still offtopic: i pitty postul8or as xvid has horrid (or at least 10-20+% worse) performance running on a P4 compared to an Athy XP  .. it's one of the few reasons i'm not completely dissatisfied with the XP's often inferior multimedia app performance...

(i wish i could find the page that had the exact numbers, but i seem to have lost it...  )
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-04-04 20:21:17
The last thing I want to do is get into an AMD vs. Intel battle, especally in a post where I'm trying to get insight into hyperthreading.  I'm too anal to avoid it I guess.

The 486 AMD ran a lot hotter and had a higher defect rate than Intel 486.  In terms of being the same chip, I have no idea.  It was too long ago, and was the first computer I ever owned so I can't claim to be an expert on it.  I believe it was the same design, and that was the reason that Intel called their next chip by the name Pentium, because they could copyright a name "Pentium", but not a number "486".  Same design and same fabrication are different things.

The last time I saw an AMD with my own 2 eyes was an AMD K6-2, I believe 450 Mhz.  At the time I recall benchmarks and AMD lovers saying this machine was better than Intel etc.  Since I had a P3 450 Mhz myself the comparison was easily made...the AMD was crap in comparison.  At the time AMD was overhyped.  Since they burned a bridge with me to start with I wasn't going to buy one personally, and I've never really been on one of their newer machines first hand.  I am coming around though, my friend wants me to put together a cheap machine and a 2400 or 2500+ XP is probably the direction I go.

I still debate when people say that AMD is faster than Intel.  I think in 90% of cases the top end Intel beats out AMD.  For the price AMD is usually a better pick.
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-04-04 21:02:42
No one ever claimed the K6-2 was better then the P2/3 overall.  The only thing it was better at was raw integer and single precision vector code (P2 had only MMX).  It wasn't until the Athlon days that AMD finally managed to match or exceed the P6 core overall, and even then it was seldom clear cut until the end of the P6 days . . .

Quote
I think in 90% of cases the top end Intel beats out AMD. For the price AMD is usually a better pick.


Depends.  Intel can't really keep up in games lately thanks to leakage problems at 90nm, but Prescott seems to really love encoding media so thats one area AMD won't catch up anytime soon.  Though this isn't really fair at the moment since Intel seems to be having some fabrication issues, while AMD does not.

Quote
Hmmm, if NT is able to allocate things to threads by itself


It is.  Or do you need to manually schedule each of the 20 threads winamp spawns everytime you run it?

Thats the whole point of having a task scheduler.

Quote
why with the Folding@home application do you have to assign each one a "machineID". Everything I have read has said to make the hyperthreading "work" you assign different machine IDs to 2 instances of Folding@home and let them both run at the same time.


Perhaps the distributed network expect that only one client will be run per machine?  Anyway I doubt if you fail to do that the machine will BSOD instantly or flat out refuse to run, though I've never run folding.

Do you need to set this seperately if you run dual CPUs as well?
Title: Lame compiles for hyperthreading?
Post by: PowerPigg on 2004-04-04 23:13:11
<quick jump to OT world>

AMD started out competing with supercharged versions of Intel's previous-generation offerings (as anyone remember from the 386SX/DX battles, including AMD's 40 MHz 386DX clone beating out Intel's 486SX/25 in many benchmarks).  It tried, without much success at first, to move up to competing head to head.  Its first near success to competing head-to-head faltered due to slow support of the 3DNow instruction set and larger cache sizes they introduced with K6's versus Intel's better floating point architecture (which is what turned out to be the real boon for the gamers and enthusiasts AMD was trying to attract).  AMD's ability to learn from its mistakes came through with the first Athlons.  This time around, they chose to design chips with higher efficiency than Intel, even when it meant dropping behind in the Megahertz race later on, without skimping on core functionality.  That was the beginning of strong competition for Intel and good days for us.  Nowadays both AMD and Intel have strong product lines, although Intel's newest Prescott line has problems that are going to keep it from competing effectively with AMD64 for a while.

<returning several gold coins richer>

As it stands, there's no reason for LAME to internalize multithreading.  As it stands, allowing two instances to run concurrently allow it to gain benefits from either HT or SMP systems.  Just let the OS do that kind of juggle work, that's what it's there to do.  Most programs that make use of the LAME engine allow you to run multiple instances easily anyway.

Also, you're best off running one more instance than the number of total simultaneous threads that can be processed.  So with an HT CPU or two CPU in SMP more, three instances are best, simply because otherwise you have short periods of inactivity in one thread processor while you close one instance of LAME and open the next one.
Title: Lame compiles for hyperthreading?
Post by: postul8or on 2004-04-09 04:27:52
From what I'm hearing the benefits of hyperthreading will be achieved simply by running multiple encoder sessions at once.  The point of a since LAME encoder session doing the multithreading is somewhat pointless because it can already be done just by opening other sessions.  I'd be interested to see what hyperthreading can do.

I need to take another approach here, something that is more quantitative.

Here's a test I could run. 

1.  I'll rip a track to a wav file (eg. 01.wav)
2.  I'll duplicate it 5 times so that I have 6 tracks in total that are the same length.  (01, 02, 03, 04, 05, 06)
3.  I'll get 3 encoder sessions going, and each session will do 2 songs 01,02 & 03,04 & 05,06
(The reason I'm duplicating the same track is so that each encoder session will finish at about the same time although maybe this assumption won't work out with hyperthreading).
4.  I'll do the test with hyperthreading enabled and then with it disabled (ie. turn the BIOS setting on and off).
5.  Tally up the total time taken by the 3 encoders for each method...

Result:  How much of a difference does hyperthreading make....

Good test?
Title: Lame compiles for hyperthreading?
Post by: p0wder on 2004-04-09 09:30:22
Quote
Depends.  Intel can't really keep up in games lately thanks to leakage problems at 90nm, but Prescott seems to really love encoding media so thats one area AMD won't catch up anytime soon.  Though this isn't really fair at the moment since Intel seems to be having some fabrication issues, while AMD does not.

AMD's gaming advantage can be offset by a high-end video card.  A computer with a high-end video card will fly in games regardless of the CPU.  I'd like to see AMD start pumping out 90nm chips without any problems.  My old CPU was an AMD Thunderbird that ran superhot but nobody seemed to mind.  Now that Intel is having temp problems with the Prescott people are screaming bloody freakin murder.
Title: Lame compiles for hyperthreading?
Post by: saratoga on 2004-04-10 00:40:58
Quote
Quote
Depends.  Intel can't really keep up in games lately thanks to leakage problems at 90nm, but Prescott seems to really love encoding media so thats one area AMD won't catch up anytime soon.  Though this isn't really fair at the moment since Intel seems to be having some fabrication issues, while AMD does not.

AMD's gaming advantage can be offset by a high-end video card.  A computer with a high-end video card will fly in games regardless of the CPU.  I'd like to see AMD start pumping out 90nm chips without any problems.  My old CPU was an AMD Thunderbird that ran superhot but nobody seemed to mind.  Now that Intel is having temp problems with the Prescott people are screaming bloody freakin murder.

Well since the TDP of the coolest Prescott is roughly double that of the hottest Thunderbird, I can see why some people would be concerned that its hot.  Though no one is screaming about it, at least no one except stupid fanboys.  I think people are just dissipointed that Prescott wasn't nearly as extraordinary an advance as Northwood was.

Quote
I'll get 3 encoder sessions going, and each session will do 2 songs 01,02 & 03,04 & 05,06
(The reason I'm duplicating the same track is so that each encoder session will finish at about the same time although maybe this assumption won't work out with hyperthreading).


Its even easier.  Run one session of LAME to encode a few tracks.  Then encode the same tracks, this time using two sessons of lame.  Compare the run times (and hope that both sessions finish at about the same time on the HT test so that you aren't single threading for very long).  You'll be off by a few seconds probably, but if you encode for 5 or 10 minutes it shouldn't be too big a deal.