[avr-gcc-list] AVR byte swap optimization

Discussion:

[avr-gcc-list] AVR byte swap optimization

Shaun Jackman

19 years ago

The following macro expands to some rather frightful code on the AVR:

#define BSWAP_16(x) \
((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))

uint16_t bswap_16(uint16_t x)
{
0: 9c 01 movw r18, r24
2: 89 2f mov r24, r25
4: 99 27 eor r25, r25
6: 32 2f mov r19, r18
8: 22 27 eor r18, r18
return BSWAP_16(x);
}
a: 82 2b or r24, r18
c: 93 2b or r25, r19
e: 08 95 ret

Ideally, this macro would expand to three mov instructions and a ret.
Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
with -O2.

I won't bother to show bswap_32 here, which produces a real disaster!
Think 47 instructions, for what should be 6.

Cheers,
Shaun

$ avr-gcc --version |head -1
avr-gcc (GCC) 4.1.0

Eric Weddington

19 years ago

You could always equate the macro to some inline assembly like what is done
for a number of avr-libc macros.

Eric Weddington

> -----Original Message-----
> From:
> avr-gcc-list-bounces+eweddington=***@nongnu.org
> [mailto:avr-gcc-list-bounces+eweddington=***@nongnu.
> org] On Behalf Of Shaun Jackman
> Sent: Friday, November 17, 2006 4:31 PM
> To: avr-gcc-***@nongnu.org; ***@gcc.gnu.org
> Subject: [avr-gcc-list] AVR byte swap optimization
>
> The following macro expands to some rather frightful code on the AVR:
>
> #define BSWAP_16(x) \
> ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
>
> uint16_t bswap_16(uint16_t x)
> {
> 0: 9c 01 movw r18, r24
> 2: 89 2f mov r24, r25
> 4: 99 27 eor r25, r25
> 6: 32 2f mov r19, r18
> 8: 22 27 eor r18, r18
> return BSWAP_16(x);
> }
> a: 82 2b or r24, r18
> c: 93 2b or r25, r19
> e: 08 95 ret
>
> Ideally, this macro would expand to three mov instructions and a ret.
> Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
> with -O2.
>
> I won't bother to show bswap_32 here, which produces a real disaster!
> Think 47 instructions, for what should be 6.
>
> Cheers,
> Shaun
>
> $ avr-gcc --version |head -1
> avr-gcc (GCC) 4.1.0
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> AVR-GCC-***@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>

Larry Barello

19 years ago

Although a bit wordy, the following produces good code under a lot of
different circumstances. It allows the compiler to shove things around &
optimize.

static inline uint16_t ByteSwap(uint16_t data)
{
union byteswap
{
uint16_t word;
struct
{
uint8_t low;
uint8_t high;
};
} foo;
foo.word = data;

uint8_t t = foo.high;
foo.high = foo.low;
foo.low = t;
return foo.word;
}

| -----Original Message-----
| From: avr-gcc-list-bounces+larry=***@nongnu.org [mailto:avr-gcc-
| list-bounces+larry=***@nongnu.org] On Behalf Of Shaun Jackman
| Sent: Friday, November 17, 2006 3:31 PM
| To: avr-gcc-***@nongnu.org; ***@gcc.gnu.org
| Subject: [avr-gcc-list] AVR byte swap optimization
|
| The following macro expands to some rather frightful code on the AVR:
|
| #define BSWAP_16(x) \
| ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
|
| uint16_t bswap_16(uint16_t x)
| {
| 0: 9c 01 movw r18, r24
| 2: 89 2f mov r24, r25
| 4: 99 27 eor r25, r25
| 6: 32 2f mov r19, r18
| 8: 22 27 eor r18, r18
| return BSWAP_16(x);
| }
| a: 82 2b or r24, r18
| c: 93 2b or r25, r19
| e: 08 95 ret
|
| Ideally, this macro would expand to three mov instructions and a ret.
| Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
| with -O2.
|
| I won't bother to show bswap_32 here, which produces a real disaster!
| Think 47 instructions, for what should be 6.
|
| Cheers,
| Shaun
|
| $ avr-gcc --version |head -1
| avr-gcc (GCC) 4.1.0
|
|
| _______________________________________________
| AVR-GCC-list mailing list
| AVR-GCC-***@nongnu.org
| http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Rask Ingemann Lambertsen

19 years ago

On Fri, Nov 17, 2006 at 04:30:53PM -0700, Shaun Jackman wrote:
> The following macro expands to some rather frightful code on the AVR:
>
> #define BSWAP_16(x) \
> ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
[snip]
>
> Ideally, this macro would expand to three mov instructions and a ret.
> Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
> with -O2.

Sure. Add a HImode rotate left by eight instruction to the AVR backend.
For example (untested, and I know very little about the AVR):

(define_insn_and_split "*rotlhi3_const8"
[(set (match_operand:HI 0 "register_operand" "=r")
(rotate:HI (match_operand:HI 1 "register_operand" "0")
(const_int 8)))]
"reload_completed"
[(set (match_dup 2) (xor:QI (match_dup 2) (match_dup 3)))
(set (match_dup 3) (xor:QI (match_dup 3) (match_dup 2)))
(set (match_dup 2) (xor:QI (match_dup 2) (match_dup 3)))]
{
operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 0);
operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1);
}

If that isn't enough, try adding an expander also:

(define_expand "rotlhi3"
[(set (match_operand:HI 0 "register_operand")
(rotate:HI (match_operand:HI 1 "register_operand")
(match_operand:HI 2 "const_int_operand")))]
""
{
if (INTVAL (operands[2]) != 8)
FAIL;
})

--
Rask Ingemann Lambertsen

Paul Brook

19 years ago

> Ideally, this macro would expand to three mov instructions and a ret.
> Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
> with -O2.
>
> I won't bother to show bswap_32 here, which produces a real disaster!
> Think 47 instructions, for what should be 6.

Use gcc head, __builtin_bswap and make sure the AVR backend implements the
bswap rtl patterns.

Future versions of gcc may also be able to recognise these idioms without
using the builtin, but AFAIK that's not been implemented yet.

Paul

Eric Weddington

19 years ago

> -----Original Message-----
> From:
> avr-gcc-list-bounces+eweddington=***@nongnu.org
> [mailto:avr-gcc-list-bounces+eweddington=***@nongnu.
> org] On Behalf Of Paul Brook
> Sent: Saturday, November 18, 2006 9:46 AM
> To: ***@gcc.gnu.org; Shaun Jackman
> Cc: avr-gcc-***@nongnu.org
> Subject: [avr-gcc-list] Re: AVR byte swap optimization
>
> > Ideally, this macro would expand to three mov instructions
> and a ret.
> > Is there anything I can do to help GCC along here? I'm
> using GCC 4.1.0
> > with -O2.
> >
> > I won't bother to show bswap_32 here, which produces a real
> disaster!
> > Think 47 instructions, for what should be 6.
>
> Use gcc head, __builtin_bswap and make sure the AVR backend
> implements the
> bswap rtl patterns.

There's the problem. You can't just glibly say "make sure the AVR backend
implements the bswap rtl patterns". There are precious few volunteers who
are familiar enough with gcc internals and the avr port in particular to go
do just that. AFAIK, there is no bswap rtl pattern in the avr port, at least
there doesn't seem to be in 4.1.1.

> Future versions of gcc may also be able to recognise these
> idioms without
> using the builtin, but AFAIK that's not been implemented yet.

Plus there is a long lead time between when it is implemented on HEAD, then
branched, released from a branch, and then when it shows up in binary
distributions.

Eric Weddington

Steven Bosscher

19 years ago

On 11/19/06, Eric Weddington <***@cso.atmel.com> wrote:
> > Use gcc head, __builtin_bswap and make sure the AVR backend
> > implements the
> > bswap rtl patterns.
>
> There's the problem. You can't just glibly say "make sure the AVR backend
> implements the bswap rtl patterns". There are precious few volunteers who
> are familiar enough with gcc internals and the avr port in particular to go
> do just that. AFAIK, there is no bswap rtl pattern in the avr port, at least
> there doesn't seem to be in 4.1.1.

Why is that a problem?
Do you have a different solution in mind?

> > Future versions of gcc may also be able to recognise these
> > idioms without
> > using the builtin, but AFAIK that's not been implemented yet.
>
> Plus there is a long lead time between when it is implemented on HEAD, then
> branched, released from a branch, and then when it shows up in binary
> distributions.

That happens with all improvements that are implemented between
releases, so I don't see your point.

Gr.
Steven

Eric Weddington

19 years ago

> -----Original Message-----
> From: Steven Bosscher [mailto:***@gmail.com]
> Sent: Sunday, November 19, 2006 3:55 AM
> To: Eric Weddington
> Cc: Paul Brook; ***@gcc.gnu.org; Shaun Jackman;
> avr-gcc-***@nongnu.org
> Subject: Re: [avr-gcc-list] Re: AVR byte swap optimization
>
> On 11/19/06, Eric Weddington <***@cso.atmel.com> wrote:
> > > Use gcc head, __builtin_bswap and make sure the AVR backend
> > > implements the
> > > bswap rtl patterns.
> >
> > There's the problem. You can't just glibly say "make sure
> the AVR backend
> > implements the bswap rtl patterns". There are precious few
> volunteers who
> > are familiar enough with gcc internals and the avr port in
> particular to go
> > do just that. AFAIK, there is no bswap rtl pattern in the
> avr port, at least
> > there doesn't seem to be in 4.1.1.
>
> Why is that a problem?
> Do you have a different solution in mind?

>
> > > Future versions of gcc may also be able to recognise these
> > > idioms without
> > > using the builtin, but AFAIK that's not been implemented yet.
> >
> > Plus there is a long lead time between when it is
> implemented on HEAD, then
> > branched, released from a branch, and then when it shows up
> in binary
> > distributions.
>
> That happens with all improvements that are implemented between
> releases, so I don't see your point.

The original message went to the avr-gcc-list mailing list. I wasn't aware
that it also went to the gcc mailing list. There are different sets of
assumptions based on the audience.

A lot depends on where the OP is coming from, in wanting help:
Does the OP want a pre-built toolchain?
Does the OP build the toolchain from source?
Is the OP familiar with even patching the toolchain?

There are a number of people on the avr-gcc-list that when they say they
want help, they want the final, pre-built toolchain from a distributor to
fix their problem. They don't want to build the toolchain from source, or
they don't even know how to. Yes there are also people who can and will
build the toolchain from source but they are a small minority.

The point that I was making was to the original OP: yes, it would be great
to get it fixed, permanently. Historically there has never been enough
volunteers with the knowledge, capability, and the time on the avr port.

The audience on the gcc list is a completely different sort with different
assumptions. My apologies for annoying the people on the gcc list.

And yes I do have a different solution in mind:
Shaun, could you open a GCC bug report on this issue? Rask, could you then
post your implementation to that GCC bug report as a patch? Steven and Paul,
could you eyeball Rask's implementation and see if it is reasonable
impelementation? Rask, do you have FSF paperwork in place? If not, then
could somebody else, with FSF assignments in place, create a patch?

After all, you guys have a process to follow, right? ;-)

Eric Weddington

'Rask Ingemann Lambertsen'

19 years ago

On Sun, Nov 19, 2006 at 08:31:22AM -0700, Eric Weddington wrote:

> Rask, do you have FSF paperwork in place?

Yes, it was completed recently.

--
Rask Ingemann Lambertsen

Rask Ingemann Lambertsen

19 years ago

On Fri, Nov 17, 2006 at 04:30:53PM -0700, Shaun Jackman wrote:
> Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
> with -O2.
>
> I won't bother to show bswap_32 here, which produces a real disaster!
> Think 47 instructions, for what should be 6.

You may get better code if you write it something like this:

#include <string.h>

uint32_t bswap_32 (uint32_t x)
{
unsigned char c[4], temp;

memcpy (c, &x, 4);
temp = c[0];
c[0] = c[3];
c[3] = temp;
temp = c[1];
c[1] = c[2];
c[2] = temp;
memcpy (&x, c, 4);

return (x);
}

It isn't only on the AVR that bswap_32() is nontrivial to get right.
These two versions would rule on the i386 if GCC would be just a little bit
smarter:

#include <string.h>
#define BSWAP_16(x) \
((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))

uint32_t bswap_32_a (uint32_t y)
{
uint16_t d[2];
uint32_t x = y;

memcpy (d, &x, sizeof (d));
d[0] = BSWAP_16 (d[0]);
memcpy (&x, d, sizeof (x));
x = ((x >> 16) & 0xffff) | ((x & 0xffff) << 16);
memcpy (d, &x, sizeof (d));
d[0] = BSWAP_16 (d[0]);
memcpy (&x, d, sizeof (x));

return (x);
}
/*
bswap_32_a:
subl $16, %esp
movl 20(%esp), %eax
movl %eax, 12(%esp)
rolw $8, 12(%esp)
roll $16, 12(%esp)
rolw $8, 12(%esp)
movl 12(%esp), %eax
addl $16, %esp
ret
*/
uint32_t bswap_32_b (uint32_t y)
{
union { uint16_t d[2]; uint32_t x; } t;
t.x = y;

t.d[0] = BSWAP_16 (t.d[0]);
t.x = ((t.x >> 16) & 0xffff) | ((t.x & 0xffff) << 16);
t.d[0] = BSWAP_16 (t.d[0]);

return (t.x);
}
/*
bswap_32_b:
movl 4(%esp), %edx
movl %edx, %eax
rolw $8, %ax
movw %ax, %dx
movl %edx, %eax
roll $16, %eax
movl %eax, %edx
rolw $8, %ax
movw %ax, %dx
movl %edx, %eax
ret
*/

--
Rask Ingemann Lambertsen

Eric Christopher

19 years ago

>
> It isn't only on the AVR that bswap_32() is nontrivial to get
> right.
> These two versions would rule on the i386 if GCC would be just a
> little bit
> smarter:
>

I prefer the single instruction bswap that we now generate for
__builtin_bswap[32,64] myself...

-eric

Rask Ingemann Lambertsen

19 years ago

On Sun, Nov 19, 2006 at 02:46:44PM -0800, Eric Christopher wrote:

> >These two versions would rule on the i386 if GCC would be just a
> >little bit smarter:
>
> I prefer the single instruction bswap that we now generate for
> __builtin_bswap[32,64] myself...

But that's only available starting with the i486. We should still produce
good code for an i386. On the m68k, we also want something with three
rotates, say

rol.w $8, d0
swap.l d0
rol.w $8, d0

--
Rask Ingemann Lambertsen

Eric Christopher

19 years ago

On Nov 19, 2006, at 10:18 PM, Rask Ingemann Lambertsen wrote:

> On Sun, Nov 19, 2006 at 02:46:44PM -0800, Eric Christopher wrote:
>
>>> These two versions would rule on the i386 if GCC would be just a
>>> little bit smarter:
>>
>> I prefer the single instruction bswap that we now generate for
>> __builtin_bswap[32,64] myself...
>
> But that's only available starting with the i486. We should still
> produce
> good code for an i386. On the m68k, we also want something with three
> rotates, say
>

I suppose, not much is being shipped that doesn't have the i486
instructions
these days.

> rol.w $8, d0
> swap.l d0
> rol.w $8, d0
>

Well, there's a bswap opcode now...

-eric

Denis Vlasenko

19 years ago

On Saturday 18 November 2006 00:30, Shaun Jackman wrote:
> The following macro expands to some rather frightful code on the AVR:
>
> #define BSWAP_16(x) \
> ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))

Sometimes gcc is generating better code if you cast
values instead of masking. Try:

( (uint8_t)((x) >> 8) | ((uint8_t)(x)) << 8 )

--
vda

Shaun Jackman

19 years ago

On 11/26/06, Denis Vlasenko <***@googlemail.com> wrote:
> On Saturday 18 November 2006 00:30, Shaun Jackman wrote:
> > The following macro expands to some rather frightful code on the AVR:
> >
> > #define BSWAP_16(x) \
> > ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
>
> Sometimes gcc is generating better code if you cast
> values instead of masking. Try:
>
> ( (uint8_t)((x) >> 8) | ((uint8_t)(x)) << 8 )

Your suggestion seemed like a good one and gave me some hope.
Unfortunately, it produces identical results to my BSWAP_16 macro. I
also tried the following:

uint8_t a = x >> 8, b = x;
return b << 8 | a;

Different register allocations this time, but identical instructions.

Cheers,
Shaun

Anton Erasmus

19 years ago

On 18 Dec 2006 at 11:28, Shaun Jackman wrote:

> On 11/26/06, Denis Vlasenko <***@googlemail.com> wrote:
> > On Saturday 18 November 2006 00:30, Shaun Jackman wrote:
> > > The following macro expands to some rather frightful code on the AVR:
> > >
> > > #define BSWAP_16(x) \
> > > ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
> >
> > Sometimes gcc is generating better code if you cast
> > values instead of masking. Try:
> >
> > ( (uint8_t)((x) >> 8) | ((uint8_t)(x)) << 8 )
>
> Your suggestion seemed like a good one and gave me some hope.
> Unfortunately, it produces identical results to my BSWAP_16 macro. I
> also tried the following:
>
> uint8_t a = x >> 8, b = x;
> return b << 8 | a;
>
> Different register allocations this time, but identical instructions.
>

Hi,

Not a macro, but the following seems to generate reasonable code.

typedef struct
{
unsigned char h;
unsigned char l;
}B2;

typedef union
{
B2 x;
unsigned short us;
}BU;

inline unsigned short byteswap(unsigned short xx)
{
BU tmp;
unsigned char tmp1;

tmp.us=xx;
tmp1=tmp.x.h;
tmp.x.h=tmp.x.l;
tmp.x.l=tmp1;
return(tmp.us);
}

volatile unsigned short gX;

void test(void)
{
gX=byteswap(gX);

}

Regards
Anton Erasmus

--
A J Erasmus

David VanHorn

19 years ago

>
> tmp.us=xx;
> tmp1=tmp.x.h;
> tmp.x.h=tmp.x.l;
> tmp.x.l=tmp1;

Am I missing something here?
Why not pop to assembler, push the high, push the low, pop the high, pop the
low?

Shaun Jackman

19 years ago

On 12/18/06, David VanHorn <***@microbrix.com> wrote:
> Am I missing something here?
> Why not pop to assembler, push the high, push the low, pop the high, pop the
> low?

* Inline assembler cannot be used at compile time, for example to
initialize a static variable.

* If the swap function is called on a constant, the compiler cannot
remove the inline assembler. In general, any inline assembler tends to
handcuff the optimizer to some degree.

* Push and pop take two cycles since they access memory. Three mov
instructions are faster than one push and one pop.

Cheers,
Shaun

Shaun Jackman

19 years ago

On 12/18/06, David VanHorn <***@microbrix.com> wrote:
> > * If the swap function is called on a constant, the compiler cannot
> > remove the inline assembler. In general, any inline assembler tends to
> > handcuff the optimizer to some degree.
>
> I'm new to C, but why would you want to swap a constant?
> CAN you swap a constant? Seems an inconstant constant :)

Quite! My usual case for this is when the constant is defined in host
order, but I'm comparing against a network order value. For example,

if (incoming == htons(EXPECTED))

Here, incoming is a 16-bit network order value and htons (a.k.a.
byteswap_16) is taking the 16-bit host order constant EXPECTED. If
htons is implemented as a mathematical expression, the compiler can
optimise it away to nothing.

Cheers,
Shaun

Shaun Jackman

19 years ago

On 12/18/06, Anton Erasmus <***@sentechsa.com> wrote:
> Hi,
>
> Not a macro, but the following seems to generate reasonable code.
...

Thanks Anton,

I came to the same conclusion.

Cheers,
Shaun

static inline uint16_t bswap_16_inline(uint16_t x)
{
union {
uint16_t x;
struct {
uint8_t a;
uint8_t b;
} s;
} in, out;
in.x = x;
out.s.a = in.s.b;
out.s.b = in.s.a;
return out.x;
}

static inline uint32_t bswap_32_inline(uint32_t x)
{
union {
uint32_t x;
struct {
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
} s;
} in, out;
in.x = x;
out.s.a = in.s.d;
out.s.b = in.s.c;
out.s.c = in.s.b;
out.s.d = in.s.a;
return out.x;
}

Nils Springob

19 years ago

Hi,

and it is possible to use an anonymous struct:

static inline uint16_t bswap_16_inline(uint16_t x)
{
union {
uint16_t x;
struct {
uint8_t a, b;
};
} in, out;
in.x = x;
out.a = in.b;
out.b = in.a;
return out.x;
}

Regards,

Nils

Shaun Jackman

19 years ago

On 12/18/06, Nils Springob <***@nicai-systems.de> wrote:
> Hi,
>
> and it is possible to use an anonymous struct:

True. However, unnamed struct/union fields are an extension of the C
language provided by GCC and not strictly portable. It is a nice
feature though. It would be nice if it crept its way into a future C
standard.

Cheers,
Shaun

21 Replies
218 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Shaun Jackman 19 years ago

Eric Weddington 19 years ago

Larry Barello 19 years ago

Rask Ingemann Lambertsen 19 years ago

Paul Brook 19 years ago

Eric Weddington 19 years ago

Steven Bosscher 19 years ago

Eric Weddington 19 years ago

'Rask Ingemann Lambertsen' 19 years ago

Rask Ingemann Lambertsen 19 years ago

Eric Christopher 19 years ago

Rask Ingemann Lambertsen 19 years ago

Eric Christopher 19 years ago

Denis Vlasenko 19 years ago

Shaun Jackman 19 years ago

Anton Erasmus 19 years ago

David VanHorn 19 years ago

Shaun Jackman 19 years ago

Shaun Jackman 19 years ago

Shaun Jackman 19 years ago

Nils Springob 19 years ago

Shaun Jackman 19 years ago

about - legalese

Loading...