Click to See Complete Forum and Search --> : shifting a large amount of data


Strogian
04-30-2004, 08:32 PM
here's a function I just wrote; I don't really like it:


#define CHARBYTE 2 /* number of 8-bit bytes held in an unsigned char */

/* shift the data at ptr left, as a whole */
void adjustb(unsigned char *ptr, long int offset, size_t size)
{
int i;
int shift = offset % CHARBYTE;
ptr[0] <<= shift * 8;
for (i=1; i<len; i++) {
ptr[i-1] |= ptr[i] >> (CHARBYTE - shift) * 8
ptr[i] <<= shift * 8;
}
}


I don't need to explain why I need this, do I? :)

ptr points to a buffer read in from a binary file, and I'm writing this to deal with the case that a char actually contains more than one "byte" (defined as an 8-bit unit). It shifts the buffer left by the amount needed, to make buf[0] point to the thing I want it to, instead of having it be buf[0.5]

Anyone have any suggestions? I could just put an "offset" byte in front of it I guess, but I don't think I really want to do that unless I find a really good reason to.

bwkaz
04-30-2004, 11:51 PM
Originally posted by Strogian
and I'm writing this to deal with the case that a char actually contains more than one "byte" (defined as an 8-bit unit). That will never happen. You don't need this code.

;)

If your characters are more than 8 bits wide (e.g. on a (*shudder*) EBCDIC system, or in a UTF-8 locale), then you need to use the <wctype.h> / <wchar.h> headers and the wchar_t and wctype_t types (which are up to 4 bytes long in UTF-8, but only 2 in straight Unicode or *shudder* EBCDIC). The char type cannot be anything bigger than a byte, otherwise there would be no way to point to an odd address on that machine (there is no type smaller than a char).

Strogian
05-01-2004, 11:11 AM
When I say "byte," I don't mean the C definition of a byte. I'm talking about 8 bits here, that's it. That's how a byte is defined in the file I'm reading. As far as I know, there's nothing in the C language spec stopping a char from being 16 bits, and if that's true, then I might be getting two "8-bit bytes" packed in one char.

Still, you're right in that this will never happen, realistically. I just want the thing to be bulletproof. :D

On that note, there could actually be something like 9-bits put into one byte, too. Then I'd need something a little different...

bwkaz
05-01-2004, 01:03 PM
Originally posted by Strogian
When I say "byte," I don't mean the C definition of a byte. I'm talking about 8 bits here, that's it. I know. A C byte IS 8 bits (well... it's a machine byte, which is always 8 bits nowadays). ALWAYS.

If you need access to 16-bit entities (or longer for UTF-8), you use the wchar_t type and friends.

That's how a byte is defined in the file I'm reading. That's now a byte is defined in EVERY file EVERYWHERE, actually. The byte was 9 bits on like the PDP-6 or some other ancient architecture that never supported C anyway, but nothing since then has had a machine byte be anything other than 8 bits.

As far as I know, there's nothing in the C language spec stopping a char from being 16 bits, There is. I don't know what exactly (since I haven't read it), but I can say that you should read question 7.8 of the comp.lang.c FAQ, available at http://www.faqs.org/faqs/C-faq/faq/.

Still, you're right in that this will never happen, realistically. I just want the thing to be bulletproof. :D It already is... ;)

Strogian
05-01-2004, 02:38 PM
Originally posted by bwkaz
I know. A C byte IS 8 bits (well... it's a machine byte, which is always 8 bits nowadays). ALWAYS.
Oh boy! Well here's the deal -- I see a lot of pages out there with information like this in it:
http://home.att.net/~jackklein/c/inttypes.html

"The value of CHAR_BIT is required to be at least 8. Almost all modern computers today use 8 bit bytes (technically called octets, but there are still some in production and use with other sizes, such as 9 bits. Also some processors (especially D igital Signal Processors) cannot efficiently access memory in smaller pieces than the processor's word size. There is at least one DSP I have worked with where CHAR_BIT is 32. The char types, short, int and long are all 32 bits."

Now, maybe all these pages were written in 1988 (a lot of the USENET posts were, at least ... it's fun to see what sorts of arguments there were about C back then, talking about what needed to be changed :D) But I haven't seen anything stating that things are different nowadays. Has something changed?

That's now a byte is defined in EVERY file EVERYWHERE, actually. The byte was 9 bits on like the PDP-6 or some other ancient architecture that never supported C anyway, but nothing since then has had a machine byte be anything other than 8 bits.

OK, OK, I get it. That code isn't necessary at all, and I'll never use it. :D

.

.

.

Let's pretend that I have an encryption algorithm that shifts all the data to the right by 8 bits, and then I have to shift everything left to decode it. No, make that 4 bits, because I don't want an answer, "start at array[n+1]." :D

Maybe I'm going to port this program to a DSP... You never know!

bwkaz
05-01-2004, 03:35 PM
Originally posted by Strogian
Now, maybe all these pages were written in 1988 (a lot of the USENET posts were, at least ... <snip>) Given that the specific page that you linked to is validating against HTML three point two, I tend to think of it as a relic of history (just like HTML 3.2 itself, and just like HTML 4.0 and 4.01 in my mind, incidentally).

Of course, the nice thing about standards is that there are so many ways to get around them... but that's a different discussion altogether.

Has something changed? I don't know, you tell me. Is it sill 1995? Are Netscape 3 and IE2 still the newest browsers out there? I'm sorry, I'm sorry, but I really had to. :p

... Let's pretend I have an encryption... OK, let's. If you absolutely require that low level of control, why use C at all? Why not use assembly or machine code? ;)