Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ASSIGNMENT OF AN ARGV VALUE TO A VARIABLE 3

Status
Not open for further replies.

jdechirico

Programmer
Jul 11, 2006
13
US
Hello,

i am trying to take data entered via the command line and assign it to a variable,


The affected values/fields:

Before main

#define KEY_BYTES 24

After main

unsigned char keyData[KEY_BYTES];

keyData = argv[2];

The compiler is complaing that keyData must be a modifiable lvalue, whats it crabbing about?
 
You need to copy the string with strncpy(), not just copying the pointer.
 
thx,

any chance you could give me a code snippet?
 
More to the point - what if your string in argv[2] is larger than 23 characters - you are going to blow the end of the array !!

I would have thought you needed to either :

a) declare keyData as the length of the argv[2] string ( +1 for \0 )
or
b) Check the length of argv[2] before copying into keyData, and if larger than 23 characters, then error .

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Code:
char* pKeyData = NULL;

pKeyData = malloc( strlen( argv[2] ) + 1 );
strcpy( pKeyData, argv[2] );
...
free( pKeyData );
 
cpjust

i'm new to C since a couple of weeks, but shouldn't that be

char *pKeyData = NULL;

?

i thought the char keyword DIRECTLY followed by the indirection operator was only used as a return type from a function, like this:

char* returnstring()
{
return "mystring";
}

but i could be mistaken..
 
you are mistaken :)

BTW, your example there would probably fail, because you have allocated the char* on the stack - not the heap - so when you return from the function, the pointer is deallocated (at some point) - *usually* before you try to access it (but not always - hence the expected behaviour is not guaranteed).

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
C doesn't really care where the * is, so it's just a matter of preference where you put it.
Code:
  char* pKeyData = NULL;

Same as:

  char *pKeyData = NULL;

Same as:

  char * pKeyData = NULL;
 
C doesn't care how you format it, as long as it can parse the statements.
Code:
char * pKeyData = NULL;

// is the same as

char
*
pKeyData
=
NULL
;
Not as readable, but they compile the same.
 
That example wouldn't fail since the string literal gets placed into the initialized static data section, and the function is just returning its address; it isn't tied to the stack at all.

However, it should be returning a [tt]const char *[/tt], not a [tt]char *[/tt], as you can't modify a string literal.
 
Oh, OK - did not realize that, thanks for pointing it out :)

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
chipperMDW.

i think you can't modify a literal string in one go, but ofcourse you can modify it in parts:
Code:
  char array0[] = "0123456789";
  char array1[] = "ABCDEFGHIJ";
  int len = strlen(array0), i;
  for (i = 0;i<len;++i)
    *(array0+i) = *(array1+i);
can you tell me what the difference is between the "initialized static data section" as you mentioned earlier, and the heap?
 
denc4 said:
i think you can't modify a literal string in one go, but ofcourse you can modify it in parts:
Your code snippet works. However, in your code snippet, you aren't modifying the string literal; you're modifying an array initialized to the value of the string literal. That array is non-[tt]const[/tt], and is located on the stack.

If you try to modify an actual string literal, you're likely to get a memory error. For example:
Code:
[COLOR=blue]$ cat >lit.c[/color]
int main ()
{
  char *literal = "hello!";
  literal[0] = 'H';
  return 0;
}
[COLOR=blue]$ gcc -Wall -s -o lit lit.c[/color]
[COLOR=blue]$ ./lit[/color]
[highlight]Segmentation fault[/highlight]

That's because my compiler/linker built an object file that stored that string literal in a section called [tt].rodata[/tt], which stands for "read-only data":
Code:
[COLOR=blue]$ objdump -s -j .rodata lit[/color]

lit:     file format elf32-i386

Contents of section .rodata:
 8048490 03000000 01000200 [highlight]68656c6c 6f2100[/highlight]    ........[highlight]hello!.[/highlight]

On further investigation, we can see that the section [tt].rodata[/tt] is indeed marked with a read-only flag in the object file:
Code:
[COLOR=blue]$ objdump -h lit |grep -A1 '\.rodata'[/color]
 13 .rodata       0000000f  08048490  08048490  00000490  2**2
                  CONTENTS, ALLOC, LOAD, [highlight]READONLY[/highlight], DATA

So when my operating system kernel (Linux) loads that executable object file into memory, it takes the hint and loads the [tt].rodata[/tt] section into an area of memory for which it has disabled writing, enabling it to enforce that constraint. When the process attempts to modify data in that area of memory, the kernel kills it for accessing memory in a way it wasn't supposed to. That was the "Segmentation Fault" above.


denc4 said:
can you tell me what the difference is between the "initialized static data section" as you mentioned earlier, and the heap?
The "initialized static data section" is a section for static data that's initialized :)

Taken in pieces:
[ul]
[li]data
This one's easy. It's not code; it's data.
[/li]
[li]static
Here, "static" is almost a synonym for "global." It just means the data is going to stay around for the whole process. It also implies that you know what the data is and how much of it there will be when you write the program.
[/li]
[li]initialized
This just means you specified an initial value for the data.[/li]
[/ul]

I probably also should have used the word "constant" in my description; that would mean the same as "read-only."

When your compiler/linker builds your executable, it puts static data into sections. That way, the OS kernel can load that data up when the process starts, and have it stick around for the duration of the process.

The "initialized" part basically just means the object file has to contain the initial values for the data. It can save quite a bit of space for static data that doesn't need to be initialized. (However, when dealing with a constant value such as our string literal, you have to initialize it because your program doesn't get to change it (as we've seen)).


The heap, on the other hand, is a pool of memory from which your program can, at runtime, request chunks using funnctions like [tt]malloc[/tt]; when it's done, it can give them back with [tt]free[/tt].

When you don't know exactly how much data you're going to need, you store it in heap memory instead of static memory. That way, you can allocate as much as you need and no less (and usually not much more).

You also have control over how long you have heap memory allocated. It doesn't stick around for the entire lifetime of the process like static data does (unless you want it to).

Finally, since your program determines everything about how (and if) heap memory gets used at runtime, you don't store heap data in the object file like you do with static data.


*Whew,* you asked a mouthful. ;-)
 
Incidentally, on that placing of the "*"; this is one of the really horrible features of the C syntax. Really, really horrible, and unworthy of a language that prides itself on strong typing and conciseness.

Although the * can be placed anywhere, without the compiler complaining, it belongs with the name of the variable and not with the type. Many people put it near the type, to indicate psychologically that it specifies the type that will be associated with the name that follows.

The problem happens if you try to define several pointers-to-ints (for example) on one line, with just the one "int*". Only the first will be a pointer to an int; the rest will be straightforward ints.
 
lionelhill said:
The problem happens if you try to define several pointers-to-ints (for example) on one line
And that's why I always declare one variable per statement. Doing it all in one line might save some space and some typing, but it makes the code a little less readable and could introduce bugs like the one you mentioned.
 
you are right about the arrays being on the stack, that's why i should have used

char *pointer0 = "0123456789";
char *pointer1 = "ABCDEFGHIJ";

instead of the array initializations to make my point. But what makes that weird is that you claim they (my pointers above) are readonly (in static area). when i compile & run your gcc code with my microsoft C 7.0 compiler under DOS, it runs perfectly. what i am wondering right now is, should the static memory area be readonly in ANSI C?

you described the static area well, i believe this is also where the static storageclass specifier comes from, right?

static int i; /* reserves space for i in static area */

thx.
 
denc4 said:
But what makes that weird is that you claim they (my pointers above) are readonly (in static area). when i compile & run your gcc code with my microsoft C 7.0 compiler under DOS, it runs perfectly. what i am wondering right now is, should the static memory area be readonly in ANSI C?
ANSI C doesn't, as far as I know, explicitly state that string literals must be stored in a read-only area of memory. That's why I only said a memory error was likely to happen.

However, ANSI C's treatment of string literals would allow them to be placed in read-only memory, and as such, it requires that you treat them as if they were. Modifying a string literal in C is considered "undefined behavior" (someone correct me if that's wrong); that means if you try doing that in your program, you have no idea what's going to happen; it might work, or it might crash, or it might work and do something completely different than you intended.


For any number of reasons, when you tried my code, your string literals ended up being modifiable. It's possible that your object file format doesn't support the notion of a read-only data section. It's possible that your compiler/linker chose to put the string literals in a section that wasn't marked read-only. It's also possible that your OS kernel decided to ignore a read-only flag and load the section into writable memory anyway.

Whatever the cause, that program is now exhibiting behavior undefined under ANSI C, so without disassembling it, you can't ever really be sure it's doing what the C source code might imply it's doing.


I was slightly surprised that my compiler didn't spit out a warning for a declaration like:
Code:
char *str = "test";

That declaration is incorrect because, for the reasons given above, a string literal is never a [tt]char *[/tt]; it's always a [tt]const char *[/tt].

Unfortunately, C makes a special exception for string literals, allowing variables of type [tt]char *[/tt] to be assigned the address of a string literal. That's because the [tt]const[/tt] keyword wasn't in C to start with, and adding that requirement would break old code.

However, any string literal you declare should always be a [tt]const char *[/tt]. That way, your compiler can enforce the constraints that ANSI C has imposed on string literals. I declared my string literal as a [tt]char *[/tt] only to bypass the compiler's enforcement so I could show what can happen when you do.
 
denc4 said:
you described the static area well, i believe this is also where the static storageclass specifier comes from, right?
Yes, I'd bet that's probably the reason for the name of the keyword. However, [tt]static[/tt] is more commonly used as a linkage specifier in C, and is used for declaring global variables that can only be used within a single source file. I'm not sure how that meaning relates to the name.
 
chipperMDW - I'm sure I'm wrong here - but ...

Say you have two source files - a.c and b.c
If you declare a static int in a.c, I was pretty sure you could access it in b.c (ie access global statics from more than one source file).

Or am I way off ?

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Nope, declaring something [tt]static[/tt] means you can only access it from that source file.

Code:
[COLOR=blue]$ cat >a.c[/color]
static int foo;
[COLOR=blue]$ cat >b.c[/color]
[highlight]extern[/highlight] int foo;

int main ()
{
  foo = 42;
  return 0;
}

First, note that I declared foo as [tt]extern[/tt] in b.c. That tells the compiler that we expect [tt]foo[/tt] to be defined in another source file.

Code:
[COLOR=blue]$ gcc -Wall -c a.c[/color]
a.c:1: warning: [highlight]'foo' defined but not used[/highlight]
When I compile a.c, it works, but my compiler warns me that [tt]foo[/tt] is defined and never used. It knows to give that warning because, since [tt]foo[/tt] is static, this is the only source file in which it can be used.

Code:
[COLOR=blue]$ gcc -Wall -c b.c
$ [/color]
b.c compiles fine; we won't see a problem until link time.

Code:
[COLOR=blue]$ gcc -s -o static a.c b.c[/color]
b.o: In function `main':
b.c:(.text+0x1e): [highlight]undefined reference to `foo'[/highlight]
collect2: ld returned 1 exit status
When we try to link, though, the reference to [tt]foo[/tt] in b.o can't resolve to the definition in a.o because that was declared static.

Code:
[COLOR=blue]$ readelf -s a.o |egrep 'Num:|foo'[/color]
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     5: 00000000     4 OBJECT  [highlight]LOCAL[/highlight]  DEFAULT    3 foo
We can see in the object file's symbol table that the symbol has a [tt]LOCAL[/tt] binding, which means it can't be used to resolve any references outside this object file.


Code:
[COLOR=blue]$ cat >a.c[/color]
int foo;
[COLOR=blue]$ gcc -Wall -c a.c[/color]
[COLOR=blue]$ gcc -s -o static a.o b.o[/color]
[COLOR=blue]$ [/color]
If we remove that [tt]static[/tt] from the declaration, everything compiles and links just fine. The only thing stopping that reference from resolving the first time around was the fact that [tt]foo[/tt] was declared [tt]static[/tt].

Code:
[COLOR=blue]$ readelf -s a.o |egrep 'Num:|foo'[/color]
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     7: 00000004     4 OBJECT  [highlight]GLOBAL[/highlight] DEFAULT  COM foo
We can confirm this by looking in the symbol table again. Since we removed [tt]static[/tt] from the declaration of [tt]foo[/tt] in a.c, it now has a [tt]GLOBAL[/tt] binding, which means it can be used to resolve undefined references from other object files.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top