this post was submitted on 21 Aug 2023
26 points (100.0% liked)

Programming

17558 readers
349 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

cross-posted from: https://lemm.ee/post/4890334

cross-posted from: https://lemm.ee/post/4890282

let's say I have this code

` #include #include char name[50]; int main(){ fgets(name,50,stdin); name[strcspn(name, "\n")] = '\0'; printf("hi %s", name); }

` and I decide my name is "ewroiugheqripougheqpiurghperiugheqrpiughqerpuigheqrpiugherpiugheqrpiughqerpioghqe4r", my program will throw some unexpected behavior. How would I mitigate this?

all 20 comments
sorted by: hot top controversial new old
[–] CasualTee@beehaw.org 22 points 1 year ago* (last edited 1 year ago)

First of, especially in C, you should very carefully read the documentation of the functions you use. It then should be obvious to you you are currently misusing it on two accounts:

  • You are not checking for errors
  • You are assuming the presence of a \n that might never be there (this one leads to your unexpected behavior)

The manual tells you it will insert a \0 at the end of what it reads within the limits of the buffer. So this \0 is what you will need to look for when determining the size of the input.

If there is a \n, it will precede the \0. Just make sure the \0 is not at index 0 before trying to erase the \n. If there is no \n before the \0, you are in either of two cases (again, this is detailed in the documentation): the input is truncated (you did not read the full line, as in your unexpected behavior above) or you are reaching the end of the stream. Note that even if the stream ends with \n, you might need to issue an additional fgets to know you are at the end of the stream in which case a \0 will be placed as the first byte of your buffer.

If you really want to handle input that exceeds your initial buffer, then you need to dynamically allocate one and grow it as needed. A well behaved program will have an upper limit to the size of the input anyway (and this is why you don't use gets). So you will need a combination of malloc/realloc and string concatenation. That means you need to learn all the pitfalls of dynamic memory allocation in C and how to use valgrind. For the string concatenation, even though strcat should be OK in your case, I'd recommend against it.

In order to use strcat properly you need to keep track of the usage of the dynamically allocated buffer by hand anyway because you want to know when you will attempt to store more bytes in the buffer than is currently allocated. And once you know the number of bytes stored in the buffer, copying over the bytes that fgets returns by hand is fairly trivial and has less pitfalls. This also circumvent one of the performance pitfall of strcat: it needs to find the \0 in the destination buffer for every call. So effectively, it can transform all by itself a trivial usecase such as yours, that one would expect to be linear in algorithmic complexity, to be of O(N^2) complexity.

On a final note: fgets does not allow you to handle binary data properly because you wont be able to tell apart a legitimate \0 coming from your input from a \0 inserted by fgets. So you will need to use fread in this case. I actually recommend using fread instead of fgets because it directly returns the number of bytes read, no need to use strlen to guess it and it makes error handling easier. Though you'll need to add the trailing \0 yourself.

[–] frankfurt_schoolgirl@hexbear.net 13 points 1 year ago (2 children)

If you want to accept a user input of any length, you have to read the input piece by piece and allocate a new buffer if the original becomes full. Basic steps would be:

  1. Use malloc to make a char * buffer
  2. Read one character at a time in a while loop, keep track of your position in the buffer
  3. If you get an EOF character, add a \0 to your buffer and break the loop. You're done!
  4. If the position is greater than the length, allocate a new buffer that has twice the length. Use memcpy to copy the stuff from the old buffer to the new one. Use free to get rid of the old buffer.

This will work until you fill the entire memory of your computer. You should probably set a max length and print an error if it is reached.

[–] buh@hexbear.net 7 points 1 year ago (1 children)

this is the right answer for the question, the only thing I would suggest is in step 4, to use realloc instead of doing malloc/memcpy/free cycles, since realloc does all that and will simply extend the allocated space so it can skip the memcpy and free steps if possible, which is a little faster

Didn't know about that one, thanks!

[–] xigoi@lemmy.sdf.org 1 points 1 year ago
  1. Realize that you'll have to do something like this every time you want to work with a string of unknown size
  2. Cry
  3. Use a different language
[–] mo_ztt@lemmy.world 7 points 1 year ago
  1. In almost all cases you want to be using some encapsulated string class (glib string, C++ string, something like that). The reason is that your question honestly doesn't really have a good answer. I.e., if you're storing the name in a statically-allocated char buffer, someone has a name that's more than 50 characters, you're screwed. "Screwed" can include all kinds of things up to and include crashing your program or introducing a way for people to enter a malicious name and take over your program.
  2. If you're really set on doing this, e.g. you just want to do this learn about C memory management, then probably what you want is a dynamic buffer. Find out the length of the name before you allocate the place for it, use malloc() to get a buffer of the size you actually need, and put the string there. It's highly unusual that in a modern application, doing this type of thing yourself is worth the effort and risk it creates though.
[–] DirigibleProtein@aussie.zone 6 points 1 year ago (2 children)
[–] hairyballs@programming.dev 7 points 1 year ago* (last edited 1 year ago)

The first article is funny, because I moved from my native country to the one right next to it, and everybody is confused by my name. They have one given name and 2 family names, while I have 4 first names, and a compound last name.

No need to travel to the other side of the planet to meet a different culture of naming.

[–] floppy@rabbitea.rs 2 points 1 year ago (1 children)
[–] DirigibleProtein@aussie.zone 2 points 1 year ago

Wow. Almost worth changing my name again just to fuck with them.

[–] paysrenttobirds@sh.itjust.works 6 points 1 year ago (1 children)

Hello, ewroiugheqripougheqpiurghperiugheqrpiughqerpuigheq,

The fgets function will only read in as many characters as you tell it to (50) in the second parameter, so the rest of the input will simply be lost and the name will be truncated.

[–] fubo@lemmy.world 7 points 1 year ago* (last edited 1 year ago) (1 children)

And, mind you, fgets is the safer replacement to the original gets, which attempts to read a variable-length line into a fixed-length buffer.

The manual has this to say about gets

BUGS
       Never use gets().  Because it is impossible to tell without knowing the
       data  in  advance  how  many  characters  gets() will read, and because
       gets() will continue to store characters past the end of the buffer, it
       is  extremely dangerous to use.  It has been used to break computer se‐
       curity.  Use fgets() instead.
[–] mo_ztt@lemmy.world 3 points 1 year ago (1 children)

Why is this even still in the library 🥲

Twenty years ago it kind of made sense. Ok it's bad, but sometimes we're just reading a local file fully under our control, maybe from old code that the source doesn't exist anymore for, it's such a core function that taking it out however badly needed will have some negative consequences.

At this point though, I feel like calling it should just play a loud, stern "NO!" over your speakers and exit the program.

[–] fubo@lemmy.world 6 points 1 year ago

Why is this even still in the library 🥲

The linker will complain at you —

dumb.c:(.text+0x2f): warning: the `gets' function is dangerous and should not be used.
[–] nothacking@discuss.tchncs.de 5 points 1 year ago

Your going to have to read the input one piece at a time, allocating a bigger buffer as you go. (realloc is the way to go here) I recommend putting the input reading code in a function do you can easily use it multiple times.

[–] buh@hexbear.net 2 points 1 year ago* (last edited 1 year ago)

I dont really see the point of the 2nd line in main, fgets already makes sure to null terminate

[–] PaX@hexbear.net 1 points 1 year ago* (last edited 1 year ago)

In addition to the other answers, you might consider using getline() if you're on a POSIX-compliant system. It can automatically allocate a buffer for you.