Andrei Burdulescu's personal website

Choosing the Upper Limit for the Number of Function Args

There are some coding style guides which have a recommended or manadatory upper limit for the number of arguments a function/method can have.

In most cases, this limit is arbitrary or chosen based on some kind of argument about readbility or maintainability.

In any case, there is no real empirical motivation behind the chosen limit.

I heard a story once(don't remeber where) about how a park was first built without any paved paths and the park administrators waited for people to walk around the park and then, after some time, paved the paths created by the park "users". This is apparently a well-known thing called "desire path".

So, let's try and find the "desire path" for our problem, the upper limit of a function's number of arguments.

Best way I see to do that is by using the magic of sampling.

In our case this means we need to select some popular open-source projects, count their functions and for each function its number of arguments.

Then, when we have that data, we can do some simple math to see what the limit should be.

But how do we count the functions and their number of arguments?

Enter: fnargc

fnargc is a simple python script which uses libclang to parse the files specified in a compilation database and counts all functions found and their number of arguments.

To install it do(in an virtual env if you want):

$ pip install git+https://github.com/aburdulescu/fnargc.git

Now let's start ...

Sampling

You can download the samples for all projects from here.

I'll present the steps to generate the samples and analyze them only for one project, sqlite.

For all other I only present the results since everything else should be the same.

sqlite

We start with the sqlite because it is generaly considered a good example of what a quality software project should be. See it's testing page to see why.

These are the steps:

Generate compilation database and build it:

$ git clone --depth 1 https://github.com/aburdulescu/sqlite.git
$ cmake -S sqlite -B sqlite/b -GNinja -DCMAKE_EXPORT_COMPILE_COMMANDS=1
$ cmake --build b

Parse the code and count the functions and their arguments:

$ fnargc -r sqlite -s src -b b -o sqlite.csv

Print stats:

$ fnargc-stats sqlite.csv
total no. of funcs: 2445

breakdown by no. of args:
  1 args        55      2.25%
  2 args        704     28.79%
  3 args        755     30.88%
  4 args        479     19.59%
  5 args        227     9.28%
  6 args        130     5.32%
  7 args        47      1.92%
  8 args        18      0.74%
  9 args        15      0.61%
  10 args       7       0.29%
  11 args       3       0.12%
  12 args       4       0.16%
  14 args       1       0.04%

what percentage of funcs have?
  3 or less args        61.92%
  4 or less args        81.51%
  5 or less args        90.80%
  6 or less args        96.11%
  7 or less args        98.04%
  8 or less args        98.77%

curl

total no. of funcs: 1779

breakdown by no. of args:
  1 args        48      2.70%
  2 args        423     23.78%
  3 args        568     31.93%
  4 args        394     22.15%
  5 args        176     9.89%
  6 args        115     6.46%
  7 args        36      2.02%
  8 args        7       0.39%
  9 args        6       0.34%
  10 args       4       0.22%
  11 args       1       0.06%
  15 args       1       0.06%

what percentage of funcs have?
  3 or less args        58.40%
  4 or less args        80.55%
  5 or less args        90.44%
  6 or less args        96.91%
  7 or less args        98.93%
  8 or less args        99.33%

libuv

total no. of funcs: 666

breakdown by no. of args:
  1 args        42      6.31%
  2 args        270     40.54%
  3 args        168     25.23%
  4 args        96      14.41%
  5 args        49      7.36%
  6 args        22      3.30%
  7 args        15      2.25%
  8 args        4       0.60%

what percentage of funcs have?
  3 or less args        72.07%
  4 or less args        86.49%
  5 or less args        93.84%
  6 or less args        97.15%
  7 or less args        99.40%
  8 or less args        100.00%

flatbuffers

total no. of funcs: 1460

breakdown by no. of args:
  1 args        290     19.86%
  2 args        413     28.29%
  3 args        318     21.78%
  4 args        277     18.97%
  5 args        112     7.67%
  6 args        35      2.40%
  7 args        14      0.96%
  8 args        1       0.07%

what percentage of funcs have?
  3 or less args        69.93%
  4 or less args        88.90%
  5 or less args        96.58%
  6 or less args        98.97%
  7 or less args        99.93%
  8 or less args        100.00%

leveldb

total no. of funcs: 327

breakdown by no. of args:
  1 args        77      23.55%
  2 args        93      28.44%
  3 args        88      26.91%
  4 args        42      12.84%
  5 args        14      4.28%
  6 args        8       2.45%
  7 args        3       0.92%
  8 args        2       0.61%

what percentage of funcs have?
  3 or less args        78.90%
  4 or less args        91.74%
  5 or less args        96.02%
  6 or less args        98.47%
  7 or less args        99.39%
  8 or less args        100.00%

ninja

total no. of funcs: 469

breakdown by no. of args:
  1 args        141     30.06%
  2 args        159     33.90%
  3 args        86      18.34%
  4 args        62      13.22%
  5 args        14      2.99%
  6 args        6       1.28%
  8 args        1       0.21%

what percentage of funcs have?
  3 or less args        82.30%
  4 or less args        95.52%
  5 or less args        98.51%
  6 or less args        99.79%
  7 or less args        99.79%
  8 or less args        100.00%

sqlite + curl + libuv

total no. of funcs: 4890

breakdown by no. of args:
  1 args        145     2.97%
  2 args        1397    28.57%
  3 args        1491    30.49%
  4 args        969     19.82%
  5 args        452     9.24%
  6 args        267     5.46%
  7 args        98      2.00%
  8 args        29      0.59%
  9 args        21      0.43%
  10 args       11      0.22%
  11 args       4       0.08%
  12 args       4       0.08%
  14 args       1       0.02%
  15 args       1       0.02%

what percentage of funcs have?
  3 or less args        62.02%
  4 or less args        81.84%
  5 or less args        91.08%
  6 or less args        96.54%
  7 or less args        98.55%
  8 or less args        99.14%

flatbuffers + leveldb + ninja

total no. of funcs: 2256

breakdown by no. of args:
  1 args        508     22.52%
  2 args        665     29.48%
  3 args        492     21.81%
  4 args        381     16.89%
  5 args        140     6.21%
  6 args        49      2.17%
  7 args        17      0.75%
  8 args        4       0.18%

what percentage of funcs have?
  3 or less args        73.80%
  4 or less args        90.69%
  5 or less args        96.90%
  6 or less args        99.07%
  7 or less args        99.82%
  8 or less args        100.00%

all

total no. of funcs: 7146

breakdown by no. of args:
  1 args        653     9.14%
  2 args        2062    28.86%
  3 args        1983    27.75%
  4 args        1350    18.89%
  5 args        592     8.28%
  6 args        316     4.42%
  7 args        115     1.61%
  8 args        33      0.46%
  9 args        21      0.29%
  10 args       11      0.15%
  11 args       4       0.06%
  12 args       4       0.06%
  14 args       1       0.01%
  15 args       1       0.01%

what percentage of funcs have?
  3 or less args        65.74%
  4 or less args        84.63%
  5 or less args        92.92%
  6 or less args        97.34%
  7 or less args        98.95%
  8 or less args        99.41%

Conclusion

First thing to note is that in general the functions in C projects have one arg more than in C++ projects.

My guess to why that is would be that most C functions look like this:

struct MyType {};

void mytype_do_something(struct mytype* self);

In my opinion, the "self" arg shouldn't be counted and therefore, in general, when we talk about C functions with 5 args we are actually talking about the other 4 args.

So, what limit should we choose?

Based on the data above, I would choose the following rule:

If the no. of args is:

> 8: error! fix your code
<= 8 and > 5: warning! maybe try refactoring?
<= 5: all is good!

Why?

Because, as we see above, the % of functions with:

9 or more args is aprox. 0.5%
6, 7 or 8 args is aprox. 6.5%

This rule, in my opinion, would give the developers enough autonomy to decide if a function needs refactoring or not(<= 8 and > 5) and it also sets a hard limit which cannot be crossed(> 8).