There are some coding style guides which have a recommended or manadatory upper limit for the number of arguments a function/method can have.
In most cases, this limit is arbitrary or chosen based on some kind of argument about readbility or maintainability.
In any case, there is no real empirical motivation behind the chosen limit.
I heard a story once(don't remeber where) about how a park was first built without any paved paths and the park administrators waited for people to walk around the park and then, after some time, paved the paths created by the park "users". This is apparently a well-known thing called "desire path".
So, let's try and find the "desire path" for our problem, the upper limit of a function's number of arguments.
Best way I see to do that is by using the magic of sampling.
In our case this means we need to select some popular open-source projects, count their functions and for each function its number of arguments.
Then, when we have that data, we can do some simple math to see what the limit should be.
But how do we count the functions and their number of arguments?
fnargc
is a simple python script which uses libclang to parse
the files specified in a compilation database and counts all functions
found and their number of arguments.
To install it do(in an virtual env if you want):
$ pip install git+https://github.com/aburdulescu/fnargc.git
Now let's start ...
You can download the samples for all projects from here.
I'll present the steps to generate the samples and analyze them
only for one project, sqlite
.
For all other I only present the results since everything else should be the same.
We start with the sqlite because it is generaly considered a good example of what a quality software project should be. See it's testing page to see why.
These are the steps:
Generate compilation database and build it:
$ git clone --depth 1 https://github.com/aburdulescu/sqlite.git
$ cmake -S sqlite -B sqlite/b -GNinja -DCMAKE_EXPORT_COMPILE_COMMANDS=1
$ cmake --build b
Parse the code and count the functions and their arguments:
$ fnargc -r sqlite -s src -b b -o sqlite.csv
Print stats:
$ fnargc-stats sqlite.csv
total no. of funcs: 2445
breakdown by no. of args:
1 args 55 2.25%
2 args 704 28.79%
3 args 755 30.88%
4 args 479 19.59%
5 args 227 9.28%
6 args 130 5.32%
7 args 47 1.92%
8 args 18 0.74%
9 args 15 0.61%
10 args 7 0.29%
11 args 3 0.12%
12 args 4 0.16%
14 args 1 0.04%
what percentage of funcs have?
3 or less args 61.92%
4 or less args 81.51%
5 or less args 90.80%
6 or less args 96.11%
7 or less args 98.04%
8 or less args 98.77%
total no. of funcs: 1779
breakdown by no. of args:
1 args 48 2.70%
2 args 423 23.78%
3 args 568 31.93%
4 args 394 22.15%
5 args 176 9.89%
6 args 115 6.46%
7 args 36 2.02%
8 args 7 0.39%
9 args 6 0.34%
10 args 4 0.22%
11 args 1 0.06%
15 args 1 0.06%
what percentage of funcs have?
3 or less args 58.40%
4 or less args 80.55%
5 or less args 90.44%
6 or less args 96.91%
7 or less args 98.93%
8 or less args 99.33%
total no. of funcs: 666
breakdown by no. of args:
1 args 42 6.31%
2 args 270 40.54%
3 args 168 25.23%
4 args 96 14.41%
5 args 49 7.36%
6 args 22 3.30%
7 args 15 2.25%
8 args 4 0.60%
what percentage of funcs have?
3 or less args 72.07%
4 or less args 86.49%
5 or less args 93.84%
6 or less args 97.15%
7 or less args 99.40%
8 or less args 100.00%
total no. of funcs: 1460
breakdown by no. of args:
1 args 290 19.86%
2 args 413 28.29%
3 args 318 21.78%
4 args 277 18.97%
5 args 112 7.67%
6 args 35 2.40%
7 args 14 0.96%
8 args 1 0.07%
what percentage of funcs have?
3 or less args 69.93%
4 or less args 88.90%
5 or less args 96.58%
6 or less args 98.97%
7 or less args 99.93%
8 or less args 100.00%
total no. of funcs: 327
breakdown by no. of args:
1 args 77 23.55%
2 args 93 28.44%
3 args 88 26.91%
4 args 42 12.84%
5 args 14 4.28%
6 args 8 2.45%
7 args 3 0.92%
8 args 2 0.61%
what percentage of funcs have?
3 or less args 78.90%
4 or less args 91.74%
5 or less args 96.02%
6 or less args 98.47%
7 or less args 99.39%
8 or less args 100.00%
total no. of funcs: 469
breakdown by no. of args:
1 args 141 30.06%
2 args 159 33.90%
3 args 86 18.34%
4 args 62 13.22%
5 args 14 2.99%
6 args 6 1.28%
8 args 1 0.21%
what percentage of funcs have?
3 or less args 82.30%
4 or less args 95.52%
5 or less args 98.51%
6 or less args 99.79%
7 or less args 99.79%
8 or less args 100.00%
total no. of funcs: 4890
breakdown by no. of args:
1 args 145 2.97%
2 args 1397 28.57%
3 args 1491 30.49%
4 args 969 19.82%
5 args 452 9.24%
6 args 267 5.46%
7 args 98 2.00%
8 args 29 0.59%
9 args 21 0.43%
10 args 11 0.22%
11 args 4 0.08%
12 args 4 0.08%
14 args 1 0.02%
15 args 1 0.02%
what percentage of funcs have?
3 or less args 62.02%
4 or less args 81.84%
5 or less args 91.08%
6 or less args 96.54%
7 or less args 98.55%
8 or less args 99.14%
total no. of funcs: 2256
breakdown by no. of args:
1 args 508 22.52%
2 args 665 29.48%
3 args 492 21.81%
4 args 381 16.89%
5 args 140 6.21%
6 args 49 2.17%
7 args 17 0.75%
8 args 4 0.18%
what percentage of funcs have?
3 or less args 73.80%
4 or less args 90.69%
5 or less args 96.90%
6 or less args 99.07%
7 or less args 99.82%
8 or less args 100.00%
total no. of funcs: 7146
breakdown by no. of args:
1 args 653 9.14%
2 args 2062 28.86%
3 args 1983 27.75%
4 args 1350 18.89%
5 args 592 8.28%
6 args 316 4.42%
7 args 115 1.61%
8 args 33 0.46%
9 args 21 0.29%
10 args 11 0.15%
11 args 4 0.06%
12 args 4 0.06%
14 args 1 0.01%
15 args 1 0.01%
what percentage of funcs have?
3 or less args 65.74%
4 or less args 84.63%
5 or less args 92.92%
6 or less args 97.34%
7 or less args 98.95%
8 or less args 99.41%
First thing to note is that in general the functions in C projects have one arg more than in C++ projects.
My guess to why that is would be that most C functions look like this:
struct MyType {};
void mytype_do_something(struct mytype* self);
In my opinion, the "self" arg shouldn't be counted and therefore, in general, when we talk about C functions with 5 args we are actually talking about the other 4 args.
So, what limit should we choose?
Based on the data above, I would choose the following rule:
If the no. of args is:
> 8
: error! fix your code<= 8 and > 5
: warning! maybe try refactoring?<= 5
: all is good!Why?
Because, as we see above, the % of functions with:
This rule, in my opinion, would give the developers
enough autonomy to decide if a function needs refactoring
or not(<= 8 and > 5
) and it also sets a hard limit which
cannot be crossed(> 8
).