512 * 8192 is only 4 MB. You're only giving it 4 MB for its shared buffer cache and you've got 1GB of RAM??? How miserly of you! :)
You've got two million rows and 30 columns per row - try giving it enough shared buffer space so the whole table fits in memory. The first count(*) won't be any faster, but subsequent ones will.
PG is particularly stupid about sequential scans and its LRU cache. It turns out that if the table is just one block longer than your buffer cache, then each sequential scan will read every block of the table from the disk, no blocks will come from cache. Oracle and probably other commercial dbs (at least some of them) try to mitigate against this disaster by limiting the use of cache for sequential scans, but PG doesn't.
So it is extremely important to have your buffer cache larger than your largest table if at all possible, at least if you expect to do things like count the entries.
Yes, the filesystem buffers too, but PG still has to read/write the blocks from the buffer cache and this involved library and kernel overhead, and of course the memory bandwidth to shuffle those bytes back and forth between the filesystem cache and PG's buffer cache.
16 MB for your sort buffers is too low, too, since the "group by" sorts the result table in order to pick out the duplicates as it groups.
Of course, your server may already be devoting all its memory to something serious like whomping up web pages like crazy, in which case you'll have to live with slower query times for your log reports.