Fyo,
I've never tried Gaussian on an x86 computer, so I have no idea what the specific performance characteristics are like
In DFT with gaussians LCAO, you essentially have 3 steps: (a) doing integrals, which involves evaluation of the Error function and recursion relations, plus contraction of something that looks like matrix-times-vector; (b) exchange-correlation quadrature, which involves things like DEXP, DSIN, ERF, etc, plus polynomial evaluations; and (c) diagonalization of the Hamiltonian matrix. This is a pretty nice combo. In the benchmarks I was mentioning, the P4 did extremely well on PRISM (integrals), where memory bandwidth could be an important issue ...
Mainly CI, MC-SCF and (more recently) Coupled Cluster.
You are on the right track! Why bother with CAS, CI, or QCI if you can do CCSD(T)! [I've made a contribution or two to the literature on the latter ;-)
fp |