Fake Online Reviews

We’ve had means of expressing our opinions on various things for a long time.  Amazon has had reviews of the books pretty much since the beginning.  But how do we know that the reviews are real?  Virus writers took the opportunity presented by Amazon to trash my books when they were published.  (Even though they used different names, it only took a very simple form of forensic linguistics to figure out the identities.)

More recently, review spam has become more important, since many people are relying on the online reviews when buying items or booking services.  A number of “companies” have determined that it is more cost effective to have bots or other entities flood the review systems with fake positive reviews than it is to make quality products or services.  So, some nice people from Cornell university produced and tested some software to determine the fakes.

Note that, from these slides, there is not a lot of detail about exactly how they determine the fakes.  However, there is enough to indicate that sophisticated algorithms are less accurate than some fairly simple metrics.  When I teach about software forensics (aspects of which are similar to forensic lingusitics, or stylistic forensics), this seems counterintuitive and surprises a lot of students.  Generally they object that, if you know about the metircs, you should be able to avoid them.  In practice, this doesn’t seem to be the case.  Simple metrics do seem to be very effective in both forensic linguistics, and in software forensics.

Share