EMSCAD logoEmployment Scam Aegean Dataset

University of the Aegean

Laboratory of Information & Communication Systems Security


The Employment Scam Aegean Dataset (EMSCAD) is a publicly available dataset containing 17,880 real-life job ads that aims at providing a clear picture of the Employment Scam problem to the research community and can act as a valuable testbed for scientists working on the field. Our first publication is available online by MDPI Future Internet Journal.

EMSCAD records were manually annotated and classified into two categories. More specifically, the dataset contains 17,014 legitimate and 866 fraudulent job ads published between 2012 to 2014.

Emails, phones and URLs found in texts were masked via the pattern #(EMAIL|PHONE|URL)_Keyed_SHA2#.

Dataset description


TitleThe title of the job ad entry.
LocationGeographical location of the job ad.
DepartmentCorporate department (e.g. sales).
Salary rangeIndicative salary range (e.g. $50,000-$60,000)

HTML fragment

Company profileA brief company description.
DescriptionThe details description of the job ad.
RequirementsEnlisted requirements for the job opening.
BenefitsEnlisted offered benefits by the employer.


TelecommutingTrue for telecommuting positions.
Company logoTrue if company logo is present.
QuestionsTrue if screening questions are present.
FraudulentClassification attribute.
In balancedSelected for the balanced dataset


Employment typeFull-type, Part-time, Contract, etc.
Required experienceExecutive, Entry level, Intern, etc.
Required educationDoctorate, Master’s Degree, Bachelor, etc.
IndustryAutomotive, IT, Health care, Real estate, etc.
FunctionConsulting, Engineering, Research, Sales etc.


Until now, the following universities, research labs and companies have requested this dataset: