EMSCAD logoEmployment Scam Aegean Dataset

University of the Aegean

Laboratory of Information & Communication Systems Security

Brief

The Employment Scam Aegean Dataset (EMSCAD) is a publicly available dataset containing 17,880 real-life job ads that aims at providing a clear picture of the Employment Scam problem to the research community and can act as a valuable testbed for scientists working on the field. Our first publication is available online by MDPI Future Internet Journal.

EMSCAD records were manually annotated and classified into two categories. More specifically, the dataset contains 17,014 legitimate and 866 fraudulent job ads published between 2012 to 2014.

Emails, phones and URLs found in texts were masked via the pattern #(EMAIL|PHONE|URL)_Keyed_SHA2#.



Dataset description

String

NameDescription
TitleThe title of the job ad entry.
LocationGeographical location of the job ad.
DepartmentCorporate department (e.g. sales).
Salary rangeIndicative salary range (e.g. $50,000-$60,000)

HTML fragment

Company profileA brief company description.
DescriptionThe details description of the job ad.
RequirementsEnlisted requirements for the job opening.
BenefitsEnlisted offered benefits by the employer.

Binary

TelecommutingTrue for telecommuting positions.
Company logoTrue if company logo is present.
QuestionsTrue if screening questions are present.
FraudulentClassification attribute.
In balancedSelected for the balanced dataset

Nominal

Employment typeFull-type, Part-time, Contract, etc.
Required experienceExecutive, Entry level, Intern, etc.
Required educationDoctorate, Master’s Degree, Bachelor, etc.
IndustryAutomotive, IT, Health care, Real estate, etc.
FunctionConsulting, Engineering, Research, Sales etc.

Downloads

Until now, the following universities, research labs and companies have requested this dataset: