Undirected email correspondence between users of a large organization with over 1,000 individuals for four consecutive years (2007-2010)

Undirected email correspondence between users of a large organization with over 1,000 individuals for four consecutive years (2007-2010). For this period, we have information of the sender, the receiver and the total amount of emails sent within the organization using the corporate email address. To preserve users' privacy, individuals are completely anonymized and we do not have access to email content (see Ethics statement).

The data is in the following format:

user1ID user2ID #emails

Where #emails is the total amount of emails exchanged (sent and received) in one natural year. The files are separated by years.

Ethics statement

This data is exempt from IRB review because: i) The research involves the study of existing data--email logs from 2007 to 2010, which the IT service of the organization archived routinely, as mandated by law; ii) The information is recorded by the investigators in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects. Indeed, subjects were assigned a "hash" by the IT service prior to the start of our research, so that none of the investigators can link the "hash" back to the subject. We have no demographic information of any kind, so de-anonymization is also impossible.

Archival

The dataset is permanently stored at Figshare.

If you use this dataset in a publication, please cite the following articles:
  • Long-term email network: Email network of an organization for 4 years | Download