Collecting data is not only a technical process — it is an ethical responsibility. Data collection must follow legal, regulatory, and moral guidelines to ensure user privacy, secure handling, and responsible usage. This submodule covers the fundamentals of how data is collected and the principles that govern ethical data practices.
Data Collection Methods
Data can be collected through multiple methods depending on the nature of the project:
1. Direct Collection
Users voluntarily share data via forms, surveys, applications, or feedback mechanisms.
Examples: on-boarding details, demographic information, customer surveys.
2. Automated System Logs
Digital platforms record interactions automatically.
Examples: clickstream, app usage, scroll depth, transaction logs.
3. Sensor-Based Collection
IoT devices collect data passively.
Examples: temperature sensors, GPS data, machine logs.
4. Third-Party Data Acquisition
Organizations purchase or license data.
Examples: credit bureau data, market datasets.
Each method has its own challenges in terms of accuracy, completeness, and compliance. Understanding how data enters the system helps ensure that preprocessing is correct and ethical boundaries are maintained.
Ethical Considerations in Data Collection
Ethics ensures data science is responsible, fair, and respects user rights. Data misuse can lead to loss of trust, legal penalties, reputational damage, and biased models.
Key ethical principles include:
1. Transparency
Users must know what data is being collected and why. Hidden or unclear data practices violate trust and compliance laws.
2. Consent
Collection must be based on explicit, informed, and freely given consent. Consent also needs to be revocable at any time.
3.Data Minimization
Collect only the data that is necessary. Unnecessary data increases risk and storage cost and complicates compliance.
4. Privacy and Anonymization
Personal data must be anonymized or pseudonymized wherever possible to protect identities and reduce privacy risks.
5. Fairness and Bias Prevention
Data must be collected in a way that does not disadvantage any group. Biased data leads to biased models, which can propagate unfair outcomes.
6. Security and Compliance
Organizations must follow laws such as GDPR, CCPA, HIPAA, or local privacy regulations. Strong encryption, access control, and monitoring are essential.
7. Responsible Usage
Data collected for one purpose must not be reused for another without consent. For example, using app data for targeted advertising without permission violates ethical boundaries.