Data organization refers to:
Good data organization considers:
You'll want to plan and use a file and folder organizational structure that is informative and keeps all the research information and materials associated with a project together.
Here are some tips for organizing directories:
Directory Structure Examples:
| Organized by File Type | Organized by Analysis |
|
|
File Naming
Make file names unique. Avoid using the same file name in different folders—this can cause confusion or even data loss.
Include key details in the file name to describe what it is. This might include:
Date (use YYYMMDD or YY-MM-DD for easy sorting)
Project name
Experiment or sample name
Data type (raw, processed, final, notes, etc.)
Instrument or location
Author or creator’s name
Version number (e.g., v01, v02)
Be clear and consistent:
Use short, meaningful terms or abbreviations
Use CamelCase, dashes ( - ) or underscores ( _ ) to separate words
Avoid spaces and special characters
Use leading zeros for numbers (e.g., 01, 02, 03... 10) so files sort properly
Arrange details from general to specific to help files group logically in folders.
Keep names to under 50 characters if possible, for easier reading and compatibility.
Record your naming rules and apply them consistently.
Data documentation is the information you create to explain your research data so that others (and your future self) can understand and use it correctly. It includes details about how the data were collected, what each variable means, how files are organized, and any processing steps that were taken. Common forms of documentation include README files, data dictionaries, and codebooks. Clear, consistent documentation is essential for ensuring your data is reusable, reproducible, and meaningful over time.
Research data doesn’t explain itself. Without context, it can be difficult—or even impossible—for others (or even you in the future) to understand what the data means, how it was collected, or how it should be used. That’s where metadata comes in.
Metadata is descriptive information about your data. It tells the who, what, when, where, why, and how behind your research. Think of it as the user manual for your dataset.
Good metadata should include:
The purpose of the project or study
Who collected or created the data
When and where the data was collected
What instruments, tools, or software were used
How the data was processed or transformed
Descriptions of variables, file formats, and units of measurement
Any limitations, assumptions, or known issues with the data
Creating thorough metadata helps:
You remember the details of your work months or years later
Others understand and reuse your data properly
Make your data easier to find in repositories or through search engines
Support long-term preservation and reproducibility
There are many established metadata standards designed for different disciplines. Find them via the links below.
README files explain the contents of your data files. It tells others (or your future self) what the files are, how they were created, and how to use them. This ensures your data can be interpreted correctly. They might include:
Data dictionaries describe each variable in your dataset. It explains:
This makes your data easier to understand, analyze, and reuse.
Codebooks are sometimes used in place of a data dictionary, but can also mean something more detailed. Codebooks can include:
Codebooks are extremely helpful when working with complex surveys or instruments.
Researchers are increasingly publishing their research protocols, particularly when publisher word counts don't allow you to describe methods in enough detail so others can recreate them. Here are links to places where protocols can be discovered or published.