Skip to main content

Building a Project Management Tool: Django CRUD with PostgreSQL and ETL Integration

·1366 words·7 mins
Table of Contents

Managing infrastructure projects efficiently requires robust data tracking and seamless information flow between teams. In this post, I’ll walk you through how I built a comprehensive project management system using Django, PostgreSQL, and Pentaho Kettle to address these challenges in a real-world scenario.

The Challenge
#

Infrastructure projects involve complex hierarchical relationships, multiple stakeholders, various contracts, and constantly changing data. Traditional spreadsheet-based tracking systems often fall short when it comes to:

  • Managing project hierarchies and dependencies
  • Tracking staff assignments across multiple projects
  • Maintaining data integrity across different systems
  • Providing real-time visibility into project status
  • Facilitating efficient batch updates from external data sources

System Architecture Overview
#

The solution I developed consists of three main components:

  1. Django Web Application - Provides the user interface and business logic
  2. PostgreSQL Database - Stores project data with robust referential integrity
  3. Pentaho Kettle ETL Pipeline - Handles batch updates and data synchronization

Database Design and Models
#

The heart of the system lies in a well-designed database schema that captures the complex relationships in infrastructure project management.

Core Models
#

class Program(models.Model):
    program_id = models.CharField(max_length=50, primary_key=True)
    program_name = models.CharField(max_length=255)
    description = models.TextField(blank=True, null=True)

Programs represent high-level organizational units that group related projects together. This hierarchical approach allows for better resource allocation and strategic oversight.

class Project(models.Model):
    project_id = models.CharField(max_length=50, primary_key=True)
    parent_project = models.ForeignKey('self', on_delete=models.SET_NULL, null=True, blank=True)
    program = models.ForeignKey('Program', on_delete=models.CASCADE)
    title = models.CharField(max_length=255)
    activity_status = models.CharField(max_length=50, choices=[('Active', 'Active'), ('Inactive', 'Inactive')])
    start_date = models.DateField()
    end_date = models.DateField(null=True, blank=True)
    current_phase = models.CharField(max_length=255, blank=True, null=True)
    project_manager = models.ForeignKey('Staff', on_delete=models.SET_NULL, null=True)

The Project model is the centerpiece, featuring:

  • Self-referencing relationships for handling sub-projects
  • Program association for organizational hierarchy
  • Flexible date handling for ongoing projects
  • Manager assignment for accountability

Staff and Contract Management
#

class Staff(models.Model):
    name = models.CharField(max_length=255, unique=True)
    bureau = models.CharField(max_length=255, blank=True, null=True)
    email = models.EmailField(unique=True)

class ProjectStaff(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE)
    staff = models.ForeignKey(Staff, on_delete=models.CASCADE)
    role = models.CharField(max_length=255)
    
    class Meta:
        unique_together = ('project', 'staff', 'role')

The many-to-many relationship between projects and staff allows for flexible role assignments while maintaining data integrity through unique constraints.

PostgreSQL Configuration
#

Leveraging PostgreSQL’s robustness was crucial for this enterprise-level application:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'postgres',
        'USER': 'postgres',
        'PASSWORD': '75395123',
        'HOST': 'localhost',
        'PORT': '5432',
    }
}

PostgreSQL provides:

  • ACID compliance for data integrity
  • Advanced indexing for fast queries
  • JSON field support for flexible data structures
  • Excellent Django ORM integration

Building the CRUD Interface
#

Class-Based Views for Efficiency
#

I implemented a mix of function-based and class-based views to handle different complexity levels:

def project_list(request):
    projects = Project.objects.all()
    return render(request, 'projects/project_list.html', {'projects': projects})

def project_create(request):
    if request.method == 'POST':
        form = ProjectForm(request.POST)
        if form.is_valid():
            form.save()
            return redirect('project_list')
    else:
        form = ProjectForm()
    return render(request, 'projects/project_form.html', {'form': form})

For more complex operations, class-based views provided better organization:

class ProjectStaffManageView(ListView):
    model = ProjectStaff
    template_name = 'projects/project_lvl_staff_list.html'
    context_object_name = 'project_staff'

    def get_queryset(self):
        return ProjectStaff.objects.filter(
            project_id=self.kwargs['pk']
        ).select_related('staff')

Form Design with Crispy Forms
#

Django Crispy Forms integration provided clean, Bootstrap-compatible forms:

class ProjectForm(forms.ModelForm):
    class Meta:
        model = Project
        fields = ['project_id', 'parent_project', 'program', 'title', 
                 'activity_status', 'start_date', 'end_date', 'current_phase', 
                 'project_manager']
        widgets = {
            'start_date': forms.DateInput(format=('%Y-%m-%d'), attrs={'type': 'date'}),
            'end_date': forms.DateInput(format=('%Y-%m-%d'), attrs={'type': 'date'}),
        }

Frontend: Bootstrap for Responsive Design
#

The user interface leverages Bootstrap 5 for a modern, responsive experience:

<nav class="navbar navbar-expand-lg navbar-light bg-light">
    <div class="container-fluid">
        <a class="navbar-brand" href="{% url 'home' %}">PMB Project List</a>
        <div class="collapse navbar-collapse" id="navbarNav">
            <ul class="navbar-nav">
                <li class="nav-item">
                    <a class="nav-link" href="{% url 'project_list' %}">Projects</a>
                </li>
                <li class="nav-item">
                    <a class="nav-link" href="{% url 'project_staff_list' %}">All Project Staff</a>
                </li>
            </ul>
        </div>
    </div>
</nav>

DataTables Integration
#

For enhanced data presentation, I integrated DataTables for sorting, filtering, and pagination:

$(document).ready(function() {
    $('#projectsTable thead tr:eq(1) th').each( function (i) {
        var title = $(this).text();
        $(this).html( '<input type="text" placeholder="Search '+title+'" />' );
        
        $( 'input', this ).on( 'keyup change clear', function () {
            if ( table.column(i).search() !== this.value ) {
                table.column(i).search( this.value ).draw();
            }
        } );
    } );

    var table = $('#projectsTable').DataTable({
        orderCellsTop: true,
        fixedHeader: true,
    });
});

This provides users with powerful search and filtering capabilities directly in the browser.

ETL Integration with Pentaho Kettle
#

While the Django application handles day-to-day CRUD operations, Pentaho Kettle (PDI) manages batch data operations and external system integration.

Why Pentaho Kettle?
#

  1. Visual ETL Design - Drag-and-drop interface for complex transformations
  2. Database Connectivity - Native PostgreSQL support
  3. Scheduling Capabilities - Automated batch processing
  4. Error Handling - Robust logging and error recovery
  5. Performance - Optimized for large dataset processing

ETL Pipeline Architecture
#

The ETL process typically follows this pattern:

  1. Extract - Pull data from external sources (CSV files, other databases, APIs)
  2. Transform - Clean, validate, and format data according to business rules
  3. Load - Insert or update records in the PostgreSQL database
  4. Validation - Verify data integrity and generate reports

Benefits of the ETL Approach
#

  • Batch Processing - Handle large volumes of data efficiently
  • Data Validation - Implement business rules before database insertion
  • Audit Trail - Track all data changes and transformations
  • Decoupled Architecture - ETL processes run independently of the web application

Key Features and Functionality
#

1. Hierarchical Project Management
#

  • Parent-child project relationships
  • Program-level organization
  • Flexible project phase tracking

2. Staff Assignment Management
#

  • Multiple roles per person per project
  • Bureau-based organization
  • Email-based communication integration

3. Contract Tracking
#

  • Multiple contracts per project
  • Vendor and contract manager tracking
  • Date-based contract lifecycle management

4. Advanced Search and Filtering
#

  • Column-based filtering in DataTables
  • Real-time search across all project attributes
  • Export capabilities for reporting

5. Responsive Design
#

  • Mobile-friendly Bootstrap interface
  • Progressive enhancement for desktop users
  • Accessibility considerations

Performance Optimizations
#

Database Level
#

  • Select Related - Reduce N+1 query problems
  • Database Indexing - Optimize frequently searched fields
  • Query Optimization - Use Django’s ORM efficiently
def get_queryset(self):
    return ProjectStaff.objects.filter(
        project_id=self.kwargs['pk']
    ).select_related('staff')  # Prevents N+1 queries

Application Level
#

  • Caching Strategy - Redis for session and query caching
  • Pagination - Handle large datasets efficiently
  • Lazy Loading - Load related data only when needed

Security Considerations
#

Data Protection
#

  • SQL Injection Prevention - Django ORM provides automatic protection
  • CSRF Protection - Built-in Django middleware
  • Input Validation - Form-based validation at multiple layers
  • Access Control - User authentication and authorization

Database Security
#

  • Connection Encryption - SSL connections to PostgreSQL
  • Least Privilege Access - Database users with minimal required permissions
  • Regular Backups - Automated backup strategies with point-in-time recovery

Lessons Learned
#

What Worked Well
#

  1. Django’s ORM made complex relationships manageable
  2. Bootstrap provided rapid UI development
  3. PostgreSQL handled complex queries efficiently
  4. Crispy Forms streamlined form development
  5. DataTables enhanced user experience significantly

Areas for Improvement
#

  1. API Development - Future integration with mobile applications
  2. Real-time Updates - WebSocket integration for live data updates
  3. Advanced Reporting - More sophisticated analytics and dashboards
  4. Microservices Architecture - Break down monolithic structure
  5. Container Deployment - Docker-based deployment strategy

Future Enhancements
#

Planned Features
#

  • REST API - Enable third-party integrations
  • Advanced Analytics - Project performance metrics and trends
  • Document Management - File upload and version control
  • Notification System - Email and in-app notifications
  • Mobile Application - Native iOS/Android apps

Technical Improvements
#

  • Async Processing - Celery for background tasks
  • Caching Layer - Redis for improved performance
  • Load Balancing - Handle increased user load
  • Monitoring - Application performance monitoring
  • CI/CD Pipeline - Automated testing and deployment

Conclusion
#

Building this project management tool taught me valuable lessons about enterprise application development. The combination of Django’s robust framework, PostgreSQL’s reliability, and Pentaho Kettle’s ETL capabilities created a powerful solution for infrastructure project tracking.

The key to success was:

  • Understanding the domain - Infrastructure projects have unique requirements
  • Choosing the right tools - Each technology serves a specific purpose
  • Planning for scalability - Building with future growth in mind
  • User-centric design - Focusing on actual user workflows
  • Data integrity - Ensuring reliable information for decision-making

This project demonstrates how modern web technologies can solve real-world business problems while providing a foundation for future enhancements. The modular architecture allows for incremental improvements while maintaining system stability.

Whether you’re building a similar project management system or exploring Django for enterprise applications, I hope this walkthrough provides valuable insights into the development process and architectural decisions that make such systems successful.


Want to learn more about Django, PostgreSQL, or ETL processes? Feel free to reach out or check out the related articles in my development series.