REVSYS Bloghttps://www.revsys.com/blog/feeds/latest/Latest News and Tips from REVSYSen-usWed, 24 Dec 2025 15:39:00 +0000Happy Holidays from REVSYShttp://www.revsys.com/tidbits/happy-holidays-from-revsys/’Twas the deploy before Christmas, when all through the officeNot a pipeline was failing, not even a bug.The containers were built by the runner with care,In hopes that green checkmarks soon would be there.Wed, 24 Dec 2025 15:39:00 +0000http://www.revsys.com/tidbits/happy-holidays-from-revsys/

Django Christmas Pony

’Twas the deploy before Christmas, when all through the office
Not a pipeline was failing, not even a bug.
The containers were built by the runner with care,
In hopes that green checkmarks soon would be there.

The devs were all nestled, eyes burning bright,
While Postgres ran queries all snug through the night.
And I in my hoodie, with coffee in hand,
Had just kicked off migrations like “this should go as planned…”

When out in the cluster there arose such a clatter,
I sprang to the logs to see what was the matter.
Away to kubectl I flew like a flash,
kubectl get pods and I gasped at the crash.

The moon on the YAML, so cruelly aglow,
Gave luster to objects in CrashLoopBackOff below.
When what to my weary eyes should appear,
But a brand-new Ingress and a cert-manager deer.

With a little old operator, so lively and quick,
I knew in a moment: it’s GitOps’ old trick.
More rapid than eagles the webhooks they came,
And it whistled and shouted and called them by name:

“Now Django! Now Gunicorn! Now Celery, hurry!
On Redis! On Nginx! No time to be blurry!
To the top of the node pool! To the edge of the firewall!
Now roll out! roll out! roll out!

As requests that before the load balancer fly,
When they meet with an outage, they spin and retry
So up to the Service the packets they flew,
With headers and cookies and CSRF too!

And then, in a twinkling, I heard on the page
The clickity-clack of a React built in rage.
As I drew in my breath and was turning around,
HTMX swapped the div with barely a sound.

It was dressed all in Tailwind, from head to toe,
And its DOM was all tidy—no extra “hello.”
A bundle of modules it had flung on its back,
And it looked like a vendor just opening a pack.

The design how it twinkled! JS console how merry!
Its warnings were minor, not scary, not hairy.
Its droll little tooltips were perfectly placed,
And the lint on its beard had been neatly erased.

It spoke not a word, but went straight to its work,
Fixed CORS and SameSite (no need to be a jerk),
Then laying a finger aside of its nose,
It patched my .env and up prod it rose.

It sprang to the cluster, to its team gave a Slack message,
And away they all scaled like a well-tuned missile.
But I heard it exclaim, ere it rolled out of sight:

“Happy Christmas to all and to all, and an outage free night!”

]]>/>
Sometimes LFU > LRUhttp://www.revsys.com/tidbits/sometimes-lfu-lru/Stop letting bot traffic evict your customers' sessions. A simple Redis configuration switch from LRU to LFU solved our crawler problem, with a Django configuration example.Sun, 17 Aug 2025 21:46:00 +0000http://www.revsys.com/tidbits/sometimes-lfu-lru/

We've seen a dramatic increase in malicious AI/web crawlers with our clients.

These crawlers aren't honoring things like robots.txt or headers to indicate a page shouldn't be crawled (think links to facets like those in search results) that cause crawlers to get lost in a web of mostly useless or duplicated content.

On top of that annoyance they're making tens if not hundreds of requests per second for hours at a time.

These sites are reasonably fast and in many cases are behind awesome cache systems like Cloudflare, so why is it impacting our clients negatively?

User sessions! These bots are creating thousands of new sessions that push the real human user's sessions out of the cache.

We're using Django's Sessions , but this technique works for any browser session token where you want to retain the sessions of real human users for as long as possible and expire out these robotic sessions.

Let's talk about sessions

Our default Django site template defaults to using the cached_db backend. This tries to get the session from Django's default cache (redis in most cases), if it's not found it is retrieved from the database and stored in the cache for the next request.

Because of these asshole crawlers we've had to rethink this default and instead only keep the session in a cache for some of our customers.

On most sites it doesn't really matter if the session data is retained. The worst case being the user is forced to log in again.

With other sites, the user's session is absolutely crucial to the business. A simple example is we have e-commerce clients who want to provide a nice experience to their users. They want a random Internet user to be able to add items to their cart over time and only make them login to actually checkout.

You need a session for that and you need that session to persist in cache for a reasonable amount of time.

Default Redis

By default Redis will use as much memory as possible and not expire keys that have a TTL set, but when it can no longer store keys in RAM it will fail to allow you to create a new session. This often causes whatever session system you're using to raise an exception and your site is broken.

LRU (Least Recently Used) cache expiration is probably the most commonly known way to expire data. It's built into Python and it is the default for systems like memcached. You could configure or customize your session handling to use an LRU mechanism to keep RAM usage under control, but then you still have a problem.

When you get a half million bogus AI requests in an hour and you only have RAM to hold say half of that, your real human customer who painstakingly curated the perfect cart of high dollar premium widgets but had the audacity to take a break to make dinner returns to their browser to find an empty cart and a fresh session.

And unless you have a monopoly on widgets, they're likely to take their business elsewhere. I mean, if you can't keep my cart for an hour why would I trust you with more complicated things like widget manufacturing and shipping logistics!?!?

LFU to the Rescue

I didn't know this until recently but Redis can be configured in a "volatile" mode which causes it to remove items even if the keys have TTLs on them. Redis provides several options here one of which is volatile-lru which is what most people would probably choose.

However, what we really need here is LFU (Least Frequently Used)! Let me explain.

Each of these bots generates a new session and uses it exactly once. This means that any regular human user who browses around a couple of pages will see their session remain for far longer. Frequent users, such as the website staff, likely would never have their session expire if not for the TTL.

To illustrate this a bit I used an LLM to help me quickly generate a demo of LRU vs LFU expiration in Redis .

The demo shows how we can keep shoving single use session keys into a Redis configured for LFU and retain our sessions which are used as little as a second time.

Configuring Redis for LFU

To adjust Redis for LFU you need to adjust your redis.conf with these three parameters:

maxmemory 9mb
maxmemory-policy volatile-lfu 
lazyfree-lazy-eviction yes

This constrains Redis to use at most 9MBs of memory. Obviously you're going to want more session storage than that, so adjust it to as much RAM as makes sense for your situation. The next two turn on LFU even if the key has a TTL set (volatile) and tells Redis to expire keys asynchronously rather than blocking during the expiration.

To do this in Docker Compose is fairly easy as well. You just need to write your own config and load it like this:

redis-lfu:
  image: "redis:8.2-alpine"
  command: redis-server /usr/local/redis/redis.conf
  ports:
    - "6001:6379"
  init: true
  stop_signal: SIGKILL
  volumes:
    - ./docker/redis-lfu.conf:/usr/local/redis/redis.conf:ro

Configuring Django Sessions for LFU

I ran into something I didn't think about when configuring Redis this way for a customer and want to point it out so you don't make the same mistake. Many Django sites use Redis and the Django cache for other things and NOT just sessions.
Celery is one good example of where this can go wrong. In that case, we actually deploy two small Redis instances one configured "normally" and one configured for LFU specifically for user sessions.

So our local compose has services like this:

  redis:
    image: "redis:8.2-alpine"
    ports:
      - "6000:6379"
    init: true
    stop_signal: SIGKILL

  redis-sessions:
    image: "redis:8.2-alpine"
    command: redis-server /usr/local/redis/redis.conf
    ports:
      - "6001:6379"
    init: true
    stop_signal: SIGKILL
    volumes:
      - ./docker/redis-lfu.conf:/usr/local/redis/redis.conf:ro

Obviously you'll need to do something slightly different in Kubernetes or however you're ultimately deploying your production applications.

And our Django settings cache and session changes look like this:

# Define two different caches
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": f"redis://redis:6379/0",
    },
    "sessions": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": f"redis://redis-sessions:6379/0",
    },
}

# Use cached sessions and specifically the LFU configured "sessions" cache 
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "sessions"

Hopefully knowing that Redis can support LFU key expiration proves helpful to you!

P.S. Dear AI/web bot crawlers, can we chat? I know you're in a hurry—gathering that big heap of data for your 'research' or racing to meet your boss's deadline—but maybe honor robots.txt? Or at least keep it to one request at a time?

The Internet might hate you less. Hell, we might even say something nice about your new cryptocurrency or whatever.

]]>/>
Boosting SEO with Django Ninja, Pydantic, and JSON-LDhttp://www.revsys.com/tidbits/boosting-seo-with-django-ninja-pydantic-and-json-ld/At REVSYS, our first attempt at adding JSON-LD to our sites relied on embedding the data in the Django template. For the most part, this has worked fine, and we've had good results from an SEO perspective. But in terms of maintainability, it has not been the most efficient approach. We have now started transitioning to generating the structured data using Django Ninja and Pydantic. As a result, we now have cleaner templates and better maintainability.Thu, 07 Aug 2025 17:44:00 +0000http://www.revsys.com/tidbits/boosting-seo-with-django-ninja-pydantic-and-json-ld/

To improve your website's search ranking, search engines rely on structured data. JSON-LD , which stands for JSON for Linking Data, is a special format that allows you to embed machine-readable information into your web pages.

How JSON-LD is used for SEO

JSON-LD is used for search engine optimization (SEO) because it provides search engines with explicit information about the content on your web pages. Instead of relying on algorithms to infer meaning from raw text and HTML, you can use JSON-LD to provide search engines with clear information about your content's meaning, creator, offerings, and relationships with other online resources.

Schema.org provides the basis for this structured information by serving as a universal dictionary of types and their associated properties. By using Schema.org types, you ensure that search engines like Google can understand the information you provide.

Why Structured Data is Important for SEO

  • Structured data enables you to use rich snippets such as star ratings, event times, and recipe instructions, in search results. These elements capture users' attention and visually invite the user to click on your site, improving your click rate.
  • It also informs Google and other search engines about the meaning of your page content. The search engine is better able to parse whether your link is to a product with price and reviews, or a blog post by an author.
  • Structured data can unlock special search features and improve your odds of appearing in knowledge panels, carousels, FAQ boxes, and voice search results.
  • All these features provide a better user experience : when people can preview key information directly in search results, they are more likely to click.
  • Your site stays ahead of the curve as search engines get more semantic and context-aware. Structured data helps future-proof your site by sligning more closely to a format that is easier for search engines to parse.

Adding JSON-LD metadata to your site

JSON-LD metadata is added to the <head> section of your HTML document through a script tag like this:

<script type="application/ld+json">
    <!-- Your structured data goes here -->
</script>

Having the JSON-LD data in the <head> section makes your HTML cleaner and easier to read for both humans and crawlers.

JSON-LD directly in the Django template

At Revsys, our first attempt at adding JSON-LD to our sites relied on embedding the data in the Django template. For the most part, this has worked fine and we've had good results from an SEO perspective. But in terms of maintainability, it has not been the most efficient approach. We have now started transitioning to generating the structured data using Django Ninja and Pydantic . As a result, we now have cleaner templates and better maintainability.

The code below illustrates how we used to embed our JSON-LD for our blog post page, with the data directly in the template:

{% block extra_head %}
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://www.revsys.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://www.revsys.com/blog/"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "{{ self.title|escapejs }}",
      "item": "https://www.revsys.com{{ request.path }}"
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "{{ self.title|escapejs }}",
  "description": "{{ self.get_description|escapejs }}",
  "datePublished": "{{ self.first_published_at|date:'c' }}",
  "dateModified": "{{ self.latest_revision_created_at|date:'c' }}",

  "author": {
    "@type": "Person",
    "name": "{{ self.get_author_name|escapejs }}"
    {% if self.author.specific.url %},"url": "{{ self.author.specific.url }}"{% elif self.author.specific.slug %},"url": "https://www.revsys.com{% routablepageurl self.get_parent.specific 'posts_by_author' self.author.specific.slug %}"{% endif %}

  },
  "publisher": {
    "@type": "Organization",
    "name": "REVSYS",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.revsys.com{% static 'images/revsys_logo_white.png' %}"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.revsys.com{{ request.path }}"
  }
}
</script>
{% endblock %}

This works, but we don't love this approach because:

  • Mixing JSON-LD logic with presentation logic makes templates cluttered , which are harder to read and maintain
  • There's no way to validate that your JSON-LD is properly formatted or follows schema.org standards
  • It's easy to make syntax errors in the JSON that break the structured data
  • Testing the JSON-LD is harder because the output requires rendering the entire template
  • The same schema logic gets duplicated across different templates

Generating JSON-LD with Pydantic and Django Ninja

We refactored our schema generation using Django Ninja and Pydantic. Now, instead of embedding logic in templates, we generate structured data server-side and pass it to templates as context variables.

This has given us several benefits:

  • Our code is more modular because we keep all schema generation logic in one file per app, making the codebase more organized and easier to navigate.
  • The Pydantic models we create are reusable , which is handy since many JSON-LD types use the same subtypes.
  • Utilizing Pydantic's type-checking and validation capabilities ensures that our structured data is valid and adheres to Schema.org standards, reducing the chance that we accidentally share invalid data with search engines.
  • Our SEO is future-proofed : with our centralized approach, expanding our schema to new content types is simpler and more manageable.

By moving schema generation out of templates and into Django Ninja and Pydantic, we have created a system that is both maintainable and developer-friendly.

We created a file schema.py that holds our Pydantic models that represent Schema.org types we need to use for the data we are turning into JSON-LD:

# schema.py
from typing import List, Optional
from pydantic import BaseModel, Field
from ninja import ModelSchema
from pydantic.config import ConfigDict

class PersonSchema(BaseModel):
    type: str = Field(default="Person", alias="@type")
    name: str
    url: Optional[str] = None


class OrganizationSchema(BaseModel):
    type: str = Field(default="Organization", alias="@type")
    name: str
    logo: "ImageObjectSchema"


class ImageObjectSchema(BaseModel):
    type: str = Field(default="ImageObject", alias="@type")
    url: str


class WebPageSchema(BaseModel):
    type: str = Field(default="WebPage", alias="@type")
    id: str = Field(alias="@id")

    model_config = ConfigDict(populate_by_name=True)


class ListItemSchema(BaseModel):
    type: str = Field(default="ListItem", alias="@type")
    position: int
    name: str
    item: str


class BreadcrumbListSchema(BaseModel):
    context: str = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="BreadcrumbList", alias="@type")
    itemListElement: List[ListItemSchema]


class BlogPostingSchema(BaseModel):
    context: Optional[str] = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="BlogPosting", alias="@type")
    headline: str
    description: Optional[str] = None
    datePublished: str
    dateModified: Optional[str] = None
    author: PersonSchema
    publisher: Optional[OrganizationSchema] = None
    mainEntityOfPage: Optional[WebPageSchema] = None
    url: Optional[str] = None
    blogPost: Optional[BaseModel] = None

Using Django Ninja's ModelSchema for automatic schema generation

One of the features of Django Ninja is ModelSchema , which automatically generates Pydantic schemas from your Django models. This is useful when you want to include model data in your JSON-LD without manually defining every field.

In our blog post implementation, we can use ModelSchema to automatically include blog page data alongside our structured schema:

# schema.py
from ninja import ModelSchema
import blog.models

class BlogPageSchema(ModelSchema):
    class Config:
        model = blog.models.BlogPage
        model_fields = [
            "title",
            "subtitle",
            "first_published_at",
            "category",
            "main_url",
            "main_url_text",
            "featured",
            "slug",
        ]

Then we integrate this ModelSchema into our blog posting schema generation:

def get_post_schema(post) -> str:
    author = PersonSchema(
        name=post.get_author_name() or "REVSYS"
    )

    # ... author URL logic ...

    schema = BlogPostingSchema(
        headline=post.title,
        description=post.get_description(),
        datePublished=post.first_published_at.isoformat(),
        author=author,
        publisher=publisher,
        mainEntityOfPage=main_entity,
        # Include the model data as additional structured information
        blogPost=BlogPageSchema.from_orm(post)
    )

    return schema.model_dump_json(by_alias=True, indent=2)

This approach gives you flexibility of JSON-LD schemas, and the convenience of automatically generated model schemas.

We then create helper functions to generate JSON-LD from our Pydantic models and update the schema.py :

# schema.py

def get_breadcrumb_schema(name: str, path: str, post_title: str = None) -> str:
    """Generate JSON-LD breadcrumb schema for navigation structure."""
    items = [
        ListItemSchema(
            position=1,
            name="Home",
            item="https://www.revsys.com/"
        ),
        ListItemSchema(
            position=2,
            name=name,
            item=f"https://www.revsys.com{path if not post_title else '/blog/'}"
        )
    ]

    if post_title:
        items.append(ListItemSchema(
            position=3,
            name=post_title,
            item=f"https://www.revsys.com{path}"
        ))

    schema = BreadcrumbListSchema(itemListElement=items)
    return schema.model_dump_json(by_alias=True, indent=2)


def get_post_schema(post: Any) -> str:
    """Generate JSON-LD schema for a blog post using Schema.org BlogPosting type."""
    author = PersonSchema(
        name=post.get_author_name() or "REVSYS"
    )

    if post.author and hasattr(post.author, 'specific'):
        author_specific = post.author.specific
        if hasattr(author_specific, 'url') and author_specific.url:
            author.url = author_specific.url
        elif hasattr(author_specific, 'slug') and author_specific.slug:
            parent_page = post.get_parent()
            if parent_page:
                author.url = f"https://www.revsys.com{parent_page.url}author/{author_specific.slug}/"

    publisher = OrganizationSchema(
        name="REVSYS",
        logo=ImageObjectSchema(
            url="https://www.revsys.com/static/images/2017/revsys_logo_white.png"
        )
    )

    main_entity = WebPageSchema(
        id=f"https://www.revsys.com{post.url}"
    )

    schema = BlogPostingSchema(
        headline=post.title,
        description=post.get_description(),
        datePublished=post.first_published_at.isoformat(),
        author=author,
        publisher=publisher,
        mainEntityOfPage=main_entity
    )

    if hasattr(post, 'latest_revision_created_at') and post.latest_revision_created_at:
        schema.dateModified = post.latest_revision_created_at.isoformat()

    return schema.model_dump_json(by_alias=True, indent=2)

The model_dump_json() method is a Pydantic feature that converts your schema objects into JSON strings. The arguments: by_alias=True ensures that field aliases (like @context , @type , @id ) are used instead of the Python field names and indent=2 : formats the JSON with proper indentation, making it readable as well as easier to debug.

Here's what the output looks like for a blog post:

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Building Better Django Apps with Pydantic",
  "description": "Learn how to integrate Pydantic with Django for better validation and cleaner code.",
  "datePublished": "2024-01-15T10:30:00",
  "url": "https://www.revsys.com/blog/building-better-django-apps-pydantic/",
  "author": {
    "@type": "Person", 
    "name": "Jane Developer",
    "url": "https://www.revsys.com/blog/author/jane-developer/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "REVSYS",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.revsys.com/static/images/2017/revsys_logo_white.png"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.revsys.com/blog/building-better-django-apps-pydantic/"
  }
}

Without by_alias=True , you would get Python field names like type instead of @type , which would break the JSON-LD standard.

The next step is to update our models to include schema generation in their context. At Revsys, we use Wagtail for our blog, so this example shows overriding the Page model's get_context method to add the JSON-LD schema elements we need for each blog post. If you are using regular Django models, you might create a get_schemas() method on your model, which you could then call from your Django view to pass the JSON-LD schemas into your context.

from blog.schema import get_breadcrumb_schema, get_post_schema
from wagtail.models import Page

class BlogPage(Page):
    def get_context(self, request):
        """Add JSON-LD schema data to the page context."""
        context = super().get_context(request)
        context["breadcrumb_schema"] = get_breadcrumb_schema("Blog", request.path, post_title=self.title)
        context["post_schema"] = get_post_schema(self)
        return context

In our template, we removed the raw JSON-LD code and replaced it with the context variables. Our updated template is now much cleaner.

{% block extra_head %}
{% if breadcrumb_schema %}
  <script type="application/ld+json">
    {{ breadcrumb_schema|safe }}
  </script>
{% endif %}

{% if post_schema %}
  <script type="application/ld+json">
    {{ post_schema|safe }}
  </script>
{% endif %}
{% endblock %}

Reuse one Pydantic model for multiple schemas

We can reuse many of the same components (like PersonSchema and OrganizationSchema ) in structured data for other pages, pages for our conference talks and presentations. This helps Google show them in event carousels and highlights. These are a great candidate for structured data, using the Event schema type. We can create a new EventSchema Pydantic model that makes use of our existing schemas, since we are following the types defined by Schema.org.

# schema.py
class PlaceSchema(BaseModel):
    type: str = Field(default="Place", alias="@type")
    name: str
    address: Optional[str] = None

class EventSchema(BaseModel):
    context: str = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="EducationEvent", alias="@type")
    name: str
    startDate: str
    location: Optional[PlaceSchema] = None
    performer: PersonSchema # from our first example
    organizer: OrganizationSchema # from our first example
    url: Optional[str] = None

Then, we add a new helper function for our Talk model:

# schema.py
def get_talk_schema(talk: Any) -> str:
    """Generate JSON-LD schema for a conference talk using Schema.org Event type."""
    speaker = PersonSchema(
        name=talk.speaker_name,
        url=getattr(talk, "speaker_url", None)
    )

    organizer = OrganizationSchema(
        name="REVSYS",
        logo=ImageObjectSchema(
            url="https://www.revsys.com/static/images/2017/revsys_logo_white.png"
        )
    )

    location = None
    if hasattr(talk, "venue_name"):
        location = PlaceSchema(
            name=talk.venue_name,
            address=getattr(talk, "venue_address", None)
        )

    schema = EventSchema(
        name=talk.title,
        startDate=talk.date.isoformat(),
        location=location,
        performer=speaker,
        organizer=organizer,
        url=f"https://www.revsys.com{talk.url}"
    )

    return schema.model_dump_json(by_alias=True, indent=2)

And update our Wagtail TalkPage model so we can add this new schema to the page context:

# models.py
from app.schema import get_talk_schema

class TalkPage(Page):
    def get_context(self, request):
        """Add event schema data to the page context for conference talks."""
        context = super().get_context(request)
        context["event_schema"] = get_talk_schema(self)
        return context

Make sure that the change also reflects on the template page for the talks:

{% block extra_head %}
  {% if event_schema %}
    <script type="application/ld+json">
      {{ event_schema|safe }}
    </script>
  {% endif %}
{% endblock %}
]]>/>
Better Django management commands with django-click and django-typerhttp://www.revsys.com/tidbits/better-django-management-commands-with-django-click-and-django-typer/Writing Django management commands can involve a ton of boilerplate code. But Revsys uses two libraries that cut our management command code in half while making it more readable and powerful: django-click and django-typer.Mon, 09 Jun 2025 21:49:00 +0000http://www.revsys.com/tidbits/better-django-management-commands-with-django-click-and-django-typer/

Writing Django management commands can involve a ton of boilerplate code. But Revsys uses two libraries that cut our management command code in half while making it more readable and powerful: django-click and django-typer .

With django-click, adding arguments is as simple as a decorator, and with django-typer, you get rich terminal output that can help you understand your data. My management commands are faster to develop, easier to test, and more pleasant to use.

I'll show you real examples of both libraries in action and share when we use each one for client projects. By the end, you'll understand why we reach for these libraries most of the time.

Why we write custom Django management commands at Revsys

Before we dive into the code, let's talk about management commands. At Revsys , I write management commands frequently. They're one of my most-used tools for handling the kinds of tasks that come up in real client projects.

Here are some typical use cases I encounter:

  • Data operations : I import CSV files from clients, export data for reports, or set up realistic test data for local development. Management commands make these repeatable and scriptable.
  • API integrations : Sometimes I need to manually trigger a webhook that's failed, or run a one-off API call that would normally happen automatically. Having a management command ready means I can handle these situations quickly without writing throwaway scripts.
  • Complex data transformations : When you've got thousands of records that need to go through a multi-step process (maybe updating related models, generating computed fields, or migrating data formats) a management command gives you a controlled environment to do that work.
  • Development and debugging : I'll write commands to reset data to a specific state, run quick reports to answer questions about data ("show me all users missing email addresses"), or test specific parts of the application in isolation.

The key insight is that management commands give you a way to write self-contained, reusable pieces of code that operate within your Django application's context. They've got access to your models, settings, and all your business logic, but they're separate from your request/response cycle.

Instead of writing one-off scripts that you'll lose track of, or cramming logic into the Django shell, management commands give you a proper home for all these ad-hoc but important tasks. The right tools can make writing them much more pleasant.

The standard Django management command experience (and why it's frustrating)

To understand why these libraries are helpful, let's look at what Django gives us out of the box . I'll use a movie data loader as an example:

# management/commands/load_movies_django.py
import json
from pathlib import Path

from django.conf import settings
from django.core.management.base import BaseCommand

from movies.utils import clear_movie_data, load_movies_from_data


class Command(BaseCommand):
    help = "Load movies data from JSON file into the database."

    def add_arguments(self, parser):
        parser.add_argument(
            "--file", default="data/movies.json", help="Path to movies JSON file"
        )
        parser.add_argument(
            "--clear", action="store_true", help="Clear existing data before loading"
        )
        parser.add_argument(
            "count", type=int, nargs="?", help="Number of movies to load (optional)"
        )

    def handle(self, *args, **options):
        if options["clear"]:
            self.stdout.write(self.style.WARNING("Clearing existing movie data..."))
            clear_movie_data()
            self.stdout.write(self.style.SUCCESS("Existing data cleared."))

        file_path = Path(settings.BASE_DIR) / options["file"]

        if not file_path.exists():
            self.stdout.write(self.style.ERROR(f"Error: File {file_path} not found"))
            return

        self.stdout.write(self.style.NOTICE(f"Loading movies from {file_path}..."))

        with open(file_path) as f:
            movies_data = json.load(f)

        count = options["count"]
        if count is not None:
            movies_data = movies_data[:count]
            self.stdout.write(self.style.NOTICE(f"Loading first {count} movies..."))

        total_created_movies, total_created_genres, total_created_cast = (
            load_movies_from_data(movies_data)
        )

        self.stdout.write(self.style.SUCCESS("\nLoading complete!"))
        self.stdout.write(self.style.SUCCESS(f"Created {total_created_movies} movies"))
        self.stdout.write(self.style.SUCCESS(f"Created {total_created_genres} genres"))
        self.stdout.write(
            self.style.SUCCESS(f"Created {total_created_cast} cast members")
        )

This works fine, but there's a lot of boilerplate and its output is pretty simple.

  • Class inheritance and method structure : You need to inherit from BaseCommand and implement specific methods before you can do anything useful.
  • Verbose argument setup : The add_arguments() method requires you to manually configure an argument parser. You have to specify types, defaults, and help text separately from where you'll use them.
  • Manual option parsing : Throughout your handle() method, you're constantly accessing options["key"] instead of having clean function parameters.
  • Basic styling : Django's built-in styling with self.style.SUCCESS() works but feels verbose and limited.

The business logic (loading movies from JSON) is buried under all this infrastructure code.

The output looks like this:

Screenshot 2025-06-09 at 2.41.12 PM

There are cleaner approaches.

Django-click: simpler command definition

django-click is a Django wrapper around the Click library. It transforms management commands from classes with methods into simple functions with decorators.

Installation and setup

pip install django-click

No configuration needed.

What I like about django-click

For me, django-click's appeal comes from a few key concepts:

  • Function-based commands : Instead of classes, you write a simple function decorated with @click.command() .
  • Decorator-driven arguments : Use @click.option() and @click.argument() decorators to define your command's interface right above the function, so it's easy to see what your arguments and options are at a glance.
  • Direct parameter access : Your function receives arguments as regular Python parameters, not through an options dictionary. It's a more intuitive way of handling arguments.
  • Built-in colorful output : click.secho() provides easy styled terminal output.
  • Automatic help generation : Click generates help text from your decorators and docstrings.

Personally, I really like the pattern of having the arguments or options be listed in the function definition and as decorators. It's very clear, it gives me an at-a-glance view of what my options are, and I immediately have those variables available to use like any other argument. The whole command feels more minimal and simpler than a standard Django management command, so the commands come together really quickly.

Real-world example: Movie import with django-click

Now let me show you that same command using django-click:

# management/commands/load_movies_click.py
import djclick as click
from django.conf import settings

from movies.utils import clear_movie_data, load_movies_from_data


@click.command()
@click.option("--file", default="data/movies.json", help="Path to movies JSON file")
@click.option("--clear", is_flag=True, help="Clear existing data before loading")
@click.argument("count", type=int, required=False)
def command(file, clear, count):
    """Load movies data from JSON file into the database."""

    if clear:
        click.secho("Clearing existing movie data...", fg="yellow")
        clear_movie_data()
        click.secho("Existing data cleared.", fg="green")

    file_path = Path(settings.BASE_DIR) / file

    if not file_path.exists():
        click.secho(f"Error: File {file_path} not found", fg="red", err=True)
        return

    click.secho(f"Loading movies from {file_path}...", fg="blue")

    with open(file_path) as f:
        movies_data = json.load(f)

    if count is not None:
        movies_data = movies_data[:count]
        click.secho(f"Loading first {count} movies...", fg="cyan")

    total_created_movies, total_created_genres, total_created_cast = (
        load_movies_from_data(movies_data)
    )

    click.secho("\nLoading complete!", fg="green", bold=True)
    click.secho(f"Created {total_created_movies} movies", fg="green")
    click.secho(f"Created {total_created_genres} genres", fg="green")
    click.secho(f"Created {total_created_cast} cast members", fg="green")

The actual work being done is identical, but the command structure is much cleaner.

The command definition is right there at the top with the decorators. The function signature tells you exactly what parameters you're working with, with no more options["key"] lookups. Here is the output (similar to regular Django):

Screenshot 2025-06-09 at 2.43.28 PM

A few other improvements:

  • Arguments vs Options : Notice that count is a positional argument while file and clear are optional flags. Click handles the difference automatically.
  • Colorful output : click.secho() with fg="green" is much cleaner than self.style.SUCCESS() .
  • Boolean flags : The is_flag=True parameter makes --clear work as a simple boolean flag.

Another cool django-click feature: the lookup utility

Django-click also includes a useful lookup utility for working with Django models. You can use it to accept model instances as command arguments:

import djclick as click
from myapp.models import User

@click.command()
@click.argument('user', type=click.ModelInstance(User))
def command(user):
    """Do something with a user."""
    click.echo(f"Processing user: {user.username}")

The click.ModelInstance(User) automatically handles lookups by primary key by default. You can also specify custom lookup fields:

# Lookup by username field
@click.argument('user', type=click.ModelInstance(User, lookup='username'))

This returns the actual User instance to your function, making it easy to work with Django models in your commands.

Django-typer: When you need beautiful output that helps you think

django-typer takes a different approach. Built on Typer , it uses Python type annotations to define command interfaces and includes the Rich library for beautiful terminal output.

Installation and setup

pip install django-typer

This brings in Typer. If you install with pip install django-typer[rich] , you will also get the Rich library and its capabilities, which we will go into below.

Key differences from django-click

  • Type annotation driven : Instead of decorators, you use Python type annotations with typer.Option() and typer.Argument() to define your interface.
  • Class-based but simpler : You can still inherit from a base class ( TyperCommand ), but the interface is much cleaner than standard Django commands. There is also a decorator available if you prefer that style.
  • Rich integration : Beautiful progress bars, tables, panels, and colorful output are easy to implement if you include Rich in your installation.
  • Better error handling : Typer provides more sophisticated error handling and validation.

Recently, I needed to dig into some messy client data and answer questions like "Do all the records that are missing a FK to this other model also share these other characteristics?" My goal was to figure out if I needed to write some custom code to "fix" some records I suspected were broken, or if there was a valid reason the records were in the state they were in.

With django-typer, I wrote a command that answered my questions and helped me identify patterns in my data. The structured output made it easier to spot patterns I might have missed in a plain text dump. Django-typer is great when you need output that helps you analyze data, not just dump it to the terminal.

Real-world example: Movie data import command

Converting our movie import command to django-typer shows how type annotations replace decorators:

from django_typer.management import Typer
app = Typer(help="Load movies data from JSON file into the database.")

@app.command()
def main(
    count: int | None = typer.Argument(None, help="Number of movies to load"),
    file: str = typer.Option("data/movies.json", help="Path to movies JSON file"),
    clear: bool = typer.Option(False, help="Clear existing data before loading"),
):
    if clear:
        typer.secho("Clearing existing movie data...", fg=typer.colors.YELLOW)
        clear_movie_data()
        typer.secho("Existing data cleared.", fg=typer.colors.GREEN)

    file_path = Path(settings.BASE_DIR) / file

    if not file_path.exists():
        typer.secho(f"Error: File {file_path} not found", fg=typer.colors.RED, err=True)
        raise typer.Exit(1)

    typer.secho(f"Loading movies from {file_path}...", fg=typer.colors.BLUE)

    with open(file_path) as f:
        movies_data = json.load(f)

    total_created_movies, total_created_genres, total_created_cast = (
        load_movies_from_data(movies_data)
    )

    typer.secho("\nLoading complete!", fg=typer.colors.GREEN, bold=True)

Pretty similar to the django-click version, just with type annotations instead of decorators. (I trimmed some logic for brevity, but you get the idea.)

You could strip the django-typer function definition down even further, like so:

def main(count: int = None, file: str = "data/movies.json", clear: bool = False):

Then, the function definition would very closely resemble any standard Python function. But then you lose the help text for your arguments and options, and you lose access to some of the extra validation that Typer can do on your behalf.

Making the movie import command output sparkle

If you need structured, visual output from a management command, django-typer can be helpful. If you install it with pip install django-typer[rich] and include the Rich integration, you can create very well-formatted output in your CLI.

from rich.console import Console
from rich.panel import Panel
from rich.progress import Progress, SpinnerColumn, TextColumn
from rich.table import Table

app = Typer(help="Load movies data from JSON file into the database.")

@app.command()
def main(
    # same function definition as before
):
    """Load movies data from JSON file into the database."""
    console = Console()

    # Display a pretty welcome banner
    console.print(Panel.fit("🎬 Movie Database Loader", style="bold blue"))

    with open(file_path) as f:
        movies_data = json.load(f)

    if count is not None:
        movies_data = movies_data[:count]
        console.print(f"🔢 Loading first [bold yellow]{count}[/bold yellow] movies...")

    # Add a progress bar
    with Progress(console=console) as progress:
        task = progress.add_task("🎭 Processing movies...", total=len(movies_data))
        total_created_movies, total_created_genres, total_created_cast = (
            load_movies_from_data(movies_data)
        )
        progress.update(task, completed=len(movies_data))

    # Add a table to summarize the output
    table = Table(title="📊 Loading Summary", style="green")
    table.add_column("Category", style="cyan", no_wrap=True)
    table.add_column("Count", style="magenta", justify="right")
    table.add_column("Icon", justify="center")

    table.add_row("Movies", str(total_created_movies), "🎬")
    table.add_row("Genres", str(total_created_genres), "🎭")
    table.add_row("Cast Members", str(total_created_cast), "👥")

    console.print()
    console.print(table)

Adding these elements gives you output like this:

Screenshot 2025-06-09 at 2.48.04 PM

Adding these elements from the Rich library shows how many elements you can add to your CLI output. The Rich elements I used were:

  • Rich panels : The welcome banner get displayed in a Panel with pretty borders
  • Progress indicators : We get a progress indicator via Rich's Progress class.
  • Beautiful tables : We used a Table to display our output in an organized and easy-to-read way.
  • Rich markup : We use familiar-sounding arguments like style and justify to style our output.

When to use django-click

If you prefer decorator syntax over type annotations, you want minimal dependencies in your project, you're already familiar with Click from other projects, you need the lookup utility for Django model integration, or you're writing simple commands that don't need fancy output, then django-click might be the library for you.

When to use django-typer

If you love type annotations and want automatic validation, you need beautiful output with minimal effort, you're building complex command suites with subcommands , you don't mind the extra dependencies, or you want Rich integration for tables, progress bars, and panels, then give django-typer a try.

Integration with Just

This is off the topic of django-typer and django-click, but I wanted to mention it: I often use Just to handle situations where I need to run multiple management commands in a specific way. When I set up commands for all three approaches:

# Load movies data into database
load-fresh-movie-data:
    just load-genres --clear 1000
    just load-people --clear 1000
    just load-movies --clear 1000

load-genres *args:
    docker compose exec web python manage.py load_genres {{args}}

load-people *args:
    docker compose exec web python manage.py load_people {{args}}

load-movies *args:
    docker compose exec web python manage.py load_movies {{args}}

This pattern lets you create shortcuts for your management commands, and link them together.

My honest take

I use django-click for most of the management commands I need to write. It's clean, fast to write, and gets out of my way. But when I need to build something that helps me understand complex data or provides structured feedback during long-running operations, django-typer is the better choice.

The next time you need to write a management command, try one of these libraries and let me know what you think!

]]>/>
Give Your Django Admin X‑Ray Vision—Automatic Dead‑Link Detectionhttp://www.revsys.com/tidbits/django-deadlink-detection-linkchecker/Stop link rot before users ever see it: this guide shows how one lightweight app scans your models, flags broken URLs, and slots neatly into any Django workflow.Thu, 29 May 2025 15:21:00 +0000http://www.revsys.com/tidbits/django-deadlink-detection-linkchecker/

Most Django projects store some sort of URLs. Whether it's in a URLField() on some of your models or deep in some TextField() s full of HTML.

We all know cool URLs don't change , but not all URLs are cool and some rot over time. How do you find out which of your links have disappeared on you?

Turns out there is a Django app for that named django-linkcheck . It will check your URLs on a schedule you set and give you a nice admin page to manage dealing with them.

Screenshot of django-linkcheck admin page

It helpfully gives you the status it received, links to recheck or ignore this item and quick 'Edit' links to take you right to the model object in question!

It hasn't had a release for awhile, but I can confirm it works just fine with Django 5.1.x. The docs are a bit sparse, so I stumbled around a little bit getting it to work. This post is here to save you from having to also stumble around in the dark.

Install and Setup

Installing the app is straight forward, you just add it to INSTALLED_APPS and include the special admin view in your main urls.py .

It then needs to create it's own tables so run:

$ ./manage.py migrate 

You then need to tell it which models and fields you want it to handle. The docs suggest you run:

$ ./manage.py linkcheck_suggest_config --model myapp.SomeModel > myapp/linklinks.py

Which is a great idea, but it's a bit broken actually. This generates something like:

from myapp.models import SomeModel

class SomeModelLinklist(Linklist):
    model = SomeModel
    url_fields = [myapp.SomeModel.url]
    ignore_empty = [myapp.SomeModel.url]
    object_filter = {"active": True}

linklists = {
    "SomeModel": SomeModelLinklist,
}

It fails to import Linklist for one, but the larger issue is it leads you believe you set the fields in the form <app>.<model>.<fieldname> when in fact it really needs to be just a list of modelfield strings. The correct version would be:

from linkcheck import Linklist   # actually import Linklist
from myapp.models import SomeModel

class SomeModelLinklist(Linklist):
    model = SomeModel
    url_fields = ["url"]   # list fields as strings 
    ignore_empty = ["url""]
    object_filter = {"active": True}

linklists = {
    "SomeModel": SomeModelLinklist,
}

This could be very confusing to a first time user or newer Django developer, but don't let that stop you. It's a really nice app that solves a real world issue.

Checking your links

To actually check your URLs from your models you then need to run two commands:

$ ./manage.py findlinks 

This populates linkcheck's tables with the TODO list of sorts.

You can then actually check them with:

$ ./manage.py checklinks 

NOTE: Both of these should be re-run periodically, perhaps via a cron job, based on your personal needs.

If you have any broken links they should be visible and ready to handle at /admin/linkcheck/ now.

Check out the README for more information on how you can set this up including settings for how frequently you would like the links to be rechecked.

Hopefully these small tips make it easier to start using this great Django app. Cheers!

]]>/>
Faster Python Docker Buildshttp://www.revsys.com/tidbits/faster-python-docker-builds/The 4 Docker Sins Killing Your Python Productivity. You changed a single line of code and Docker has to rebuild everything! Dramatically improve both fresh builds and rebuilds when working in Docker.Mon, 26 May 2025 20:30:43 +0000http://www.revsys.com/tidbits/faster-python-docker-builds/

If you're waiting more than 10 seconds for your Docker rebuilds, you're wasting hours of your life!

So that you keep reading, here are the results!

Results

  • Fresh builds: 50s → 18s (2.7x faster)
  • Rebuilds: 47s → 0.4s (117x faster)
  • Adding dependencies: 50s → 6s (8x faster)

Overview

We work with Python development teams of all sizes and levels of sophistication. One area where many teams struggle is optimizing their Docker image build times.

I was reminded of this yesterday replying to a recent r/django reddit thread where the author assumed they needed to break their Django monolith up into a few services to reduce their build time, which isn't accurate, but is a common mistaken assumption.

There are a few small things you can do to dramatically reduce the amount of time it takes to build (and more importantly REBUILD) a Python or Django docker image.

The TL;DR is you need to:

  • Only rebuild the dependency layer when your dependencies actually change
  • Cache your PyPI downloads locally and in CI
  • Switch to using uv which is stupid stupid fast
  • Use a multi-stage build

If you're already doing all of those things, you can safely skip the rest of this post.

Naive Approach

The most common problem and impactful issue is not ordering the lines in your Dockerfile to only rebuild the layers that need to be rebuilt.

Not having to do anything at all is the fastest thing there is! It's a one million percent improvement! 🤣

Here is what many people start with:

FROM python:3.13-slim

RUN mkdir /code
WORKDIR /code

COPY . /code/

RUN pip install -r requirements.txt

# ... rest of lines ending in a `CMD` to run

So what's wrong with this? Because every time you change ANYTHING in your git repository you're re-installing your pip dependencies.

This is thankfully easy to resolve. Instead of the above, you should be doing this:

FROM python:3.13-slim

RUN mkdir /code
WORKDIR /code

# Copy just the requirements first 
COPY ./requirements.txt /code/requirements.txt 

# Install dependencies 
RUN pip install -r requirements.txt

# Copy in everything else 
COPY . /code/

# ... rest of lines ending in a `CMD` to run

Now when you rebuild this image it will only need to perform the pip install step when there has actually been a change to your requirements.txt !

Dependencies change somewhat frequently, but no where near as frequently as you change code, docs, tests, and your README this stops wasting time rebuilding that particular Docker layer on every single change.

Caching the PyPI dependencies

Ok so now we're only doing this when there is really something new to do. The next thing to do is not bother downloading all of these dependencies each and every time we build our Docker image. By default pip caches your downloads when using it locally, so this little optimization is overlooked. Python developers either assume it IS happening inside Docker or that it is hard or impossible to make it do so.

Where does pip cache things?

You can manage your pip cache but the most useful thing is to simply know where this cache exists. So run pip cache dir (or uv cache dir if you're already using uv we'll talk about it more later). If you look into that directory hopefully you'll see a bunch of files.

Now this is the cache on your HOST OS, not inside of Docker. There are a couple of ways to expose this into your Docker image, but it's much easier to just have your Docker daemon cache it for you.

If you're using a default Python docker image, you're running in Debian and by default everything is running as the root user. FYI there are security implications to this and you look into running your code as another user non-root user, but that's a topic for another post.

So for the root user on a Debian system this makes the pip and uv cache locations are going to be in /root/.cache/ so we need to make a small change to your RUN that installs everything.

Instead of:

RUN pip install -r requirements.txt

We need to use:

RUN --mount=type=cache,target=/root/.cache,id=pip \
    python -m pip install -r /code/requirements.txt

This is instructing the Docker daemon to cache this folder with the id pip and it will then be reused across builds.

What about in CI?

Things are a bit harder in CI. Depending on what CI system you're using it's sometimes built in, sometimes you need to make configuration adjustments. In any case, the goal you're after here is that the /root/.cache/ folder is preserved and reused across builds so that the downloads are cached between CI runs.

You can read up on all of the details of to optimize Docker cache usage in the Docker docs.

Use uv

If you're not familiar with uv it's a near drop-in replacement for pip from the folks at Astral who also brought us the great ruff linting and formatting tool and the soon to be beta ty type checker.

For most things you just prefix your normal pip command with uv and it works as expected, just a HELL OF A LOT faster.

Switching to uv and adding in the cache mount makes our example Dockerfile now look like this:

FROM python:3.13-slim

RUN mkdir /code
WORKDIR /code

# Install uv 
RUN --mount=type=cache,target=/root/.cache,id=pip \
    python -m pip install uv 

# Copy just the requirements first 
COPY ./requirements.txt /code/requirements.txt 

# Run uv pip install with caching! 
RUN --mount=type=cache,target=/root/.cache,id=pip \
    uv pip install --system -r /code/requirements.txt

# Copy in everything else 
COPY . /code/

# ... rest of lines ending in a `CMD` to run

So how fast is it now?

Things are quite a bit faster at the small expensive of a slightly more complicated Dockerfile .

Naive Fresh - 50 seconds

Naive Rebuild - 47 seconds.

The difference here is just the speed of downloading the pip dependencies between runs.

After we've fixed things to only re-run pip install when those requirements change gives us the biggest benefit.

Naive Fixed Fresh - 50 seconds

Naive Fixed Rebuild - 10 seconds

With Caching

Caching our downloads improves our situation even further!

Cached Fresh - 44 seconds

Cached Rebuild - 0.4 seconds

With Caching and uv

UV Fresh - 18.5 seconds

UV Rebuild - 0.4 seconds

Why isn't uv faster? Well it IS faster downloading the files initially, I'm guessing it is doing something in parallel better or just being written in rust is making this aspect twice as fast as normal pip. But for these last two we're really just testing how faster Docker is able to create the layer since there is really no calls to pip or uv going on.

Adding a new pip dependency into the mix

The real speed up is when you need to add a new dependency. In our original requirements.txt we neglected to add the very useful django-debug-toolbar package. So I added it and re-ran all of these.

Naive

Naive Fresh - 50 seconds

Naive Rebuild - 47 seconds.

Naive Rebuild w/ new dependency - 50 seconds

Naive Fixed Fresh - 50 seconds

Naive Fixed Rebuild - 10 seconds

Naive Fixed Rebuild w/ new dependency - 51 seconds

With Caching

Cached Fresh - 44 seconds

Cached Rebuild - 0.4 seconds

Cached Rebuild w/ new dependency - 24 seconds

With Caching and uv

UV Fresh - 18.5 seconds

UV Rebuild - 0.4 seconds

UV Rebuild w/ new dependency - 6 seconds

So we went from a consistent 50ish seconds per build to 18 seconds for a fresh build, 6 seconds when adding a new dependency and nearly instant for rebuilds with no dependency changes.

Bonus info

Multi-stage Docker Builds with Python

What are multi-stage builds? In short, they are Dockerfile s with multiple FROM lines.

Why would I want to do that? Well size and security mainly.

On the security front, using a multi-stage build allows you to deploy an image that does not include any compilers or build tools, but still use those tools to build the dependencies you use. In terms of size, the your final image only includes the initial runtime environment, your built dependencies, but not any of the tools or dev packages needed to build those dependencies.

So you get a smaller and more secure image, which are good things and they add just a BIT more complexity to your Dockerfile. Once you've been walked through it, it should be fairly clear.

FROM python:3.13-slim AS builder-py

RUN mkdir /code
WORKDIR /code

# Install uv 
RUN --mount=type=cache,target=/root/.cache,id=pip \
    python -m pip install uv 

# Copy just the requirements first 
COPY ./requirements.txt /code/requirements.txt 

# Run uv pip install with caching! 
RUN --mount=type=cache,target=/root/.cache,id=pip \
    uv pip install --system -r /code/requirements.txt

FROM python:3.13-slim AS release 

# Copy our system wide installed pip dependencies from builder-py
COPY --from=builder-py /usr/local /usr/local

# Copy in everything else 
COPY . /code/

# ... rest of lines ending in a `CMD` to run

Benchmark / Testing done here

You can find the exact Docker files and bits I used to do this testing here in this repo .

I did this testing on a M4 Max MacBook Pro with 128GBs of RAM on a 1.2 Gbps fiber internet connection while catching up on watching some PyCon 2025 talks. I'm also using Orbstack which improves the overall performance of Docker on MacOS. Your results will almost certainly vary, but doing any of these steps will save you and your team time in your CI pipelines and when building images locally. The small differences in download speed or available CPU don't really matter, we aren't doing a CPU heavy micro-benchmark here.

Our time on this planet is short, too short to spend it waiting for Docker to needlessly rebuild images.

Do yourself a favor a start using these tips now!

]]>/>
18 Years of REVSYShttp://www.revsys.com/tidbits/18-years-of-revsys/Frank left his last corporate job in April of 2007 to start working on REVSYS full time. Prior to that he was just doing the occasional consulting and freelance work under the REVSYS brand.Mon, 05 May 2025 14:27:00 +0000http://www.revsys.com/tidbits/18-years-of-revsys/

I turned in my two weeks notice April 23rd, 2007 to Sunflower Broadband subsidiary of the Lawrence Journal-World. I know this only because it just so happened that my last day ended up being Cinco de Mayo which makes remembering our anniversary thankfully very easy.

My last corporate job was also my best, so good I came back to the company after taking a year and a half tour through various KC tech companies that sucked for one reason or another.

My various roles at the company were the absolute perfect breeding ground to grow into a consultant. Most of my time was spent running critical ISP services and writing custom software for the company, but the company was involved in so many things it was easy to get pulled into problems like:

  • antiquated nearly unsupported accounting systems
  • ancient Solaris newspaper printing press software
  • satellite issues receiving cable content
  • Emergency Broadcast System and 911
  • Fleet operations and running a good NOC
  • Customer service and marketing needs
  • Running of the medium/large web properties of the day
  • Crazy side projects like trying to build an early Peloton

All of those has proven useful with clients over the years, but it was really the experience across so many areas that made it easy to adapt oneself to the client's situation at hand.

The Process

I've written about the process in more depth (it was my first post to be on HackerNews actually), but in short I focused on learning areas that weren't tech, reduced my debt, increased my savings and jumped when I had a contract that supposedly would have made the first 6 months easy. Spoiler it didn't.

Starting your own business is hard and scary, but 18 years later I'm VERY glad that I did.

The Result

I've had the pleasure to work with so many amazing people at great companies. Often doing important things with technology. The constant learning, improving and running a business can be tiring but it is never boring!

It's been my pleasure to bring in great business partners like Jacob Kaplan-Moss and Jeff Triplett. Amazing employees like Jacob Burch, Lacey Henschel, Stephen Spencer, Kojo Idrissa, and Catherine Holmes. And have former employees like Flavio Curella and Daniel Lindsley over the years.

We've even had a sub-contractor around for nearly all 18 years Greg Newman!

Speaking of sub-contractors, we also currently have Natalia Bidart, Velda Kiara, Mark Wirblich, and Sean Parsons helping us out on various projects.

In the past we've had a string of really amazing contractors work for us. We've had Dr. Russell Keith-McGee, Idan Gazit, and the late Malcolm Tredinnick . I'm probably forgetting at least a few people over the full 18 years and for that I apologize.

The Clients

We've done massively scaled and fun projects with companies like Netflix, Sony, and The Wharton School. We've helped with critical infrastructure, internal skunkworks projects, and companies moving to Python, Django and/or the cloud. From cancer research to to custom furniture to event tickets and AI we've had a lot of fun helping a lot of our clients .

The Work

It is exciting to see your work go live and to see it talked about in the media, but the most meaningful work to us is helping improve the development teams we work with. From introducing new tools and better processes we absolutely love leading them down a path to better code, in less time, with less hassle, and of course, less bugs!

Some things we've done:

  • Reduced cloud spend 90% while reducing page load time 50% for large websites like Politifact.com
  • Numerous code reviews
  • System architectural help, design, and sanity checking
  • Taught ops automation with Python to scores of staff at companies like KPMG and Jump Trading
  • Moving clients to Kubernetes and streamlining lining CI/CD pipelines for efficiency
  • Helped companies move to Python, Django and the Cloud like DealerTrack and eMoney
  • Solved performance issues in too many companies to list
  • Building startup MVPs from the ground up in all sorts of industries
  • Acquisition due diligence
  • Filling in as team lead or fractional CTO when clients have an unexpected staffing issue
  • Python and Django upgrades to keep critical systems running and patched
  • Building Python AI into EdTech and ecommerce products and consulting with some of the behind the scenes players in that space.
  • And bespoke development of course!

Here's to another 18 years!

P.S. Oh and do let us know if we can help your company. We have some availability in our calendar coming up this summer due to a company canceling their projects because of the silly tariffs!

]]>/>
What IS a Django App?http://www.revsys.com/tidbits/what-is-a-django-app/To kick off a semi-regular series of blog posts about useful Django apps, we start with some definitionsFri, 02 May 2025 15:57:00 +0000http://www.revsys.com/tidbits/what-is-a-django-app/

Introduction

We'll be doing a semi-regular series of blog posts about various useful Django apps. However, it seemed like a good idea to lay some groundwork first. That will help us do a few things:

  • define app along with a few other similar terms, and get some namespace confusion out of the way
  • understand the value, as a developer , of creating separate apps in your Django projects
  • understand the value, as a developer or end-user of a Django project, of using separate apps (third-party or internally-created) in your Django projects

What is an App? Context matters.

If you go to the Django home page, https://www.djangoproject.com, the blurb at the top says: Django makes it easier to build better web apps more quickly and with less code. Django is often described as a web application framework or something that you use to create web applications (web apps). I've used this description myself. I have also purposefully distinguished between a web app and a website during a Django tutorial I taught. I've also written a blog post on that topic. So, if Django is used to BUILD web apps, why are we talking about apps IN Django? The context shifts a bit if you're looking at something built with Django from the outside vs. from the inside.

A View From The Inside

From the outside, the specific nomenclature is much less important. But from the inside, as someone building something with Django, the specific terminology has more meaning. But, it can still be a little confusing.

Internally, Django views an entire collection of code, what most would call a "website," as a project . This is where the settings and configuration for a specific Django website are contained. Inside that project are 1 or more apps . An app is usually a specific sub-system used within the project. It's a collection of related code that provides specific functionality that's available to the rest of the project.

A Github Example

We can look at GitHub and how its functionality is split up as an example. Note : Github is NOT built with Django. But it's a web application that's known to the people who'll probably be reading this post. So, we'll pretend Github is built in Django for this example.

Looking at the repository for Python , we can see that the issue tracker , management of pull requests , and tracking CI/CD activity are all handling different tasks in different parts of the GitHub web application. But those bits of functionality are available to be reused across the web app on each different repository. When I talk about "bits of functionality" from a Django perspective, I'm talking about the database tables, views, other code and endpoints related to each task. So, the tables, views, other code and endpoints needed to manage Github Issues would all live in one app, perhaps called issues . It would be the same for the pull_requests and ci_actions apps. As such, that functionality is available to any other part of the larger Github project. At the same time, any changes that needed to be made to one of those apps (model updates, view changes, new endpoints, etc.) can be localized in that specific app.

In Django, as with most web application frameworks, visiting a certain URL causes a specific section of code to be run. In this example, that would be the code in the issues , pull_requests or ci_actions apps.

One of the best things about apps is that they can be used between different projects. It's this reusability that makes Django apps especially valuable. In many ways, a Django app can be seen as Django's analogy to Python packages. And they serve a very similar role: you don't have to re-invent something that already exists. While most 3rd party Django apps get downloaded from PyPI like other Python packages, you can search for a Django app to fit your needs there or at Django Packages .

Building Your Own Django App

Developer-Facing Benefits

The benefits of using a third-party app are clear: you get new functionality added to your project. And you don't have to write/test/debug new code. You usually only have to make a few configuration changes.

But when it comes to the code you're writing yourself, some developers may wonder why they should bother creating separate apps inside their projects. Won't that just add extra complexity? If not, then what are the benefits?

Code Modularity

By having code that performs a specific set of functions separated into an app, it's not mixed in with your larger project monolith. As a result, you end up with code that's easier to test, document, and update. Instead of having to dig through your entire code base, you can focus on JUST the parts you need to look at. You may also improve the effectiveness of your testing by realizing that some subsystems (apps) need more or less testing than others. You can tune your coverage strategy on a per-app basis. This modularity can also provide clearer internal imports due to app namespacing.

Reusability

You may find yourself writing code that is nearly identical across multiple projects. Instead of copying/pasting that code several times and hoping you've made the right changes, you can install your app into that project. This way, any changes that need to be made to that app can be made in one place instead of several. This might not be one you've considered, given that we don't always know what the full design will be as we're working on something, and it can take time to tell if a certain chunk of code will be useful in other projects.

On the topic of reusability, this doesn't mean you have to share this app with the world. You can put it on Github and pip install it from a Git repository . That way, it's easily available to all your other projects and other developers on your team.

A Microservices Analogy

Here's another way of looking at the relationship between a Django project and an app. We can make a comparison to the monolithic and microservice approaches to software architecture. An app in Django can be compared to a microservice but with MUCH less overhead. But an app and a microservice fill similar roles. They help deal with complexity by decomposing your solution into smaller, more manageable pieces. So, if you're working on a Django project and thinking you need a microservice, that's probably a sign that you should convert part of your monolith into an app. And, if at some point you realize you truly DO need a microservice, it's much easier to convert an app into a microservice. You already have the encapsulated structure in place. In short, breaking the codebase of a monolithic Django project into apps (where appropriate, not just for the sake of doing it) can provide many of the benefits of distributed systems without most of the problems.

There may be a question about how much "overhead" there is in turning your working code into a reusable app? The answer is, "not much". I won't go into detail here, but the Django docs have you covered from start to finish .

End-User Facing Benefits

The assumption here is you are building a Django project for other people to use, not just a personal project. Perhaps it's a client you're doing freelance or agency work for. Or, you may be a developer inside a larger organization, building web apps for your co-workers to use. In situations like these, there may be concerns about "adding complexity" or "slowing down development" by creating separate apps within a larger project. As noted above, that's not really a problem, but here are a few things to keep in mind if a concern like that is raised.

There are two main benefits users of your Django project will see from creating separate apps within the project. First, all users will benefit from better naming conventions and clarity about which part of the web app they're using. Most won't care about the implementation details, but knowing that they can go directly to https://CompanyName/task_1/ or https://CompanyName/task_2/ to do what they need to will simplifiy things for them. Contrast this with needing to go to https://CompanyName/all_the_tasks/ and then search for what they need.

Second, there may be a group of staff or superusers who will have access to the Django Admin. These may be people such as managers, department heads, and the like. For those users, having your code separated into apps will create automatic separation in the Django admin. This will make it easier for those folks to find and focus on the specific app they need, instead of sorting through the entire project.

Next: Django App Of The {{ variable_time_period }}

Now that we've set a bit of foundation, we'll be looking at various Django apps and how they can be used. Some will be apps we're currently using on projects. Some will be apps we think are interesting. We probably won't be doing an app every week, but we'll try to keep the cadence regular enough to be compelling! Stay tuned!

]]>/>
REVSYS Anniversary and Office Warming Partyhttp://www.revsys.com/tidbits/revsys-anniversary-and-office-warming-party/Come have some snacks and drinks at our new office.Thu, 01 May 2025 21:40:14 +0000http://www.revsys.com/tidbits/revsys-anniversary-and-office-warming-party/

Can you believe it's been 18 years? We're celebrating our anniversary and our new offices on May 5th, 2025 between 3pm and 7pm.

Stop by, have a drink, a snack, and talk tech with us.

  • Date: May 5th, 2025
  • Time: 3pm to 7pm
  • Location: 888 New Hampshire, Suite E, Lawrence, KS 66044 (The Lofts Building)

The only issue is our exterior door is fairly hidden . It's in the alley BEHIND the Lofts building at 8th and New Hampshire. Here is a Google Map pin for the door

We'll have the door propped open as it requires a key fob so look for the door that is slightly ajar and for a small REVSYS sign.

]]>/>
How to Add Blazing Fast Search to Your Django Site with Meilisearchhttp://www.revsys.com/tidbits/how-to-add-blazing-fast-search-to-your-django-site-with-meilisearch/Step-by-step guide to integrating Meilisearch with Django, complete with automatic indexing, typo tolerance, and relevant filtering capabilities.Mon, 21 Apr 2025 15:25:00 +0000http://www.revsys.com/tidbits/how-to-add-blazing-fast-search-to-your-django-site-with-meilisearch/

TL;DR: Add powerful search to your Django site in an afternoon

Last year, I needed to add robust search functionality to a client's Django-powered surplus parts website. With approximately 70,000 items spanning hundreds of categories and subcategories, each with unique part numbers and extensive metadata, the site needed a powerful search solution. Surplus Sales sells all kinds of surplus equipment, making it critical that customers quickly find specific parts using various search terms.

Based on positive experiences with previous clients, we chose Meilisearch from the start. We knew how quickly we could implement it and how effectively we could refine search results with minimal tweaking. This approach paid off. Meilisearch delivered fast, typo-tolerant search that could handle all the complex metadata of our client's inventory.

The client's requirements were particularly challenging. Users need to search by part numbers (and there might be multiple part numbers for a single product), product names, descriptions, and other technical specifications. They also need to find results even when making typos or using abbreviations. All the more reason to use Meilisearch!

Let's walk through how to add Meilisearch to your own Django project.

Why Meilisearch for Your Django Project?

If you've ever tried to implement search in Django using just the ORM, you know it can quickly become a performance bottleneck. While Django's icontains lookups work for simple cases, they fall short when you need:

  • Fast response times on large datasets
  • Typo tolerance ("djagno" should still find "django")
  • Tighter control over relevance ranking
  • Filtering and faceting capabilities
  • Complex multi-field searches across related models

Meilisearch solves all these problems with a simple API and Python client that integrates easily with Django's model structure. The official documentation is excellent and comprehensive, but in this post, I'll focus specifically on integrating it with Django.

Meilisearch versus other search solutions

When comparing search solutions for Django projects, Meilisearch stands out for several reasons. Unlike PostgreSQL full-text search, which requires complex query construction and index management, Meilisearch provides a dedicated search API with minimal configuration. Compared to Elasticsearch, Meilisearch is significantly easier to set up, maintain, and scale for most Django applications. The day-to-day operation of Meilisearch is quite simple, with no need for cluster management or index optimization tasks that Elasticsearch can require. Now that it's been in production for a few months, the most maintenance I need to do is manually run a management command once in a while (and even that is pretty rare).

For many Revsys clients , Meilisearch hits the sweet spot between powerful features and developer-friendly implementation. You get advanced search capabilities without the operational complexity of heavier solutions. This makes it particularly well-suited for small to medium-sized teams that need robust search solutions they can implement quickly.

Step 0: Setting Up Meilisearch

Before we dive into our Django models and search schemas, let's set up Meilisearch.

Docker Configuration

The easiest way to run Meilisearch in your development environment is with Docker and Compose.

# compose.yml 
services:
  search:
    image: getmeili/meilisearch:v1.7
    volumes:
      - ./meili-data:/meili_data
    ports:
      - "7700:7700"

This configuration:

  • Uses the official Meilisearch Docker image (version 1.7)
  • Creates a persistent volume for your search data
  • Exposes the service on port 7700

The persistent volume is handy for local development. It ensures your search index survives container restarts and updates, so you won't need to rebuild your index every time you restart your containers.

To start Meilisearch, run:

docker-compose up -d search

Django Configuration

Now let's configure our Django settings to work with Meilisearch. First, install the Python client:

pip install meilisearch

(Or add to your dependencies using whatever method you choose.)

Then add these settings to your Django project:

# settings.py
SEARCH_API_URL = env("SEARCH_API_URL", default="http://search:7700/")
SEARCH_API_KEY = env("SEARCH_API_KEY", default=None)
SEARCH_INDEXES = {
    "main": "main_search",
    # You can define additional indexes here for different purposes
    # "autocomplete": "autocomplete_search",
    # "admin": "admin_search",
}
# Control whether we index the search or not
INDEX_SEARCH = env.bool("INDEX_SEARCH", default=True)

The SEARCH_INDEXES dictionary is particularly powerful. It allows you to define multiple indexes for different purposes. For example:

  • A "main" index for general site search
  • An "autocomplete" index optimized for quick suggestions with different ranking rules
  • An "admin" index that includes sensitive fields only staff should search

Each index can have different settings, ranking rules, and even different data schemas, all while drawing from the same Django models. This flexibility lets you optimize each search experience for its specific use case.

Pro-tip: That INDEX_SEARCH setting is particularly useful. During development and testing, you can set it to False to prevent your test data from being indexed. This keeps your search index clean with only real data. It's also helpful when running tests that create and destroy many model instances, as it prevents unnecessary indexing operations that would slow down your tests.

Now with Meilisearch running and configured, we're ready to start building our searchable models.

Step 1: Start with Your Django Models

Let's start by looking at a typical Django model that we want to make searchable:

# products/models.py
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=255)
    description = models.TextField(blank=True)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    sku = models.CharField(max_length=50, unique=True)
    brand = models.CharField(max_length=100, blank=True)
    active = models.BooleanField(default=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def get_absolute_url(self):
        return f"/products/{self.id}/"

    def __str__(self):
        return self.name

This is a standard Django model for a product. Nothing special here yet, but we'll soon make it searchable with Meilisearch.

The get_absolute_url() method is particularly important for search integration. When displaying search results, you'll need a URL to link to each result. Having this method on your model makes it easy to generate these links consistently.

Before moving forward, consider what users will actually search for. For a product catalog, users might search by:

  • Product names
  • Brand names
  • SKUs or part numbers
  • Words in the description
  • Technical specifications
  • Categories or tags (which might be related models)

Understanding these search patterns helps you design an effective search schema in the next step.

Thinking About Your Search Schema Design

Before diving into code, let's take a step back and think conceptually about how to design an effective search schema. This is one of the most important decisions you'll make when implementing search because it determines what data is searchable and how your search results will be structured.

The Unified Search Approach

When implementing search across multiple Django models, you have two main approaches:

  1. Separate indexes - Create different indexes for each model type (products, categories, etc.)
  2. Unified schema - Create a single schema that can represent any searchable entity

For most Django sites, I recommend the unified approach. This allows users to search across all content types with a single query, providing a more intuitive experience. It also simplifies your frontend code since you only need to query one index.

The unified approach is particularly valuable when users don't necessarily know or care about the underlying data structure. For example, a user searching for "vaccuum capacitors" might find it useful to get results that include products, categories, or blog posts about capacitors. They just want relevant information. A unified schema lets you return all these result types in a single search.

Using separate indexes can be useful when you have different needs for your customer-facing frontend and your admin side. For example, you would index your customer information in its own search index, and then ensure those results could only be read by your staff members. Separate indexes are also useful when different parts of your application have different search requirements or access patterns.

Mapping Different Models to a Common Schema

The key to a unified search schema is designing fields that can accommodate different models while maintaining their unique characteristics. Let's look at how this works with a concrete example, using a Product model and a Category model.

class Product(models.Model):
    name = models.CharField(max_length=255)
    sku = models.CharField(max_length=50, unique=True)
    description = models.TextField()
    price = models.DecimalField(max_digits=10, decimal_places=2)

class Category(models.Model):
    name = models.CharField(max_length=255)
    slug = models.SlugField(max_length=100)
    description = models.TextField()
    active = models.BooleanField(default=True)

Both have a name field, but their secondary fields differ. In our search schema, we map them like this, using model-specific field names to maintain the unique characteristics of each model type:

  • Schema title maps to Product name and Category name
  • Schema product_name maps to Product name and isn't used for Category
  • Schema category_name isn't used for Product and maps to Category name
  • Schema product_description maps to Product description and isn't used for Category
  • Schema category_description isn't used for Product and maps to Category description
  • Schema sku maps to Product sku and isn't used for Category
  • Schema type is set to "product" for Product and "category" for Category

This approach gives us a consistent way to display search results while preserving the unique aspects of each model. The title field serves as a common display field, while the model-specific fields allow for targeted searching within each model type.

This pattern scales well to more complex scenarios. For example, if you were building a university website, you might have models for Courses, Faculty, and Events. Your schema might include fields like:

  • title - The main display title for any result
  • subtitle - Which might be a course code for Courses, a department for Faculty, or a date for Events
  • type - "course", "faculty", or "event"
  • course_description , faculty_bio , event_details - Type-specific content fields

The key takeaway is that your search schema doesn't have to mirror your database schema . It should be designed specifically for search, with fields that make sense for how users will search and how you'll display results.

What to Include in Your Schema

When deciding which fields to include in your search schema, consider:

  1. What users search for : Include fields that contain terms users are likely to search for
  2. What helps with relevance : Fields that help determine if a result is relevant
  3. What's needed for display : Information needed to render search results
  4. What's needed for filtering : Fields users might want to filter on

For our client, we found that including part numbers, alternate part numbers, and technical specifications was crucial because many customers knew exactly what part they needed. We also included category information so users could find related parts even if they didn't know the exact part number.

Equally important is what to exclude:

  1. Large text fields : Unless they're critical for search, large text fields can bloat your index
  2. Sensitive information : Never include passwords, private notes, etc.
  3. Frequently changing data : Data that changes very often might be better queried directly from your database
  4. Derived data : If it can be calculated from other fields, consider computing it at display time

Multiple Indexes for Different Purposes

While a unified schema works well for general search, sometimes you need specialized indexes for specific features. For example:

  • A main index for general site search
  • An autocomplete index optimized for quick suggestions as users type
  • A products index with additional fields specific to product search
  • An admin index that includes fields only relevant to staff users

Each index can have different settings optimized for its specific use case, while still drawing from the same underlying data models.

Step 2: Define Your Search Schema with Pydantic

Now that we understand the conceptual approach, let's define what data from our model should be searchable. This is where Pydantic comes in. It helps us create a clean, type-checked schema that will be sent to Meilisearch:

# search/schemas.py
from pydantic import BaseModel

class SearchSchema(BaseModel):
    """
    Pydantic model for our main search schema
    """
    id: str
    title: str | None = None
    type: str  # 'product', 'category', etc.
    url: str
    active: bool = True

    # Product fields
    product_name: str | None = None
    product_description: str | None = None
    brand_name: str | None = None
    sku: str | None = None

This schema defines exactly what will be stored in our Meilisearch index. The beauty of this approach is that we can:

  1. Have strict type checking for our search documents
  2. Control exactly which model fields get indexed for searching
  3. Clearly separate which fields belong to which model types

The fields in our schema serve different purposes:

  • id : A unique identifier for each document in the index. We'll prefix this with the model type to ensure uniqueness across different models.
  • title : A common display field used for all result types.
  • type : Identifies what kind of object this is (product, category, etc.). This is crucial for filtering and displaying results appropriately.
  • url : The link to the full object, used when a user clicks a search result.
  • active : Whether this item should appear in search results. This allows us to hide items without removing them from the index.

The model-specific fields (prefixed with product_ or later category_ ) allow us to search within specific model types while maintaining a unified schema.

Later, we'll add a second model to our search schema, demonstrating how this approach scales to multiple model types.

Why Pydantic?

Using Pydantic for this purpose offers several advantages over a simple dictionary:

  1. It provides type validation, ensuring that your search documents always have the expected structure. This catches errors early, before they cause problems in your search index.
  2. Pydantic's schema documentation capabilities make it easy to understand what data is being indexed.
  3. Pydantic models can be easily serialized to JSON, which is what Meilisearch expects.

Step 3: Connect Your Model to the Search Schema

Now let's add a method to our Product model that maps its fields to our search schema:

# products/models.py
from django.db import models
from search.schemas import SearchSchema

class Product(models.Model):
    # fields and methods as before

    def search_data(self):
        """Return data for the search index"""
        return SearchSchema(
            # prefix the ID with the "type" for the SearchSchema
            id=f"product-{self.id}",
            title=self.name,
            type="product",
            url=self.get_absolute_url(),
            active=self.active,
            product_name=self.name,
            product_description=self.description,
            brand_name=self.brand,
            sku=self.sku,
        )

We've added a search_data() method that transforms our Django model into the Pydantic schema. The id field uses a prefix to ensure uniqueness across different model types (which we'll need later when we add more models to our search index).

This method serves as the bridge between your Django model and your search index. It's responsible for:

  1. Selecting which model data should be searchable
  2. Transforming that data into the structure expected by your search schema
  3. Adding any computed or derived fields needed for search

The prefix in the id field ( product-{self.id} ) is particularly important. Without it, you might have ID collisions when indexing different model types. For example, both a Product with ID 1 and a Category with ID 1 would have the same ID in the search index. By adding the type prefix, we ensure each document has a truly unique identifier.

Notice how we map fields from our Django model to our search schema: - name becomes both title (the generic display field) and product_name (the product-specific field) - We include the active status to control visibility in search results - We use get_absolute_url() to generate the link for search results

Step 4: Indexing Our Search Data in Meilisearch

4.1 What is Search Indexing?

So what exactly is "indexing" in the context of search engines? Think of it like creating a highly optimized lookup table for your data. When we talk about indexing in Meilisearch, we're:

  • Extracting specific fields from our Django models (like names, descriptions, SKUs)
  • Transforming this data into a format optimized for search
  • Organizing it so Meilisearch can quickly find matching results when users search

Without indexing, searching would require scanning through every record in your database. Indexing creates that organization system, allowing Meilisearch to deliver results in milliseconds rather than seconds or minutes.

Our SearchSchema is how we tell Meilisearch which fields should be searchable (like product names and descriptions) and which should be filterable (like product types and whether they're active). These decisions shape how the index is structured and optimized behind the scenes.

4.2 Creating a Meilisearch Client Connection

Let's build our helper functions step by step, starting with establishing a connection to Meilisearch:

def get_meilisearch_client():
    """Return a meilisearch client"""
    url = settings.SEARCH_API_URL
    api_key = settings.SEARCH_API_KEY
    client = meilisearch.Client(url, api_key)
    return client

This function creates a connection to your Meilisearch server using the URL and API key from your Django settings. We'll use this in all our other functions to avoid repeating the connection code.

4.3 Managing Individual Objects in the Index

Next, we need a way to handle individual objects (like products in our inventory) when they're created, updated, or deleted:

def update_object(obj):
    """Update a single object in the search index or delete if inactive"""
    client = get_meilisearch_client()
    index_name = settings.SEARCH_INDEXES["main"]
    index = client.index(index_name)

    if obj["active"]:
        # Update the object in the index
        index.add_documents([obj])
    else:
        # Delete the object from the index
        index.delete_document(obj["id"])

This function handles adding, updating, or removing a single object in the search index. When you save a product, this function gets called to either:

  • Add/update the product in the index if it's active
  • Remove it from the index if it's inactive (like when a product is discontinued)

The conditional logic based on the active field is particularly useful. It automatically handles the case where items can be deactivated but not deleted from the database. By removing inactive items from the search index, you ensure users only find items they can actually purchase or access.

4.4 Building a Complete Search Index

Now for our workhorse function that handles populating the entire search index:

def build_main_indexes(verbose=False):
    """Build the main search index with all searchable models"""
    from products.models import Product  # Import your models

    client = get_meilisearch_client()
    index_name = settings.SEARCH_INDEXES["main"]
    index = client.index(index_name)

    # Configure the index settings
    index_settings = {
        "filterableAttributes": ["type", "active"],
        "searchableAttributes": [
            "title",
            "product_name",
            "product_description",
            "sku",
            "brand_name",
        ],
    }
    index.update_settings(index_settings)

    # Index all active products
    products = Product.objects.filter(active=True)
    if verbose:
        print(f"Indexing {products.count()} products...")

    product_docs = [product.search_data().dict() for product in products]
    if product_docs:
        index.add_documents(product_docs)

This function:

  • Configures which fields should be searchable (like product names and descriptions)
  • Defines which fields can be used for filtering (type and active status)
  • Fetches all active products from your database
  • Converts each product to its search schema format
  • Adds all products to Meilisearch in a single batch operation

The searchableAttributes setting is particularly important. It tells Meilisearch which fields should be included in the search index. Fields not listed here won't be searchable, even if they're included in your documents. This gives you fine-grained control over what's searchable and what's not.

Similarly, filterableAttributes defines which fields can be used for filtering results. In our case, we want to filter by type (to show only products or only categories) and by active status (to hide inactive items).

The optional verbose parameter lets you see progress output, which is helpful when indexing large datasets.

4.5 Complete Index Rebuilds

Finally, we need a "nuclear option" for when we want to start fresh:

def rebuild_main_indexes(verbose=False):
    """Completely rebuild the main search index"""
    client = get_meilisearch_client()
    index_name = settings.SEARCH_INDEXES["main"]

    # Delete the index if it exists
    try:
        client.delete_index(index_name)
    except meilisearch.errors.MeiliSearchApiError:
        pass

    # Create the index and build it
    client.create_index(index_name)
    build_main_indexes(verbose=verbose)

This function completely wipes and rebuilds your search index. You'll want to use this when:

  • You've made changes to your search schema
  • You want to reset your index after testing
  • Your index has gotten out of sync with your database
  • You've made significant changes to your ranking rules or index settings

When I first implemented this for our client, I was amazed at how quickly Meilisearch could index tens of thousands of products. Even with our full catalog of 70,000+ items, the initial indexing took only a couple of minutes.

Step 5: Creating a Mixin for Automatic Indexing

Now let's create a mixin that will automatically update the search index whenever a model is saved. This way, if a product's data changes, or the website administrator marks a product inactive, the search index updates and the search results for users remain accurate.

# search/mixins.py
from django.conf import settings
from search.utils import update_object

class UpdateSearchMixin:
    """
    Mixin to update the search index when a model is saved
    """
    def update_search(self):
        if settings.INDEX_SEARCH:
            update_object(self.search_data().dict())

    def save(self, *args, **kwargs):
        """ Override the save method so we can call update_search after save """
        # Call the original save method
        super().save(*args, **kwargs)
        # Update the search index
        self.update_search()

Now we can update our Product model to use this mixin:

# products/models.py
from django.db import models
from search.schemas import SearchSchema
from search.mixins import UpdateSearchMixin

# Add UpdateSearchMixin
class Product(UpdateSearchMixin, models.Model):
    ...

With this implementation, whenever a Product is saved:

  1. The save() method from our mixin is called
  2. It calls update_search() , which uses the search_data method on the model to update the search index
  3. The search_data() method transforms our Django model into the Pydantic schema
  4. The data is sent to Meilisearch in the exact format we've defined

This automatic indexing approach ensures your search index stays in sync with your database without requiring any additional code at the point of use. Once you've set up the mixin and applied it to your models, the indexing happens automatically whenever models are saved. This means your search results stay up-to-date.

For more complex scenarios, you might want to extend this mixin to handle bulk operations or to update related objects. For example, if changing a Category should update all its related Products in the search index, you could add that logic to the Category model's save method.

One of the great advantages of Meilisearch compared to other search engines like Elasticsearch is that it handles document indexing asynchronously. When you send a document to Meilisearch, it quickly acknowledges receipt and then processes the indexing in the background. This means your Django save() method won't be slowed down waiting for indexing to complete, a common issue with other search backends where indexing can add seconds to each save operation.

This asynchronous approach eliminates the need for background task queues like Celery or Dramatiq just to handle search indexing, greatly simplifying your architecture.

Step 6: Creating Management Commands for Index Operations

While our automatic indexing handles day-to-day updates, we sometimes need more powerful tools for managing our search index. Django management commands are perfect for this.

Let's create a management command using django-click for re-indexing our search:

# search/management/commands/index_search.py
from __future__ import annotations

import djclick as click

from search.utils import rebuild_main_indexes


@click.command()
@click.option("--verbose", is_flag=True, default=False)
def main(verbose):
    """Index objects in Meilisearch."""
    click.secho("Rebuilding main indexes...", fg="green", nl=False)
    rebuild_main_indexes(verbose=verbose)
    click.echo("Indexes built.")

This simple command provides a convenient way to rebuild your search index from the command line.

For larger projects, we can enhance this with more options and batch processing for handling large datasets:

# search/management/commands/index_search.py
from __future__ import annotations

import time
import djclick as click
from django.conf import settings
from products.models import Product, Category
from search.utils import get_meilisearch_client

@click.command()
@click.option("--verbose", is_flag=True, default=False)
@click.option("--batch-size", default=1000, help="Batch size for indexing")
def main(verbose, batch_size):
    """Rebuild search indexes in Meilisearch with batch processing."""
    start_time = time.time()
    click.secho("Rebuilding search indexes...", fg="green")

    # Get client and recreate index
    client = get_meilisearch_client()
    index_name = settings.SEARCH_INDEXES["main"]

    # Delete and recreate index
    try:
        client.delete_index(index_name)
    except Exception:
        pass

    client.create_index(index_name)
    index = client.index(index_name)

    # Configure index settings
    click.echo("Configuring index settings...")
    index_settings = {
        "filterableAttributes": ["type", "active"],
        "searchableAttributes": [
            "title", "product_name", "product_description", 
            "sku", "brand_name", "category_name"
        ],
    }
    index.update_settings(index_settings)

    # Index products in batches
    products = Product.objects.filter(active=True)
    total_products = products.count()

    if verbose:
        click.echo(f"Indexing {total_products} products in batches of {batch_size}...")

    # Process in batches to avoid memory issues with large datasets
    batch_count = 0
    for i in range(0, total_products, batch_size):
        batch = products[i:i+batch_size]
        product_docs = [p.search_data().dict() for p in batch]
        if product_docs:
            index.add_documents(product_docs)

        batch_count += 1
        if verbose:
            click.echo(f"Processed batch {batch_count}, {min(i+batch_size, total_products)}/{total_products} products")

    # Index categories (typically smaller, so we do it all at once)
    categories = Category.objects.filter(active=True)
    category_docs = [cat.search_data().dict() for cat in categories]
    if category_docs:
        index.add_documents(category_docs)

    elapsed_time = time.time() - start_time
    click.secho(f"✓ Indexing completed in {elapsed_time:.2f} seconds", fg="green")

This enhanced command is what we used for our client with 70,000+ items. By processing records in chunks of 1,000, we avoid memory issues that can occur when trying to process all 70,000+ records at once. By processing in manageable chunks, we kept memory usage reasonable while still maintaining good performance.

The entire indexing process for 70,000+ items took about 3-4 minutes, which is remarkably fast considering the volume of data. This is another area where Meilisearch shines compared to other search solutions. Its indexing performance is excellent even with large datasets.

This command is invaluable when:

  • You've made schema changes and need to rebuild the entire index
  • You're deploying to a new environment and need to populate the index
  • You suspect the index has gotten out of sync with your database
  • You're performing data migrations that affect searchable content
  • You want to update your index settings or ranking rules

The ability to easily and quickly rebuild your search index is one of the many reasons Meilisearch is so developer-friendly.

Step 7: Adding More Models to the Search Index

Now that we have the basic structure in place, we can easily add more models to our search index. Let's integrate a Category model:

# products/models.py
class Category(UpdateSearchMixin, models.Model):
    name = models.CharField(max_length=255)
    description = models.TextField(blank=True)
    active = models.BooleanField(default=True)

    def get_absolute_url(self):
        return f"/categories/{self.id}/"

    def search_data(self):
        return SearchSchema(
            id=f"category-{self.id}",
            title=self.name,
            type="category",
            url=self.get_absolute_url(),
            active=self.active,
            category_name=self.name,
            category_description=self.description,
            products_count=self.products.count(),
        )

And update our SearchSchema to include category fields:

# search/schemas.py
class SearchSchema(BaseModel):
    """
    Pydantic model for our main search schema
    """
    # other fields as before... 

    # Category fields
    category_name: str | None = None
    category_description: str | None = None
    products_count: int | None = None

Then update our build_main_indexes function to include categories:

def build_main_indexes(verbose=False):
    """Build the main search index with all searchable models"""
    from products.models import Product, Category  # Import your models

    client = get_meilisearch_client()
    index_name = settings.SEARCH_INDEXES["main"]
    index = client.index(index_name)

    # Configure the index settings
    index_settings = {
        "filterableAttributes": ["type", "active"],
        "searchableAttributes": [
            ...
            "category_name",
            "category_description",
        ],
    }
    index.update_settings(index_settings)

    # Index all active products as before... 

    # Index all active categories
    categories = Category.objects.filter(active=True)
    if verbose:
        print(f"Indexing {categories.count()} categories...")

    category_docs = [cat.search_data().dict() for cat in categories]
    if category_docs:
        index.add_documents(category_docs)

This pattern scales well to any number of models. Each model:

  1. Implements the UpdateSearchMixin
  2. Provides a search_data() method that returns a SearchSchema instance
  3. Sets appropriate values for common fields ( id , title , type , url , active )
  4. Populates its model-specific fields in the schema

The beauty of this approach is that you can add new model types to your search index without changing your existing code. You just:

  1. Update your SearchSchema to include fields for the new model type
  2. Add the mixin and search_data() method to the new model
  3. Update your build_main_indexes function to include the new model

This extensibility is particularly valuable as your application grows. You might start with just products, then add categories, then blog posts, then user profiles, all without having to redesign your search architecture.

Step 8: Implementing the Search Frontend

Now let's create a simple view to search our index:

# search/views.py
from django.shortcuts import render
from django.conf import settings
from search.utils import get_meilisearch_client

def search(request):
    """Search view"""
    query = request.GET.get("q", "").strip()
    type_filter = request.GET.get("type", None)
    results = []

    if query:
        client = get_meilisearch_client()
        index = client.index(settings.SEARCH_INDEXES["main"])

        # Build filter string using your filterable attributes 
        filter_str = "active = true"
        if type_filter:
            filter_str += f" AND type = {type_filter}"

        # Perform search
        search_results = index.search(
            query, 
            {
                "filter": filter_str,
                "limit": 50
            }
        )
        results = search_results["hits"]

    return render(
        request,
        "search/results.html",
        {
            "results": results,
            "query": query,
            "type_filter": type_filter
        },
    )

This view handles the search process:

  1. It extracts the search query and any filters from the request
  2. It connects to Meilisearch and performs the search
  3. It renders a template with the search results

The filter_str construction shows how to use Meilisearch's filtering. We always filter to show only active items. You can optionally filter by type if the user has selected a specific type filter.

For more advanced search needs, Meilisearch supports additional parameters like:

  • attributesToHighlight : Highlight matching terms in results
  • facets : Return facet counts for filtered attributes
  • sort : Sort results by specific attributes
  • offset : Paginate through results

Now let's create a simple template to display the results:

<!-- templates/search/results.html -->
{% extends "base.html" %}

{% block content %}
<div class="search-container">
    <!-- Search form -->
    <form method="get" action="{% url 'search' %}">
        <input type="text" name="q" value="{{ query }}" placeholder="Search...">
        <button type="submit">Search</button>
    </form>

    <!-- Results -->
    {% if query %}
        <h2>Results for "{{ query }}"</h2>
        {% if results %}
            <div class="search-results">
                {% for result in results %}
                    <div class="search-result">
                        <h3><a href="{{ result.url }}">{{ result.title }}</a></h3>
                    </div>
                {% endfor %}
            </div>
        {% else %}
            <p>No results found.</p>
        {% endif %}
    {% endif %}
</div>
{% endblock %}

This template includes:

  • A search form with the current query pre-filled
  • A results section that displays different information based on the result type

Don't forget to add the URL pattern:

# urls.py
from django.urls import path
from search.views import search

urlpatterns = [
    # ... other URLs
    path('search/', search, name='search'),
]

For a production application, you might want to enhance this with:

  1. Pagination : For handling large result sets
  2. Highlighting : To show users where their search terms matched
  3. Faceted search : To allow filtering by multiple attributes
  4. Autocomplete : To suggest search terms as users type
  5. Analytics : To track popular searches and improve your search experience

Meilisearch supports all these features through its API, making it easy to build a sophisticated search experience as your application grows.

See It In Action: Surplus Sales Case Study

Want to see what a production Meilisearch implementation looks like? Visit our client Surplus Sales and try out their search functionality. Pay special attention to:

  • Typo tolerance : Try searching for "trnasistors" (misspelled) and notice how it still finds what you need
  • Speed : The millisecond response times, even when filtering through their 70,000+ item catalog
  • Relevance : How the most relevant parts appear at the top of results
  • Complex metadata handling : Search by part numbers, specifications, or descriptions

Their implementation handles all the concepts we've covered in this tutorial. It's a great example of how these techniques scale to real-world applications.

Alternative Approach: Using django-meili

If you prefer a more Django-native approach with less custom code, you might want to check out django-meili . This package provides a more Haystack-like experience for integrating Meilisearch with Django. I haven't used it, but from what I can tell, these are the main differences between using django-meili and implementing something more like our custom Meilisearch implementation:

  • Model-focused : django-meili is designed around indexing individual Django models, which works well if you're searching just one model at a time
  • Less flexible : Our unified schema approach gives you more control over how different models are represented in search results
  • Easier setup : django-meili requires less boilerplate code to get started
  • Django-native : It follows Django conventions more closely with a familiar API

django-meili might be a good choice if:

  • You're primarily searching within a single model type
  • You prefer a more Django-integrated approach with less custom code
  • You don't need the flexibility of a unified schema across different model types

Further Reading

Need help implementing advanced search for your Django project? Contact our team at Revsys for expert Django consulting . We've implemented search solutions for clients ranging from small startups to large enterprises, and we'd be happy to help you build a search experience that delights your users.

]]>/>
Redis connections growing unboundedhttp://www.revsys.com/tidbits/redis-connections-growing-unbounded/gunicorn+gevent and Celery can be tricky. There is a setting that is often missed that keeps your Redis connections from increasing until they reach the maximum.Fri, 30 Aug 2024 21:04:31 +0000http://www.revsys.com/tidbits/redis-connections-growing-unbounded/

We use gunicorn and gevent in most of our production deployments. There are many options in this area, but this is one we've had great success with and at one point was the fastest based on some benchmarking we did many years ago. (We should re-run that, but that's another blog post).

The Problem

The number of active Redis connections keeps growing. At first glance you would think this would be happening in your Celery workers or even Celery Beat, but it's actually happening in your WSGI (gunicorn) process.

While gevent is given you green threads, Celery isn't aware it's running in a thread safe context so when you call a task in your application code it is creating a NEW connection pool for each task.

In one client's set up this manifested as the connection count growing by 4-5 connections every time a task was called. If things are restarted frequently enough, which happens in a lot of our apps with CI/CD, it goes unnoticed. If however your app runs for weeks on end it's pretty easy to reach the Redis default of 10k connections.

The Solution

You just need to tell Celery it is working in a thread safe environment by setting result_backend_thread_safe = True .

NOTE: For Django specifically this means added CELERY_RESULT_BACKEND_THREAD_SAFE = True to your settings.py .

Once this is set it should properly re-use Redis connections and stop being a problem for you.

Hope this helps!

]]>/>
How to upgrade FluxCDhttp://www.revsys.com/tidbits/how-to-upgrade-fluxcd/Streamline your FluxCD upgrade in Kubernetes with this concise guide. Learn how a simple re-run of the flux bootstrap command can effortlessly update your FluxCD version, saving you time and hassle.Mon, 27 May 2024 15:51:57 +0000http://www.revsys.com/tidbits/how-to-upgrade-fluxcd/

It's Memorial Day here in the USA My family celebrated it together yesterday so my plan for the day was to "lift and shift" my personal Kubernetes cluster from one cluster to another MOSTLY because I wanted to upgrade FluxCD .

I was under the incorrect assumption that doing an in-place upgrade of Flux was difficult, if not impossible, without some severe disruptions to the services running on that cluster.

I was mostly thinking it would be hard to upgrade the various flux manifests from one CRD version to the next and especially hard to upgrade around the ingress-nginx I have installed WITHOUT it dropping the load balancer and forcing me to repoint all of my DNS.

Considering I was prepared to rebuild all of my services from scratch in the new cluster, I realized this was a perfect time to attempt an in-place upgrade to see what sort of problems do arise. Worst case I wasted a few minutes and proceeded with my rebuild.

Luckily, I was wrong, and it took me all of 3 minutes!

Upgrading Flux

I previously searched around a couple of times trying to find any documentation or blog posts on how to upgrade Flux. I mostly found older posts related to upgrading various v1 situations and this was v2 . In a last ditch effort, I tried a couple of different searches today and finally ran across this Github Discussion .

If you installed it using flux bootstrap, upgrading should be as easy as running that again with the right repository flags...

Wait what? I can just re-run the flux bootstrap command with a newer version of the cli tool and it just works ?!?!?!

I'm pleased to say it does!

All I had to do was re-run the exact same flux bootstrap ... command I had used previously which was:

$ flux bootstrap github \
  --components-extra=image-reflector-controller,image-automation-controller \
  --owner=frankwiles \
  --repository=personal-ops \
  --branch=main \
  --path=cluster/ \
  --read-write-key \
  --personal

After it spit out some progress output of what it was doing all of the containers kicked over and boom, all done.

Before Flux v0.41.2

$ flux version
flux: v0.41.2
helm-controller: v0.31.2
image-automation-controller: v0.31.0
image-reflector-controller: v0.26.1
kustomize-controller: v0.35.1
notification-controller: v0.33.0
source-controller: v0.36.1

v0.41.2 was the last "v2"-ish release prior to them starting the 2.0 release candidates. So while conceptually "v2", I was actually running a really odd version to attempt this with.

After Flux Upgrade

$ flux version
flux: v2.3.0
distribution: flux-v2.3.0
helm-controller: v1.0.1
image-automation-controller: v0.38.0
image-reflector-controller: v0.32.0
kustomize-controller: v1.3.0
notification-controller: v1.3.0
source-controller: v1.3.0

Quite the shift of version numbers!

Why use FluxCD?

I mentioned how easy this went on social media and that I was going to use the time saved to write up this blog post. I got a few questions about why we use Flux vs ArgoCD or why we use these tools at all.

When we first started using Kubernetes with our clients, it was common to simply give your CI systems admin level k8s credentials and have them run kubectl apply or helm upgrade commands to initiate deploys. This never set well with us as the security of most CI systems is usually an after thought. In most cases we deployed our own private CI runners INSIDE our k8s clusters to avoid the bulk of this attack vector, but it was a pain.

We then found a long abandoned project, that I forget the name of, but was very similar to Flux. When it was obvious that tool wasn't going to continue we made the switch.

What tools like Flux and Argo do for you is they provide true gitops yaml manifests that you change to affect changes in your Kubernetes clusters.

They run services inside your cluster and watch a specific git repository looking for changes to these manifests. Let's say we have a repo named gitops and a cluster named bob , the flow is essentially:

  • Make a change to a YAML manifest in the gitops repo, say changing a Docker image from v9.0.1 to v9.0.2 . Doesn't really matter what the change is, just that there is some change you want to have happen in your cluster.
  • The repo is configured to fire a webhook that alerts the CD system (Flux or Argo) that there is a change. This may run inside of bob or in the case of ArgoCD can actually be a centralized Argo service that powers several clusters.
  • The CD containers perform a git pull and reconciles the change.
  • Assuming there aren't any errors, like a YAML syntax error or something, it applies the changes inside of your bob cluster.

Benefits of CD and gitops in a Kubernetes landscape are:

  • No cluster credentials need to exist in your CI systems. All communication of the intent of your change happens via a git repo. While an attacker could still wreck havoc on your cluster by removing services or adjusting security configurations they would not have direct access to the cluster.
  • Developers can also make changes directly without the need for k8s access or knowledge of kubectl and helm .
  • Changes can be vetted using Pull Requests.
  • Changes are tracked in the git commit history for posterity.

Lessons Learned

You should always do one more search before embarking on a long task that seems like it should be easier than it first appears. You might get lucky like I did!

Also always write down the exact flux bootstrap ... command you used when bringing up a new cluster. Personally, I like to document this in the README.md at the root of the gitops repository I'm using for that cluster.

Hopefully this helps other people confused by how to easily upgrade FluxCD. If your company struggles with Kubernetes or gitops you might reach out as we help many of our clients with Kubernetes infrastructure and operations needs including being your entire ops staff if that makes sense.

]]>/>
"En-Rich" your Python testinghttp://www.revsys.com/tidbits/en-rich-your-python-testing/Test output is often an afterthought. You can improve your flow and slightly gamify your bug hunting by combining the wonderful Rich Python library and pytest.Sat, 03 Feb 2024 16:49:11 +0000http://www.revsys.com/tidbits/en-rich-your-python-testing/

Debugging complex code is not usually fun

It's hard to get motivated when faced with a big pile of numbers that just aren't lining up in the ways they are supposed to. Stupid numbers, do what you're told!

The task at hand

I've been rebuilding a 15 to 20-year-old ColdFusion currency trading learning simulation for a client. The math involved is mostly contained in SQL Server stored procedures, some values in tables are really constants (or at least most of the time), and variable names between steps of the process aren't consistent. This has made the rebuilding effort harder than it originally appeared. Add in finding some undesired behaviors in the old app and the desire to improve these situations going forward has been a bit of a slog actually.

I knew that today was the day I had to just dig in and iterate on this code until all of the numbers were correct now that I had a known good set of values to work against.

Since I needed to compare several numbers together, I thought I might use a Rich Table to make the output a little better organized. And I'm VERY glad I did.

Here is what I started with:

Initial boring pytest output

Not exactly pretty. It spit out the numbers it was calculating and I could compare them against the known good values visually. Looking and re-re-reading the numbers a half dozen times I was already bored and feeling tired despite having too much caffeine in my system.

So I implemented all of the expected, calculated and differences between the two in a Rich Table. I took a couple of moments to adjust some colors and right justifying all of these numbers because I knew I was going to be staring at this all freaking day .

Initial Python Rich table formatting

The debugging

I then went about debugging the first Total Unhedged number because all of the following calculations depended on this number being correct. And I quickly found the bug!

I would love to say it was because of this great new visualization, but it was just closely re-reading my Python code and comparing it ( yet again ) to the original portion of a stored procedure to find my subtle mistake.

One down, 6 more to go I thought. This is going to be a long day, but it turned out two more of the values were NEARLY correct now, just off due to rounding. This WAS easier to spot because of the tabular layout. I treated myself to a bit more table improvement, I wrote a little function named calc_diff() that took the expected and actual values and color coded the difference red if there was a difference and green if not.

Improved table with red/green color coding

Table code example

The code for this is surprisingly easy. Rich was already a dependency of something else in my project's virtualenv so I didn't even need to install it. Here is a simplified example of the code for you:

from decimal import Decimal

from rich.console import Console
from rich.table import Table
from rich.text import Text 

def calc_diff(expected, actual):
        diff = expected - actual

        color = "red"
        if diff == 0:
            color = "green"

        text = Text(str(diff))
        text.stylize(color)

        return text

# My actual pytest test
def test_ex3_math(ex3_params):
    expected_total_unhedged = Decimal("3.394148")

    # call a method on a fixture class that does the math calculation for us, yours
   # would obviously be different 
    total_unhedged = ex3_params.total_unhedged()

    # Build our table
    table = Table(
        title="Exercise 3 Answers", title_style="gold1", style="deep_sky_blue3"
    )
    table.add_column(
        "Name", style="turquoise2", header_style="turquoise2", no_wrap=True
    )
    table.add_column("Expected", header_style="turquoise2", justify="right")
    table.add_column("Actual", header_style="turquoise2", justify="right")
    table.add_column("Difference", header_style="turquoise2", justify="right")

    # Total Unhedged - just showing you one row here for sake of brevity
    table.add_row(
        "Total Unhedged",
        str(expected_total_unhedged),
        str(total_unhedged),
        calc_diff(expected_total_unhedged, total_unhedged),
    )

   # Output the table 
   console = Console(color_system="256")
   console.print(table)

   # Trigger our test to fail artificially 
   assert False

Conclusion

Was it worth the extra time? DEFINITELY! I not only made it easier to compare the values as I worked but also motivated me a little more than usual as I turned each red line green.

Every little bit helps, right? Hopefully, you enjoyed this protip and can put it use in one of your upcoming debugging sessions.

If you or your company struggles with testing your Python or Django apps, check out TestStart our testing consulting package to help jump start your team into better productivity.

]]>/>
pytest fixtures are magic!http://www.revsys.com/tidbits/pytest-fixtures-are-magic/Fixtures are building blocks for good tests and can increase development speed. The main issue with writing tests is setting up necessary data before the test, but pytest fixtures make it easier by injecting necessary data into your tests.Tue, 13 Jun 2023 12:56:00 +0000http://www.revsys.com/tidbits/pytest-fixtures-are-magic/

I usually hate magic in code, but in this case, the magic is fantastic!

I must admit I used pytest to run my tests for an embarrassingly long time before I started writing and using pytest style tests. I'm sure it was a mixture of laziness and not understanding the magic behind pytest fixtures.

Because I didn't understand the magic happening, it scared me away from looking at pytest more deeply. Once I did, I wished someone had held me down and forced the explanation down my throat.

Fixtures are the building blocks of writing good tests. Writing great tests is easy if you have great fixtures, and your development velocity will skyrocket.

The main problem with writing tests

One of the problems with writing automated tests is setting the stage for the test. Generating the specific data necessary to exist before we execute the thing we want to test.

It is tedious work, but it does not have to be!

Why is this such a problem? In many systems, it is cumbersome to generate all of the branches of a tree, its roots, the ground, and the sky to verify the leaves turn out to be green in spring.

As an example, if you worked at Github and were tasked with adding a feature that replaces curse words in issue comments on public repositories, you have to:

  • create a user because Organizations need an owner
  • create an Organization to own these repositories
  • create a public repo to test the main functionality
  • create a private repo to ensure it doesn't touch those comments
  • create an issue

And THEN , you can create comments with and without curse words to exercise the feature you're working on. That's a lot of crap to make before you test. Is it any surprise people get lazy and opt to not test the feature end to end?

The answer is not to avoid the test but to make it far easier to generate that data in the first place.

The magic of pytest fixtures is how they are injected into your tests for you to use and their composability. When done well, writing tests is considerably easier and actually fun.

The Magic

pytest fixtures are injected into your test by including the name of the fixture function into the argument list of your test.

import pytest
from seuss.characters import ThingOne, ThingTwo
from seuss.comparisons import are_twins

@pytest.fixture
def thing_one():
    t1 = ThingOne()
    return t1

@pytest.fixture
def thing_two():
    t2 = ThingTwo()
    return t2

def test_book(thing_one, thing_two):
    assert are_twins(thing_one, thing_two)

(This is a reference to Dr. Seuss' book The Cat In The Hat if you aren't familiar with it.)

What happens here is pytest walks your code base looking for all of your tests. This is called test discovery . During that process it inspects the list of arguments to your test functions and matches those to fixtures.

It then organizes the test run to minimize the time of all fixture creation based on the dependency tree. The author of the fixture can control the scope/lifecycle of the fixture data. You can read up more on pytest fixture scopes in this excellent explanation.

So, now we know that if we create a fixture named bob that just adding bob to our list of arguments to our test, we will get that fixture's data. That makes sharing fixture data around hundreds of tests easy.

The composability of fixtures

It's important to realize that fixtures can build on top of each other.

Let's go back to our Github naughty issue comment example. I would structure things so that I had the following fixtures:

  • owner
  • public_repo
  • private_repo
  • public_issue
  • private_issue

It is likely, as we were building our registration process perhaps, we would have a fixture that was an already registered user, one who would be able to act as the repository owner. Let’s assume we have a function named make_repo that handles making a repository for us. But since every repo needs an owner, we create an owner fixture. We can now use the owner fixture as an argument to make_repo() when we create our repo fixtures. So for the next two fixtures, we would just use that fixture as an argument, and it gets pulled in automatically:

@pytest.fixture
def owner():
    return make_registred_user()

@pytest.fixture
def public_repo(owner):
    return make_repo(owner=owner, visibility="public")

@pytest.fixture
def private_repo(owner):
    return make_repo(owner=owner, visibility="private")

We can now quickly get a public and/or a private repo for all future tests we write. Keep in mind these fixtures are available across ALL of the tests in our project. They are injected across Python namespaces and source files. This removes the hassle of having to import dozens of things at the top of your test files.

However, that also means you should take a moment and ensure your fixture's name is a good one that represents it well and is easy to remember, type, and not misspell. Ok, let's get back to work.

Our task is to ensure curse words are rewritten in public issue comments. These fixtures help but only take us halfway to where we want to be. We can solve this with even more fixtures!

from issues.utils import create_issue

@pytest.fixture
def public_issue(public_repo):
    return create_issue(repo=public_repo, title="Something is broken")

@pytest.fixture
def private_issue(private_repo):
    return create_issue(repo=private_repo, title="Something is broken in private")

Now we have the two bits of data (and the tree of data that comes with them) all created so we can actually test our task. It might look something like this:

def test_naughty_comment_public(public_issue):
    comment = public_issue.new_comment("Frak off butthole")
    assert comment.body === "F*** off b**hole"

def test_naughty_comment_private(private_issue):
    comment = private_issue.new_comment("Frak off butthole")
    assert comment.body === "Frak off butthole"

Things you can return

Hopefully, it is evident by now that you can return all sorts of things from fixtures. Here are some ideas that might get your brain seeing how you can use these in your projects.

Simple static data

@pytest.fixture
def password():
    # We can return simple strings
    return "top-secret"

@pytest.fixture
def pi():
    # We can return special/constant values
    return 3.14159

@pytest.fixture
def default_preferences():
    # We can return structures of things that are commonly needed
    return {
        "default_repo_visibility": "public",
        "require_2fa": true,
        "naughty_comments_allowed": false,
    }

Instantiated objects

Often we need access to an object that is either complex to instantiate or expensive in terms of processing time. In those cases, we can turn that instantiation itself into a fixture and reuse it across tests more easily.

from core.models import Organization

@pytest.fixture
def big_org():
    return Organization(members=1000)

@pytest.fixture
def small_org():
    return Organization(members=5)

Callables to make other things

You can have fixtures that return functions or other callables to make it easier to inject them into your tests:

from core.utils import make_new_user

@pytest.fixture
def make_user():
    return make_new_user

def test_user_creation(make_user):
    new_user = make_user(username='frank')
    assert new_user.is_active
    assert new_user.username == 'frank'

Or you can use it to curry these functions to make them easier to use:

from core.utils import make_complicated_user

@pytest.fixture
def make_admin():
    def inner_function(username, email, password):
        return make_complicated_user(
            username=username,
            email=email,
            password=password,
            is_admin=True,
            can_create_public_repos=True,
            can_create_private_repos=True,
        )
    return inner_function

def test_admin_creation(make_admin):
    admin = make_admin(username='frank', email='frank@revsys.com', password='django')
    assert admin.username == 'frank'
    assert admin.is_admin == True

Organizing Your Fixtures

Here at REVSYS, we primarily work with Django projects, so our fixtures tend to live with the Django app with the models used for that fixture. It doesn't have to be this way, of course, it just makes sense in the context of a Django project.

There is nothing keeping you from defining fixtures all over the place in your tests. It just makes it more challenging to know where to look for them.

If a fixture isn't going to be reused outside of a single test file, then it's absolutely appropriate to keep it in that file. Any tests used across a project should be in some central or semi-central location.

pytest helps you with this as well, of course. pytest looks for a file named conftest.py in the directory where you're running pytest (and sub-directories FYI) and that allows you to define where to look for fixtures that live outside of the test files themselves.

Here is a typical project structure here at REVSYS:

project-repo/
    config/
        settings.py
        urls.py
        wsgi.py
    users/
        models.py
        views.py
        tests/
            fixtures.py
            test_models.py
            test_views.py
    orgs/
        models.py
        views.py
        tests/
            fixtures.py
            test_models.py
            test_views.py

We put all of the fixtures we intend to share in the project into the tests/fixtures.py files. We then define project-repo/conftest.py like this:

import pytest

from users.tests.fixtures import *
from orgs.tests.fixtures import *

@pytest.fixture
def globalthing():
    return "some-global-project-data-here"

As you can see, we also are able to define any project wide or global fixtures directly in the conftest.py file if it's more appropriate for something to live there and not in a particular Django app directory.

Plugin Fixtures

pytest plugins can also create and provide fixtures to you. For Django users, pytest-django provides several valuable fixtures:

  • db which automatically marks a test as needing the database to be configured
  • client provides an instance of django.test.Client
  • admin_user returns an automatically created superuser
  • admin_client returns an instance of the test client, already logged in with the admin_user
  • settings gives you django.conf.settings

To show you another example, our library django-test-plus provides itself as a pytest fixture using the name tp . This allows you to take advantage of the test helping functions it provides more easily in pytest style tests. Here is a quick example:

def test_named_view(tp, admin_user):
    # GET the view named 'my-named-view' handling the reverse for you
    response = tp.get('my-named-view')
    # Ensure the response is a 401 because we need to be logged in
    tp.response_401(response)

    # Login as the admin_user and ensure a 200
    with tp.login(admin_user):
        response = tp.get('my-named-view')
        tp.response_200(response)

This is accomplished with just a couple lines of config in our library and then the fixtures themselves . That means you can also provide fixtures in internal libraries that are pip installed and not confine yourself to only having fixtures in single repository situations.

Automatically using pytest fixtures everywhere

Everything we've shown so far is nice, but you may face a code base that needs TONS of fixtures most everywhere. Or even just a handful of fixtures that are needed across most of the code base. No worries, pytest has you covered.

You can define a fixture to automatically be run for all of your tests without specifying it as an argument to each test individually.

@pytest.fixture(autouse=True)
def global_thing():
    # Create something in our database we need everywhere or always
    Config.objects.create(something=True)

def test_something_lame():
    assert Config.objects.filter(something=True).exists()

Built-in Fixtures

pytest comes with a bunch of built-in fixtures as well. I admit I didn't even know about some of these for a long time and they can come in very handy.

Some examples include:

  • tmpdir returns a temporary directory path object that is unique to each test
  • tmp_path gives you a temporary directory as a Path object
  • capsys fixture for capturing stdout and stderr
  • caplog fixture that captures logging output
  • monkeypatch quickly and easily monkeypatch objects, dicts, etc.
  • cache a cache object that can persist state between testing sessions

Further Learning

Learn more details in the pytest fixture docs or look over the full list of built-in fixtures .

Conclusion

The more of our codebase that is covered by meaningful tests the faster we can develop new features and refactor cruft. It is not just about bug-free code. It's about the speed at which you can develop good code.

The easier we make it to write tests, the faster we can deliver great software.

P.S. If you or your team aren't writing tests or just struggling with them, we can help get you there with our TestStart product. One of our experts will swoop in, get you set up, and write a bunch of your first tests for you to show you how it's done in your own code base!

]]>/>
REVSYS is 16 Years Old Today!http://www.revsys.com/tidbits/revsys-is-16-years-old-today/Find out the details of our origin story along with examples of some of the more interesting projects we&#x27;ve tackled in our history.Fri, 05 May 2023 14:17:19 +0000http://www.revsys.com/tidbits/revsys-is-16-years-old-today/

Picture of Kid&#x27;s Birthday

REVSYS is 16 today

Well, technically, it's 21 years old, but it got held back 5 years. I was still employed at The World Company / Sunflower Broadband, the company that was the 2nd cable model ISP in the US and ultimately released the Django web framework. I did some consulting on the side and a few clients needed me to be a "company" and not just "some guy".

It wasn't super expensive to register or cumbersome at tax time, so I started working to find a name. I grabbed some ice cream with the then marketing director of Linux Professional Institute to help me come up with a decent name. I had recently watched the documentary RevolutionOS and floated using revolution in the name somehow. We settled on "Revolution Systems", which we eventually shortened to REVSYS for branding purposes.

So it was registered in 2002, but I left my corporate job to focus on it full time on May 5th, 2007. I put in my notice and it just so happened to line up with Cinco De Mayo so the customary after work drinks leaned toward tequila.

The early days

I wouldn't recommend starting a business just before a major recession. The Great Recession hit in December 2007 just a few months later and I barely survived it. However, once the economy started to pick back up after the initial shock wore off it was a feast of work as everyone had some budget, work to do, but were still in a hiring freeze so contracting was their only option.

After maybe a year, Jacob Kaplan-Moss also had left the World Company and wondered if I wanted to share an office. I had been working from home and coffee shops and didn't really feel like I needed or wanted an office but it was a reasonable price so we did it.

Soon after that Jacob joined me in REVSYS as a partner and I switched our focus from Perl/PostgreSQL/Ops consulting to Python/Django/PostgreSQL/Ops consulting.

Funny story about how that happened. Rikki Endsley was my editor at SysAdmin and LinuxPro magazines at the time and she asked me to write up an interview with Jacob about the creation of the Django Software Foundation and a brief tutorial on Django.

So I had to learn Python and Django for the article and I quickly realized it was soooooo much better than the tools I was using in the Perl world (some of which I had personally written hahaha) that I essentially immediately switched.

Thanks Rikki!!!

We operated just the two of us for a while and eventually hired Jeff Triplett as Employee #1. We made Jeff a partner in REVSYS in 2019.

We followed up that smart hiring decision with LOTS of smart hiring decisions to the team that you can find here .

Our work

While at a PyCon dinner, a long time friend asked if we "still primarily do rescue projects". I hadn't thought about it much, but we do actually take on a LOT of rescue projects.

Rescuing a project with bad performance, stability, or both.

Rescuing a project from the clutches of a team of ineffective programmers.

Rescuing a project's deadline by bringing a lot of senior developer velocity.

Rescuing projects isn't all we do, there are plenty of projects that are greenfield or even legacy projects that are in need of some help but don't really qualify as a "rescue" in my mind.

Some examples of our more interesting previous projects:

  • Sony corporate homepage for a few years
  • Keep CashStar from crumbling on Cyber Mondays
  • Optimizing tasks into the queues of expensive proprietary software systems to allow the client to maximize their use of the expensive product (Cirrus Logic, Boston University, etc)
  • Helping set the architecture of eMoney's foray into cloud based Django microservices and doing quite a bit of the ops and development
  • Game for Netflix's German speaking market that was played by over 750k people over a weekend
  • Help RueLaLa with a high-performance inventory caching expiration issue
  • Integrating several POS systems into a cohesive whole for beer gardens with multiple vendors for a company named Porter
  • Building and refreshing Python.org itself for the PSF
  • Hosted DjangoBirthday when Django turned 10 in Lawrence
  • Lots of Django upgrades for clients. Some relatively easy. Some insane. (Don't hack up the ORM internals ok)
  • Tons of Python/Django/Kubernetes performance-related projects

Some of our current projects include:

  • Politifact.com (the Liar Liar Pants on Fire people)
  • Custom system for an event promotion company to track concert events they promote. Which is far more complicated than you may realize.
  • Oil pumpjack inspection and repair tracking software
  • Rewriting a popular Open Source library project website
  • An interesting EdTech product that smartly uses ChatGPT in a way that truly helps students learn
  • Business simulations at The Wharton School
  • Helping to optimize 2TBs of dairy cow data in a heavily used mission critical app (if you're into milk and cheese in North American anyway). It's an older legacy code base that we've been helping modernize and optimize to improve it's resiliency and reduce infrastructure costs.

We also help a lot of companies improve their DX (Developer Experience) to be able to achieve vastly more efficient development by identifying and removing all of the small barriers that get in the way of your developer's day. Sometimes it's code or CI related, often it's people and process related.

Our experience as consultants allows us to have a razor sharp wide angle lens of what will and won't work for the current team and the project at hand. Each situation is a bit different, but the results we achieve on those sorts of projects are often surprising to everyone involved!

We have some availability coming up over the summer so if your company needs help reach out to us on the website or email us

New Consulting Products

Are you still reading this? Wow you must really like us! While I have you then...

If you're interested we are working on a couple new consulting product packages. Specifically:

Testing Sprint

1 or 2 week Python/Django testing consulting and development. If you currently have some tests, we'll super charge them and lend our experience in making them easier to write and faster to run.

If you aren't currently testing your code programatically, we'll get you set up and going.

Architecture Sprint

One of the best value-for-money aspects of working with us is to have us design your high-level architecture (tech choices, database schema design, etc.) in a way that does not force you to build for the scale you won't have in the first few years but MAY need down the road and make those changes easier when the time comes.

An architecture sprint can easily save your team several developer MONTHS of going down bad roads with good intentions.

Documentation Sprint

Have one of our amazing writers (who are also senior devs) help put in place a good documentation set up for your project to accelerate your internal development or ease customers using your products.

If you have questions or an idea for something you think we might be able to help you with reach out, it's quite possible we can help just haven't thought to advertise it as a service we provide.

Conclusion

No big finish here. 16 years has been a wild ride and thankfully, I don't see this ride ending any time soon.

Heartfelt thanks to all of our employees, customers, and tech community friends for making all of this even possible.

]]>/>
Formatting Gone Wronghttp://www.revsys.com/tidbits/formatting-gone-wrong/Your code formatter may have reformatted your API key. This could cause many confusing errors.Wed, 22 Feb 2023 17:30:00 +0000http://www.revsys.com/tidbits/formatting-gone-wrong/

I posted this tweet after feeling defeated by trying to set up SendGrid with Django. I kept getting a smtplib.SMTPServerDisconnected: Connection unexpectedly closed error and had NO IDEA why.

Turns out, it was my editor. I had two settings that I usually depend on, but today, they were working against me.

Background

I've been going through Will Vincent's " Django For Beginners " book to help catch gaps in my knowledge. It's a great book and Will's a great person. In addition, I often get asked where/how to start learning Django, so I wanted to go through the book in more detail, so I could make a stronger, more informed recommendation. QA isn't just for software! Spoiler alert: It's a great book and Will's a great person.

In Chapter 1, Will has you set up VS Code as your editor ( but it could have happened with most editors ). He tells you to configure it to do two things:

  • Setup black to be your python formatting provider (an excellent suggestion)
  • Configure your editor to format on save (an excellent suggestion)

I already had VS Code configured to do one more thing, and this is the thing that bit me: Autosave on focus change .

Lots of people use this combination of settings. Automatically linting and formatting your code when you save a file can help you avoid CI fails and having to make commits with messages like "fixed linting" . And if you were to ask me, I'd suggest you use black as your Python formatter. It's what the Django codebase uses. Here's where things went wrong for me:

  • an API key
  • not following along closely enough!

API Keys: They're Sensitive

Jumping ahead to Chapter 12, I was setting up my Django project to be able to send emails using SendGrid . While setting up SendGrid, I had to generate an API key, then add it to my settings.py file, so my Django project could connect to the service. When you generate an API key, there's an instant of panic where you think, "Ok, let me copy/paste this somewhere so I don't lose it!" A common approach is to just paste it into the file where it'll be used, then assign/use it later. Since this was a practice project, I was more concerned with, "Does this work?" than application security. So, I hard-coded the API key into my settings file:

EMAIL_HOST_PASSWORD = MY.secret-API.key-dont-look

But, since it was a Python file, black took one look, said, " That's not right! " and 'fixed' it. So, as soon as I switched to another window (remember: I've turned on Autosave on focus change , which was not one of Will's suggestions), black turned it into this:

EMAIL_HOST_PASSWORD = MY.secret - API.key - dont - look

I didn't notice the change, but if you know anything about API keys, those spaces make a difference. Now I was using an invalid API key and wondering why I couldn't connect. After some time pairing with Jeff , he realized there were way too many spaces in that API key. 🤦🏾‍♂️

How To Avoid This?

There are a few approaches I could have taken.

Hard-code the API key as a string

This is exactly what Will suggested! When I go back and look at the screenshot of settings.py in his book, I see:

EMAIL_HOST_PASSWORD = "MY.secret-API.key-dont-look" .

black won't change the contents of a string literal. That would have worked in the short term (Remember kids: don't put your API keys into files you commit to Git!) .

Will literally reminds you of this in the paragraph after the settings.py screenshot. I have that entire paragraph highlighted. He tells you, (in Chapter 12) , In Chapter 16 we will learn how to add environment variables to our project so that we can keep secret information truly secret. So, if I'd followed the screenshot more closely, I'd have been finished with this mid-afternoon Friday.🤦🏾‍♂️

Disable autosave on focus change

I go back and forth on this. The convenience of not forgetting to save a change is nice. However, there's also something to be said for being mindful and intentional about saving your work. This is especially true if you have something else set to run when you save. For now, I've opted for convenience. I could also disable format on save , but I don't like that option. If I'm going to save a change, I want it formatted.

Current best practice

The approach I took was to do what Will foreshadowed from Chapter 16. I made the API key an environment variable and set the project up to use it. Don't ever think you're too fancy to remember your 12 Factor App fundamentals , even on practice projects. In this case that means:

  • adding a .env file of some sort, to store my secret
  • making sure that file does NOT go into version control
  • importing something into settings.py to allow it to read environment variables

Again, this is part of the challenge of being self and community-taught. Going through a book for beginners, there are times I already know to do things that will come later, and there are things that are new to me that might be old hat for others.

The Final Product

I'm using direnv , so I made a .envrc file that contains:

export EMAIL_HOST_PASSWORD = "MY.secret-API.key-dont-look" .

I've also added a .envrc line to my .gitignore file.

As a team, we like django-environ over os.environ (but os.environ is built-in to Python and it will also get the job done) . So my settings.py looks like this:

# settings.py
import environs
env = environs.Env()
...
*** various Django settings ***
...
EMAIL_HOST_PASSWORD = env("EMAIL_HOST_PASSWORD")

And there you go: it all works now! Teamwork Makes The Dream Work (thanks to Lacey for all the helpful input)! And Reading Is Fundamental !

]]>/>
devdata - Improving developer velocity and experiencehttp://www.revsys.com/tidbits/devdata-improving-developer-velocity-and-experience/Developers are the largest expense with software. Improving their productivity doesn&#x27;t just make sense to the company&#x27;s bottom line, but it improves morale as they no longer have to wade through a bunch of crap to get started on the job at hand. This data generation technique helps improve your developer experience and testing.Fri, 17 Feb 2023 00:24:00 +0000http://www.revsys.com/tidbits/devdata-improving-developer-velocity-and-experience/

Developers spend the majority of their time either debugging existing code or writing new bugs into new code. Except for all the time we spend in meetings talking about code of course.

I'm half joking, but there is some truth to this. After 25+ years writing code professionally, I've given up looking for magic bullets that solve all the problems. Today's AI ain't gonna help you here folks.

Instead, I've focused on small tactical changes that seem to improve developer productivity.

One technique we've been using recently I call "devdata".

I'm going to cover the problem and then a high-level explanation of why devdata works better and finish off with some deeper details to be successful if you choose to try this technique.

The Challenge

One of the biggest challenges to debugging is setting up your data exactly like the scenario where the bug happened. It's the same with writing tests. We have to set the stage before asserting our code does the right thing.

One way to do this is to snapshot your production database(s), pull it local to your laptop, and start debugging. This can work and is often the first thing developers reach for. It sure is more manageable than hand-crafting the exact situation.

This technique isn't optimal if your data is sensitive. Now you have to spend some time writing code to sanitize your data. Did you do it right and obfuscate it enough? Who knows. I don't and you probably don't either because maybe Susan added a new Social Security Number field since the last time anyone updated the sanitization code.

Can we trust this sub-contractor to have the sanitized version? To be safe maybe he can try and work without it?

And it falls apart when your production database is 2TB. Even if will fit on your laptop, transferring it there means you get to start debugging in a couple hours.

Another way to tackle the problem is to not use a copy of production and instead craft a specific test and fixture data to closely mimic the problem. Safer, but this specific fixture of data isn't likely applicable to any other bug or feature.

A Better Way

Developers are the largest expense with software. Improving their productivity doesn't just make sense to the company's bottom line, but it improves morale as they no longer have to wade through a bunch of crap to get started on the job at hand.

The suggestion I'm proposing is to invest some time in a realistic but fake data generation tool. We call it devdata but the name isn't at all important.

This tool should:

  • Dynamically generate common scenarios in your application in your local development environment
  • It has to be able to reset to known states quickly and easily
  • It should keep some names and credentials consistent
  • It should be easy to extend as your application changes over time
  • And it has to be used by the vast majority of your team

At this point I think it's easier to start talking in terms of how this works using a real application.

The Example App: Pizza

Everybody is familiar with pizza, so we're going to talk about a SaaS product that does pizzaria management.

In software development we often talk about User Stories and Personas. What I'm essentially suggesting here is that you automate bringing some of these story scenarios and personas to life.

Pizzarias come in all shapes and sizes, but what I would start with is a few specific scenarios.

Scenario 1: Lombardi's

Lombardi's is your typical small, single-location, local pizza place. There is one owner, Gennaro. Who is also the only manager, but has 7 employees. They only offer two crusts in three sizes, have regular toppings, and don't do delivery.

Scenario 2: Kansas Pizza Kitchen

The lesser known cousin of the California variety. KPK has 6 locations in 4 cities. There are three owners, 8 managers, 12 supervisors, 41 employees, and they do delivery in 4 of their 6 locations. Along with pizza they also have a few pasta dishes and need a fairly complicated intake form for their occasional catering gig.

Scenario 3: ACME Pizza

ACME is a huge publicly traded pizza empire. Lots of locations across many states, several layers of management, and more employees than you think. Complicated in every way possible. Uses all the features of your app.

They're so large you're certain they're going to stop using your app and build their own in-house any day now.

So what does this do for us?

First, this gives us some named scenarios we can talk about.

"I see how this is useful for Lombardi's, but how is the UI going to work for KPK and ACME users?"

"So I think I found a weird sales tax bug that doesn't happen for Lombardi's, but does when you order a delivery with KPK."

Here at REVSYS we mostly work with Django, so our devdata tool is built as a Django management command. Because ACME is a big beast with lots of data it takes awhile to generate all the fake pizza orders so we set up two initial options.

./manage.py devdata common

It will wipe away your existing local development database, leaving just your Django superusers in place so you can get to the Django admin without generating a new user/password each time. You're going to run this command at least a few times a day, if not dozens, so that would be annoying.

This common scenario then sets up Lombardi's and KPK. Here is where consistency of naming comes in handy. We should hard code Gennaro Lombardi and gennaro@lombardispizza.com to be the owner. The rest of the employees, orders, and customers should be random-ish and generated with something like Faker .

The hard coded bits are our anchors we can use to quickly hijack a user who is of a certain persona to poke around in the UI.

We can also then setup ./manage.py devdata complex which will run the same common scenario as above, but add in the larger, more time consuming ACME scenario when needed.

Hopefully you're seeing how this can apply to your own application, but some other things I would likely generate are:

  • Customers for each pizzaria with varying levels of previous orders, reward points, etc
  • A few orders for each at various stages (new, cooking, out for delivery, etc) with random-ish data. A few simple "large cheese" orders and a couple more complicated orders.
  • I'd randomly set each to run out of something. No mushrooms for you!
  • Maybe we'd also set one of KPK's locations to have weird hours so they are closed during the day, but open midnight to 6am. Timezones are hard and this helps us test them.
  • A common sale or promotion or two.

The main benefit of all of this is that we can quickly jump into a variety of situations in these scenarios as different user personas.

Did that little logic change I just made to coupons break something in the UI? Customers reported if you add pineapple to thin crust pizza it shows up on the checkout screen as extra cheese, but shows the kitchen staff the right information.

Easy, just hijack a user with a persona nearest your problem and adjust things a bit to your situation.

And then, when you've found and fixed your bug, run ./manage.py devdata common again and you're back to a known state.

Additional Benefits

  • Done right this generation code can often be re-used or built from the same code you're using to generate test data for your automated tests. If you are consistent with this both your ability to generate awesome test fixtures AND your ability to manually test are greatly improved
  • Your frontend developers can get started more easily as they have realistic data to throw on the screen
  • I've found several small UI bugs simply because much of the data was random and faked. Oh look, that wraps weirdly when the user's last name is longer than 12 characters

Technical Details

Since we are typically using Django, our devdata commands are implemented using django-click which makes it extremely easy to make a great Django Management Command with all of the great argument parsing and power of the Click .

We also use pytest for our tests, but you can just call a pytest fixture directly. You can however wrap a function that generates something in another function so that you can share it between the two. For example:

# File: toppings/tests/fixtures.py

import pytest
from devdata.generation import generate_available_toppings

@pytest.fixture
def available_toppings():
    return generate_available_toppings()

I would also encourage you to write an option into your command to clear out all of your generated test data completely. This is useful if you ever want to generate (and then wipe it all away) safely in production.

Conclusion and Challenge

I encourage you to give this technique a try. Now that I've been using in a few projects I immediately miss it in the ones that do not have it. Automated testing, manual testing and even just exploring around a new UI feature is more of a pain and hence don't happen as frequently as they should.

I promise your team's velocity will improve far beyond the time investment.

P.S If you need help convincing your boss, I'd be happy to help convince them of the benefits.

]]>/>
Breaking up django-ninja viewshttp://www.revsys.com/tidbits/breaking-up-django-ninja-views/A simple import trick to make it easy to keep your django-ninja view functions in different Django apps and files.Sat, 04 Feb 2023 22:44:00 +0000http://www.revsys.com/tidbits/breaking-up-django-ninja-views/

I've been evaluating django-ninja recently, using it in a personal project to see when it might be appropriate for our clients.

The Problem - How to have multiple view files

One thing that was not clear was how to break up your API views and sprinkle them around in multiple files. The default tutorial and documentation is not bad, but just does not cover this small thing.

The way django-ninja works is that you decorate functional views with @api where you indicate what HTTP method it is for and some other parameters. Everything is perfect until you realize your views.py has grown huge or you want to include api views from several Django applications.

For those of us who are very familiar with DRF, it is not immediately clear how you can break your views into multiple files.

Let's use an example of having a Django apps named core , users , and posts to mimic some sort of blog like software and our project directory, which contains our settings and urls files will be named config/ which is the standard way we roll at REVSYS.

Your first guess might be that you should create another instance of NinjaAPI and use that to decorate your views in each file. Unfortunately, because of how the internal registry works for these, that would create multiple django-ninja "sites", each with their own docs, etc. which is obviously not what we want.

The Solution

The easiest way to do this is to pass around your API decorator and ensure your view files are imported before you call on Ninja to generate your URLs.

While I hate having an app named core , I used it in this case. So we create our API instance with the following:

# File: core/api.py
from ninja import NinjaAPI

api = NinjaAPI(csrf=True)

And then in our views we import this and use it:

# File: users/views.py
from core.api import api

@api.get("users/something/"):
def get_user(request):
   # ... do something interesting here ...

# File: posts/views.py
from core.api import api

@api.get("posts/{username}/"):
def get_posts(requests, username: str):
   # ... do something interesting here ...

Then the final bit is to make sure we import these files in our urls.py before we call api.urls like this:

# File: config/urls.py
import users.views
import posts.views
from core.api import api

urlpatterns = [
    path("api/", api.urls),
]

Alternative Solution

Turns out there is also a DRF like concept of routers which is honestly probably a better more maintainable solution than the above.

Thanks to Ville Säävuori for point this out after I posted this.

Hopefully, this saves you a bit of time when you first start using django-ninja!

If you need help creating Django REST APIs don't hesitate to reach out to us !

]]>/>
Website vs Web App: Hidden Complexitieshttp://www.revsys.com/tidbits/website-vs-web-app-hidden-complexities/Most of what are called `websites` today are `web apps`. These are computer applications that use the web as their delivery method and GUI. However, since they look like websites, their complexity and design challenges are often underestimated.Mon, 19 Dec 2022 20:42:48 +0000http://www.revsys.com/tidbits/website-vs-web-app-hidden-complexities/

The Old Days: Static Files & the Request/Response Cycle

Back in The Old Days, we built websites that were mostly for reading text or looking at pictures. Someone's web client sent a request to your web server and it sent static HTML files back as a response . Then we added CSS for style and JavaScript for dynamism. And while websites became increasingly more complex, they remained, primarily, static files sent as a response to an HTTP request . The request identified which set of pre-existing static files to send as a response . This model ( each page on a site roughly equates to one .html file that can be 'jazzed up' with CSS and JS ) is fairly straightforward and easily held in the mind. It can also be easily explained to people who don't create websites, but want one created. Think of a department manager or business owner. Sadly, that led many people to think, "Building a website is easy!"

Modern Web Apps: Database-Backed Web Sites

The majority of web addresses people type into their browsers are not websites. They're web applications or web apps. But what is a web app and how is it different from a website? In simple terms, most modern web apps are a database of related content with an HTML GUI. The HTML GUI makes the database easily accessible to the intended audience. Amazon is a database of books and various other items for sale. Instagram is a database of pictures. Most blogs are primarily databases of text documents.

The benefit of a database is its ability to store large amounts of information. The problem with a database is getting the specific information you want from it, as well as getting that information in the format you want.

As I said before, these days most websites are web applications . And while the two often look the same, the design and implementation challenges are vastly different.

The Difference: Processing the Request

One of the main ways that a web app differs from a website is that a web app adds an extra step to the request-response cycle that's often overlooked: processing.

Most websites don't need to do much processing. They send a response of HTML/CSS/JavaScript to the client that made the request . That request specifies which documents are desired via a URI with a path to those pre-existing documents. So, https://send.me/this/thing.html where thing.html already exists.

Contrast this with a web app. They frequently get requests that don’t end in a file extension like .html . This request ends in a query string, which begins with the ? character: https://send.me/search?q=thing&SomeOtherStuff . In this case, the app is being asked (queried), "Tell me what you know about thing and SomeOtherStuff and how the two are related." This is where the processing happens. It includes: - converting the search terms into something useable by the backend software - forming a valid query to send to the database with those terms - getting the database response back - converting that database response into a properly formed HTTP response that can be sent back to the client

While websites have a request/response cycle, web apps almost always have a request/processing/response cycle. That processing step can vary quite a bit in terms of time and resource cost. A lot of it can relate to having to talk to your database and wait for it to respond. Then, multiply that processing cost (in time and resources) by the number of visitors to your web app. It can get out of hand fast.

As an aside, this is one reason static-site generators have made a return to popularity for blogs and many other uses. Depending on your use case, you don't always need a database. As much as we at REVSYS love Django, the Upgrade Django and DjangoCon US sites are both built with SSGs.

Speed and Data Structure

Shortly after starting at REVSYS, I was talking to my co-worker Jeff about best practices when starting a new Django project. He said when working on a new web app (I'm going to paraphrase), he starts by thinking about the data models, or: - What information do you want in the database (what are you trying to keep track of with this app)? - How do those bits of data relate to each other?

When someone goes to a web app, they could be doing multiple things . Most frequently, they're looking for something. But search terms that seem simple to most people ("Chinese restaurant near me" or "1-inch green washi tape") are ONLY simple in a web app if proper consideration has been given to the data models and how they can be queried. When every click is a database query, you want those queries to be as fast as possible. And as your data becomes more complex or increases in volume (imagine the options on a growing fleet of cars or laptops or different models of pens or notebooks), the structure of your data becomes even more important. Beyond the structure of the data (Do you want categories? How many?), the structure of the queries themselves becomes more critical. Writing a nested query may seem to make sense as you step through the logic, but it's almost always a bad idea.

The related issues of query speed and data schemas are things most websites don't have to deal with. But almost every web app has to be designed with this constraint in mind.

Efficiency vs Effectiveness

Slow queries are problematic. They turn off potential users and can use more computing resources, becoming expensive from a time and financial standpoint. But an ineffective query, one that returns a useless result, is just as bad. It doesn't matter how fast it runs if the result has no value. Structuring your data properly helps ensure you can run queries that will return the results people want. I won't name names, but there's at least ONE internet destination whose terrible search function almost ALWAYS disappoints me.😭

Other Things and Remote Control

There are two other issues I want to discuss.

First, more on the processing step. I've focused on database queries, but depending on your web app, there can be a LOT of other things going on. The app could be making API calls to other web apps or software (so each request triggers a new, external request/processing/response cycle). It could be doing all manner of complex calculations on your request or the results of the database query. Anything a desktop app could be doing, a web app could also be doing in that processing step. This is very different from a website, which is usually able to skip the processing step.

Second, let's recall that a web app is a computer application, similar in many ways to a desktop application. But instead of running on your local machine, you're driving it by remote control, over the web. Modern web apps like Google Sheets and Gmail make this look simple (how that happens is an entirely different conversation and beyond the scope of this post), but I assure you, it isn't. Also, most of us don't have Google's resources.

Conclusion

While they look the same, websites and web apps are VERY different things. And with web apps becoming more common, the confusion has only grown.

This is not to say that one is better or worse than the other. But because they look identical from the outside, it's easy to confuse what's needed to build one that works well. There's usually a lot more that goes into designing and building a well-functioning web app than a well-functioning website that primarily delivers static or mildly dynamic content.

Something most web (and desktop application) developers have heard is, "It's just a button!" when someone wants to add something new. The visual representation, the colored shape, is easy. It's making the button function that contains the difficulty. If you click a button, you want a result. That means kicking off a brand new request/processing/response cycle and writing and maintaining the code that does the underlying processing.

]]>/>
Nullable DRF FloatFields and Javascripthttp://www.revsys.com/tidbits/nullable-drf-floatfields-and-js/Sometimes you want your FloatFields to be nullable, but your JS code isn&#x27;t quite smart enough to handle it.Tue, 15 Nov 2022 21:43:00 +0000http://www.revsys.com/tidbits/nullable-drf-floatfields-and-js/

I run into this situation every couple of years, and I always forget the solution. Even asked around the office and everyone had the same "Oh yeah, I remember doing that once... but now how".

When these situations happen to us, it is often time to write a blog post!

The situation I was faced with was I had:

  • A model with a few nullable Django Model FloatFields
  • Django Rest Framework, using a ModelSerializer
  • react-hook-form, react-query and standard fetch calls to POST the data to the endpoint

The issue is that we have a simple HTML text input to collect a possible float value. It's easy enough to mark this as not required on the form and any client-side form validation. However, it is NOT always that easy to handle what happens next. Because it's a text field, the JS tooling wants to send it as an empty string. So we end up with a payload that looks like:

{
  "name": "Frank Wiles",
  "weight": "",
}

When what we'd really like to have is:

{
  "name": "Frank Wiles",
  "weight": null,
}

DRF rejects this saying that an empty string is not a number. It is correct of course, just not being super helpful in this exact situation. So what is the solution? Enter BlankableFloatField

from rest_framework.serializers import FloatField


class BlankableFloatField(FloatField):
    """
    This accepts an empty string as None
    """

    def to_internal_value(self, data):
        if data == "":
            return None
        return super().to_internal_value(data)

You can override this on your ModelSerializer by just specifying the fields manually like this:

class PersonSerializer(serializers.ModelSerializer):
    # ... your other fields here ... 
    weight = BlankableFloatField(required=False, allow_null=True, default=None)

The solution isn't hard to execute, but it's hard to reason about and figure out the best place to adjust this in your Django project.

Hopefully, the next time you find yourself with Javascript error in your console saying that an empty string is not a number this helps you out!

]]>/>
Helpful Shell alias to git repo roothttp://www.revsys.com/tidbits/shell-alias-to-git-repo-root/I created a simple `r` shell alias that will take me to the root of the current git repository. As this is a common thing I need to do on my Mac BookPro when I&#x27;ve been exploring deep into a code base and now need to run some command at the root.Tue, 08 Nov 2022 22:43:00 +0000http://www.revsys.com/tidbits/shell-alias-to-git-repo-root/

I've recently been on a mission to take my own advice and spend a little bit of time each week optimizing my personal coding setup. Recently I constantly need to got to the root of the current git repository I'm in.

It doesn't sound like a big deal, does it?

It isn't that big of a deal. However, it's keeping me from dozens to hundreds of `cd ../../..` type keystrokes every day! This won't single-handedly keep RSI away, but every little bit helps.

So what did I set up? I set up two zsh aliases.

  • `r` which will immediately cd me to the root of the current git repository I'm in
  • `rd` which will run the fd command from the perspective of the root of the git repo

I'm on OSX, so the only new dependency was I needed to install `git-extras` with `brew install git-extras`. git-extras is useful all on it's own, but I haven't used that many of its features yet. My muscle memory for such things takes awhile to set in as I'm sure it does for you.

I added the following to the end of my `~/.zshrc`:

# Git customization
alias r='cd $(git root)'
function rd() {
    fd "$1" $(git root)
}

Now, if I type `r` I'm immediately taken to the git root. And if I run `rd settings` from a couple of directories deep in a code base I get.

git zsh rd function shell output

I hope you'll join me in taking a little time to improve our workflows for better developer experience (DX) and a slightly easier day!

Fish Shell Version

Daniel worked up the fish shell version for us all

# ~/fish-configs/functions/r.fish
function r
    cd (git root)
end

function rd -a pattern
    fd "$pattern" (git root)
end
]]>/>
A Tip About DRF Permissionshttp://www.revsys.com/tidbits/tip-about-drf-permissions/I needed to structure permissions in an API view that had multiple user roles, and I learned about using the &amp; (and), | (or) and ~ (not) operators with Django REST Framework permissions.Tue, 11 Feb 2020 23:44:26 +0000http://www.revsys.com/tidbits/tip-about-drf-permissions/

Recently, I needed to structure permissions in an API view so that someone could access an API if they were a superuser or they had some other status, like being a manager. The Django Rest Framework Permissions docs say that “If any permission check fails an exceptions.PermissionDenied or exceptions.NotAuthenticated exception will be raised,” which made me think I’d have to override the get_permissions() method of the view or write a series of complicated IsUserTypeA , IsUserTypeAOrUserTypeB permissions classes, in order to do what I wanted. That didn’t seem ideal.

Then a colleague pointed out one line in the DRF docs further down on the same page : “Note: it supports & (and), | (or) and ~ (not).” This was exactly what I needed.

Here’s how it works.

All permissions must return True

The expected way that DRF permissions work is as stated above; when you list multiple permission classes in permission_classes , each of them must return True in order for the user to have access to that view. This is most likely what you’re used to when implementing DRF permissions.

from rest_framework.permissions import IsAuthenticated, IsAdminUser, ReadOnly
from rest_framework.viewsets import ModelViewSet

from my_app.permissions import IsStudent, IsFaculty


class MyModelViewSet(ModelViewSet):
    permission_classes = (IsAuthenticated, IsAdminUser)

In this case, the user must be authenticated and be an admin user, or else they cannot access any part of the view.

The | (or) operator

Using the | (or) operator in DRF permissions allows you to set multiple conditions the user might meet and be able to access this view. For instance, for an educational app, a view might be available to students or faculty.

from rest_framework.permissions import IsAuthenticated, IsAdminUser, ReadOnly
from rest_framework.viewsets import ModelViewSet

from my_app.permissions import IsStudent, IsFaculty


class MyModelViewSet(ModelViewSet):
    permission_classes = (IsAdminUser | IsStudent | IsFaculty)

This expression (IsAdminUser | IsStudent | IsFaculty) allows a user who is an admin, a student, or a faculty member to access the view. Instead of needing to pass all three of these conditions, a user only needs to pass one.

The way I accomplished this before was a custom permissions class named something like IsAdminStudentOrFaculty , but with multiple roles and complex permissions, this meant creating custom permission classes for nearly every combination of user role. (In this example, I refer to custom permissions IsStudent and IsFaculty ; there are some examples in the docs of how to implement custom permissions like these.)

The & (and) operator

Especially combined with the | (or) operator, the & (and) operator is really powerful. It enables you to set clear permissions boundaries without writing increasingly complex custom permissions classes.

from rest_framework.permissions import IsAuthenticated, IsAdminUser, ReadOnly
from rest_framework.viewsets import ModelViewSet

from my_app.permissions import IsStudent, IsFaculty


class MyModelViewSet(ModelViewSet):
    permission_classes = (IsAuthenticated & (IsAdminUser | IsFaculty | ReadOnly))

This example, (IsAuthenticated & (IsAdminUser | IsFaculty | ReadOnly)) , requires that all users accessing this view be authenticated. In addition, it sets other permissions: a user must be authenticated and be an admin or a faculty member, or they only have “read” access to this view.

The ~ (not) operator

The ~ (not) operator allows you to simplify your permissions to permit everyone except users who meet specific criteria. Personally, I prefer to set permissions with positive language, like “allow admins and faculty” rather than “don’t allow students,” but there are probably cases where the ~ operator is a great option.

from rest_framework.permissions import IsAuthenticated, IsAdminUser, ReadOnly
from rest_framework.viewsets import ModelViewSet

from my_app.permissions import IsStudent, IsFaculty


class MyModelViewSet(ModelViewSet):
    permission_classes = (~IsStudent & IsAuthenticated)

(~IsStudent & IsAuthenticated) says that anyone who is authenticated and who is not a student can access this view.

I hope you, like me, now can free yourself from the cumbersome task of writing too many custom permission classes in your apps with several different types of user roles and instead make use of simpler permission classes and the & , | and ~ operators!

Further Reading

Thanks to Jeff Triplett for reviewing a draft of this article and finding this tip in the docs in the first place!

]]>/>
Custom Exceptions in Django REST Frameworkhttp://www.revsys.com/tidbits/custom-exceptions-django-rest-framework/I was working on a project with a coworker recently and I noticed in one of their pull requests that they used a custom exception in one of our Django REST Framework viewsets. I prefer this way to what I was doing before, so I wanted to share it with you!Tue, 04 Feb 2020 23:06:00 +0000http://www.revsys.com/tidbits/custom-exceptions-django-rest-framework/

I was working on a project with a coworker recently and I noticed in one of their pull requests that they used a custom exception in one of our Django REST Framework viewsets. I prefer this way to what I was doing before, so I wanted to share it with you!

My Less Clean, Less Consistent Exception Handling

I made good use of the status codes page of the DRF docs. For example, if a user uploaded a CSV file that my view couldn’t read, I might have had something like this:

from rest_framework import status
from rest_framework.response import Response
from rest_framework.views import APIView


class MyCSVUploadView(APIView):
    def post(self, request, *args, **kwargs):
        
        if file_error:
            return Response(
                {"detail": "Unable to read file."}, 
                status=status.HTTP_400_BAD_REQUEST
            )

This works just fine. But in some cases, I might return the same type of error response in multiple places. For a university that is no longer accepting applications, I might have multiple places where I am essentially passing the same “Deadline passed” message. Although the response is just one line of code, I have to find it, copy it, and paste it to make sure I’m using the same wording, or deal with inconsistency in my exception messages.

Custom Exception Classes for Clean Consistency

For exception conditions that you encounter frequently in your code, you can override DRF’s APIException class and create your own exceptions! Read more in the docs .

Import the APIException class and create a new class that inherits from it. Then set the status_code , default_detail , and default_code attributes.

# exceptions.py
from rest_framework.exceptions import APIException


class UnreadableCSVFile(APIException):
    status_code = 400
    default_detail = "Unable to read file."
    default_code = "unreadable_csv_file"

The status_code attribute defines the HTTP status code you want to send in the response with this exception; refer to the DRF list of status codes for help. The default_detail attribute is the human-readable exception message you want to pass in the response. The default_code is a string that represents this specific exception.

In your view, instead of manually returning a Response() object, raise your custom exception.

# views.py
from rest_framework.views import APIView

From my_app.exceptions import UnreadableCSVFile

class MyCSVUploadView(APIView):
    def post(self, request, *args, **kwargs):
        
        if file_error:
            raise UnreadableCSVFile()

This allows my exceptions to be reusable across my app, which means the language I use in exception messages is always consistent.

Custom Exception Handlers

You can further customize your exceptions by creating a custom handler function ( docs on custom exception handling ). This is especially helpful if there are pieces of data, like the status code, that you want to ensure are included in all of your HTTP responses. This is a slightly different goal than what I refer to above, but it’s worth knowing about. Here is the example from the docs:

from rest_framework.views import exception_handler

def custom_exception_handler(exc, context):
    # Call REST framework's default exception handler first, 
    # to get the standard error response.
    response = exception_handler(exc, context)

    # Now add the HTTP status code to the response.
    if response is not None:
        response.data['status_code'] = response.status_code

    return response

This adds the status_code to each exception response sent in your app.

You would then add your custom exception handler to the REST_FRAMEWORK settings in your settings.py so this custom exception handler will be executed every time there is an API exception:

REST_FRAMEWORK = {
    'EXCEPTION_HANDLER': 'my_project.my_app.utils.custom_exception_handler'
}

For my example above, with the unreadable CSV file, the response would look like this:

{"status_code": 400, "detail": "Unable to read file."}

With the status code returned as part of the response dictionary.

Thanks to Frank Wiles for teaching me about custom exceptions, and Jeff Triplett for reviewing a draft of this article.

]]>/>
SSH_AUTH_SOCK, tmux and youhttp://www.revsys.com/tidbits/ssh_auth_sock-tmux-and-you/tmux is awesome. I have used it for years. This post is not about tmux but about solving a problem in the context of tmux.Sat, 30 Nov 2019 03:16:00 +0000http://www.revsys.com/tidbits/ssh_auth_sock-tmux-and-you/

Problem Definition

I use tmux (nearly) every day. One of the roles it fulfills is as a remote session persistence manager. I login to a remote host, connect to the tmux session I've been using for the last week and do things . That tmux session may persist for up to a month. Often my laptop stays at the office for days at a time. Assuming there have been no wifi router reboot, ISP hiccup, OS update-related laptop reboot, dog snagging the power adapter out the side of the laptop, fuse on the power strip the power adapter for the laptop is plugged into blowing because the hedgehog is cold and wants to snuggle events, tmux sessions are exactly as they were when last I interacted with them!

The one exception: ssh --specifically, ssh-agent forwarding. Each request for agent forwarding is assigned a local socket named /tmp/ssh-blanrndthng/agent.$( pidof local end of the agent link ) . This path is assigned to the environment variable, SSH_AUTH_SOCKET .

When a new tmux session or window is created, the main tmux process reads the environment of the initiating process. Now, the prime directive of programs such as tmux is that it must persist the user's session. The best way for it to achieve this goal is to take on some daemon-like characteristics such as dissociating itself from its parent process and becoming a ward of the init process. This facilitates the feature of being able to disconnect from a tmux client-session leaving the editor sessions, log tail windows, etc. in whatever state they are currently in.

:# ps f -o session,pid,ppid,cmd -s 26804,26761
 SESS   PID  PPID CMD
26761 26761     1 tmux
26804 26804 26761  \_ -bash
26804 28003 26804      \_ ps f -o session,pid,ppid,cmd -s 2680

When tmux creates a new session (usually a shell), the shell environment as that tmux process knows it is passed to that session which is the state of the environment when the master tmux process was created.

If I have an active/connected terminal connection from the laptop to this host, there is no problem. ssh works as expected. It is when the user disconnects from all sessions that things get out of sync. So, let's say, one of the aforementioned events has occurred. Everything is back up. Systems are rebooted. Blah blah blah. I do the ssh dance and am reconnected to the remote host, reattach the tmux client-session I was working with previously. Everything works great until ssh-agent comes into play: Error connecting to agent: No such file or directory .

Fuck.

<Ctrl>-Bc (new tmux window with properly set SSH_AUTH_SOCK)

git push

<Ctrl>-Bp (return to previous window)

<Ctrl>-D (exit shell/close window with unset SSH_AUTH_SOCK)


NOTE

Absolutely nothing is malfunctioning. There is nothing that tmux or ssh-agent are doing (or not doing) that they shouldn't be (or should be) doing. When a new window/session is created, tmux learns the environment of the calling process and passes it on. It has no mechanism (nor should it) to influence the environment of existing processes.


Solution

The problem can be summed up as: there is a dynamic element involved when creating a static session. The solution is to make this element (our erstwhile problem variable, SSH_AUTH_SOCK ) another static element for existing or new sessions.

check & repair

  • is there a functional agent configuration (test against existing SSH_AUTH_SOCK present by default when ssh-agent forwarding is used)
  • no
    • do nothing
  • yes
    • reset SSH_AUTH_SOCK to point to the (static) filesystem location ( ~/.ssh/ssh_auth_sock )
    • is this a functional agent configuration?
      • yes: do nothing
      • no: ln -sf ${SSH_AUTH_SOCK} $HOME/.ssh/ssh_auth_sock

~/.bashrc_agent

# vim: ft=sh

function _check_ssh_agent() {
        return $( ssh-add -l >& /dev/null )
}

function set_ssh_agent() {

        local SAS=${SSH_AUTH_SOCK}

        _check_ssh_agent &&
                local SSH_AUTH_SOCK=${HOME}/.ssh/ssh_auth_sock
                _check_ssh_agent ||
                        ln -sf ${SAS} $HOME/.ssh/ssh_auth_sock

        # recall, "||" and "&&" operate on the 0/non-0 property
        # of the called function's return value. If the check succeeds
        # with the alternative socket path, the "ssh-add" call returns
        # 0, so there is nothing more to do. It is only if the alternative
        # path does not have a functional agent that a non-0 value will
        # be returned.  "&&" proceeds if 0 is returned. "||" proceeds
        # if non-0 is returned, thus, "||" is the correct glyph to
        # use since we have additional work to do.
}

set_ssh_agent

Run the check on login

~/.bashrc

[...]
[[ -f ~/.bashrc_agent ]] && . ~/.bashrc_agent

This directive lets ssh client utilities know where to look:

~/.ssh/config

Host *
  IdentityAgent ~/.ssh/ssh_auth_sock

Dark Corners

The one bit that popped into my head as I was writing this: This sort of thing should never be done with login accounts used by multiple individuals. The key material belonging to the last individual to login to the account and who is still logged in will be what every tmux session for that account has access to.

]]>/>
How to Add Django Models to the Wagtail Adminhttp://www.revsys.com/tidbits/how-add-django-models-wagtail-admin/When working with Wagtail, you might find that you&#x27;re using Wagtail Page models for some of your database models, but regular Django models for others. In this post, learn how to add your Django models to the Wagtail admin to make managing multiple types of models easier.Tue, 27 Aug 2019 20:47:00 +0000http://www.revsys.com/tidbits/how-add-django-models-wagtail-admin/

Versions

  • Python 3.7
  • Django 2.2
  • Wagtail 2.6

When working with Wagtail, you might find that you're using Wagtail Page models for some of your database models, but regular Django models for others.

A built-in example of this is the Django User model. When you log into the Wagtail admin, you can see the Django User model in the Settings submenu. The User model is not a Wagtail model; it's the same User model you see in a Django project that doesn't use Wagtail. Wagtail just exposes it to the Admin for you.

users_admin.png

We can do the same thing with our Django models: we can expose them to the Wagtail admin so we don't have to maintain two separate admin interfaces to manage our website content.

For this example, let's assume we're working with these models:

from django.db import models 


class Pizza(models.Model):

    name = models.CharField(max_length=30)
    toppings = models.ManyToManyField("Topping")


class Topping(models.Model):

    name = models.CharField(max_length=30)

Adding a single model

The Wagtail docs are pretty clear on how to accomplish this, but let's walk through the steps.

First, make sure wagtail.contrib.modeladmin is in your INSTALLED_APPS :

# settings.py 

INSTALLED_APPS = [
    ...
    "wagtail.contrib.modeladmin",
]

Next, in the same app as the model you want to expose to the Wagtail admin, add a file called wagtail_hooks.py .

# wagtail_hooks.py

from wagtail.contrib.modeladmin.options import ModelAdmin, modeladmin_register 

from .models import Pizza


class PizzaAdmin(ModelAdmin):
    model = Pizza 
    menu_label = "Pizza"  
    menu_icon = "pick" 
    menu_order = 200 
    add_to_settings_menu = False 
    exclude_from_explorer = False 
    list_display = ("name",)
    list_filter = ("toppings",)
    search_fields = ("name",)


modeladmin_register(PizzaAdmin)

Let's step through these options in the ModelAdmin class:

  • model : The name of the model you're adding.
  • menu_label : Leave this blank to use the verbose_name_plural from your model. Give it a value to specify a new label for the Wagtail menu.
  • menu_icon : Every menu item in the Wagtail admin has an icon, and you can specify the one you want to use. Here is a list of the available icons .
  • menu_order : What order you want this model to appear in. 000 is first, 100 is second, etc. Note: if you add multiple models to the admin, you won't get an error if two of them have the same menu_order ; Wagtail will just pick for you.
  • add_to_settings_menu : Whether you want this menu item to appear in the Settings submenu in the Wagtail admin.
  • exclude_from_explorer : Set to True if you do not want the explorer (the search box in the admin) to return results from this model. Set to False if you do want the explorer to return results from this model. (It's confusing.)
  • list_display : Same as the Django admin; list the fields you want to display on the listing page for this model in the Wagtail admin.
  • list_filter : Same as the Django admin; supply the fields you want to use to filter in the sidebar of the Wagtail admin.
  • search_fields : Same as the Django admin; supply the fields that you want the explorer to use to return search results.

The final step is to register the admin class. Once you've done that and started your server, you'll be able to see your model in the Wagtail admin:

pizzas_admin.png

Adding related models

In our example models, we have two models: Pizza and Toppings . We could manually add the Topping model to the Wagtail admin and have it appear just below the Pizza model. We just learned how!

But it's so closely related to the Pizza model that it might be nice if we were able to relate those two models together in a submenu, kind of like how Settings is its own submenu in the admin that contains Users, Redirects, Sites, etc.

Go back to wagtail_hooks.py :

# wagtail_hooks.py

from wagtail.contrib.modeladmin.options import (
    ModelAdmin, 
    ModelAdminGroup, 
    modeladmin_register 
)

from .models import Pizza, Topping


class PizzaAdmin(ModelAdmin):
    ...
    menu_order = 000 
    ...


class ToppingAdmin(ModelAdmin):
    model = Topping 
    menu_label = "Toppings"  
    menu_icon = "edit" 
    menu_order = 100 
    add_to_settings_menu = False 
    exclude_from_explorer = False 
    list_display = ("name",)
    search_fields = ("name",)

Relating our two models together starts off in the same way: we create a class that inherits from ModelAdmin for each model and identify the necessary attributes like model and menu_icon to control things like their listing pages and search behavior.

Then, we add a new class that inherits from ModelAdminGroup :

# wagtail_hooks.py

from wagtail.contrib.modeladmin.options import (
    ModelAdmin, 
    ModelAdminGroup, 
    modeladmin_register 
)

from .models import Pizza, Topping


class PizzaAdmin(ModelAdmin):
    ...
    menu_order = 000 
    ...


class ToppingAdmin(ModelAdmin):
    ...
    menu_order = 100 
    ... 

class PizzaGroup(ModelAdminGroup):
    menu_label = "Pizzas" 
    menu_icon = "pick"
    menu_order = 500 
    items = (PizzaAdmin, ToppingAdmin)


modeladmin_register(PizzaGroup)

In the PizzaGroup class, we have some of the same attributes:

  • menu_label : We set what we want this group of related models to be called in the Wagtail admin menu
  • menu_icon : Which icon we want to use for this menu
  • menu_order : Where we want this menu to appear in the sidebar, in relation to the other menu items

We also add a new attribute, items , where we list which ModelAdmin classes we want to be part of this group. In our case, we want PizzaAdmin and ToppingAdmin to be in this group, so we add those.

Note the change we made to menu_order in PizzaAdmin and ToppingAdmin : Now those are set to 000 and 100 . When the ModelAdmin classes will be part of a group, set the menu_order how you want them to relate to each other, not to the other menu items in the Wagtail admin. Then set the menu_order for the ModelAdminGroup class to the proper value for the order you want it to appear in the side menu in the admin.

Then we register the whole group, instead of the ModelAdmin classes individually, to the Wagtail admin. When we reload the admin, we see this:

pizzas_group_admin.png

On the far left, there is a new menu item Pizzas that expands a submenu. The submenu contains links to the admin interfaces for Pizzas and Toppings !

Note : If you have the Django admin enabled and have your models already in the Django admin, this doesn't disable them from the regular Django admin. You are free to access your models in both the Wagtail admin and the Django admin, or at this point you can choose to remove your models from the Django admin (or disable the Django admin altogether, if you prefer).

Helpful Links

Special thanks to Jeff Triplett and Jacob Burch for their help with this post.

]]>/>
Using Different Read and Write Serializers in Django REST Frameworkhttp://www.revsys.com/tidbits/using-different-read-and-write-serializers-django-rest-framework/On a recent project, we needed to use different serializers for GET vs POST/PUT/PATCH requests to our API. Read on to learn how we used a mixin to accomplish this goal.Tue, 20 Aug 2019 20:47:00 +0000http://www.revsys.com/tidbits/using-different-read-and-write-serializers-django-rest-framework/

Versions :

  • Python 3.7
  • Django 2.2
  • Django REST Framework 3.10

On a recent project, we needed to use different serializers for GET vs. POST/PUT/PATCH requests to our Django REST Framework API. In our case, this was because the GET serializer contained a lot of nested data; for example, it contained expanded fields from other serializers to foreign-key relationships. The requests to update data via the API, though, didn't need these expanded fields.

The first way we approached using different serializers for read and update actions was to override get_serializer_class() on each viewset to decide which serializer to return depending on the action in the request. We returned the "read" serializer for list and retrieve actions, and the "update" serializer for everything else. (The full list of API actions is in the DRF codebase .) But we wound up repeating ourselves across several viewsets, so we wrote a mixin to take care of some of this work for us!

A mixin is a Python class that contains custom attributes and methods ( more explanation ). It's not very useful on its own, but when it's inherited into a class, that class has access to the mixin's special attributes and methods.

This was our mixin:

class ReadWriteSerializerMixin(object):
    """
    Overrides get_serializer_class to choose the read serializer
    for GET requests and the write serializer for POST requests.

    Set read_serializer_class and write_serializer_class attributes on a
    viewset. 
    """

    read_serializer_class = None
    write_serializer_class = None

    def get_serializer_class(self):        
        if self.action in ["create", "update", "partial_update", "destroy"]:
            return self.get_write_serializer_class()
        return self.get_read_serializer_class()

    def get_read_serializer_class(self):
        assert self.read_serializer_class is not None, (
            "'%s' should either include a `read_serializer_class` attribute,"
            "or override the `get_read_serializer_class()` method."
            % self.__class__.__name__
        )
        return self.read_serializer_class

    def get_write_serializer_class(self):
        assert self.write_serializer_class is not None, (
            "'%s' should either include a `write_serializer_class` attribute,"
            "or override the `get_write_serializer_class()` method."
            % self.__class__.__name__
        )
        return self.write_serializer_class

This mixin defines two new attributes, read_serializer_class and write_serializer_class . Each attribute has a corresponding method to catch the error where the mixin is being used, but those attributes haven't been set. The get_*_serializer_class() methods will raise an AssertionError if your viewset hasn't set the appropriate attribute or overridden the necessary method.

The get_serializer_class method makes the final decision on which serializer to use. For the "update" actions to the API, it returns write_serializer_class ; otherwise it returns read_serializer_class .

The mixin gets used in a viewset like this:

from rest_framework import viewsets

from .mixins import ReadWriteSerializerMixin
from .models import MyModel
from .serializers import ModelReadSerializer, ModelWriteSerializer


class MyModelViewSet(ReadWriteSerializerMixin, viewsets.ModelViewSet):
    queryset = MyModel.objects.all() 
    read_serializer_class = ModelReadSerializer 
    write_serializer_class = ModelWriteSerializer

Now the viewset MyModelViewSet has access to the attributes and methods from the mixin ReadWriteSerializerMixin . This means that when a call is made to the API that uses MyModelViewSet , the get_serializer_class() method from ReadWriteSerializerMixin will automatically be called and will decide, based on the kind of API request being made, which serializer to use. If we needed to make even more granular decisions about the serializer returned (maybe we want to use a more limited serializer for a list request and one with more data in a retrieve request), then our viewset can override get_write_serializer_class() to add that logic.

Note: Custom DRF actions will contain actions that aren't part of the DRF list of accepted actions (because they are custom actions you're creating), so when you call get_serializer_class from inside your action method, it will return whatever your "default" serializer class is. In the example above, the "default" serializer is the read_serializer_class because it's what we return when we fall through the other conditional.

Depending on your action, you will want to override get_serializer_class to change your default method or explicitly account for your custom action.

Mixins are a DRY (Don't Repeat Yourself) way to add functionality that you wind up needing to use across several viewsets. We hope you get to experiment with using them soon!

Thanks to Jeff Triplett for his help with this post.

]]>/>
Dataclasses and attrs: when and whyhttp://www.revsys.com/tidbits/dataclasses-and-attrs-when-and-why/Python 3.7 introduced dataclasses, which design is based on the &quot;attrs&quot; library. This article will show the way I use dataclasses and attrs, why I think you should use both, and why I think attrs is still very relevant.Tue, 04 Jun 2019 20:40:00 +0000http://www.revsys.com/tidbits/dataclasses-and-attrs-when-and-why/

Python 3.7 introduced dataclasses ( PEP557 ). Dataclasses can be a convenient way to generate classes whose primary goal is to contain values.

The design of dataclasses is based on the pre-existing attr.s library. In fact Hynek Schlawack, the very same author of attrs, helped with the writing of PEP557.

Basically dataclasses are a slimmed-down version of attrs. Whether this is an improvement or not really depends on your specific use-case.

I think the addition of dataclasses to the standard library makes attrs even more relevant. The way I see it is that one is a subset of the other, and having both options is a good thing. You should probably use both in your project, according to the level of formality you want in that particular piece of code.

In this article I will show the way I use dataclasses and attrs, why I think you should use both, and why I think attrs is still very relevant.

What do they do

Both the standard library's dataclasses and the attrs library provide a way to define what I'll call "structured data types" (I would put namedtuple , dict and typeddict in the same family)

PS: There's probably some more correct CS term for them, but I didn't go to CS School, so ¯\ (ツ)

They are all variations on the same concept: a class representing a data type containing multiple values, each value addressed by some kind of key.

They also do a few more useful things: they provide ordering, serialization, and a nice string representation. But for the most part, the most useful purpose is adding a certain degree of formalization to a group of values that need to be passed around.

An example

I think an example would better illustrate what I use dataclasses and attrs for. Suppose you want to render a template containing a table. You want to make sure the table has a title, a description, and rows:

def render_document(title: str, caption: str, data: List[Dict[str, Any]]):
    return template.render({
        "title" : title,
        "caption": caption,
        "data": data,
})

Now, suppose you want to render a document, which consists of a title, description, status ("draft", "in review", "approved"), and a list of tables. How would you pass the tables to render_document ?

You may choose to represent each table as a dict :

{
    "title": "My Table",
    "caption": "2019 Earnings",
    "data": [
        {"Period": "QT1", "Europe": 500, "USA": 467},
        {"Period": "QT2", "Europe": 345, "USA": 765},
    ]
}

But how would you express the type annotation for the tables argument so that it's correct, explicit and simple to understand?

def render_document(title: str, description: str, status: str, tables: List[Dict[str, Any]]):
    return template.render({
        "title": title,
        "description": description,
        "status": status,
        "tables": tables,
    })

That only gets us to describe the first level if tables . It doesn't tell us that a Table has a title, or caption. Instead, you could use a dataclass:

@dataclass
class Table:
    title: str
    data: List[Dict[str, Any]]
    caption: str = ""

def render_document(title: str, description: str, tables: List[Table]):
    return template.render({
        "title": title,
        "description": description,
        "tables": tables,
    })

This way we have type hinting, helping our IDE helping us.

But we can go one step further, and also provide type validation at runtime. This is where dataclasses stops, and attrs comes in:

@attr.s
class Table(object):
    title: str = attr.ib(validator=attr.validators.instance_of(str))  # don't you pass no bytes!
    data: List[Dict[str, Any]] = attr.ib(validator=...)
    description: str = attr.ib(validator=attr.validators.instance_of(str), default="")


def render_document(title: str, description: str, tables: List[Table]):
    return template.render({
        "title": title,
        "description": description,
        "tables": tables,
    })

Now, suppose we also need to render a "Report", which is a collection of "Document"s. You can probably see where this is going:

@dataclass
class Table:
    title: str
    data: List[Dict[str, Any]]
    caption: str = ""

@attr.s
class Document(object):
    status: str = attr.ib(validators=attr.validators.in_(
        ["draft", "in review", "approved"]
    ))
    tables: List[Table] = attr.ib(default=[])

def render_report(self, title: str, documents: List[Document]):
    return template.render({
        "title": title,
        "documents": documents,
    })

Note how I am validating that Document.status is one of the allowed values. This comes particularly handy when you're building abstractions on top of Django models with a field that uses choices . Dataclasses can't do that.

A couple of patterns I keep finding myself in are the following:

  1. Write a function that accepts some arguments
  2. Group some of the arguments into a tuple
  3. Hm, I want field names -> namedtuple .
  4. Hm, I want types -> dataclass .
  5. Hm, I want validation -> attrs .

Another situation that happens quite often is this:

  1. write a function that accepts some arguments
  2. add typing so my IDE can help me out
  3. oh, by the way, it needs to support a list of those things, not just one at a time!
  4. refactor to use dataclasses
  5. This argument can only be one of those values, or
    1. I ask myself: How do I make sure other developers are passing the right type and/or values?
  6. switch to attrs

Sometimes I stop at the dataclasses. Lots of times I get to the attrs step.

And sometimes, this happens: 1. one half of this legacy codebase uses -1 as special value for False , that other half uses False . Switch to attr.s so I can use converter= to normalize.

Comparison

The two libraries do appear very similar. To get a clearer picture of how they compare, I've made a table of the features I use most:

feature dataclasses attrs
frozen
defaults
totuple
todict
validators
converters
slotted classes

As you can see, there's a lot of overlap. But the additional features on attrs provide functionality that I need more often than not.

When to use dataclasses

Dataclasses are just about the "shape" of the data. Choose dataclasses if:

  • You don't care about values in the fields, only their type
  • adding a dependency is not trivial

When to use attrs

attrs is about the shape and the values. Choose attrs if:

  • you want to validate values. A common case would be the equivalent of a ChoiceField.
  • you want to normalize, or sanitize the input
  • whenever you want more formalization than dataclasses alone can offer
  • you are concerned about memory and performances. attrs can create slotted classes , which are optimized by CPython .

I often find myself using dataclasses and later switching to attr.s because the requirements changed or I find out I need to guard against some particular value. I think that's a normal aspect of developing software and what I call "continuous refactoring".

Why I like dataclasses

I'm glad dataclasses have been added to the standard library, and I think it's a beneficial addition. It's a very convenient thing to have at your disposal whenever you need.

For one, it will encourage a more structured style of programming from the beginning.

But I think the most compelling case is a practical one. Some high-risk corporate environments (eg: financial institutions) require every package to be vetted (with good reason: we've already had incidents of malicious code in libraries). That means that adding attrs is not as simple as adding a line to your requirements.txt , and will involve waiting on approval from your corpops team. Those developers can use dataclasses right away and their code will immediately benefit from using more formalized data types.

Why I like attrs

Most people don't work in such strictly-controlled environments.

And sure, sometimes you don't need all the features from attrs, but it doesn't hurt having them.

More often than not, I end up needing them anyway, as I formalize more and more of my code's API. Dataclasses only gets half-way of where I want to go.

Conclusion

I think dataclasses encompass only a subset of what attrs has to offer. Admittedly, it is a big subset. But the features that are not covered are important enough and needed often enough that they make attrs not only still relevant and useful, but also necessary.

In my mind, using both allows developers to progressively refactor their code, moving the contracts amongst functions from loosely-defined arguments all the way up to formally described data structures as the requirements of the app stabilize over time.

One nice effect of having dataclasses is that now developers are more incentivized to refactor their code toward more formalization. At some point dataclasses is not going to be enough, and that's when developers will refactor to use attrs. In this way, dataclasses actually acts as an introduction to attrs. I wouldn't be surprised if attrs becomes more popular thanks to dataclasses.

References

Acknowledgments + Thanks

I want to thank the following people for revising drafts and providing input and insights:

  • Hynek Schlawack
  • Jacob Kaplan-Moss
  • Jacob Burch
  • Jeff Triplett
]]>/>
jetstack/cert-manager on GKE Private Clustershttp://www.revsys.com/tidbits/jetstackcert-manager-gke-private-clusters/how to train your validating admission controller webhook without losing a handFri, 12 Apr 2019 03:08:23 +0000http://www.revsys.com/tidbits/jetstackcert-manager-gke-private-clusters/

Kubernetes Admission Controllers

If, dear reader, you are not familiar with this controller-type, I encourage you to hit up a favorite search engine--many much less obtuse descriptions of their use and implementation exist.

This post focuses on their use by Jetstack's cert-manager controller and how to make it happy running on a GKE private cluster.

The Point

The cert-manager webhook process is an example of a v alidating admission webhook. Generically, VA webhooks are a useful tool to enforce policy decisions; a common example is denying submission of manifests requesting container images from non-private registries.

Do One Thing Well

The webhook is responsible for making sure any manifests that match one of its CRDs are syntactically and structurally valid before being submitted to the actual cert-manager controller. This takes the validation load for the controller as well as relieving it of overhead from processing connections that carry invalid manifest data.

The Context

Google Kubernetes Engine (private)

The Reader's Digest version: communications from the master subnet is restricted. Nodes are not granted public addresses. Users are charged for Kubernetes nodes. Master functionality is provided via a shared environment.

The Problem

NOTE: cert-manager already in place


TL;DR

The webhook registration includes connection info for the webhook process. GKE private clusters do not allow connections from the master network to the service network on port 6443.


apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  annotations: {}
  name: something
spec:
  acme:
    ca: {}

... graces us with:

Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

That error message is absolutely correct and just as unhelpful.

The server ? Yeah... let me just find... the server . Are we talking about... maybe the API server? Or the cert-manager controller? Maybe we're just talking about the guy that just brought the check to the table...


Thanks to the author of an issue for the old cert-manager helm chart , it is now common(ish?) knowledge that TCP:6443 is the listening port for the cert-manager webhook. The cert-manager-webhook pod runs on a non-master node. Because of the environment, user workloads aren't deployable to master nodes because... there aren't any.

The Kube API is still a process. It runs.

.

.

.

Somewhere.

It is where the webhook has been registered; the process that waits patiently for requests to validate relevant manifests! And it will continue to wait. Every time the API receives a properly authz'd connection with a cert-manager-related payload, the aforementioned error will be delivered because the API can't connect to the webhook service.

Because this needs a bonus...

When a namespace is deleted, the relevant controller goes through a house-keeping process, walking all registered CRD and built-in object-types and removing any of that object before actually deleting the namespace. The admission controllers registered with the API fire during the course of this process. If one of these fails, the namespace remains in a Terminating state until the failing webhook is either deregistered or it is able to eventually resolve its requests.

Retrospectively, this makes sense, though, seeing a namespace that was deleted yesterday still present and "terminating" is rather disturbing.

Because the bonus needs icing...

The aforementioned namespace problem also rears its head when cordoning a node for upgrades. The node will never reach a state of readiness (anti-readiness) that indicates the instance is ready for destruction. (First noticed with kops )

The Solution

GCE VPC firewall-rules are created using either a source tag, IP range or service account. We know the source range for the master network from when the cluster was created (in our case: 172.16.0.0/28). The target can only be selected via target tag or serviceaccount.

obtain the GCE instance name for a cluster node

gcloud compute instances list

display the GCE tags for that node:

gcloud compute instances describe --format=json [name of instance] | jq .tags.items

[
  "gke-revsys-production-deux-a2f2de43-node"
]

create the firewall rule:

gcloud compute firewall-rule create \
  --source-ranges 172.16.0.0/28 \
  --target-tags gke-revsys-production-deux-a2f2de43-node  \
  --allow TCP:6443

That's it. With that command, 3 poorly logged, unremarked error states are done away with. I hope this short post is helpful.

Now, on that note, go do something interesting.

]]>/>
Helm & RBAC: Polishing the Doorknobhttp://www.revsys.com/tidbits/helm-rbac-polishing-doorknob/Because someone has to right this stuff down...Wed, 20 Feb 2019 05:59:26 +0000http://www.revsys.com/tidbits/helm-rbac-polishing-doorknob/

Helm. Tiller. You have probably heard these words in reference to the former's description as the Kubernetes Package Manager . The later is the server/cluster-side element that helm communicates with to do its work. The common practice is to create a role binding between tiller's service account and the cluster-admin role. This allows for installation/management/updating-of/etc of any Kubernetes object.

Like with most RBAC configuration that denies haphazard assignment of the cluster-admin role, once it works it makes sense. Getting to that state for random-kubernetes-thing-X can be a time-consuming and often frustrating task.

The doc for properly setting up tiller in an RBAC-enabled cluster is... elusive. The following RBAC definition was observed on some issue board (gitlab-runner?)--I was fortunate enough to recognize this gem for what it was and wrote it down before loosing track of that particular issue forever.

Whoever's brain is the origin of this knowledge, I owe you a drop of something nice.

Apply this and remove one more thing that runs with needlessly powerful privileges!


apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: tiller-for-mortals
rules:
  -
    apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - list
  -
    apiGroups:
      - ""
    resources:
      - pods/portforward
    verbs:
      - create


Once this manifest is applied to the namespace occupied by tiller, a rolebinding is all it takes to have access to helm/tiller's functions.


$ kubectl  --namespace=tillerhut create rolebinding helm4stephen \
     --service-account=tokens:stephen-at-revsys --role=tiller-for-mortals


NOTE

The constraint is not directly with tiller but with the rolebinding . This role can (and is) used to bind to a gitlab-runner deployment used for CI-driven deployments. The runner doesn't have direct unfettered access to the host cluster--it just has gRPC access to an internal service with unfettered access to the cluster.

Baby steps, dig?

Thanks for reading!

Stephen

]]>/>
GKE: part 2http://www.revsys.com/tidbits/gke-part-2/Accessing a New GKE ClusterMon, 04 Feb 2019 02:31:04 +0000http://www.revsys.com/tidbits/gke-part-2/

kubectl

kubectl should, ideally, match the server version. At the very least, the client binary should usually be within the minor version ballpark of the server.


NOTE: Sometimes, in the immortal words of Freddie Childress, " shit be broke! " An example can be found in the v1.10.x family. The only known work-around is to use a client binary from v1.11.x . The takeaway: your milleage may vary, but if using an off-version and receiving strange results, try using the version-equivalent client.


Client binary downloads for v1.11.6 can be found here

Authorizing the Client Environment (part 1)

It is neccessary to add the Kubernetes Engine Cluster-Admin role to the target (Google) userid. This is accomplished via the IAM & admin section of the GCP console.

  • click the pencil widget under the inheritance column
  • click Add Another Role
  • scroll or search to Kubernetes Cluster Admin

Authenticating the Client Environment

The Google Cloud SDK is required.

Once installed and configured for the desired Google account, the following commands are used to populate ~/.kube/config with the correct bits to authenticate with the new cluster:

(revsys) [[revsys-prod-cluster:revsys-com-playground]]
gladiatr@s
:# gcloud container clusters list
NAME                    LOCATION       MASTER_VERSION  MASTER_IP    MACHINE_TYPE   NODE_VERSION      NUM_NODES  STATUS
revsys-production-deux  us-central1-c  1.11.6-gke.6    23.24.25.26  n1-highmem-2   1.11.6-gke.6      2          RUNNING

:# gcloud container clusters get-credentials revsys-production-deux
Fetching cluster endpoint and auth data.
kubeconfig entry generated for revsys-production-deux.

(revsys) [[gke_revsys-150116_us-central1-c_revsys-production-deux:]]
gladiatr@s
[~/git/revsys/revsys/src]
:#

Holy progenitor of mythological figure, Batman! That's ugly!

Later (post v1.9.x?) kubectl binaries added the very welcome feature of easily renaming kube contexts!


NOTE: ~/.kube/config is a YAML file. The outer structure is a map with most of the values expressed as lists of single-element maps.

skeleton

apiVersion: v1
kind: Config
clusters: []
users: []
contexts: []
current-context: str
preferences: {}  <- I have no idea what this can be used for

meta-outline of a populated context + associated bits

apiVersion: v1
kind: Config
clusters:
  - name: cluster-x
    cluster:
      server: https://api.somewhere.on.the.internets
      certificate-authority-data: (base64 PEM certificate string)
users:
  - name: user-tokens:cluster-x-user
    token: (service account token (**no base64**))
contexts:
  - name: cluster-x
    context:
      user: user-tokens:cluster-x-user
      cluster: cluster-x
      namespace: (configured default namespace)(_optional_)
current-context: cluster-x
preferences: {}

The following command simply changes the name of the context. (No cluster or user definitions were harmed in the filming of this episode):

(revsys) [[gke_revsys-150116_us-central1-c_revsys-production:]]
gladiatr@s
[~]
:# kubectl config rename-context gke_revsys-150116_us-central1-c_revsys-production revsys-prod-new

(revsys) [[revsys-prod-new:]]
gladiatr@s
[~]
:#

Authorizing the Client Environment (part 2)

GCP IAM integration is ongoing. Currently, however, assinging the GCP IAM role (kubeneretes-cluster-admin) only gets you halfway to the proverbial church.

To complete the authorization of the target account requires creating a kubernetes cluster role binding that grants full access to 100% of the kube API.

Without this step, the target user account will be unable to create RBAC roles or cluster roles . The next few posts will involve the creation and/or manipulation of such roles.

kubectl create clusterrolebinding he-man --clusterrole=cluster-admin --user=googleid@googledomain

Epilogue

This post covered the steps for setting up a local environment to allow for interacting with the new GKE cluster.

Next up: the helm package manager

Thanks for reading!

-Stephen

]]>/>
GKE: part 1http://www.revsys.com/tidbits/gke-part-1/Beyond wet feet...Sun, 03 Feb 2019 13:26:53 +0000http://www.revsys.com/tidbits/gke-part-1/

The Cluster


This series is a journal of how we currently configure our GKE clusters. It should not be construed, necessarily, as a best-practices guide but as a written account that will be augmented as knowledge is gained (or perhaps as a series of internal memos that has boundary issues).


The following is a list of values used with a brief explanation where the key's purpose is not obvious:

  • Name (of cluster): prod-cluster-deux
  • Location Type: Zonal (default)
    • This field affects how the cluster is laid out within the chosen availability zone
  • Region: us-central1
  • Master Version: 1.11.6-gke.6
    • latest release as of 02/02/2019

NOTE : Access to the following config keys require clicking the Advanced Configuration widget


  • Availability
    • Additional Node Locations: clear (default)
    • Maintenance Window (beta): 3:00 AM
  • Networking
    • VPC-Native: check
    • Network: default
    • Subnet: default
    • Pod Address Range: blank (default)
    • Maximum Pods-Per-Node (beta): 110 (default)
    • Enable HTTP Load Balancing: clear
    • Network Security
    • Private Cluster: check
    • Enable Master Authorized Networks: check
      • add a reasonably stable source IP ( echo -en $( curl -s api.ipify.org )/32 )
    • Enable Network Policy: check
      • There is reason why having a UI toggle for a standard part of the Kube API is provided. There are people, somewhere, that know this reason. I am not among them.
  • Security
    • Enable Basic Authentication: uncheck
    • Issue a Client Certificate: uncheck
    • Enable Legacy Authorization: uncheck
    • Enable Binary Authorization (beta): uncheck
    • Enable Application Layer Secrets Encryption (beta): check
    • Unlike AWS, GCP does not generate default cypto keys for this sort of thing.
    • GCP crypto keys are filed under the IAM & admin heading.
    • Create a keyring
    • Create a Symmetric encrypt/decrypt key
    • Copy the key's ( not the keyring's) resource ID
      • available from the key's hamburger menu
    • paste the RID into the Select a Customer-Managed Key box (GKE UI)
  • Additional Features
    • Enable Stackdriver Logging Service: check (default)
    • Enable Stackdrive Monitoring Service: check (default)
    • Try the new Stackdriver beta Monitoring and Logging Experience: check (what the hell, right?)
    • Enable Cloud TPU (beta): uncheck (default)
    • Enable Kubernetes Alpha Features: uncheck (default)
    • Enable Kubernetes Dashboard: uncheck
    • Enable Istio: uncheck
    • Enable Node Auto-Provisioning (beta): uncheck

Egress NAT

Without a NAT, private clusters cannot establish connections external to GCP networks

  • Navigate to the GCP Network Services -> Cloud NAT
  • Create
    • Gateway Name: prod-cluster-nat
    • Select Cloud Router
      • VPC Network: default
      • Region: us-central1
      • Cloud Router: create new cloud router
    • NAT mapping
      • Source (internal): Primary and secondary ranges for all subnets
      • NAT IP Addresses: Automatic (default)

Epilogue

The next part in this series will cover setting up a local environment to communicate with the new cluster.

Thanks for Reading!

-Stephen

]]>/>
Keeping Django Models Orderedhttp://www.revsys.com/tidbits/keeping-django-model-objects-ordered/How to smartly keep a set of Django ORM objects in order for drag-n-drop like situations.Wed, 28 Nov 2018 15:25:19 +0000http://www.revsys.com/tidbits/keeping-django-model-objects-ordered/

Most of us know how to order objects using the Django ORM using order_by('field') which creates a SQL query for us using ORDER BY field . That's not the type of ordering I'm going to talk about today.

I'm talking about keeping a set of objects in order with respect to each other, without any gaps. This has come up for us a couple of times in the last 6 months of client projects, and I've found a good pattern for dealing with it I wanted to share with you.

Django Drag and Drop Ordering

Think of a drag-n-drop interface your app might need. You have some subset of objects that are currently in order 1, 2, 3, 4, ..., n and you want the item in position 4 to move into position 2. The fact that drag and drop is the UI we're using isn't essential to the discussion here, just an example. This pattern would also work for any arrow based nudging movement or even a rudimentary queue system that may not have a visible UI to it at all.

The naive approach is to loop over all the objects and adjust the order manually. This honestly works fine for small sets of objects, but performance will suffer if we're talking about hundreds or thousands of elements in your set.

Setting the Stage

To give us a concrete example to work with we're going to create a couple of models. We're going to have a Task model that has some number of Steps that need to be completed in order. We'll be reordering the steps in our UI often so let's make it easy and elegant.

Our Models

class Task(models.Model):
    name = models.CharField(max_length=100)

class Step(models.Model):
    task = models.ForeignKey(Task, related_name='steps', on_delete=models.CASCADE)
    name = models.CharField(max_length=100)
    order = models.IntegerField(default=1)

    objects = StepManager()

I'm purposefully keeping the number of fields small for illustration purposes, but you would probably also want creation/modification times and booleans for whether or not the Step or Task is completed. They aren't useful for our example here.

Where you put the meat of this logic is up to you, but I think it makes the most sense to do as a manager method.

Ordering Logic

Here is the manager you need, skim it and we'll dive into the details in a second.

from django.db import models, transaction
from django.db.models import F

class StepManager(models.Manager):
    """ Manager to encapsulate bits of business logic """

    def move(self, obj, new_order):
        """ Move an object to a new order position """

        qs = self.get_queryset()

        with transaction.atomic():
            if obj.order > int(new_order):
                qs.filter(
                    task=obj.task,
                    order__lt=obj.order,
                    order__gte=new_order,
                ).exclude(
                    pk=obj.pk
                ).update(
                    order=F('order') + 1,
                )
            else:
                qs.filter(
                    task=obj.task,
                    order__lte=new_order,
                    order__gt=obj.order,
                ).exclude(
                    pk=obj.pk,
                ).update(
                    order=F('order') - 1,
                )

            obj.order = new_order
            obj.save()

Assuming we had several steps already created for a task we can move it like this:

Step.objects.move(step_3_obj, 5)

This would move the object to the 5th position. In the real world, you would probably also want to ensure that the new order position is not below 1 or beyond the total count for the steps to avoid weird gaps.

Let's go through this step by step (pun intended):

  • After getting our default QuerySet we start an atomic transaction. We want to renumber the tasks that need updating, including the one moving, all together or not at all. This transaction helps ensure that.
  • We then limit our query to Steps which are part of this Task. This is a crucial bit and would depend on the nature of your application. If we didn't set this we would be reordering Steps without regard for the Tasks they were a part of which would definitely be a bug.
  • Next, we have to make a decision. Is our new order larger or smaller then our current order? This determines which items need to move around.
  • If it's larger, we need to move all of the Steps that are less than our current order and greater than or equal to our new order. Otherwise, we need the ones less than or equal to our new order position and greater than our current position.
  • We then exclude ourselves.
  • Then we issue an update call using an F() expression to either increment or decrement the object's order by 1. F expressions allow us to modify each row in our query by a value contained on that row.
  • Finally, we set our object's order directly and save.

Creating new objects

If we're going to have this setup, we also need to handle creating new objects putting them in order next, to do that override the create method on your manager like so:

from django.db import models, transaction
from django.db.models import F, Max

class StepManager(models.Manager):
    # ... previous code here ...

    def create(self, **kwargs):
        instance = self.model(**kwargs)

        with transaction.atomic():
            # Get our current max order number
            results = self.filter(
                task=instance.task
            ).aggregate(
                Max('order')
            )

            # Increment and use it for our new object
            current_order = results['order__max']
            if current_order is None:
                current_order = 0

            value = current_order + 1
            instance.order = value
            instance.save()

            return instance

Now we can create new objects normally with Step.objects.create(task=<task_obj>, name='Testing') and it will give us the proper order.

Benefits

The most significant benefit to this pattern is that we're pushing most of the work onto our database to do the incrementing and decrementing of all the objects. Also, we're doing it with two fairly speedy UPDATE queries. One to reset all of the intervening rows and one last one to set the model object we're wanting to move.

If we had merely looped over our objects, we would be forced to generate an individual update query per Step which is inefficient and does not scale well.

We've also encapsulated this logic into one spot to stay DRY and make writing tests for this logic easier.

Production Indexing

There is an index you should consider on this to make your common queries faster. On your Step model you may want:

class Step(models.Model):

    # ... original fields and manager definitions ... 

    class Meta:
        index_together = ('task', 'order')

This will create a multi-column index by Task ID and the order which will make this query fast Step.objects.filter(task=<TaskObj>).order_by('order') .

DRF Ordering API

So this is all dandy, but if you're ordering some Django models with a drag and drop interface, you're doing it with Javascript so we need an API. If we're building an API with Django we're using Django REST Framework so here is an easy way to implement that.

from rest_framework import viewsets, status
from rest_framework.decorators import action
from rest_framework.response import Response
from django_filters.rest_framework import DjangoFilterBackend

from . import models
from . import serializers

class StepViewSet(viewsets.ModelViewSet):
    queryset = models.Step.objects.all().order_by('order')
    serializer_class = serializers.StepSerializer
    filter_backends = (DjangoFilterBackend, )
    filter_fields = ('task', )

    @action(methods=['post'], detail=True)
    def move(self, request, pk):
        """ Move a single Step to a new position """
        obj = self.get_object()
        new_order = request.data.get('order', None)

        # Make sure we received an order 
        if new_order is None:
            return Response(
                data={'error': 'No order given'},
                status=status.HTTP_400_BAD_REQUEST,
            )

        # Make sure our new order is not below one
        if int(new_order) < 1:
            return Response(
                data={'error': 'Order cannot be zero or below'},
                status=status.HTTP_400_BAD_REQUEST,
            )

        models.Step.objects.move(obj, new_order)

        return Response({'success': True, 'order': new_order})

Where these endpoints reside is going to depend on how you configure things in your Django project, but assuming you're using a DRF SimpleRouter and have things setup on /api/ your URL patterns are going to be something like:

  • GET /api/steps/ for a list of all steps
  • GET /api/steps/?task=<task_id> for all Steps by Task (in Step order)
  • POST /api/steps/<step_id>/move/ with a JSON body of {'order': <int>} of the that Step's new order position.

NOTE This API example excludes any authentication and authorization concerns which you will need to address before using in production. You don't want just anyone reordering your steps!

Hopefully, you've found this informative and can put this pattern to use in your next project. If you need help implementing things of this nature at your company reach out to us to check on our availability.

]]>/>
Tips for Using Django's ManyToManyFieldhttp://www.revsys.com/tidbits/tips-using-djangos-manytomanyfield/ManyToManyFields confuse a lot of people. The way you relate objects to each other is just different enough from dealing with ForeignKeys and just uncommon enough in day-to-day Django development that it&#x27;s easy to forget all the little tricks for dealing with them.Fri, 31 Aug 2018 20:37:58 +0000http://www.revsys.com/tidbits/tips-using-djangos-manytomanyfield/

Versions:

  • Python 3.7
  • Django 2.1

ManyToManyField s confuse a lot of people. The way you relate objects to each other using a many-to-many relationship is just different enough from dealing with ForeignKey s and just uncommon enough in day-to-day Django development that it's easy to forget all the little tricks for dealing with them.

When should you use a ManyToManyField instead of a regular ForeignKey ? To remember that, let's think about pizza. A pizza can have many toppings (a Hawaiian pizza usually has Canadian bacon and pineapple), and a topping can go on many pizzas (Canadian bacon also appears on meat lovers' pizzas). Since a pizza can have more than one topping, and a topping can go on more than one pizza, this is a great place to use a ManyToManyField .

So let's dive in: assume the following models in a pizzas app.

from django.db import models 


class Pizza(models.Model):

    name = models.CharField(max_length=30)
    toppings = models.ManyToManyField('Topping')

    def __str__(self):
        return self.name


class Topping(models.Model):

    name = models.CharField(max_length=30)

    def __str__(self):
        return self.name

Both objects must exist in the database

You have to save a Topping in the database before you can add it to a Pizza , and vice versa. This is because a ManyToManyField creates an invisible "through" model that relates the source model (in this case Pizza , which contains the ManyToManyField ) to the target model ( Topping ). In order to create the connection between a pizza and a topping, they both have to be added to this invisible "through" table. From the Django docs :

" [T]here is … an implicit through model class you can use to directly access the table created to hold the association. It has three fields to link the models. If the source and target models differ, the following fields are generated:

  • id : the primary key of the relation.
  • <containing_model>_id : the id of the model that declares the ManyToManyField.
  • <other_model>_id : the id of the model that the ManyToManyField points to."

The invisible "through" model that Django uses to make many-to-many relationships work requires the primary keys for the source model and the target model. A primary key doesn't exist until a model instance is saved, so that's why both instances have to exist before they can be related. (You can't add spinach to your pizza if you haven't bought spinach yet, and you can't add spinach to your pizza if you haven't even started rolling out the crust yet either.)

See what happens when you try to add a topping to a pizza before you've added that topping to the database:

>> from pizzas.models import Pizza, Topping
>> hawaiian_pizza = Pizza.objects.create(name='Hawaiian')
>> pineapple = Topping(name='pineapple')
>> hawaiian_pizza.toppings.add(pineapple)
Traceback (most recent call last):
...
ValueError: Cannot add "<Topping: pineapple>": instance is on database "default", 
value is on database "None"
>> 

A ValueError is raised because the pineapple hasn't yet been saved, so its value on the database doesn't exist yet. But when I save pineapple , I can add it to my pizza.

>> pineapple.save() 
>> hawaiian_pizza.toppings.add(pineapple)
>> hawaiian_pizza.toppings.all()
<QuerySet [<Topping: pineapple>]>

The reverse doesn't work either: I can't create a topping in the database, and then add it to a pizza that hasn't been saved.

>> pepperoni = Topping.objects.create(name='pepperoni')
>> pepperoni_pizza = Pizza(name='Pepperoni')
>> pepperoni_pizza.toppings.add(pepperoni)
Traceback (most recent call last):
...
ValueError: "<Pizza: Pepperoni>" needs to have a value for field "id" before this many-to-many 
relationship can be used.

This error is more explicit (it states that an id is required) but it's essentially the same error. It's just coming from the other side of the relationship.

To retrieve the stuff in a ManyToManyField , you have to use *_set ...

Since the field toppings is already on the Pizza model, getting all the toppings on a specific pizza is pretty straightforward.

>> hawaiian_pizza.toppings.all()
<QuerySet [<Topping: pineapple>, <Topping: Canadian bacon>]>

But if I try to see what pizzas use Canadian bacon, I get an AttributeError :

>> canadian_bacon.pizzas.all()
Traceback (most recent call last):
...
AttributeError: 'Topping' object has no attribute 'pizzas'

That's because Django automatically refers to the target ManyToManyField objects as "sets." The pizzas that use specific toppings are in their own "set":

>> canadian_bacon.pizza_set.all()
<QuerySet [<Pizza: Hawaiian>]>

… unless you add the related_name option to the field

Adding the related_name option to a ManyToManyField will let you choose a more intuitive name to use when you want to retrieve the stuff in that field.

class Pizza(models.Model):
    ...
    toppings = models.ManyToManyField('Topping', related_name='pizzas')

The related_name should usually be the lowercase, plural form of your model name. This is confusing for some people because shouldn't the related_name for toppings just be… toppings?

No; the related_name isn't referring to how you want to retrieve the stuff in this field ; it specifies the term you want to use instead of *_set when you're on the target object (which in this case is a topping) and want to see which source objects point to that target (what pizzas use a specific topping).

Without a related_name , we would retrieve all the pizzas that use a specific topping with pizza_set :

>> canadian_bacon.pizza_set.all()
<QuerySet [<Pizza: Hawaiian>]>

Adding a related_name of "pizzas" to the toppings attribute lets us retrieve all the toppings for a pizza like this:

>> canadian_bacon.pizzas.all()
<QuerySet [<Pizza: Hawaiian>]>

You can add things from both sides of the relationship.

Earlier, we created a Pizza object, and then a Topping object, and then ran hawaiian_pizza.toppings.add(pineapple) to associate the pineapple topping with the Hawaiian pizza.

But we could, instead, add a pizza to a topping.

>> cheese_pizza = Pizza.objects.create(name='Cheese')
>> mozzarella = Topping.objects.create(name='mozzarella')
>> mozzarella.pizzas.add(cheese_pizza)
>> mozzarella.pizzas.all()
<QuerySet [<Pizza: Cheese>]>

You can query the items in the many-to-many set from both sides

Say we want to find all the pizzas that have toppings that start with the letter "p." We can write that query like this:

>> Pizza.objects.filter(toppings__name__startswith='p')
<QuerySet [<Pizza: Pepperoni Pizza>, <Pizza: Hawaiian Pizza>]>

A Hawaiian pizza contains pineapple, and a pepperoni pizza contains pepperonis. Pineapple and pepperoni both start with the letter "p," so the both of those pizzas are returned.

We can do the same from the Topping model, to find all the toppings used on pizzas that contain "Hawaiian" in their name:

>> Topping.objects.filter(pizzas__name__contains='Hawaiian')
<QuerySet [<Topping: pineapple>, <Topping: Canadian bacon>]>

You might need a custom "through" model

Remember when I mentioned the invisible "through" model that Django creates to manage your many-to-many relationships? You might want to keep track of more data about those relationships, and to do that you would use a custom "through" model.

The example used in the Django docs is of a Group, Person, and Membership relationship. A group can have many people as members, and a person can be part of many groups, so the Group model has a ManyToManyField that points to Person . Then, a Membership model contains ForeignKey s to both Person and Group , and can store extra information about a person's membership in a specific group, like the date they joined, who invited them, etc.

But we're not here to talk about people. We are all about pizza.

Using our existing models, we can create all kinds of pizzas with a wide range of toppings. But we can't make a pizza like "Super Pepperoni" that contains double the usual amount of pepperonis. We can't add pepperoni to a pizza more than once:

>> pepperoni_pizza.toppings.all()
<QuerySet [<Topping: pepperoni>]>
>> pepperoni_pizza.toppings.add(pepperoni)
>> pepperoni_pizza.toppings.all()
<QuerySet [<Topping: pepperoni>]>

Django just ignores us if we try. A "through" model would let us specify a quantity for each topping, enabling us to add "pepperoni" once, but specify that we wanted twice the amount for the Super Pepperoni pizza.

Note : If you're going to use a "through" model, you have to start with that in mind… or be willing to either drop your database or do some very advanced database finagling. If you try to add a "through" model later, you will see an error like this one when you run migrate :

ValueError: Cannot alter field pizzas.Pizza.toppings into pizzas.Pizza.toppings - 
they are not compatible types (you cannot alter to or from M2M fields, or add or remove through= on M2M fields)

If you're following along, now is the time to drop your database, delete your migration files, and add this new model:

class ToppingAmount(models.Model):

    REGULAR = 1
    DOUBLE = 2
    TRIPLE = 3
    AMOUNT_CHOICES = (
        (REGULAR, 'Regular'),
        (DOUBLE, 'Double'),
        (TRIPLE, 'Triple'),
    )

    pizza = models.ForeignKey('Pizza', related_name='topping_amounts', on_delete=models.SET_NULL, null=True)
    topping = models.ForeignKey('Topping', related_name='topping_amounts', on_delete=models.SET_NULL, null=True, blank=True)
    amount = models.IntegerField(choices=AMOUNT_CHOICES, default=REGULAR)

Now, add the through option to the toppings field on the Pizza model:

class Pizza(models.Model):
   ...
    toppings = models.ManyToManyField('Topping', through='ToppingAmount', related_name='pizzas')

And run makemigrations and migrate .

If we try to add a pizza, a topping, and then associate them the way we used to, we will get an error:

>> super_pep = Pizza.objects.create(name='Super Pepperoni')
>> pepperoni = Topping.objects.create(name='pepperoni')
>> super_pep.toppings.add(pepperoni)
Traceback (most recent call last):
...
AttributeError: Cannot use add() on a ManyToManyField which specifies an intermediary model. 
Use pizzas.ToppingAmount's Manager instead.

Using a custom "through" model forces us to use that model to associate the pizza and toppings.

>>> super_pep_amount = ToppingAmount.objects.create(pizza=super_pep, topping=pepperoni, amount=ToppingAmount.DOUBLE)

But the benefit is that we can now add some extra information about that relationship, like the fact that the amount of pepperonis on a Super Pepperoni pizza should be double the regular amount.

We can still access the toppings from the pizza:

>> super_pep.toppings.all()
<QuerySet [<Topping: pepperoni>]>

We'll only see pepperoni once, since the amount is on the "through" model. And we can access the pizzas that use a specific topping:

>> pepperoni.pizzas.all()
<QuerySet [<Pizza: Super Pepperoni>]>

But now, we can use our "through" model to get all the toppings and their amount for a specific pizza from the ToppingAmount model:

for top_amt in ToppingAmount.objects.filter(pizza=super_pep):
    print(top_amt.topping.name, top_amt.get_amount_display())

pepperoni Double

You can also see the topping amounts from the Pizza objects themselves.

for top_amt in super_pep.topping_amounts.all():
    print(top_amt.topping.name, top_amt.get_amount_display()) 

pepperoni Double

And you can also access the amount for a specific topping from the topping on the Pizza object.

for topping in super_pep.toppings.all():
    for top_amt in topping.topping_amount.all():
        print(topping.name, top_amt.amount, top_amt.get_amount_display())

pepperoni 2 Double

You could extend this ToppingAmount through model to hold information about the left and right halves of the pizza, or notes about topping preparation ("put peppers under cheese"). You can also add methods to the through model or the source/target models to more easily access some of the topping amount information.

A through model is also useful for relationships between players and teams; the through model could contain information about the players' positions, jersey numbers, and dates they joined the team. A through model joining movie theatres and films could contain the number of screens the film is showing on and the start and end run dates. Students' relationships to their Degree Programs could track information like GPA, whether the program is the student's major or minor, whether it's a double major, and start/end semesters the student was in a program.


Further Reading

Thank you to Monique Murphy and Jeff Triplett for their assistance.

]]>/>
Slugs as primary keyshttp://www.revsys.com/tidbits/slugs-primary-keys/A very common pattern in a Django Project is to have some kind of &#x27;model-type&#x27; relationship, where you have some kind of object that can only belong to one of the types defined in the database.Mon, 20 Aug 2018 16:39:00 +0000http://www.revsys.com/tidbits/slugs-primary-keys/

Slugs as primary keys

A very common pattern I've seen in Django Project is to have some kind of 'model-type' relationship, where you have some kind of object that can only belong to one of the types defined in the database.

A typical implementation would look something like this:

class EventType(models.Model):
    slug = models.SlugField(unique=True)
    name = models.CharField(max_length=100)


class Event(models.Model):
    type = models.ForeignKey(EventType, on_delete=models.CASCADE)
    title = models.CharField(max_length=100)
    day = models.DateField()

A subtle issue with this implementation is that you may have to query the DB for an EventType when creating an Event :

work_type = EventType.objects.get(slug='work')
Event.objects.create(
    type=work_type,
    title='release',
    day=datetime(2018, 12, 31),
)

In other words, the minimum amount of queries to create an Event is 2: a SELECT for fetching the type, and the INSERT that saves the instance.

If somehow you already have the pk of the EventType (for example, it might come from an API payload or from the URL), then you can easily avoid the lookup by setting the primary key directly into the type_id column:

def post(self, request, *args, **kwargs):
    type_id = self.kwargs['pk']
    Event.objects.create(
        type_id=type_id,
        title='release',
        day=datetime(2018, 12, 31),
    )

But dealing directly with column names is discouraged by the docs

However, your code should never have to deal with the database column name

We can get around this by instantiating an EventType instance with just the primary key:

def post(self, request, *args, **kwargs):
    type_id = self.kwargs['pk']
    Event.objects.create(
       type=EventType(pk=type_id),
       title='release',
       day=datetime(2018, 12, 31),
    )

But this requires us to know the numerical id beforehand.

We already have slug as source of uniqueness for the event_eventtypes table, and it's URL-friendly. We could just use that as the EventType primary key.

class EventType(models.Model):
    slug = models.SlugField(primary_key=True)
    name = models.CharField(max_length=100)


class Event(models.Model):
    type = models.ForeignKey(
        EventType, 
        on_delete=models.CASCADE,
    )
    title = models.CharField(max_length=100)
    day = models.DateField()
def post(self, request, *args, **kwargs):
    type_slug = self.kwargs['slug']
    Event.objects.create(
       type=EventType(pk=type_slug),
       title='release',
       day=datetime(2018, 12, 31),
    )

This also allows to easily create Event s of a specific type without having to fetching any EventType . This is especially useful in data migrations, or tests:

types =  ['home', 'work', 'community']
for type_slug in types:
    Event.objects.create(
       type=EventType(pk=type_slug),
       title='release',
       day=datetime(2018, 12, 31),
    )

In cases like this, you may want to consider using a slug as primary key, rather than the default integer-based default. It's more performant and just as straight forward.

There are some caveat to consider, though: You won't be able to modify the slug from the Django admin.

Another thing to consider is that changing the type of your primary key is a one-way road: once you made the slug your pk, you won't be able to convert it back to a regular AutoField .

]]>/>
Sentinel values in Pythonhttp://www.revsys.com/tidbits/sentinel-values-python/It is often necessary to differentiate between an argument that has not been provided, and an argument provided with the value `None`. For that purpose, we create what&#x27;s called a &#x27;sentinel value&#x27;.Thu, 16 Aug 2018 20:35:04 +0000http://www.revsys.com/tidbits/sentinel-values-python/

Sometimes it is necessary to differentiate between an argument that has not been provided, and an argument provided with the value None . For that purpose, we create what's called a 'sentinel value'.

For example, let's assume you want to define a Field class. Field instances must have a value or declare a default, and None could be a perfectly valid value to have:

class Field:
    def __init__(self, default=sentinel):
        self.value = default

    def set(self, value):
        self.value = value

    def get(self):
        if self.value is sentinel:
            raise ValueError("this field has no value!")

eula_accepted = Field()
eula_accepted.get()  # raises `ValueError`

eula_accepted = Field(default=None)
eula_accepted.get()  # doesn't raise. `None` means the EULA hasn't been neither accepted or reject yet.

The most common approach is to declare the sentinel value with object() :

sentinel = object()

This approach is the quickest and most common, but it has some issues. To quote Joseph Jevnik :

One is that they don't repr well so they make debugging harder. Another issue is that they cannot be pickled or copied. You also cannot take a weak reference to a sentinel which can break some caching code and makes them harder to use.

For example:

sentinel = object()

repr(sentinel)
# '<object object at 0x10823e8d0>'

To work around this issue, some people create their own Sentinel class. But I've found a quicker way in the unittest.mock module.

from unittest.mock import sentinel

NotSet = sentinel.NotSet

repr(NotSet)
# 'sentinel.NotSet'

If you don't feel like importing from unittest in your application code, you could install the mock package, or "hide it under the rug" by aliasing somewhere in your code base and importing it from there:

# in `myproject.types`
from unittest.mock import sentinel
# somewhere else in your project:
from myproject.types import sentinel

An alternative to unittest.mock.sentinel is to declare your own sentinel class and use it as a value:

class NotSet:
    pass

# PS: Remember to use the class _itself_.
def fn(default=NotSet):
    pass

This will give you a not really pretty, but useful enough repr :

repr(NotSet)
# "<class '__main__.NotSet'>"

Of course, you could go one step further and declare your own repr :

class NotSet:
    def __repr__(self):
        return 'NotSet'

repr(NotSet)
# 'NotSet'

Of all the options, I think using unittest.mock.Sentinel is my favorite. Importing from unittest in my application code is a compromise that I'm willing to make in exchange for having something ready to use.

]]>/>
Celery and Django and Docker: Oh My!http://www.revsys.com/tidbits/celery-and-django-and-docker-oh-my/In this post, you will learn how to create a Celery task inside a Django project in a Docker container. Sounds awesome, right?Thu, 19 Jul 2018 19:37:00 +0000http://www.revsys.com/tidbits/celery-and-django-and-docker-oh-my/

In this post, you will learn about how to:

  • create a Celery task
  • inside a Django project
  • in a Docker container

Versions: Django 1.11, Python 3.6, Celery 4.2.1, Redis 2.10.6, and Docker 17.12.

Note that especially for Celery, versions matter a lot. Celery changed the names of many of their settings between versions 3 and 4, so if internet tutorials have been tripping you up, that might be why. Be careful when Googling for advice and always check the version number if something isn’t working.

If you need a refresher on using Docker with Django, check out A Brief Intro to Docker for Djangonauts and Docker: Useful Command Line Stuff .

What is Celery?

Celery is a tool that helps you manage tasks that should occur outside the request/response cycle. It’s not specific to Django.

Celery is especially helpful for transforming blocking transactions on your site into non-blocking transactions. For example, you might have a site that takes payment information. Validating credit card information, processing a charge, and producing a receipt might take 3-15 seconds, during which time your user is waiting and other users’ requests are also held up. Celery can help by offloading that work to different tasks. Instead of waiting until the credit card has been processed to show your user a confirmation page, you can quickly show them a confirmation screen that assures them that a receipt is forthcoming in their email. Then, outside the request/response cycle in a series of Celery tasks, you can validate their credit card, charge it, create a receipt, and email the receipt to the user. This experience is much smoother for your user, a better use of your server resources, and increases the number of requests your website can process for other users.

You can use Celery to send email, update your database with side effects from the request that was just processed, query an API and store the result, and a lot more. Any task that takes more than half a second is a great candidate for turning into a Celery task.

Another thing Celery is helpful for is scheduling tasks to run at specific times. You might be familiar with cron jobs, which are tasks that run at specific intervals you define. Django doesn’t have the cleanest ways of handling scheduling jobs, but using Celery with Django to schedule jobs is pretty smooth. You might set up scheduled Celery tasks to send user notification emails, scrape a website, or process vendor payments.

This post focuses on getting a scheduled task to run inside Docker in a Django project.

Setting Up a Task

Assume this project has the following structure:

proj/
├── app/
     ├── __init__.py
     ├── tasks.py
├── proj/
     ├── __init__.py
     ├── celery.py
     ├── settings.py
├── docker-compose.yml
├── Dockerfile
├── manage.py
└── requirements.txt

You should already have Django specified in your requirements file, and the most recent version of Docker downloaded onto your computer. Since the Dockerfile takes care of installing packages for us, to access Celery and Redis we need to add the current versions of those libraries to the requirements.txt file:

celery==4.2.1
redis==2.10.6

Open proj/celery.py and add the following code. Most of it is boilerplate that you will see in all Celery configuration files.

import os
from celery import Celery


os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')

app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

In this code, you are identifying a default Django settings module to use and doing some configuration setup. You are also setting up Celery to “autodiscover” tasks from all apps in your project. (This project is, creatively, called proj .) Finally, you have a debug task.

Now let’s create a task. In app/tasks.py , add this code:

from celery import shared_task


@shared_task
def hello():
    print(Hello there!”) 

The task itself is the function hello() , which prints a greeting. The shared_task decorator creates an instance of the task for each app in your project, which makes the tasks easier to reuse. There’s a great explanation of shared_task here .

In proj/__init__.py , add the following:

from .celery import app as celery_app

__all__ = ['celery_app']

This code ensures that Celery finds the tasks you’ve written when your Django application starts.

To test that your hello() task works, you can run it locally as a regular Python function. Start a Python shell using docker-compose run web ./manage.py shell . Run:

>>> from app.tasks import hello
>>> hello()
Hello there! 

If you would like to test running your task as a Celery task, run:

>>> hello.delay()
<AsyncResult: ba845cf3-e60b-4432-a9d8-9943621cb8a0>

Back in your first tab, you will see the output from your task. delay() lets Celery execute the task, so instead of seeing the output in your shell like you’re used to, you see your output logged to the console where your server is running.

Introducing: Celery Beat

But the task, once found, will only execute once, and we’d like to schedule it to happen more frequently, like once a minute. We will use a feature called Celery beat to schedule our task to run periodically. Celery beat is the Celery scheduler. It executes tasks as often as you tell it to.

Open settings.py . Before we run our task through Celery, we need to configure some Django settings. All settings specific to Celery should have the CELERY_ prefix.

CELERY_BROKER_URL = 'redis://redis:6379'
CELERY_RESULT_BACKEND = 'redis://redis:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

For CELERY_BROKER_URL and CELERY_RESULT_BACKEND , you may see tutorials that instruct you to set these to something like redis://localhost:6379 , but you should replace localhost with the service name defined in your docker-compose file, redis . (We’ll get to that in a moment.)

Import crontab in your settings file.

from celery.schedules import crontab

Now add the following variable below your other CELERY_ settings.

CELERY_BEAT_SCHEDULE = {
    'hello': {
        'task': 'app.tasks.hello',
        'schedule': crontab()  # execute every minute
    }
}

This code sets up a dictionary, CELERY_BEAT_SCHEDULE , that contains the names of your tasks as keys and a dictionary of information about your task and its schedule as the value. In the dictionary that contains the keys “task” and “schedule,” the value of “task” should be a string with the fully qualified path to your task. The value of “schedule” is the information about how often you want this task to run. The default is to execute every minute; check out the docs for examples on more complex schedules.

Configuring Celery In Docker

The Dockerfile is here and doesn’t need any changes in order to work with Celery.

The docker-compose.yml file, however, needs some new services:

version: '3'

services:
  db:
    image: postgres:9.6.5
    volumes:
      - postgres_data:/var/lib/postgresql/data/
  redis:
    image: "redis:alpine"
  web:
    build: .
    command: bash -c "python /code/manage.py migrate --noinput && python /code/manage.py runserver 0.0.0.0:8000"
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    depends_on:
      - db
      - redis
  celery:
    build: .
    command: celery -A proj worker -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis
  celery-beat:
    build: .
    command: celery -A proj beat -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

volumes:
  postgres_data:

Let’s walk through the services we’ve added.

Redis is a data store and message broker that works with Celery to manage storing and processing your messages.

redis:
    image: "redis:alpine"

In your web service, add redis to the depends_on section. This ensures that your db and redis services will start before the web service. (Note: this won't guarantee that the db and redis services will be fully ready before the web service starts; look into restart: on-failure and other options for making sure a service doesn't start until other services it needs are ready.)*

depends_on:
      - db
      - redis

In order to have that task execute without needing to explicitly tell it to execute via the command line, we added the celery service.

celery:
    build: .
    command: celery -A proj worker -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

This code adds a Celery worker to the list of services defined in docker-compose. Now our app can recognize and execute tasks automatically from inside the Docker container once we start Docker using docker-compose up .

The celery worker command starts an instance of the celery worker, which executes your tasks. -A proj passes in the name of your project, proj , as the app that Celery will run. -l info sets the log-level as info . The Django docs have more info on logging ; the log-level you set won’t matter until you have some code to determine how the different levels should be handled.

We also added a celery-beat service that will run this command automatically inside the Docker container.

celery-beat:
    build: .
    command: celery -A proj beat -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

The command is similar, but instead of celery -A proj worker we run celery -A proj beat to start the Celery beat service, which will run tasks on the schedule defined in CELERY_BEAT_SCHEDULE in settings.py .

Start Docker with docker-compose up . You should see the output from your task appear in the console once a minute (or on the schedule you specified).

celery_1 | [2018-02-01 22:44:00,957: WARNING/ForkPoolWorker-1] Hello there!

And there you have it! For even more fun, you might try:

  • Adding more complex tasks, like tasks that take arguments
  • Setting up different schedules
  • Setting up a non-periodic task

What next?

Review the Celery Tasks Checklist for a great introduction to Celery best practices. In particular, pay attention to:

  • Set a retry_limit so that failed tasks don’t keep trying forever.
  • Set a high default task_time_limit to avoid tasks that block the entire Celery worker.
  • For tasks that need to take in a Django model object as a parameter, pass in a primary key and not the object itself.
  • Do a lot of logging!

You will also want to monitor your tasks for success or failure. A great tool for this is Flower , Celery’s monitoring tool. Flower will show you a dashboard of all your workers and tasks and let you drill down into specific tasks, show you task statistics, let you restart workers, and let you rate-limit tasks (among many other things).

If you use an error-tracking system like Rollbar or Sentry , you can also set Celery up to report exceptions to those services.

Further Reading


* Thanks to kurashu89 for their correction on an earlier version of this article.

]]>/>
pytest support for django-test-plushttp://www.revsys.com/tidbits/pytest-support-django-test-plus/django-test-plus has long been a useful helper library with Django. Now we&#x27;ve added some simple pytest fixture support to make it even more useful!Fri, 15 Jun 2018 21:02:34 +0000http://www.revsys.com/tidbits/pytest-support-django-test-plus/

We're obviously fans of django-test-plus , having written it, and have found it very valuable over the last few years at cutting down on all of the boilerplate test code we have to write with our clients.

It even helps sell certain stubborn clients on the idea that testing can be done in a not so painful way!

Today we're happy to announce that you can now use django-test-plus as a real pytest fixture. Here's a quick example:

def test_some_view(tp):
    tp.get('some-view-name')
    tp.response_200()


def test_some_url_stuff(tp):
    expected_url = '/api/'
    reversed_url = tp.reverse('api')
    assert expected_url == reversed_url

And you get all of the other test plus methods as part of that "tp" fixture. This is a new experimental feature, so we'd love to hear how we can make this better and more "pytest like".

To start using all you need to do is install it and pull in "tp" as a fixture, you install it with just:

pip install django-test-plus
]]>/>
Caching uuid's for the win!http://www.revsys.com/tidbits/caching-uuids-win/More and more often, we see schema designs that use UUIDs as primary keys. That&#x27;s a valid choice if you&#x27;re concerned about sharding and partitioning your database, but it has its own drawbacks, sometimes in unexpected places.Tue, 22 May 2018 17:43:28 +0000http://www.revsys.com/tidbits/caching-uuids-win/

More and more often, we see schema designs that use UUIDs as primary keys. That's a valid choice if you're concerned about sharding and partitioning your database, but it has its own drawbacks, sometimes in unexpected places.

If you're working on a system big enough that needs to use UUIDs, chances are that at some point you'll have to turn those UUIDs into strings, perhaps in order to pass them across different systems.

In order to serialize the UUID into a string, Python has to format the UUID's hex property into a string containing hyphens, as RFC 4122 requires. And in order to do that, it has to slice that property 5 times .

    def __str__(self):
        hex = '%032x' % self.int
        return '%s-%s-%s-%s-%s' % (
            hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])

This is fasted way to do it, and it isn't usually a bottleneck per-se, as the overhead is minimal.

However, it's common for UUIDs to be used in scenarios where there's a lot of data and computation, and in some situations you might find yourself serializing a homogenous set of UUIDs over and over. This might happen in a situation where you have a 'heap' of hotter model instances that ares processed more often than most.

import random
import uuid


uuids = [uuid.uuid4() for _ in range(99)]
normal_distrib = [random.choice(uuids) for _ in range(9999)]
for uuid in normal_distrib:
    [str(uuid) for _ in range(100)]

In this situation, the overhead adds up, and could become a bottleneck you'd never think about.

You could use uuid.hex instead of the serialized value, but that would break the RFC and, more pragmatically, it would make the data not really portable across systems or languages.

Your best option is going to be the lru_cache decorator.

First, abstract the logic that's calling the serialization into a function that you can decorate:

def some_logic():
    for instance in instances:
        requests.post(url, {'uuid': str(instance.pk)})
from functools import lru_cache

@lru_cache(max_size=None)
def post_instance(instance):
    requests.post(url, {'uuid': str(instance.pk)})


def some_logic():
    for instance in instances:
        post_instance(instance)

Note: You might be tempted to abstract the str call itself:

@lru_cache
def uuid_to_str(uuid):
    return str(uuid)


def some_logic():
    for instance in instances:
        requests.post(url, {'uuid': uuid_to_str(instance.pk)})

That's not really a good idea. You'll end up caching too much. The cache size will be wasted in calls that are rarely reused, making the more common calls incur in cache misses.

Next, add logging to get some insight. This will tell uis if our caching is actually effective, and gives us some indication of how big the cache size should be:

import logging


logger = logging.getLogger(__name__)


def some_logic():
    for instance in instances:
        post_instance(instance)
    logger.debug(post_instance.cache_info())

Briefly run the code against production data, and you'll get something like this in your logs:

CacheInfo(hits=126, misses=8, maxsize=32, currsize=8)

If hits is really low, then you know lru_cache won't help you with this data. Your dataset of UUIDs is too heterogenous to take advantage of caching.

currsize gives you an idea of how big of a cache you might need and what you should set lru_caches maxsize argument to.

Keep in mind that higher values will use more RAM, and we're not really interested in perfectly caching every single value that's used more than once. We just want to cut down on the most common ones most of the times.

As rule of the thumb, if currsize is smaller than 512 , I would set maxsize to that value plus some room (I like to go to next power of 2). Otherwise, I would cap maxsize to 128 . In our cexample, I would use max_size=16 :

@lru_cache(max_size=16)
def post_instance(instance):
    requests.post(url, {'uuid': str(instance.pk)}
]]>/>
Testing import file mismatchhttp://www.revsys.com/tidbits/testing-import-file-mismatch/Ran into a weird error with coverage, pytest, and Travis today and wanted to document an easy fix when you get a &#x27;import file mistmatch&#x27; error from pip packages you do not control.Mon, 21 May 2018 14:08:19 +0000http://www.revsys.com/tidbits/testing-import-file-mismatch/

So over the weekend I was working on adding pytest support to django-test-plus our library which makes testing in Django easier. We of course have tests for our testing library, but have up until now used normal Unittest style tests that we're all familiar with.

Now that I'm adding full pytest support, I needed to have the tests run using pytest rather than Django's normal test runner. In switching I needed pytest itself, pytest-django, and pytest-cov so we could keep track of our test coverage. Which of these caused the problem isn't really important, but what started happening was this:

==================================== ERRORS ====================================
_____________ ERROR collecting .eggs/py-1.5.3-py2.7.egg/py/test.py _____________
import file mismatch:
imported module 'py.test' has this __file__ attribute:
  /home/travis/build/revsys/django-test-plus/.eggs/pytest-3.5.1-py2.7.egg/pytest.py
which is not the same as the test file we want to collect:
  /home/travis/build/revsys/django-test-plus/.eggs/py-1.5.3-py2.7.egg/py/test.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

If you don't speak fluent Python, this is just saying that because of some import magic going on the files 'pytest.py' and 'py/test.py' appear to be the same thing to pytest and it's properly bailing out of the test run so you can fix it.

Unfortunately, neither of those are my projects and I didn't have a direct way to fix this. Luckily, I thought of a way around it which was, if we don't generate the bytecode pyc files then it won't see any duplicates.

To do this I just needed to add the following to the project's tox.ini:

[tox]
   .... lots of other configuration ...

setenv =
    PYTHONDONTWRITEBYTECODE=true

Hopefully this helps you sort out odd error message should you run into it when using Travis-CI and tox.

And more importantly, hopefully Google helps me find this page when I completely forget about this and run into it again next year!

]]>/>
Copying Kubernetes Secrets Between Namespaceshttp://www.revsys.com/tidbits/copying-kubernetes-secrets-between-namespaces/easy-mode, cross-namespace object copySun, 29 Apr 2018 19:19:00 +0000http://www.revsys.com/tidbits/copying-kubernetes-secrets-between-namespaces/

It's a fairly long command, but it actually is pretty easy use. You will however need to have yq installed as we need to scrub out a few elements from our YAML that are automatically created by Kubernetes. yq is just like jq but for working with YAML.

If we have a secret named database in the namespace app-dev and we want to copy it, as is, to the app-production namespace we need to execute:

$ kubectl -n app-dev get secret database -o yaml | \
yq 'del(.metadata.creationTimestamp, .metadata.uid, .metadata.resourceVersion, .metadata.namespace)' | \
kubectl apply --namespace app-production -f -

Breakdown

What we're doing here is grabbing the current secret in YAML format. We then use yq to remove the uid, namespace, creationTimestamp, and resourceVersion elements from the metadata stanza so it can be applied into a new namespace.

just

If you're a fan of the tool just like we are, you can drop in this command into your Justfile :

# Copy a secret from one namespace to another
copy-secret from-namespace secret-name to-namespace:
   kubectl -n {{from-namespace}} get secret {{secret-name}} -o yaml| yq 'del(.metadata.creationTimestamp, .metadata.uid, .metadata.resourceVersion, .metadata.namespace)' | kubectl apply --namespace {{to-namespace}} -f -

This can then be used quickly like:

$ just copy-secret app-dev pg app-production

Original Deprecated Example

UPDATE: Unfortunately, the --export has been deprecated by the kubectl team so this original advice below no longer works with modern Kubernetes clusters.

The secret resource-type is unique--it cannot be accessed from pods outside of its namespace. A simple way of copying common secret data (e.g.: docker registry credentials) between namespaces is provided by the --export flag of kubectl get . Pipe its output to kubectl apply -n <target namespace> -f - , and you are done!

kubectl get secret gitlab-registry --namespace=revsys-com --export -o yaml |\
   kubectl apply --namespace=devspectrum-dev -f -
]]>/>
What the, Kubernetes! -- part 1http://www.revsys.com/tidbits/what-the--kubernetes-1/Overcoming the read-only configMap with HelmFri, 30 Mar 2018 21:53:56 +0000http://www.revsys.com/tidbits/what-the--kubernetes-1/

Greetings, and welcome to the first edition of, What the, Kubernetes !

Today's topics: CVE-2017-1002101 , init-containers and YOU!

The context

Upgrading a cluster instance group from v1.7.13 to v1.7.14 introduced me to the first-run attempt at solving the problem outlined in the CVE.

The solution to the vulnerability (for the most part affecting untrusted, multi-tenant clusters) involved forcing all configMap and secret bind-mounts to read-only.

The tool we'll use in this post is Helm 2.7.2 .

The Problem

A CI component that was running successfully on v1.7.13 after the upgrade to v1.7.14.

$ kubectl get pod -l app=docker-ci
NAME                         READY     STATUS             RESTARTS   AGE
docker-ci-4050235671-n487p   0/1       CrashLoopBackOff   4          1m

$ kubectl logs docker-ci-4050235671-n487p

[...]
time="2018-03-26T04:57:21Z" level=info msg="containerd successfully booted in 0.016248s" module=containerd
Error starting daemon: Error saving key file: open /etc/docker/.tmp-key.json853378281: read-only file system

Whether or not the problem presents is dependent on the workload. It affects docker and minio but not gitlab-ci-runner. The explanation is simple: it depends on the programmer who wrote the code.


chdir(2) : it's a syscall, not a law.


Contrary to kube issue #58720 , most temp file writes to one of these mounts are performed by programs during initialization and are gone before initialization is complete.

Regardless, since I've written exactly zero lines of Kubernetes code, I will leave public declaration of opinion on the quality of this fix to others in favor of presenting a solution that can help mitigate the results of this change.

The Solution

  • initContainer
    • Mount a shared emptyDir volume on /etc/docker
    • Mount the configMap/secret to an alternate directory ( /etc/docker_ )
    • Copy the contents to the expected/configured location
    • exit
  • runtime container
    • Mount the shared emptyDir volume on /etc/docker
    • Initialize dockerd normally

One functional loss incurred with this method: the files in the target directory will not be magically updated. A mounted configMap or secret will eventually reflect changes made to the source configMap/secret. Replicating such functionality could be done with a side-car container (a runtime, rather than an init-container) that would monitor the API event bus for changes to the configMap of interest; copying new data to the shared volume when necessary.

The Hiccup

Init-containers aren't new. They have, however, had a rough start. When deploying to a v1.5-1.7 cluster, it is necessary to use the beta.kubernetes.io/init-containers annotation to avoid issue #45627 . Post v1.8, it will all be just another part of .spec.template.spec[]Map

What the...

NOTE: if you are uninterested in the why, this subsection can be skipped.

I'm not sure when init-containers entered the codebase. The feature graduated to beta status in kube v1.5 and ostensibly to GA status in v1.6.

  • beta feature annotation: spec.template.spec.metadata.annotations["beta.kubernetes.io/init-containers"] .
  • GA spec path: spec.templates.spec.initContainers[] .

The theory goes as follows:

  • v1.5 : use the beta annotation
  • v1.6 through v1.7 : GA/beta deprecation phase; either the annotation or spec form are valid
  • v1.8 : full GA; annotation is removed.

The full story: issue #45627

The reality of how init-containers are processed on v1.6-v1.7:

  • Initial Deployment Received
    • Does the deployment define .spec.template.spec.initContainers[] ?
    • No
      • Does the deployment define beta annotation init-containers?
      • Yes
        • ingest annotation JSON
        • sync data from annotation ingestion to ...spec.initContainers[]
      • No (well, then)
    • Yes
      • Does the deployment define beta annotation init-containers?
      • Yes
        • Are the ...spec.initContainers[] sync'd with the beta annotation
        • Yes
          • Capital! Nothing to see here. Carry on!
        • No
          • Hrmph! We know what's really making the wheels turn here!
          • ...spec.initContainers[] dropped into /dev/null .
          • re-synchronize ...spec.initContainers[] to reflect the beta annotation.

Summary: kube v1.6 through v1.7 the spec definition never has primacy with the scheduler except on initial Deployment . When such a manifest is received, kube converts the ....initContainers[] spec structure into a JSON string and stores it as a beta annotation value. On subsequent updates, modifications to ....initContainers[] not only have no effect, but also overwrite ....initContainers[] with the existing (deserialized) annotation structure. The only way around this situation is to only use the annotation form on Deployment update. The API will entice you to ...spec.initContainers[] by deserializing your annotation value to its GA spec location. Be strong! Until v1.8, define init-containers as if you were still on v1.5--pretend ...spec.initContainers[] doesn't exist until then!

The Helm Chart

So we need a method that will allow for gradual cut-over to v1.8 without having to manage separate charts.

The application used for this demonstration is Docker. dockerd is one of those binaries that uses its config directory for pre-initialization scratch space.

Configuration

The fix for the read-only configuration path can be seen towards the bottom of ./values.yaml with the keys for driving our init-container named templates above Env .


NOTE : The following chart files have been pruned for this post. The unpruned version

All indentation is at 2-space increments. Look for any lines in the chart with the term indent for adjustment if you adapt this to a different indentation interval.


values.yaml

  2
  3 Image: docker
  4 ImageTag: &itag "18-dind"
  5
  6 deploymentEnvironment: &env demo
  7
  8 Plug: docker
  9
 10 NodeSelectors:  []
 11
 12 InitCommands:
 13   -
 14     name: config
 15     command: cp /etc/docker_/config.json /etc/docker/
 16
 17
 18 Env:
 19   -
 20     name: DOCKER_HOST
 21     value: localhost:49152
 22   -
 23     name: IMAGE_TAG
 24     value: *itag
 52
 53 # Volumes
 54 Volumes:
 55   -
 56     name: docker-config
 57     configMap:
 58       name: docker-config
 59       items:
 60         -
 61           key: config
 62           path: config.json
 63           mode: 0600
 64   -
 65     name: docker-config-directory
 66     emptyDir: {}
 67
 68
 69
 70 VolumeMounts:
 71   -
 72     name: docker-config
 73     mountPath: /etc/docker_
 74   -
 75     name: docker-config-directory
 76     mountPath: /etc/docker



Q : Dear Stephen: Why are your init-container commands listed in values.yaml ?

A : I am glad you asked! As template markup gets thicker, readability decreases. Having critical aspects of a deployment hidden within a tangle of unrelated symbols and formatting has the danger of obscuring what the target workload is. I've been meaning to work out the gotpl incantations to make this happen and this series seemed to be a perfect reason to do it!



./settings/one

  1
  2 {
  3   "log-driver": "gcplogs",
  4   "group": "root",
  5   "iptables": true,
  6   "ip-masq": true
  7 }


Named Templates ( doc link )


op.ed time

The following is the meat of the presented solution. It involves Helm. Helm is a templating utility that is working its way towards fulfilling its stated goal of being a package manager for Kubernetes.

Because, um... well.. Kubernetes and uh... Golang, Helm, unsurprisingly, uses Go templates. If its notably inelegant appearance displeases you, well, the large pile of sand is over there. And here is your mallet. And you were born with the other critical piece to that puzzle. Go for it!

For everyone else, without further ado, third party plugins or wrapper scripts, I give you...


The Meat (or salty, smokey-flavored tempeh)

NOTE : the filenames prefixed with an underscore signal to helm that the contents are not Kube manifests.

First up: the InitMethod template. See this if you are unfamiliar with postfix notation (reference to lines 9 and 13).

Within this wee mess we have a thing that, when included in another template, will emit a term, annotation or spec , indicative of the form supported by the target kube cluster.


./templates/_helpers.yaml

  1 {{/* vim: set filetype=sls sw=2 ts=2: */}}
  2
  3
  4 {{- define "InitMethod" -}}
  5   {{- $major := .Capabilities.KubeVersion.Major -}}
  6   {{- $minor_ := ( splitList "+" .Capabilities.KubeVersion.Minor ) -}}
  7   {{- $minor := index $minor_ 0 -}}
  8   {{- if and (lt (int $major) 2) (lt (int $minor) 8) }}
  9     {{- printf "annotation" -}}
 10   {{- else -}}
 11     {{- if (eq (int $major) 1) and (ge (int $minor) 8) }}
 12       {{- printf "spec" -}}
 13     {{- end -}} {{/* else if */}}
 14   {{- end -}} {{/* if */}}
 15 {{- end -}} {{/* define */}}


Once your eyes are able to blur past the template markup, it is quite straightforward:

  • InitMethod
    • what version of Kubernetes are we talking to?
      • less than v1.8: we use the annotation form
      • v1.8 and beyond: use the spec form

NOTE: GKE decided to augment the kube version with a "+". Lines 6-7 are required to deal with this anomaly


./templates/_init-containers.yaml

 1 {{/* vim: set filetype=sls sw=2 ts=2: */}}
  2
  3 {{- define "InitSpec" }}
  4 {{- if eq (include "InitMethod" .) "spec" }}
  5 {{- $env := .Values.Env }}
  6 {{- $volumes := .Values.VolumeMounts }}
  7 {{- $image := ( printf "%s:%s" .Values.Image .Values.ImageTag ) }}
  8 initContainers:
  9 {{- range .Values.initCommands }}
 10   -
 11     name: {{ .name }}
 12     image: {{ $image }}
 13     command: ["/bin/sh", "-c"]
 14     args:
 15       - {{ .command | quote }}
 16     env:
 17 {{ toYaml $env | indent 8 }}
 18     volumeMounts:
 19 {{ toYaml $volumes | indent 8 }}
 20 {{- end }} {{/* range */}}
 21 {{- end }} {{/* if */}}
 22 {{- end }} {{/* define */}}
 23
 24
 25
 26 {{- define "InitAnnotation" }}
 27 {{- if eq (include "InitMethod" .) "annotation" }}
 28 {{- $env := .Values.Env }}
 29 {{- $volumes := .Values.VolumeMounts }}
 30 {{- $image := ( printf "%s:%s" .Values.Image .Values.ImageTag ) }}
 31 pod.beta.kubernetes.io/init-containers: |
 32   [
 33 {{- range $ic_index, $ic := .Values.initCommands }}
 34 {{- if $ic_index }},{{end}}
 35     {
 36       "name": {{ .name | quote }},
 37       "image": {{ $image | quote }},
 38       "command": ["/bin/sh", "-c"],
 39       "args": [ {{ .command | quote }} ],
 40       "env":
 41         [
 42 {{- range $ev_index, $ev := $env }}
 43 {{- if $ev_index}},{{end}}
 44 {{ toJson $ev | indent 12 }}
 45 {{- end }}
 46         ],
 47       "volumeMounts":
 48         [
 49 {{- range $vm_index, $vm := $volumes }}
 50 {{- if $vm_index }},{{end}}
 51 {{ toJson $vm | indent 12 }}
 52 {{- end }}
 53         ]
 54     }
 55 {{- end }}
 56   ]
 57 {{- end }}
 58 {{- end }}
 59
 60 {{- define "InitContainers" }}
 61 {{- if eq ( include "InitMethod" ) "annotation" }}
 62 {{- include "InitAnnotation" }}
 63 {{- end }}
 64 {{- if eq ( include "InitMethod" ) "spec" }}
 65 {{- include "InitSpec" }}
 66 {{- end }}
 67 {{- end }}

See lines 4 & 27 for how InitMethod is called.


The Cheese (or congealed soy paste cheese analog)

  • InitSpec
  • InitAnnotation

What follows are the templates that inject the appropriate init-containers definition when included by a Deployment manifest. They can be included as part of a Chart's boilerplate (if one is so inclined) as they do not add to the manifest's structure if no init-container commands are defined to drive them.

Encapsulating the if/else logic within the helper templates allows for a Deployment manifest template to get away with only two template-related statements. Only the form that provides full functionality actually renders anything; therefore, the desire for a chart that is version-agnostic (viz. init-containers)


templates/deployment.yaml

  1 apiVersion: extensions/v1beta1
  2 kind: Deployment
  3 metadata:
  4   name: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
  5   namespace: {{.Release.Namespace}}
  6   labels:
  7     app: {{.Values.Plug}}
  8     env: {{.Values.deploymentEnvironment}}
  9     imageTag: {{.Values.ImageTag | quote }}
 10     heritage: {{.Release.Service | quote }}
 11     release: {{ .Release.Name | quote }}
 12     chart: {{.Chart.Name}}-{{.Chart.Version}}
 13 spec:
 14   selector:
 15     matchLabels:
 16       app: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
 17       env: {{.Values.deploymentEnvironment}}
 18       imageTag: {{.Values.ImageTag | quote }}
 19       release: {{ .Release.Name | quote }}
 20   template:
 21     metadata:
 22       labels:
 23         app: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
 24         env: {{.Values.deploymentEnvironment}}
 25         imageTag: {{.Values.ImageTag | quote }}
 26         release: {{ .Release.Name | quote }}
 27       annotations:
 28         chksum/config: {{ include (print $.Template.BasePath  "/configmap.yaml") . | sha256sum | quote }}
 29 {{- include "InitAnnotation" . | indent 8 }}
 30     spec:
 31 {{- include "InitSpec" . | indent 6 }}
 32 {{- if .Values.NodeSelectors }}
 33       nodeSelector:
 34 {{- toYaml .Values.NodeSelectors | indent 10 }}
 35 {{- end }}
 36       volumes:
 37 {{ toYaml .Values.Volumes | indent 8 }}
 38       containers:
 39         -
 40           name: docker
 41           image: {{.Values.Image}}:{{.Values.ImageTag}}
 42           command:
 43             - /usr/local/bin/dockerd
 44           args:
 45             - --config-file=/etc/docker/config.json
 46             - -H
 47             - 0.0.0.0:49152
 48             - --dns
 49             - 8.8.8.8
 50             - --insecure-registry•
 51             - registry--ci.ci
 52           securityContext:
 53             privileged: true
 54           ports:
 55             -
 56              protocol: TCP
 57              containerPort: 49152
 58           volumeMounts:
 59 {{ toYaml .Values.VolumeMounts | indent 12 }}
 60           env:
 61 {{ toYaml .Values.Env | indent 12 }}


Conclusion

The original target example for this post was to have been demonstrating a Django and Celery application. Once I encountered CVE-2017-1002101 I decided to refocus my initial foray towards the simpler and much more immediate problem domain. This example doesn't have complex requirements for making it work. As long as the init-container image has a functioning cp binary it will foot the bill--a valid argument can be presented that the post-initial-deployment functionality of this init-container has no effect on the long-term viability of that Deployment (as long as the command is entered correctly the first time).

Because python runtimes (e.g.: Django, Celery, Gunicorn ) directly consume application code, it is critical those runtimes' environments are always in sync. The next post will cover such a deployment and will include the methods demonstrated today.

Thanks for reading!

]]>/>
Using Private Packages in Pythonhttp://www.revsys.com/tidbits/using-private-packages-python/With companies moving to microservices, it's becoming common to have a system scattered across multiple repositories. It's frequent to abstract common patterns and code into private repositories that are then included in each service. But using packages from private repos with Python can be tricky. This guide will guide you through the necessary steps.Thu, 21 Dec 2017 17:17:00 +0000http://www.revsys.com/tidbits/using-private-packages-python/

With companies moving to microservices, it's becoming common to have a system scattered across multiple repositories. It's frequent to abstract common patterns and code into private repositories that are then included in each service.

But using packages from private repos with Python can be tricky. This guide will walk you through the necessary steps to correctly use private packages with pip or pipenv . We have also built a little app that will generate the right incantations for you to copy and paste.

Figuring out the URL

First, you need to choose which format you want to use: tarball (ie: .zip ) or vcs (most likely git ). I prefer just downloading the tarball because it doesn't require any additional vcs software. But if use pipenv , you'll have to use vcs (see below).

Authentication

In order to have access to the private repo or tarball, you'll need an authentication token. GitHub and other providers allow you to create a private access token for your user, but because we are going to hardcode such token in code, I strongly recommend to create a 'bot' user and use its access token.

  • Github: access token at https://github.com/settings/tokens . You can select just the repo scopes.
  • Githost: deploy token at https://<myorg>.githost.io/<org>/<project>/settings/repository . You can just select read_repository scope.
  • Gitlab: deploy token at https://gitlab.com/profile/<org>/<project>/settings/repository . You can just select read_repository scope.
  • Bitbucket: access token at https://bitbucket.org/account/user/<botusername>/app-passwords . You can just select repositories/read scope.

Versioning

You will also need to decide which version of your package you want to install. We'll call this the ref . You could target a branch, a commit, or a tag/version. I prefer to use version because it gives me a human-readable point of reference.

Building the url

Once you have your access token and your ref, the next step is figuring out a URL at which pip can download or clone the code.

Github

  • Tarball URL will be https://<access_token>@github.com/<myorg>/<myproject>/archive/<ref>.zip
  • VCS Tarball will be git+https://<access_token>@github.com/<myorg>/<myproject>.git@<ref>

Githost

  • Tarball URL will be https://<myorg>.githost.io/<myteam>/<myproject>/repository/archive.zip?private_token=<access_token>&ref=<ref>
  • VCS Tarball will be git+https://<token_username>:<deploy_token@<myorg>.githost.io/<myteam>/<myproject>.git@<ref>

Gitlab

  • Tarball URL will be https://gitlab.com/<myteam>/<myproject>/repository/archive.zip?private_token=<access_token>&ref=<ref>
  • VCS Tarball will be git+https://<token_username>:<deploy_token>@gitlab.com/<myorg>/<myproject>.git@<ref>

Bitbucket

  • Tarball URL will be https://<botusername>:<access_token>@bitbucket.org/<myorg>/<myproject>/get/<ref>.zip
  • VCS Tarball will be git+https://<botusername>:<access_token>@bitbucket.org/<myorg>/<myproject>.git@<ref>

Specifying the dependency

pip requirements file

Once you have the URL, you can just add it to you requirements files, or use with pip install :

$ pip install "<URL>"  # You'll need the quotes to escape the URL

Pipfile

To add the URL to you Pipfile , use this syntax:

[packages]
revsys-teams = {file = "<URL>"}

Private dependency of a private package

If you need to specify your private package as a dependency of another package, you'll have to add the URL in your the dependency's setup.py . Note that the exact syntax of the specification is a little tricky:

import os
from setuptools import setup, find_packages

setup(
    name='upstreamprivatepackage',
    install_requires=[
        # The 'version' is required. Doesn't have to be an actual version of the package.
        # but it must match what's in the `egg` fragment of its entry in `dependency_links`.
        "<myproject>==0.0.1",
    ],
    dependency_links=[
        # there must be a version identifier in the `egg=` fragment and it must match what's
        # in `install_requires`
        '<url>#egg=<myproject>-0.0.1',
    ],
)

When installing the upstream package, you'll also have to tell pip to actually parse the entries in dependency_links :

# `pip` has to be run with the `PIP_PROCESS_DEPENDENCY_LINKS` env var or with `--process-dependency-links` option.
# As of v9.0.1, `pip` will print a big red scary deprecation warning, but that's the only way.
PIP_PROCESS_DEPENDENCY_LINKS=1 pip install ...

# or
pip install ... --process-dependency-links
]]>/>
Docker: Useful Command Line Stuffhttp://www.revsys.com/tidbits/docker-useful-command-line-stuff/Let’s talk about what it’s like to actually use Docker in your day-to-day.Tue, 05 Dec 2017 02:43:00 +0000http://www.revsys.com/tidbits/docker-useful-command-line-stuff/

In my  first post about Docker , I took you through what goes in the  Dockerfile  and  docker-compose.yml  files. In this post, let’s talk about what it’s like to actually use Docker.

This post assumes that the following are true for you:

  • You have worked through the  Get Started with Docker  tutorial
  • You have a Django project with a Dockerfile and docker-compose.yml file (see my  first post  for help setting these up)
  • You can run  docker-compose build  and  docker-compose up , start your server (if that isn’t part of your docker-compose.yml file already), and see a Django blue screen of success (or other “YAY!” indication) when you look at your web page in a browser

To see what my Dockerfile and docker-compose.yml file look like for the purposes of this post, head over to my  first post .

On a New Day

If you've changed anything in your Dockerfile or requirements, then you will want to rebuild your image. This will retrieve any updates from the Python image you are using, re-install your requirements if they have changed, and make sure your image is up to date. If not much has changed in your Dockerfile or requirements, this will only take a few seconds. If you have made more substantial changes, this might take a little longer.

Open Terminal or your command line, navigate to the directory that contains your Dockerfile, and run this command:


$ docker-compose build

We’re using  docker-compose build  instead of  docker build .  because once you start using Compose, you should consistently use the Compose commands. I’ve found through some trial and error that the two commands don’t always have the same result, and that can be frustrating to debug.

You will see a lot of output that looks something like this:

$ docker-compose build
db uses an image, skipping
Building web
Step 1/10 : FROM python:3.6.2
---> 26acbad26a2c

Step 7/10 : RUN pip install -r /code/requirements.txt
---> Using cache
---> 6494267dadea
Step 8/10 : COPY . /code/
---> d80b24eb2470
Step 9/10 : WORKDIR /code/
---> c0e80ec3605d
Removing intermediate container 8d15670dbe57
Step 10/10 : EXPOSE 8000
---> Running in aec283f5123c
---> c8f944861cbf
Removing intermediate container aec283f5123c
Successfully built c8f944861cbf
Successfully tagged my_app_web:latest

You can see from this output that Docker treats each line in your Dockerfile as its own step. Step 1 retrieves the Python image you specified, Step 7 installs your requirements, and so on. This is what folks mean when they describe Docker as layered. Each line in your Dockerfile is a layer. Docker will skip any step that was complete the last time you built the image… as long as all the steps above it have stayed the same. If I changed the Python image that I’m basing this image on, then each step below that would be re-run in its entirely. If I change the WORKDIR on line 9, then steps 9 and 10 both have to be re-run.  Jeff Triplett  says, “I think of layers like dominoes. If you have 10 lines and you push the fourth one over, lines 4-10 are going to fall over and have to be rebuilt.” Read more about images and layers in the  Docker docs .

Assuming your image built with no errors, you now have a container! Docker indicates that to you with the final lines:

Successfully built c8f944861cbf
Successfully tagged my_app_web:latest

Docker successfully built an image for you, gave it an ID, labeled that image with whatever you told it to name your image ( my_app_web  in this example), and tagged it “latest.”

Now you’re ready to run your container(s). Run this command:

$ docker-compose up

You will see some output and see your server start (as long as starting your server is part of your docker-compose.yml file).

Starting my_app_db_1 ...
Starting my_app_db_1 ... done
Recreating my_app_web_1 ...
Recreating my_app_web_1 ... done
Attaching to my_app_db_1, my_app_web_1
db_1   | LOG:  database system was shut down at 2017-10-19 00:59:19 UTC
db_1   | LOG:  MultiXact member wraparound protections are now enabled
db_1   | LOG:  autovacuum launcher started
db_1   | LOG:  database system is ready to accept connections
web_1  | Performing system checks...
web_1  |
web_1  | System check identified no issues (0 silenced).
web_1  |
web_1  | October 19, 2017 - 01:08:50
web_1  | Django version 1.11.5, using settings 'my_app.settings.local'
web_1  | Starting development server at http://0.0.0.0:8000/
web_1  | Quit the server with CONTROL-C.

Remember in our docker-compose.yml file when we asked Docker to not start our web service until our db service had successfully started? If you look at the top lines, you can see Docker is doing as we asked:

Starting my_app_db_1 ...
Starting my_app_db_1 ... done
Recreating my_app_web_1 ...
Recreating my_app_web_1 ... done

Running manage.py commands

Once you’re using Compose, you can run your normal manage.py commands without any extra work or setup in the docker-compose.yml file. We set docker-compose.yml up to automatically start our server, but running the server isn’t the only thing we do with manage.py.

To access your manage.py commands, enter:

$ docker-compose run -rm web ./manage.py [command] [arguments]


docker-compose run  tells Docker that we’re about to run a command. The  -rm  option will shut down this container when we’re finished with it.  web  identifies the service we want to run; manage.py commands will generally be in the web service.

The rest is the standard way to run manage.py commands:  ./manage.py [command] . Since the Dockerfile defines our WORKDIR as /code/, we can run  manage.py  from the current directory. For example, to run tests, we would run:

$ docker-compose run -rm web ./manage.py test [app]

The output is what we expect:

Starting my_app_db_1 ... done
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
...............
----------------------------------------------------------------------
Ran 15 tests in 0.577s

OK
Destroying test database for alias 'default'...

On the top line we see Docker start the db service, but the other output is probably familiar to you.

To make migrations, we would run:

$ docker-compose run -rm web ./manage.py makemigrations

Your container's bash prompt

You can also run commands from Docker’s bash shell. This is handy when you’ve run  docker-compose up  and you don’t want to stop your server while you do something else.

But in order to enter the bash shell of a container, you need the container id. While your server is running, open a new Terminal tab. Run the command  docker ps  to get your running containers and their IDs. You will see something like this:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
[container_1_id]        my_app_web             "python /code/mana..."   21 minutes ago      Up 21 minutes       0.0.0.0:8000->8000/tcp   my_app_web_1
[container_2_id]        postgres:9.6.5      "docker-entrypoint..."   9 hours ago         Up 21 minutes       5432/tcp                 my_app_db_1

Expand your Terminal window so you can see everything lined up pretty. You can see both of your containers,  my_app_web_1  and  my_app_db_1 , listed. Copy the container ID of your web container.

Now run this:

$ docker exec -it [container_id] bash

exec   is the command used to run a command in a running container. The  -it  option combines two options,  -i  and  -t , which together make the shell interactive and set up a  pseudo-TTY .  Identify your container by its ID, then type  bash  to indicate that you want a bash prompt. Your Terminal window will looks a little different:

$ docker exec -it [container_id] bash
root@[container_id]:/code#

You’re now inside your container’s bash shell. If you set your  WORKDIR  as  /code/  in your Dockerfile, then you will automatically be in the /code/ directory. If you  cd ..  and then  ls , you can see what other directories live in your container. Feel free to poke around.

You can also enter a container’s bash prompt with a Compose command:

$ docker-compose run [service] bash
Starting my_app_db_1 ... done
root@[container_id]:/code#

The difference between this command and  docker exec -it [container_id] bash  is that the  exec  command puts you into a bash prompt inside a container that was already running. The docker-compose command starts a brand new container and puts your into a bash prompt inside that container. You can see this happen in the output: the docker-compose command will print something like “Starting my_app_db_1,” to indicate it’s starting a new db service, since the new web service will require one. The difference probably won’t matter when things are going well, but it’s good to know.

Once you are in your bash prompt, you can enter the Python shell:

root@[container_id]:/code# ./manage.py shell
Python 3.6.2 (default, Sep 13 2017, 14:26:54)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

To exit your bash prompt (whichever way you got into it), just type  exit  or  ctrl + d .

Going Home

I’m not sure how neat or messy you like to leave your desktop at the end of the workday. But if you like to button things up, you might want to stop your containers from running all weekend when you leave the office on Friday. To do this, run:

$ docker-compose down

This will stop your running containers. If you haven’t been using the  -rm  option when running management commands, you will probably see more container notifications than you expected to.

Resources

Thanks to  Nick Lang  for advice on this post, and to  Frank Wiles  and  Jeff Triplett  for their feedback on drafts and their patient debugging help.

]]>/>
A Brief Intro to Docker for Djangonautshttp://www.revsys.com/tidbits/brief-intro-docker-djangonauts/Lacey didn&#x27;t have the opportunity to work with Docker at her last job. In this tidbit she steps you through getting started with Docker for Django developers.Tue, 05 Dec 2017 01:41:27 +0000http://www.revsys.com/tidbits/brief-intro-docker-djangonauts/

I’ll be honest: I was pretty trepidatious about using Docker. It wasn’t something we used at my last job and most tutorials felt like this comic by Van Oktop.

How to Draw a Horse

This post won’t teach you everything you need to know about Docker. But if you’re getting started with Docker and feeling a little lost, hopefully this will help demystify it a bit.

What is Docker?

But first, some frequently asked (by me to my colleagues) questions:

  • So… what’s Docker? Glad you asked! Docker helps you run different projects with different dependencies totally separately from each other, without needing to download a bunch of stuff onto your machine that you may never need again.
  • How is that different from virtualenv ? A virtual environment does some of this. You can use different versions of Python, Django, etc. in different projects when you run each project in its own virtualenv. Docker has the added benefit of isolating your database, caching service like Redis, and other tools as well. For my current project, I’m running a Postgres database and I didn’t have to download Postgres or configure it locally at all!
  • So do you use Docker alongside virtualenv? Not quite. You use Docker containers instead of virtual environments. If you’re committed to Docker, you don’t need to worry about virtualenvs anymore. (They can still be useful… but that’s another post.)

A few Docker definitions

  • Docker : a software container platform. In practice, this means that Docker is something you download onto your machine. You will run Docker for your projects the way you used to use virtual environments, but you will write a little extra code to set up your stuff in Docker.
  • Image : a “lightweight, stand-alone, executable package that includes everything needed to run a piece of software.” You will set up a specific image for each project you work on that will tell Docker which packages your project needs, where your code lives, etc.
  • Container : “a runtime instance of an image.” Containers are running copies of images, and are what your code will actually run in. This part is closest to what used to be the virtual environment.
  • Dockerfile : the name of the file that contains the instructions for setting up your image.
  • docker-compose.yml : the file where you can set up your database, automatically start your server when you start your container, and cool stuff like that.

Stop reading

I highly recommend working through the Get started with Docker tutorial on the Docker website. It will introduce you to the parts of a Dockerfile and the basics of how Docker works. The rest of this post assumes you’ve done the tutorial and are ready to use Docker with a Django project.

Setting up a new project in Docker

You should not consider these instructions as set in stone. It’s what made it easiest for me to get set up and verify that everything was working with Docker.

First, download Docker and complete the Get started with Docker tutorial.

Follow your normal process for starting a new project, including using cookie-cutter and creating a virtual environment . (You’ll discard this virtual environment later.) Create a requirements.txt file and add the packages you need. Inside your virtual environment, run pip install -r requirements.txt . Then run ./manage.py runserver and make sure you have the blue screen of success in your browser. Yay! Make your initial commit.

Dockerfile

In the same directory as your manage.py file, create a file called Dockerfile . Remember that a Dockerfile contains the instructions for creating your image. It should look something like this (but yours might not need everything mine does, and yours might include some instructions that mine does not):

FROM python:3.6

ENV PYTHONUNBUFFERED 1
ENV DJANGO_ENV dev
ENV DOCKER_CONTAINER 1

COPY ./requirements.txt /code/requirements.txt
RUN pip install -r /code/requirements.txt

COPY . /code/
WORKDIR /code/

EXPOSE 8000

Let's break this down:

FROM python:3.6

You don’t need to create your Docker image from scratch. You can base your image off of code in another image in the Docker Hub, a repository of existing Docker images.

On this line, I’ve told Docker to base my image off of the Python 3.6 image, which (you guessed it) contains Python 3.6. Pointing to Python 3.6 versus 3.6.x ensures that we get the latest 3.6.x version, which will include bug fixes and security updates for that version of Python.

ENV PYTHONUNBUFFERED 1

ENV creates an environment variable called PYTHONUNBUFFERED and sets it to 1 (which, remember, is “truthy”). All together, this statement means that Docker won’t buffer the output from your application; instead, you will get to see your output in your console the way you’re used to.

ENV DJANGO_ENV dev

If you use multiple environment-based settings.py files, this creates an environment variable called DJANGO_ENV and sets it to the development environment. You might call that "test" or "local" or something else.

ENV DOCKER_CONTAINER 1

This creates an environment variable called DOCKER_CONTAINER that you can use in settings.py to load different databases depending on whether you’re running your application inside a Docker container.

COPY ./requirements.txt /code/requirements.txt

Remember that "." means “the current directory,” so this line copies your project’s requirements.txt file into a new directory in Docker called /code/ .

RUN pip install -r /code/requirements.txt

Just like in a regular virtual environment, you need to install your required packages.

COPY . /code/

This line copies the rest of the code in your current directory "." (your project code) into the /code/ directory.

Each Docker container will already contain some subdirectories, so a good practice is to put your project code into its own directory.

WORKDIR /code/

You’re probably used to running things like ./manage.py runserver . But when you run that command in your Docker container, you’re likely to forget that your code doesn’t live in the current directory (.) anymore; it lives in /code/. This line tells Docker that you want your “working directory” to be /code/ so you can still continue to run commands from the current directory to your heart’s content.

EXPOSE 8000

In order to runserver like a champ, your Docker container will need access to port 8000. This bestows that access.

Huzzah! Your first Dockerfile is ready to go.

Deactivate your virtual environment. In Terminal or your command line, run docker build . from the same directory that contains your Dockerfile. You will see a lot of output in the console.

Your Dockerfile defines the rules and instructions for your image, and "docker build ." actually creates your image. You can’t run containers until you have a valid image to base them on. Assuming you had no errors when you ran docker build . , you will now have a functioning container!

Docker Compose

If you are not on a Mac, install Docker Compose . (Mac users: Docker Compose ships with Docker, so you’re good to go!)

Docker Compose lets you run more than one container in a Docker application. It’s especially useful if you want to have a database, like Postgres, running in a container alongside your web app. (Docker’s overview of Compose is helpful.) Compose allows you to define several services that will make up your app and run them all together. Examples of services you might define include:

  • web : defines your web service
  • db : your database
  • redis or another caching service

Compose can also help you relate those services to each other. For example, you likely don’t want your web service to start running until your db is ready, right?

Create a new file called docker-compose.yml in the same directory as your Dockerfile. While Dockerfile doesn’t have an extension, the docker-compose file is written in YAML, so it has the extension .yml . Mine defines two services, web and db , and looks like this:

version: '3'

services:
  db:
    image: postgres:9.6.5
    volumes:
      - postgres_data:/var/lib/postgresql/data/
  web:
    build: .
    command: bash -c "python /code/manage.py migrate --noinput && python /code/manage.py runserver 0.0.0.0:8000"
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    depends_on:
      - db

volumes:
  postgres_data:

Just like we did with the Dockerfile, let’s go through the parts of this docker-compose.yml file.

version: '3'

This line defines the version of Compose we want to use. We’re using version 3, the most recent version .

.

services:

Indented under this line, we will define the services we want our image to run in separate containers when we run our project.

db:
    image: postgres:9.6.5
    volumes:
      - postgres_data:/var/lib/postgresql/data/

This is where Compose gets exciting: this section sets up the db service as a Postgres database and instructs Compose to pull version 9.6.5 of Postgres from the image that already exists in Docker Hub. This means that I don’t need to download Postgres on my computer at all in order to use it as my local database.

Upgrading Postgres from one minor version to another while keeping your data requires running some extra scripts, pgdump and pgrestore, and can get a little complicated. If you don’t want to mess with this, set your Postgres image to a specific version (like 9.6.5). You will probably want to upgrade the Postgres version eventually, but this will save you from having to upgrade with every minor version release.

volumes tells Compose where in the container I would like it to store my data: in /var/lib/postgresql/data/ . Remember when I said that each container had its own set of subdirectories and that is why you needed to copy your application code into a directory named /code/ ? /var/ is one of those other subdirectories. A volume also lets your data persist beyond the lifecycle of a specific container.

web:
    build: .
    command: bash -c "python /code/manage.py migrate --noinput && python /code/manage.py runserver 0.0.0.0:8000"
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    depends_on:
      - db

This section sets up the web service, the one that will run my application code. build: . tells Compose to build the image from the current directory. command: bash -c "python /code/manage.py migrate --noinput" will automatically run migrations when I run the container and hide the output from me in the console. && python /code/manage.py runserver 0.0.0.0:8000 will start the server when I run the container. (The && lets us put two commands on one line.)

volumes:
      - .:/code

This section sets up another volume for the code in your current directory and where your code will live in the Docker container (/code/).

ports:
      - "8000:8000"

Here we map our own port 8000 to the port 8000 in the Docker container. A more technical explanation is, “We map port 8000 to the host’s port 8000, meaning that our app server will be reachable in the host via `127.0.0.1:8000` once it’s running;” thanks to Oliver Eidel for that!

depends_on:
      - db

The depends_on statement declares that our web service depends on our db service, so Compose will get the db service up and running before it tries to run the web service.

volumes:
  postgres_data:

Finally, Compose has a rule where you have to list your named volumes in a top-level volumes key, so we have done that.

Save the docker-compose.yml file.

In Terminal or your console and from the same directory that contains your Dockerfile and docker-compose.yml file, run docker-compose up .

Assuming you have no errors, navigate to http://localhost:8000/ in a browser and see your blue screen of success once again!

Ready for your next step? Check out Docker: Useful Command Line Stuff next!

Resources

Thanks to Frank Wiles and Jeff Triplett for reviewing drafts of this post.

]]>/>
Optimized Pythonhttp://www.revsys.com/tidbits/optimized-python/Smaller and faster Python 3.6.x images for you to take advantage of.Sat, 02 Dec 2017 16:50:15 +0000http://www.revsys.com/tidbits/optimized-python/

Turns out there are some optimizations you can do when compiling Python 3.6 and Python 3.7 that give you some significant speed improvements without any real downside.

We've paired with with Google's base Debian image that rips out systemd and all it's dependencies, which you weren't going to be using in a Docker container anyway, which yields a MUCH smaller image.

How do you use it? Pretty simple just start your Dockerfile with:

FROM revolutionsystems/python:3.7.1-wee-optimized-lto

And rebuild your image.

How much faster is it? Depending on the benchmark in question it's up to 19% faster and nearly 75% smaller than the official Python images.

You can get more details on benchmarks and how all this works on the dockerhub page . Information on build details is also available.

Enjoy!

Need help effectively using Docker with Python or Django? We can help! Contact us or send us a quick email .

]]>/>
Python 3 run command over SSHhttp://www.revsys.com/tidbits/python-3-run-command-over-ssh/Sometimes you just need to run a command on a remote host with ssh and Python 3 without any real fuss. Here is the simplest way we have found to do that and retrieve the output. Wed, 28 Dec 2016 17:35:49 +0000http://www.revsys.com/tidbits/python-3-run-command-over-ssh/

Fabric is great, but isn't 100% compatible with Python 3.  There is Fabric3 , which does work, but sometimes you don't want to have to issue `fab <something>`  but instead want to issue a command as part of a larger bit of Python.  

I ran into this situation today where I wanted to issue a few commands over ssh to a remote host as part of a Click command I'm building to do some ops automation here at REVSYS. 

It's probably not perfect in terms of error handling, but it sure is simple! Here is all you need to do:

from fabric.api import env
from fabric.operations import run as fabric_run
from fabric.context_managers import settings, hide

def run(user, host, command):
    """ Run a command on the host. Assumes user has SSH keys setup """
    env.user = user
    env.use_ssh_config = True

    with settings(hide('everything'), host_string=host):
        results = fabric_run(command)

    return results

You'll need to install the Fabric3 library with a simple:

pip install Fabric3

And away you go! All this is doing is re-using Fabric3's run() function while silencing all of the normal Fabric style output and forcing the user, host, and to use the calling environment's ssh configuration settings (like your ~/.ssh/config).  

We could accomplish this more directly with Paramiko or work some subprocess magic to call our ssh, but this was the simplest and fastest way I solved the ssh problem today.

Hope you find this little trick useful! Happy Hacking! 

]]>/>
Python: atexithttp://www.revsys.com/tidbits/python-atexit/Python's atexit handler is one of those things people should use more often.  Some developers aren't even aware that it exists and that makes me sad. Thu, 29 Sep 2016 18:47:09 +0000http://www.revsys.com/tidbits/python-atexit/

Did you know Python can automatically do things for you when the process exits? The atexit  is one of those handy, but often forgotten about, aspects of Python.  I had cause to use it yet again today with a certain process that was being naughty.  

The idea is simple, atexit instructs Python to run your function just before the process exits.  So when would you want to do that? I'm sure there are other situations, but I typically end up using them when:

  • I want to log both process startup AND exit
  • Alert someone that a process has died, especially cron jobs that are easily forgotten about

A quick example to show you the idea:

import atexit


@atexit.register
def closing():
    print("=== Closing down ===")


while True:
    print("whee!")

Using the atexit decorator, we wrap a little function that just prints out that we're closing down. Our while loop, er I mean our whee loop, will run until we interrupt the process with a Ctrl-C.  The output looks like this:

whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee!
whee^Cwhee!
whee!
Traceback (most recent call last):
  File "example.py", line 10, in <module>
    print("whee!")
KeyboardInterrupt
=== Closing down ===

We see our usual KeyboardInterrupt traceback and then our custom closing message. 

You can also register your atexit functions without the decorator so you can pass args and kwargs to it to be flexible.  

Seems pretty perfect, doesn't it? It's not all roses.  Python only calls these atexit functions in a couple of scenarios, specifically if:

  • sys.exit() is called
  • The process completes and is exiting normally

However, it  does not  call your atexit handles if Python is killed by a signal, when Python itself has a fatal internal error, or if os._exit() is called directly.  So it's not useful to detect kills or kill -9s for sure, but in most other usual exit scenarios it can be handy. 

Hope you find a use for it! 

]]>/>
The Vagary of AWS Availability Zoneshttp://www.revsys.com/tidbits/vagary-aws-availability-zones/What are Amazon availability zones anyway?Tue, 30 Aug 2016 16:56:16 +0000http://www.revsys.com/tidbits/vagary-aws-availability-zones/

In daily parlance, “zone” is used in a variety of contexts:

  1. the neutral zone (Romulan or Klingon--take your pick)
  2. the demilitarized zone
  3. a safe zone
  4. zoned-out
  5. in the zone
  6. the "friend" zone
  7. land/property-use zoning

The commonality these contexts share is the idea of a specific space;  whether physical or metaphysical, virtual or real, a “zone” defines a discrete space for a specific use or set of uses (or non-use in the case of #4).

I hadn’t thought much about whether or not the term ‘availability zone’ was etymologically accurate. 

On some level (Amazon NOCs), I imagine it is precise enough. From the perspective of an external consumer of AWS resources--I imagine most people will go through life without giving it a second thought.  Apparently, I was not destined to walk amongst them.

My goal was simple: Using the Python boto3  and botocore  modules, I wanted to be able to specify a supernet , an AWS region, a CIDR mask  and a VPC  id and have it do the math and create the desired subnets that would subsequently be attached to the target region’s availability zones.

What I learned:

  • the concept of "zone" in AWS-land is fluid.  Something I noticed: the AZ is the only AWS thing that does not have an associated ID (at least not available to mortals)
  • the EC2 API will give you a list of possible availability zones. They may all be active or... not. (e.g: us-east-1  currently has five zones. Four of them exist. The letter associated with the dead zone is chosen by Amazon when you create your account)

When you ask for the list of an AZ within us-east-1:

[
    "AvailabilityZones",
    [
        {
            "State": "available",
            "RegionName": "us-east-1",
            "Messages": [],
            "ZoneName": "us-east-1a"
        },
        {
            "State": "available",
            "RegionName": "us-east-1",
            "Messages": [],
            "ZoneName": "us-east-1b"
        },
        {
            "State": "available",
            "RegionName": "us-east-1",
            "Messages": [],
            "ZoneName": "us-east-1c"
        },
        {
            "State": "available",
            "RegionName": "us-east-1",
            "Messages": [],
            "ZoneName": "us-east-1d"
        },
        {
            "State": "available",
            "RegionName": "us-east-1",
            "Messages": [],
            "ZoneName": "us-east-1e"
        }
    ]
]

Today there are only 4 active zones.  Which one is the Dead Zone(tm)?  The solution: simple brute force!

  1. Get the list of possible AZs for the region
  2. Create a dummy VPC using a moderate sized IPv4 supernet (/21)
  3. Divide the supernet into several subnets -- there are 8 /24 subnets in a /21 network so unless Amazon rolls out a region with more than 8 AZs, we're covered
  4. Start creating subnets on the dummy VPC
  5. The response from Amazon is a dictionary with a ResponseMetadata key.  If response['ResponseMetadata']['HttpStatusCode'] , you have a live AZ
  6. Cache this on a per-account basis.

import boto3
from botocore.exceptions import ClientError
from ipaddr import IPNetwork    # https://github.com/google/ipaddr-py

def verified_az_list(region=us-east-1, key=None, keyid=None, profile=None, supernet=10.255.248.0/21):
    retval = []
	
    rc_boto = boto3.Session(aws_access_key_id=keyid,         # 1
                       aws_secret_access_key=key,
                       region_name=region)

    mc_ec2 = boto3.client(ec2)

    (_, az_dat,), stat = mc.describe_availability_zones().items()

    az_list = [t['ZoneName'] for t in az_dat]

    resp = mc_ec2.create_vpc(                                # 2
        CidrBlock=supernet, InstanceTenancy='default'
    )

    if resp['ResponseMetadata']['HTTPStatusCode'] != 200:
        return False

    vpc_id = resp['Vpc']['VpcId']

    subnets = [                 # for list comp. haters: using the ipaddr.IPNetwork.exploded 
        subnet.exploded         # property method to extract the CIDR string--we only need one
        for t in IPNetwork(     # subnet-per-reported AZ
           supernet 
        ).subnet(
            new_prefix=28
        )[:len(az_list)]
    ]

    # THE BRUTALITY
    for az, subnet in zip(az_list, subnets):
        try:
            (_, subnet_dat), stat = mc_ec2.create_subnet(
                VpcId=vpc_id,
                CidrBlock=subnet,
                AvailabilityZone=az
            ).items()

            retval.append(az)
        except ClientError:
            pass
        else:
            mc_ec2.delete_subnet(SubnetId=subnet_dat['SubnetId'])

    mc_ec2.delete_vpc(VpcId=vpc_id)

    return retval

to be fair...

From Amazon’s very own “ What is Amazon EC2? ” document:

An Availability Zone is represented by a region code followed by a letter identifier; for example, us-east-1a. To ensure that resources are distributed across the Availability Zones for a region, we independently map Availability Zones to identifiers for each account. For example, your Availability Zone us-east-1a might not be the same location as us-east-1a for another account. There's no way for you to coordinate Availability Zones between accounts.

I used to thoughtlessly cause the Death of Trees by printing off, binding then storing such manuals on the back of toilets for perusal when entrapped by biological necessity.  I don’t do that anymore--the ascension of tablets and smartphones have made reaching for 3-ring binders filled with pounds of API documentation a rare compulsion.  The result: I don’t think I’ve given the EC2 introductory manual more than a passing glance until recently.

P.S.

To the keepers of Amazon’s AWS documentation: your mobile user experience is rotten. Save a forest; give us a nice mobile-friendly doc option!  (PDF -> MOBI doesn’t count! :P )

]]>/>
Python 12-factor apps with envparsehttp://www.revsys.com/tidbits/python-12-factor-apps-envparse/Getting your configuration from the system environment for your Python and/or Django apps is often the best way to provide security and flexibility. envparse makes it easy. Sat, 27 Aug 2016 18:39:41 +0000http://www.revsys.com/tidbits/python-12-factor-apps-envparse/

Making  12-Factor apps  is all the rage these days and not without good reasons. Using the system environment to get your app's configuration is probably the most flexible way to pass configuration information once you get used to it.

One mental barrier I know I had was I didn't want to always  have to pass a bunch of variables just to run my code. The ideal setup is:

  • Load the env vars from a file in the current directory or any directory above
  • Have any variables from the actual environment included and give the ability to override them on a per execution basis on the command line

Luckily, the envparse  library for Python gives us this pretty easily.  There are two ways you can use it.  The "standard" way or more manual way.  Or you can define up front a schema of variables you expect and ensure they are cast to the types you need. 

Let's explore the standard way first. 

from envparse import env

env.read_envfile()

That's all you need to do to load up the environment. The library will walk your file system path looking for a .env file, if it finds one it will load it.  If it doesn't it will issue a warning you can safely ignore. 

You're on your own for validation, however.  If you go to access a variable that doesn't exist you'll get a KeyError. 

Let's assume we have the following in a .env in the current directory:

AWS_ACCESS_KEY='blahblahblahblah'
AWS_SECRET_KEY='WeAreNotThatDumb'
CONSULTING_LEVEL='Pro'

If we then called our script like this:

OTHER='bar' CONSULTING_LEVEL='Expert' python envparse-example.py

Now let's look at envparse-example.py, what do you think it would output?

from envparse import env

env.read_envfile()

print('ACCESS =', env.str('AWS_ACCESS_KEY'))
print('SECRET =', env.str('AWS_SECRET_KEY'))
print('CONSULTING_LEVEL =', env.str('CONSULTING_LEVEL'))
print('OTHER =', env.str('OTHER', default='foo'))

You win a cookie or something if you correctly guessed:

ACCESS = AKIAJI4KRV67OPXQZS7Q
SECRET = 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
CONSULTING_LEVEL = Expert
OTHER = bar

I want to point out two things about this. First, as you probably already understood the fact we have 'CONSULTING_LEVEL' set in both the file and on the command line, the command line version takes precedence. 

The second thing I want to point out is the last line in the source above.  We overrode 'OTHER' on the command line, but if we hadn't it would be set to the default of 'foo'.  This helps allow you to set sane defaults in the absence of the variable being set anywhere at all. 

envparse using a Schema

Now that we've covered the more manual and direct way of using envparse, let's play with the Schema support real quick. 

from envparse import Env

env = Env(
    AWS_ACCESS_KEY=str,
    AWS_SECRET_KEY=str,
    CONSULTING_LEVEL=str,
    OTHER=dict(cast=str, default='foo')

)
env.read_envfile()

print('ACCESS =', env.str('AWS_ACCESS_KEY'))
print('SECRET =', env.str('AWS_SECRET_KEY'))
print('CONSULTING_LEVEL =', env.str('CONSULTING_LEVEL'))
print('OTHER =', env.str('OTHER'))

Assuming the same .env file and overrides on the command line this will produce the exact same results. The different here is we import 'Env' (note the capitalization) and set the variables we expect to have. 

If something is missing from our environment, envparse raises a envparse.ConfigurationError rather than getting the more generic KeyError when we go to access to a missing value.

Using a schema like this also gives us options for settings defaults as part of the definition rather than when calling for the value itself.  envparse has a few other useful options for casting more complex types and giving you the ability to pre or post process the values when they are read which can be useful to help normalize values.  

Using envparse with Click

I'm a huge fan of Click  and while it has great support for pulling values out of the environment for you , it doesn't support using a .env file directly.  Lucky for us it's straightforward to use envparse with Click. If you need options that can come from a .env file, the environment directly, or passed on the command line you can do this:


import click

from envparse import Env

env = Env(
    AWS_ACCESS_KEY=str,
    AWS_SECRET_KEY=str,
    CONSULTING_LEVEL=str,
    OTHER=dict(cast=str, default='foo')

)
env.read_envfile()

@click.command()
@click.option('--access-key', default=env.str('AWS_ACCESS_KEY'))
@click.option('--secret-key', default=env.str('AWS_SECRET_KEY'))
def testing(access_key, secret_key):
    print(access_key)
    print(secret_key)

if __name__ == '__main__':
    testing()

The trick here is to use envparse to read in your expected values, whether they come from a .env file or are passed on in via the environment it doesn't matter.  We then use the value from envparse to set the default for Click.  If we need to, we can still override this on the command line and everyone is happy.

Hope this helps you make your Python app's configuration more flexible! 

]]>/>
reverse()'s soulmate resolve()http://www.revsys.com/tidbits/django-resolve-to-view/Have a URL but can't figure out what view it is supposed to use? Use resolve() to easily and quickly determine it just as Django does.Thu, 11 Aug 2016 12:29:46 +0000http://www.revsys.com/tidbits/django-resolve-to-view/

One common practice in Django is reversing URL:

from django.urls import reverse

reverse('my_url_name', kwargs={'pk': 1})

But very often, in large projects, it's hard to remember exactly what the name of the URL is. Furthermore, you maybe using 3rd party libraries that include URLs you want to refer to.

If you know what the URL looks like, but you can't remember the name or its arguments, a quick and easy way to find them out is use the `resolve` function:

from django.urls import resolve

resolve('/myurl/2')
# ResolverMatch(func=myapp.views.MyModelViewSet, args=(), kwargs={u'pk': u'2'}, url_name=mymodel-detail, app_name=None, namespaces=['api'])

As you can see, the output includes all you need to reverse the URL, including arguments, app_name and namespaces.

UPDATE

The django-extensions package provides another easy way to find out which view. After installing it, you can simply use its show_urls and grep ping for the URL you're looking for:

$ ./manage.py show_urls | grep "/myurl/<pk>"

Thanks to @bmihelac  for the suggestion!


]]>/>
Extend Django Templates Smartlyhttp://www.revsys.com/tidbits/extend-django-templates-smartly/You can use a variable to set the name of the template you are extending when using Django templates.Mon, 08 Aug 2016 16:55:49 +0000http://www.revsys.com/tidbits/extend-django-templates-smartly/

One of my all favorite template tricks goes back to my LJ World days.

Most people don't realize that the  extends  tag accepts a variable and not just a hard-coded literal value. This allows for magical things to happen when combined with a default.

The following block extends a base template named "base.html" if the  base_template  is not set.

{% extends base_template|default:"base.html" %}

For any page that you want to programmatically override the base template, you may do so by adding a base_template context variable to your views context values which are exposed to your template.

from django.views.generic import TemplateView


class MarketingView(TemplateView):

    template_name = 'marketing_landing.html'

    def get_context_data(self, **kwargs):
        context = super(MarketingView, self).get_context_data(**kwargs)
        context['base_template'] = 'base_marketing.html'
        return context

This simple trick opens the door to do fun things with your website based on the day of the year or something useful like A/B test two different versions of your website's landing page. The possibilities are endless.

]]>/>
Django Birthday: Recaphttp://www.revsys.com/tidbits/django-birthday/Our recap of the Django&#x27;s 10th Birthday PartyTue, 14 Jul 2015 21:02:13 +0000http://www.revsys.com/tidbits/django-birthday/

Happy 10th Birthday Django!

Whew, what a conference! The talks were great, the venue was comfortable, and the BBQ was amazing. But that was only part of it. Many of the talks focused on the Django community and you could feel it in the room. We laughed from our bellies and teared up each time someone mentioned Malcolm .

I had the honor of giving the closing address, which was really just a thinly veiled way for me to have a Steve Jobs "oh and one more thing" moment. We had secretly conspired with The World Company, the place where Django was born, to open up the building so all of the attendees could get a quick tour and see EXACTLY where Django originated. Liberty Hall, where we held the conference, is conveniently across the alley from the Journal-World building so we all stood up, grabbed our bags, and walked over. The photo above is most of us in front of the News Center building where Django was born.

The night before the conference we had a very loosely planned evening of dinner and drinks which also included a quick Skype session with the Django Girls group in Australia to kick off the celebration. It was a great relaxed way to catch up with old friends.

Videos

The videos are up on PyVideo.org for you all to enjoy. As always Next Day Video had them up so fast it was a constant stream of "OMG my talk is already up" Tweets.

Photos

Even before Adrian mentioned in his talk about taking more photos of your projects for nostalgia purposes, we had it in our plans for Django Birthday. There is no telling if Django will make it to 2025, but assuming it does we have plenty of ammo for the next birthday party. I took a bunch of photos and we hired Dailey Images to ensure we captured all of the right moments.

Some initial quickly edited shots from Jason Dailey, the photos I took, and some from my friend Joe Griffin are in my Django Birthday photo album.

Kenneth Love also setup an iCloud album of some of his shots. Including one of the newspaper the next day with a front page photo of the block party.

The Party

Eldarion and RevSys sponsored the party with Lincoln Loop sponsoring the birthday cake.

While it was a bit warmer than I would like, the party was epic. We had all of the things you would expect at a 10 year old geeky kids birthday party. Snow cones, balloon animals, Gizmo the robot wandering around, and of course a bouncy castle. The Batmobile even made an appearance, but that wasn't planned... it's just #LFK

For the more adult of us, we also had 3 musical performances and started the night off with Adrian himself playing Django Reinhardt. Yep that's right, the guy who created Django, playing Django, at Django's 10th birthday.

Our friends at The SandBar were awesome enough to help us throw this block party and showed everyone an amazing time. We had the usual things you would expect, but also a special Python/Django themed craft cocktail tent and this awesome shrimp & rice in a pineapple food option that I have to try to recreate at home now.

When we started planning this party Jacob made me promise to not have any ponies, but I couldn't resist one. So in addition to our big cakes for all attendees we had the baker create a small cutting cake and let Adrian have the honor of cutting the pony in half.

I want to personally thank everyone who made the trip out to our little weird bit of the world. I hope you enjoyed yourselves and come back again very soon. Things went so well, I'm not sure I can wait until 2025. We may have to celebrate Django's Sweet 16 or something in the mean time.

Many thanks to our great sponsors for helping us put on a great and fun event!

And our travel sponsors:

]]>/>
Django Birthday Partyhttp://www.revsys.com/tidbits/django-birthday-party/Django is turning 10 years old this summer and we&#x27;re throwing it a birthday party in the city where it was born.Wed, 17 Jun 2015 23:01:07 +0000http://www.revsys.com/tidbits/django-birthday-party/

So as you may know, Django is turning 10 years old this summer and we’re throwing it a birthday party . It’s a one day tech conference and one day of sprints mere feet from where Adrian, Simon, Jacob, and Wilson toiled away on “ the CMS ”, as it was called then. That code became the first of a few amazing refreshes of Lawrence.com and the very first Django website.

We’ve got an amazing line up of speakers of course. And we want to celebrate Django obviously, but secretly we have another small motivation. See we love Lawrence, despite the fact it’s surrounded by Kansas, and we’ve grown tired of answering questions along the lines of “Ummm why are you in Kansas?” So we’re going to expose our Django community friends to as much as this area has to offer to help explain why we choose to live here. And who knows maybe one of them will fall in love, as we have, and build their next startup here.

So what can you expect?

Most everyone agrees the best part of Lawrence is our downtown, so we’re having the event downtown and encouraging everyone to stay in or near the area. This makes everything special in Lawrence within walking distance. There’s great local restaurants, quirky stores, and being a college town lots of interesting places to grab a libation should you so choose.

We’re having the event at Liberty Hall which has acted as a community-meeting house and venue since 1856! If you’re interested you can read more about its history . Lawrence is weird and funky, which is reflected in the wide variety of entertainers that have performed there. Some highlights include: Oscar Wilde, Tina Turner, Wu Tang Clan, Henry Rollins, Tori Amos, The Civil Wars, and for its 100th anniversary party The Flaming Lips which I’m quite sad to have missed. Lawrence has a long and great music history, but much of that happened in the 90s at The Outhouse (which is now sadly a strip club so we won’t be going there) and another venue down the street named the Bottleneck .

Kansas is known for its BBQ, so of course we’re having Kansas City’s best Oklahoma Joe’s cater lunch. In 2009, Anthony Bourdain listed Joe’s in his “Thirteen Places to Eat Before You Die”. Yeah it’s that good.

But it’s a party right?

Oh yes it’s a party all right! We’ve gotten special permission from the City of Lawrence to close down a block of downtown and hold a block party! We’ll have a few live bands, including Adrian, food, beer, a special cocktail tent for our more refined guests, snow cones, and some other fun family friendly things for the kids.

See many in Lawrence don’t know about Django and we want to change that. By having a block party all of Lawrence can come down and help celebrate, instead of having it just be a small gathering of us geeks.

Oh and there will be cake of course… it’s not a birthday party without cake.

Many thanks to our great sponsors for helping us put on a great and fun event!

And our travel sponsors:

]]>/>
Improved Django Testshttp://www.revsys.com/tidbits/django-test-plus/Improve your Django Tests using django-test-plus&#x27; helpful additions to the standard Django TestCaseFri, 29 May 2015 12:20:52 +0000http://www.revsys.com/tidbits/django-test-plus/

You write tests. Or at least you should be writing tests! As Jacob always says, "Code without tests is broken as designed."

Unfortunately you have to write a ton of boilerplate code to test even the simplest of things. This isn't true of just Django, but most languages and frameworks. There are lots of tools like Factory Boy and pytest that help, but we can do better.

If you're looking for how to get started writing Django tests I'd suggest reading the excellent testing documentation first and also consider picking up Harry Percival's excellent book Test-Driven Development with Python

After the 400th time of copying over the same base test methods into various personal and client projects, I finally got around to packaging up and releasing all of our useful test additions into a module django-test-plus . I confess, I did this so I didn't have to copy it around anymore as much as wanting to share it with the world. My laziness is your gain. Come be lazy with me!

Let's have some testing fun

I think this module is best explained by examples. So let's just dive right in. First off you need to install django-test-plus with:

pip install django-test-plus

Then instead of this...

from django.test import TestCase
from django.core.urlresolvers import reverse

class MyViewTest(TestCase):

    def test_some_view(self):
        url = reverse('my-url-name')
        response = self.client.get(url)
        self.assertEqual(response.status_code, 200)

Do this...

from test_plus.test import TestCase

class MyViewTest(TestCase):

    def test_some_view(self):
        response = self.get('my-url-name')
        self.response_200(response)

For the sake of brevity, I'll be excluding the imports and test class in the rest of these examples and just focus on the methods themselves.

Or even this...

Getting a named URL and checking that status is 200 is such a common pattern, it has it's own helper. This will get the named URL, check that the status is 200 and return the response for other checks.

def test_some_view(self):
    response = self.get_check_200('my-url-name')

Instead of...

def test_with_reverse_args(self):
    url = reverse('some-url', kwargs={
        'year': 2015, 
        'slug': 'my-this-is-getting-long'
    })
    response = self.client.get(url)

Do this...

def test_with_reverse_args(self):
    response = self.get('some-url', year=2015, slug='much-better')

HTTP Posts work the same way

Instead of...

def post_with_reverse_args(self):
    url = reverse('some-url', kwargs={
        'year': 2015, 
        'slug': 'my-this-is-getting-long'
    })
    data = {'body': 'long-way'}
    response = self.client.post(url. data=data)

`

Save your keystrokes for another day with...

def test_post_better(self):
    response = self.post('some-url', 
        year=2015, 
        slug='much-better', 
        data={'body': 'lazy-way'}
    )

Sometimes you still need reverse...

So it's included for you. No need to import it yourself.

def test_reversing(self):
    response = self.get('my-named-url')
    test_url = self.reverse('some-other-url', pk=12)
    self.assertEqual(response.context['next'], test_url)

That's better, but that's still a bit rough on the hands. Let's make it a touch better with this version which uses our get_context() method:

def test_reversing(self):
    response = self.get('my-named-url')
    test_url = self.reverse('some-other-url', pk=12)
    self.assertEqual(self.get_context('next'), test_url)

We keep the last response received in self.last_response for you so there isn't a reason to have to pass it around all over the place.

Often you need to test the values of several context variables, so let's make that a bit easier.

def test_several_values(self):
    self.get('my-view')
    self.assertContext('key1', 'value1')
    self.assertContext('key2', 'value2')
    self.assertContext('key3', False)

What about other statuses?

Don't worry we've got you covered there are several response_XXX() methods to test for other common status code, for example:

def test_not_found(self):
    response = self.get('no-there')
    self.response_404(respones)

Authentication and Users

When testing out Django views you often need to make some users, login in as them, and poke around. Let's make that easier too!

def test_needs_login(self):
    # Make a user 
    user = self.make_user('testuser')

    # Make sure we protected the view
    self.assertLoginRequired('my-protected-view')

    with self.login(user):
          self.get('my-protected-view')
          self.assertContext('secret', True)

Performance

It's easy to make a few template changes that seem inconsequential only to have your database query count blow up to the size of Warren Buffett's checking account. Django provides the assertNumQueries assertion context to check the query count, but that is a static count. Often the results are slightly variable, so django-test-plus as assertNumQueriesLessThan .

def test_does_not_get_crazy(self):
    with self.assertNumQueriesLessThan(25):
        variable_query_count_function()

Ultimate Lazy

Maybe it's a toy project or you're adding tests to a project that previously didn't have any. Some tests are better than no tests right? So we provide a quick view check with assertGoodView :

def test_better_than_nothing(self):
    self.assertGoodView('first-view')
    self.assertGoodView('second-view')
    self.assertGoodView('something-else')

What's this do? It gets the view at the named URL, ensures the status code is 200 and tests that the number of queries run is less than 50. Oh and it returns the response if you want it for other purposes. Not amazing, but better than having no test coverage.

The future

I've been using various versions of this module for years, but there is always room for improvement. Happy to take pull requests submissions for new methods that will be generally useful. Or maybe one day I'll be less lazy and see about merging these into Django core.

Hope you like the module and find it useful. Happy Testing!

]]>/>
Django Performance: 4 Simple Thingshttp://www.revsys.com/tidbits/django-performance-simple-things/Four simple steps you can take to improve the performance of your Django applicationsWed, 06 May 2015 15:13:51 +0000http://www.revsys.com/tidbits/django-performance-simple-things/

Optimizing for performance often comes with a bunch of headaches and hard problems, but it doesn’t always have to be that way.

Below are four simple things you can do to quickly and easily improve the performance of your sites. They’re so simple you should make them part of your standard setup.

Persistent database connections

New in Django 1.6 Django has built-in database connection persistence. Most everyone uses PostgreSQL for their production systems, and while connecting to PG isn’t particularly slow, it is something we can optimize.

Without persistent connections every request to your site also makes a connection to your database, authenticates the user, etc. This can easily take as much as 20–75ms if your database is on another host even on a fast network.

To setup connection persistence you add the CONN_MAX_AGE parameter to your DATABASES setting to look like this:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'whoohoodb',
        'CONN_MAX_AGE': 600,
    }
}

With this,we’ve setup persistent connections to stay alive for 10 minutes at a time. This helps to cut down on any possible memory leaks or a flaky connection causing a long term problem. You can certainly use a longer value, but I wouldn’t go much beyond an hour as there will be little benefit.

You can read more details about persistent database connections in the Django documentation.

Template Loading

By default Django comes configured to use the two standard template loaders:

TEMPLATE_LOADERS = (
    'django.template.loaders.filesystem.Loader',
    'django.template.loaders.app_directories.Loader',
)

These loaders search the file system and parse your templates on every request. Kinda surprising that it works as quickly as it does right?

You can turn on cached loading, so Django only has to find and parse your templates one time, by simply changing your configuration to be:

TEMPLATE_LOADERS = (
    ('django.template.loaders.cached.Loader', (
        'django.template.loaders.filesystem.Loader',
        'django.template.loaders.app_directories.Loader',
    )),
)

Now, don’t be silly and turn on cached loading in your development environments as it will get annoying. You’ll be forced to restart runserver (or whatever you’re using locally) on each and every template change.

Optimize Django Sessions

If you poll Django users nearly 82% of them have no idea where their sessions are stored. Ok, I admit I made that number up, but in our experience it’s pretty true.

By default, Django stores your user sessions in your database and expects you to occasionally prune out old entries. Very few people do this.

So on each request you’re doing a SQL query to get the session data and another to grab the User object information. If you’re the sort of person who’s concerned about performance you likely already have memcached or Redis setup to cache some things. You can switch to storing your sessions in your cache and easily remove a SQL query from every single request to your site with:

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'

Now this assumes you can safely lose any data you store in your user sessions. If that’s not the case, you can still get some benefit from using:

SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'

This backend caches your data, but also writes any changes to your database for long term storage.

select_related() and prefetch_related()

The first two tips above were super easy, right? Unfortunately this last one, while still easy compared to totally changing up your data storage, does require a bit more than a single configuration setting change.

select_related() and prefetch_related() aren’t in everyone’s bag of tricks but they are incredibly useful for cutting down on the number of SQL queries your ORM code is firing.

The situation that usually happens is you have some BlogPost model with a ForeignKey to User. Then in a list view you use queryset = BlogPost.objects.active . Then in your templates you have something like:

<ul>
{% for post in object_list %}
  <li>{{ post.title }} - {{ post.user.email }}</li>
{% endfor %}
</ul>

This ends up doing the expected SELECT against the blog_post table, but then is forced in the template loop to do another SELECT against auth_user for each and every post in the list .

Instead, we can ask the Django ORM to go ahead and JOIN in the auth_user table to our results so that object.user is already an object for us. Taking us from BlogPost.objects.active().count() queries down to a single query.

We can fix it by simply changing the view’s queryset to be:

queryset = BlogPost.objects.select_related().active()

While not a simple settings change, still pretty simple to accomplish.

prefetch_related is the same mechanism, but for ManyToManyField relationships. Prefetch can’t use a join here, so the join happens in Python at the ORM level. This can be a HUGE performance win for situations where you have a small number of rows in your ManyToManyField() vs a large number of rows in the model itself. However, in general, it’s less useful and thus used less often that select_related() .

So when do you use these you ask? The easiest thing to do is fire up django-debug-toolbar with your project and some representative data. By representative I mean you shouldn’t have a single BlogPost and a single User, try more like 100+ of each.

Then all you need to do is browse around your app locally with the debug toolbar open looking specifically at the query counts and times. If you see anything over 5–10 queries going on this is your clue to investigate.

Most of the time with 100s of objects in your database you’ll stumble across a view that is doing 200–500+ queries where one of these two options will be a huge win for you. Change your queryset, refresh, and make sure that both your query counts decreased but also your overall time. Occasionally you’ll have a situation where it’s actually faster to do the extra queries vs using joins it happens rarely, but still worth checking the overall query time to make sure you aren’t making things worse.

]]>/>
Wagtail 1.0 (beta) best Django CMS?http://www.revsys.com/tidbits/wagtail-best-django-cms/Is Wagtail the best Django CMS? Review of Wagtail 1.0 features and our experiences using it.Wed, 29 Apr 2015 15:25:21 +0000http://www.revsys.com/tidbits/wagtail-best-django-cms/

Now that the Wagtail CMS is gearing up for its 1.0 release, I wanted to take some time to introduce you to the all around best and most flexible Django CMS currently available. Wagtail has been around for a while, but doesn’t seem to get the attention I believe it deserves.

We’ve used Wagtail recently on a number of projects, and the overall experience has been great. It strikes the right balance of making the easy things easy and the hard things not only possible, but relatively easy as well.

Feature Highlights

  • Non-technical end-user ease. Custom admin with great UI/UX
  • Plays nicely alongside any other Django apps on your site
  • Easy admin customization and branding
  • Flexibility of CMS models for more structured data beyond just having a “Page”
  • Built in Form builder system
  • Great Image and Document/File support and UI
  • StreamField for ultimate flexibility allowing you to define and organize small blocks of content
  • Ability to organize admin tabs and field layouts
  • Control/Flexibility of what page models can be added below certain URLs
  • Hooks into ElasticSearch for searching
  • Compatible with Varnish and static site generators to help with performance at scale

Admin Interface

Let’s face it, the Django admin leaves a lot to be desired. It’s very CRUD-oriented and confusing for all but the most technical of users. Even giving it a facelift with packages like Django Suit , or swapping it out entirely for something like Grappelli isn’t really what your end users want. Don’t get me wrong: both of these packages are great and you should check out, but they both simply can’t get past all of the hurdles and pain that come with attempting to customize the Django admin beyond a certain point.

Wagtail comes out of the box with it’s own custom admin interface that is specifically geared toward a typical CMS workflow. Check out this great promo video about Wagtail and you’ll see what I mean. No seriously, go watch it. I’ll wait.

Isn’t that great looking? My first thought when seeing the Wagtail video for the first time was “Nice, but I bet customizing it is a huge pain in the…”. Thankfully, I gave it a whirl anyway and came to find that customizing the Wagtail admin is actually pretty simple.

There is a great editor’s guide in the docs that is all most end users need to get started. So far in our use, the only thing that confuses users is the Explorer, Root pages, and the hierarchical nature of pages in general. Even those are small issues as one quick chat with the user and they grok it and are on their way.

Oh and a huge bonus the admin is surprisingly usable on both mobile and tablets!

Ways to customize the Wagtail Admin

There are a few ways you can customize the admin. First off, you can determine what fields are visible to your users and on what tab of the interface with just a bit of configuration . Consider this the bare bones entry level of “customization” you’ll be doing.

Customizing the branding of the admin is also a very frequent need. Techies often don’t see the point, but if you can put on your end user hat for a moment it seems weird and often confusing to come to a login page for www.revsys.com that reads “Welcome to the Wagtail CMS Admin”.

If you install django-overextends you can easily customize the logo, login message, and welcome messages used by the CMS to match your user’s expectations.

For me, these two customization options are what I expect from a CMS. However, Wagtail goes a step further and gives you hooks to allow for much richer customizations. You can do things like:

  • Add items to the Wagtail User Bar that appears for logged in users on the right side of the page much like Django Debug Toolbar
  • Add or remove panels from the main Wagtail admin homepage
  • Add or remove summary items (Pages, Documents, Images, etc) from the homepage
  • Use hooks for taking behind the scenes actions or if you want your own customized Responses after creating, editing, or deleting at Page
  • Add your own admin menu items, which can go to any Django views or offsite URLs you desire.

I used that last ability to add admin menu items with great success on TEDxLawrence.com . We needed a way for our Speaker Committee to view the speaker submissions, vote, and make comments. Instead of attempting to shoe horn all of this into a Django Admin or even Wagtail Admin universe, I simply linked off to entirely customized Class Based Views to give me complete end to end control.

Wagtail Pages

Most content management systems operate around the concept of a page that usually has a title, some sort of short description of the page, and then the page content itself. Many give you nice WYSIWYG editing tools to make things like adding headers, lists, bold, and italics relatively easy.

The problem comes when what you are wanting to represent on a page doesn’t fit cleanly in this data model. Do you just shove it into the content field? Maybe your CMS has some clunky mechanism to relate additional content to a page or via some plugin system. Or maybe you’re just out of luck, punt, and load some stuff with javascript from the template.

With Wagtail you build your own models that inherit from it’s Page model. This gives you the ability to customize specific fields for specific data and ultimately removes a lot of the usual shenanigans one goes through to fit your data concepts into your CMS’ vision of the world.

This probably works best as an example. Let’s build two different types of pages. A simple blog type page and a more complex Staff Member page one might use for individual staff members.

Our simple page can look like this:

from django.db import models
from wagtail.wagtailcore.models import Page
from wagtail.wagtailcore.fields import RichTextField
from wagtail.wagtailadmin.edit_handlers import FieldPanel

class BlogPage(Page):
    sub_title = models.CharField(max_length=500, blank=True)
    published = models.DateField()
    author = models.CharField(max_length=100)
    summary = models.RichTextField(blank=True)
    body = models.RichTextField()
    closing_content = models.RichTextField(blank=True)

    content_panels = [
        FieldPanel(title),
        FieldPanel(sub_title),
        FieldPanel(published),
        FieldPanel(author),
        FieldPanel(summary),
        FieldPanel(body),
        FieldPanel(closing_content)
    ]

Wagtail automatically sets up some fields for you, like title, the slug of the page, start/end visibility times, and SEO/meta related fields so you just need to define the fields you want beyond those.

Here we’ve defined some additional structured information we want on a blog post. A possible sub_title and summary information, an author, the date the entry was published, and the usual body field. We’ve also added an additional closing_content field we might use for a ending call to action or other content that we want highlighted and shown below the post.

All you need to do is add this to a Django app’s models.py, run makemigrations and migrate and you’re good to go.

Now let’s make a slightly more complicated Staff Page:

DEPARTMENT_CHOICES = (
    (admin, Administration),
    (accounting, Accounting),
    (marketing, Marketing),
    (sales, Sales),
    (engineer, Engineering),
)

class StaffPage(Page):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    active = models.BooleanField(default=True)
    start_date = models.DateField()
    end_date = models.DateField(blank=True, null=True)
    department = models.CharField(max_length=50, choices=DEPARTMENT_CHOICES)
    email = models.EmailField()
    twitter = models.CharField(max_length=50)
    short_bio = models.RichTextField(blank=True)
    bio = models.RichTextField(blank=True)
    education = models.RichTextField(blank=True)
    work_history = models.RichTextField(blank=True)

    # Panel options left out for brevity

As you can see the StaffPage model has quite a bit more fields, most of them optional, which allows the Staff member to update their information over time and not get strangled into putting ‘Bio coming soon’ into an otherwise required field.

Pretty simple right? You’re probably thinking there is some catch, luckily you’re wrong. It’s really pretty much that simple. Easy things easy right?

Harder things in Wagtail

So what are the hard things in Wagtail? Well it’s mostly just getting familiar with the system in general. A few things that may trip you up are:

  • You can’t have a field named url on your derived models as Wagtail uses that field name in the parent Page model. Unfortunately if you do add one, which I’ve done more times than I care to admit, you get a not very useful error “ can’t set attribute ” and not much else to go on.
  • On many listing type pages it’s fine to simply show all of the items, with pagination, in some sort of chronological order. Other times users want to be able to manually curate what shows up on given pages. Wagtail makes this relatively easy as you can define a ForeignKey relationship using a through type model to other pages and use a PageChoosePanel to give the user a nice interface for doing this. The user can also manually order them right in the admin, no additional work necessary.
  • Limiting which pages can be created as children (aka beneath) a page can be handled by setting a list named parent_page_types on the child model. Then it can only be added below pages of those defined types. On complex sites with lots of different page types this helps keep the Page Choosing and Creation option choices to a manageable level for the users. And it also obviously helps to keep users from creating the wrong types of pages in the wrong parts of the site.
  • Wagtail currently doesn’t have a great story around building navigation menus for you, but there are a dozen reusable Django apps to help handle that. Often a site’s menu remains relatively static and isn’t edited day-to-day anyway.
  • Supporting multiple sites with the same CMS. This isn’t technically hard, but more conceptually difficult to grok. Wagtail supports having multiple sites, via it’s wagtailsites app. The way this works is you simply set the Root page for each hostname and it basically takes it from there. However, in most circumstances it’s probably easier and cleaner to just have two different instances of Wagtail and use different databases.

Images and Documents

Documents are files of any type you want to be able to upload into the system. This handles any sort of situations where a user needs to upload a PDF, Excel, or Word document and be able to link to it from any other bit of content.

Images are exactly what you think, however you can define your own base model for this if you choose and attach additional fields for things like copyright, license, attribution, or even something like EXIF data if you wanted.

With both Documents and Images having tagging support via django-taggit and a really nicely designed look and UX for them in the admin interface.

And yes, before you ask, it has built in support for flexible thumbnails in your templates andthe ability for you to mainually define the main focal point in the image to avoid cropping things weirdly.

Form Builder interface

Wagtail also has a nice form builder built into it that can easily suffice for your typical contact form scenarios or more complicated collection needs.

Much like Pages, you simply subclass from Wagtail and define what fields you want to collect. On your model you can also override the process_form_submission method to do more complex validation or in a more common case to email the interested parties that there is a new submission.

One great feature of the form builder that is also built in, is the viewing and downloading interface. Viewing the data that has come in is great, but you just know your users are going to want to pull it out and use it for some other purpose. Wagtail smartly anticipates this and allows the user to download the submitted data, by date range, as a CSV file anytime they want.

Snippets

Wagtail Snippets are reusable bits of content that aren’t full web pages. Often these are used for things like sidebar content, advertisements, or calls to action.

Unlike with Pages or Forms, you don’t subclass a model but instead define a model and simply register it as a snippet. You’re then free to give your users the option of attaching snippets of the types you want to other pages. Or if you just want to give them the ability to edit things like footer content for example, you can just manually include the snippet data and the Snippet admin UI is really just becomes their editing interface.

Best Feature? Content Streams

While being able to define your own Page types with their own fields goes a long way, it’s quite a stretch from truly free form content. New in version 1.0 is Wagtail’s killer feature the StreamField .

Users want free form content while developers, designers, and even ops want nicely structured data. StreamFields satisfies both camps.

When you define a StreamField on a Page you set what types of blocks are available to be added into that stream. A block can be something as simple as a CharField of text or as complicated as a Staff type record like above using structural block types .

The end user can then add say few headers of various types, some rich text content blocks and have it all interspersed with a few images and code blocks. Each of these block types you define can then be styled differently CSS and/or have their markup structured entirely differently if needed.

Prior to this feature being added to 1.0, I had to resort to complicated Page relationships that weren’t actually pages we intended to make visible on the site. We just subverted the Page Choosing features of Wagtail to give the users the flexibility they needed and keep it all in the same admin interface.

Here is what the admin UI looks like for StreamFields. Here we've defined a field named Body that has header, content, and code block types. Each of these lines are different blocks. The top and bottom being headers. As you can see you can simply click the plus icons to add new blocks in between others or use the arrows on the right to move block around. They are currently a bit hard to see due to a CSS bug I anticipate being fixed quickly.

Wagtail’s Future

I think Wagtail has a VERY bright future in general and especially inside the Django community. However, like any 1.0 product there are definitely some things I would like to see in future versions. The two main things I hope to see are:

  • A community collection of high quality and flexible common Page and Block types to make most sites more of a Lego exercise than a coding one.
  • The ability to more easily customize and control the Publish/Moderate/Save as Draft options that appear at the bottom of the screen while editing content. On many smaller sites or those with a flat workflow it should be trivial to make ‘Publish’ or ‘Submit for Moderation’ be the default action presented to the user.
]]>/>
REVSYS Roundup - March 2015http://www.revsys.com/tidbits/revsys-roundup-march-2015/March 2015 update on happenings around Revolution Systems including our new team member Stephen SpencerThu, 26 Mar 2015 15:48:00 +0000http://www.revsys.com/tidbits/revsys-roundup-march-2015/

Revolution Systems has been having a great time recently and wanted to highlight a few things that have been happening in our world.

TEDxLawrence 2015

We were happy to have partnered with TEDxLawrence to put on the first ever TEDx event in Lawrence. Frank helped organize and MC the event as a member of the steering committee. We also built the website using our new favorite Django based CMS Wagtail . The videos of the awesome collection of speakers we had will be up in a few weeks on TED.com we'll tweet about that when it happens.

Kansas Linux Fest

Frank also had the pleasure of giving the ending keynote on the first day of the conference. Frank talked on his experience working in devops environments, tips and tricks and advice in his talk Smells like Teen Systems: Advice for raising healthy happy systems and getting to DevOps Nirvana . Frank tried to make it entertaining so prepare yourself for lots of cheesy 90s music references.

It's great to have local Open Source event just a block down the street from our offices. We hope this event will grow much larger in the future and based on the quality of the program put on we have no doubt that it will.

PyCon 2015

We're of course sponsoring PyCon again this year and nearly all of the REVSYS team will be there. Be sure to swing by our booth in the expo hall to chat. Frank will be manning the booth most of the time and he will run out of new Reddit posts to keep him occupied if you don't.

Spectrum

We've been hard at work on our hopefully soon to be released desktop developer tool Spectrum . We had hoped to get a beta released to coincide with PyCon, but unfortunately we've been too busy helping our clients to make that happen.

Spectrum is a local desktop app that gives you powerful filtering for your applications logs. In the vein of hosted logging services such as Loggly or Splunk, or self-hosted solutions like Kibana, but specifically targeted at the needs of a developer working locally with features like:

  • Inject logs via a local syslog daemon, REST API, or simply tailing local files to support as many local development scenarios as possible
  • Send a firehose of logs to Spectrum and instantly show or hide log entries by things like facility, priority, or simple string matching. No longer will you have to adjust the verbosity of your logs and then recreate a scenario or error just to see the logs you need
  • Separate project level settings to avoid having to change your setup as you move between projects

We started building Spectrum because relaying a firehose of logs to any of the hosted or self-hosted services we tried simply had too much lag time between the generation of the logs and when they were visible to the developer to make it truly productive. We're building Spectrum using Atom Shell and ReactJS so it will be available for Windows, OSX, and Linux developers and stupid stupid fast.

Follow @devspectrum on Twitter or signup for the Spectrum mailing list for announcements and early beta discounts!

And our biggest news so far in 2015, we added another team member!

In January we were pleased to add our friend Stephen Spencer to the RevSys team. While most of the team are 80% dev and 20% ops in experience, Stephen is exactly the opposite which helps not only round out our skill sets in general but also drastically increases our capacity to take on more operations and devops projects without overloading Frank's already busy schedule.

Stephen has a ton of experience with tools we like such as SaltStack and in fact is the guy who got Frank his first *NIX job and one of his original Unix/Linux mentors. Give a shoutout to Stephen on Twitter to welcome him to our team!

]]>/>
Loading Django FileField and ImageFields from the file systemhttp://www.revsys.com/tidbits/loading-django-files-from-code/How to load files into Django models, not via a file upload, but from a script or Python shell.Wed, 03 Dec 2014 21:44:12 +0000http://www.revsys.com/tidbits/loading-django-files-from-code/

I don’t know about you, but I run into a situation every so often where I need to programmatically push a file from the local file system or remotely via a URL into a Django model. Using a Python shell, a one off script, or a Django management command.

This should be easy, but with a typical web app you do it infrequently. Mostly I find we do it when we’re converting someone’s site from something else to Django and need to associate an image with another Content item for example.

If you start Googling around for answers to how to do this you get the default docs for File Uploads , File objects , and Managing Files which are all great. However, they don’t exactly spell out how you’re supposed to do this.

My first inclination is to just assign the model field to a file, but this is missing the relative file path information Django needs.

I took the time to write this post mostly for my own benefit so I don’t have to figure this out yet again in the future, but hopefully you’ll get some use out of it as well.

Steps for loading Django ImageField with a local file

Here is a quick example showing you the moving pieces. The steps you need to perform are:

  1. Retrieve the file (if remote) and store it locally
  2. Open that file as a normal Python file object
  3. Covert that open file to a Django File
  4. Attach it to your model field

Example Model

Suppose we have a model like this:

from django.db import models

class Company(models.Model):
    name = models.CharField(max_length=100)
    logo = models.ImageField()

Let’s create an entry for RevSys, pulling the logo in with requests:

import requests
from django.core.files import File

from .models import Company

r = requests.get(http://media.revsys.com/img/revsys-logo.png)

with open(/tmp/revsys-logo.png, wb) as f:
    f.write(r.content)

reopen = open(/tmp/revsys-logo.png, rb)
django_file = File(reopen)

revsys = Company()
revsys.name = Revolution Systems
revsys.logo.save(revsys-logo.png, django_file, save=True)

The last line here is the important bit. save is given three arguments here:

  1. The relative path and filename for inside MEDIA_ROOT
  2. The open file using Django’s File object
  3. Whether or not we want to save the revsys Company instance after the image is saved.

Now if you’re doing this for a bunch of images, you’ll want to parse your URLs or filenames to build the paths for you. Something like this would be a good starting point, note this is using Python 3:

import os
from urllib.parse import urlparse
# from urlparse import urlparse for Python 2.7

url = http://media.revsys.com/img/revsys-logo.png
filename = os.path.basename(urlparse(url).path)
# This returns ‘revsys-logo.png’ from the URL


revsys.logo.save(filename, django_file, save=True)

Hope this helps!

]]>/>
Recommended Django Project Layouthttp://www.revsys.com/tidbits/recommended-django-project-layout/Our recommendation for the optimal Django project layout for your files, settings, and templates.Fri, 21 Nov 2014 22:11:53 +0000http://www.revsys.com/tidbits/recommended-django-project-layout/

What’s the optimal layout for your Django applications, settings files, and various other associated directories?

When Django 1.4 was released it included an updated project layout which went a long way to improving the default Django project’s layout, but here are some tips for making it even better.

This is a question we get asked all of the time so I wanted to take a bit of time and write down exactly how we feel about this subject so we can easily refer clients to this document. Note that this was written using Django version 1.7.1, but can be applied to any Django version after 1.4 easily.

Why this layout is better

The project layout we’re recommending here has several advantages namely:

  • Allows you to pick up, repackage, and reuse individual Django applications for use in other projects. Often it isn’t clear as you are building an app whether or not it is even a candidate for reuse. Building it this way from the start makes it much easier if that time comes.
  • Encourages designing applications for reuse
  • Environment specific settings. No more if DEBUG==True nonsense in a single monolithic settings file. This allows to easily see which settings are shared and what is overridden on a per environment basis.
  • Environment specific PIP requirements
  • Project level templates and static files that can, if necessary, override app level defaults.
  • Small more specific test files which are easier to read and understand.

Assuming you have two apps blog and users and 2 environments dev and prod your project layout should be structured like this:

myproject/
    manage.py
    myproject/
        __init__.py
        urls.py
        wsgi.py
        settings/
            __init__.py
            base.py
            dev.py
            prod.py
    blog/
        __init__.py
        models.py
        managers.py
        views.py
        urls.py
        templates/
            blog/
                base.html
                list.html
                detail.html
        static/
           …
        tests/
            __init__.py
            test_models.py
            test_managers.py
            test_views.py
    users/
        __init__.py
        models.py
        views.py
        urls.py
        templates/
            users/
                base.html
                list.html
                detail.html
        static/
            …
        tests/
            __init__.py
            test_models.py
            test_views.py
     static/
         css/
             …
         js/
             …
     templates/
         base.html
         index.html
     requirements/
         base.txt
         dev.txt
         test.txt
         prod.txt

The rest of this article explains how to move a project to this layout and why this layout is better.

Current Default Layout

We’re going to call our example project foo , yes I realize it’s a very creative name. We’re assuming here that we’re going to be launching foo.com but while we like to have our project names reflect the ultimate domain(s) the project will live on this isn’t by any means required.

If you kick off your project using django-admin.py startproject foo you get a directory structure like this:

    foo/
        manage.py
        foo/
           __init__.py
           settings.py
           urls.py
           wsgi.py

This layout is a great starting place, we have a top level directory foo which contains our manage.py the project directory foo/foo/ inside it. This is the directory you would check into your source control system such as git.

You should think of this foo/foo/ subdirectory as being the project where everything else is either a Django application or ancillary files related to the project.

Fixing Settings

We’re on a mission to fix your bad settings files here. We show this layout to new clients and I’m constantly surprised how few people know this is even possible to do. I blame the fact that while everyone knows that settings are just Python code, they don’t think about them as Python code.

So let’s fix up our settings. For our foo project we’re going to have 4 environments: dev, stage, jenkins, and production. So let’s give each it’s own file. The process to do this is:

  1. In foo/foo/ make a settings directory and create an empty __init__.py file inside it.
  2. Move foo/foo/settings.py into foo/foo/settings/base.py
  3. Create the individual dev.py , stage.py , jenkins.py , and prod.py files in foo/foo/settings/ . Each of these 4 environment specific files should simply contain the following:
    from base import *
    

So why is this important? Well for local development you want DEBUG=True , but it’s pretty easy to accidentally push out production code with it on, so just open up foo/foo/settings/prod.py and after the initial import from base just add DEBUG=False . Now if your production site is safe from that silly mistake.

What else can you customize? Well it should be pretty obvious you’ll likely have staging, jenkins, and production all pointing at different databases, likely even on different hosts. So adjust those settings in each environment file.

Using these settings

Using these settings is easy, no matter which method you typically use. To use the OS’s environment you just do:

export DJANGO_SETTINGS_MODULE=foo.settings.jenkins

And boom, you’re now using the jenkins configuration.

Or maybe you prefer to pass them in as a commandline option like this:

./manage.py migrate —settings=foo.settings.production

Same if you’re using gunicorn:

gunicorn -w 4 -b 127.0.0.1:8001 —settings=foo.settings.dev

What else should be customized about settings?

Another useful tip with Django settings is to change several of the default settings collections from being tuples to being lists. For example INSTALLED_APPS , by changing it from:

INSTALLED_APPS = (
   …
)

to:

INSTALLED_APPS = [
    …
]

In foo/settings/base.py we can now more easily add and remove apps based on each environment specific settings file. For example, maybe you only want django-debug-toolbar installed in dev, but not your other environments.

This trick is also often useful for the TEMPLATE_DIRS and MIDDLEWARE_CLASSES settings.

Another useful trick we often use is to break up your apps into two lists, one your prerequisites and another for your actual project applications. So like this:

PREREQ_APPS = [
   django.contrib.auth,
   django.contrib.contenttypes,
   
   debug_toolbar,
   imagekit,
   haystack,
]

PROJECT_APPS = [
   homepage,
   users,
   blog,
]

INSTALLED_APPS = PREREQ_APPS + PROJECT_APPS

Why is this useful? For one it helps better distinguish between Django core apps, third party apps, and your own internal project specific applications. However, PROJECT_APPS often comes in handy as a list of your specific apps for things like testing and code coverage. You have a list of your apps, so you can easily and automagically make sure their tests are run and coverage is recorded just for them, not including any third party apps, without having to maintain the list in two separate places.

Fixing requirements

Most projects have a single requirements.txt file that is installed like this:

pip install -r requirements.txt

This is sufficient for small simple projects, but a little known feature of requirements files is that you can use the -r flag to include other files. So we can have a base.txt of all the common requirements and then if we need to be able to run tests have a specific requirements/test.txt that looks like this:

-r base.txt
pytest==2.5.2
coverage==3.7.1

I’ll admit this is not a HUGE benefit, but it does help separate out what is a requirement in which environment. And for the truly performance conscience it reduces your pip install time in production a touch by not installing a bunch of things that won’t actually be used in production.

Test Files

Why did we separate out the tests files so much? One main reason, if you’re writing enough tests a single tests.py file per application will end up being one huge honking file. This is bad for readability, but also just for the simple fact you have to spend time scrolling around a lot in your editor.

You’ll also end up with less merge conflicts when working with other developers which is a nice side benefit. Small files are your friends.

URLs

For small projects it’s tempting to put all of your url definitions in foo/urls.py to keep them all in one place. However, if your goal is clarity and reusability you want to define your urls in each app and include them into your main project. So instead of:

urlpatterns = patterns(‘’,
    url(r’^$’, HomePageView.as_view(), name=‘home’),
    url(r’^blog/$’, BlogList.as_view(), name=‘blog_list’),
    url(r’^blog/(?P<pk>\d+)/$’, BlogDetail.as_view(), name=‘blog_detail’),
    …
    url(r’^user/list/$’, UserList.as_view(), name=‘user_list’),
    url(r’^user/(?P<username>\w+)/$’, UserDetail.as_view(), name=‘user_detail’),
)

you should do this:

urlpatterns = patterns(‘’,
    url(r’^$’, HomePageView.as_view(), name=‘home’),
    url(r’^blog/‘, include(‘blog.urls’)),
    url(r’^user/‘, include(‘user.urls’)),
)

Templates and static media

Having per app templates/ and static/ directories gives us the ability to reuse an application basically as is in another project.

We get the default templates the app provides and any associated static media like special Javascript for that one cool feature all in one package.

However, it also gives us the ability to override those templates on a per project basis in the main foo/templates/ directory. By adding a templates/blog/detail.html template we override, or mask, the default blog/templates/blog/detail.html template.

Reusing a Django application

So assuming you’ve been using this layout for awhile, one day you’ll realize that your new project needs a blog and the one from your foo project would be perfect for it. So you copy and paste the files in… pssst WRONG! . Now you have two copies of the application out there. Bug fixes or new features in one have to manually be moved between the projects, and that assumes you even remember to do that.

Instead, make a new repo for your blog and put the foo/blog/ directory in it. And adjust both your existing foo project and your new project to pip install it.

They can still both track different versions of the app, if necessary, or keep up to date and get all of your bug fixes and new features as they develop. You still can override the templates and static media as you need to on a per project basis, so there really isn’t any real issues doing this.

Additional Resources

Our friends Danny and Audrey over at CartWheel Web reminded us about Cookie Cutter and specifically Danny's cookiecutter-django as useful tools for making your initial project creations easier and repeatable.

Also, if you're looking for all around great Django tips and best practices, you can't go wrong with their book Two Scoops of Django: Best Practices For Django 1.6 which we recommend to all of our clients.

Feedback

We hope you find this improved project layout useful. If you find any bugs, have a suggestion, or just want to chat feel free to reach out to us. Thanks for reading!

]]>/>
Upgrade salt-master and minions on Ubuntu servershttp://www.revsys.com/tidbits/upgrade-salt-master-and-minions-ubuntu-servers/Quickly, safely and easily upgrade SaltStack entirely from your Salt masterSun, 16 Nov 2014 17:58:49 +0000http://www.revsys.com/tidbits/upgrade-salt-master-and-minions-ubuntu-servers/

Step by step guide to safely upgrading your SaltStack master and minions

We’ve noticed a bit of confusion and understandable trepidation around upgrading SaltStack with some of our customers. Salt is pretty darn important to your stack, so keeping it working is essential. To help, we’ve written up this HOWTO guide for easily and safely upgrading your master and minions.

We mostly work with Ubuntu and Debian based systems, so while the steps to upgrade are the same on other Linux systems the exact commands will differ.

This guide also assumes you’ve got the Salt Stack PPA configured in your apt sources.

Step 1 - Update your apt repositories

First you need to make sure your apt repositories are up to date, so you get the latest stable versions. Easiest way to do this is via salt itself:

sudo salt ‘*’ cmd.run “aptitude update”

Step 2 - Upgrade your master

Upgrading the master first ensures you don’t run into any version compatibility issues between your master and minions. So ssh into your master and run:

sudo apt-get upgrade salt-master

Step 3 - Upgrade your minions

Before we attempt to upgrade, let’s take a quick look at the existing versions we have running. This might surprise you, I definitely found a couple of cloud instances that were running older version of salt-minion that I somehow had not upgraded in the past. So get a list of what version of salt your minions are running issue this salt command:

sudo salt ‘*’ test.version

And you’ll get a nice display of every version currently in use. Another useful option here is the command 'manage.versions' which shows you a list of up to date minions vs those that need updating. Here is how you run it:

salt-run manage.versions

Now that we know our base line of versions, let’s upgrade them all:

sudo salt '*' pkg.install salt-minion refresh=True

This is the correct way to do it with Salt, but if you run into any trouble you can always issue the usual Ubuntu commands directly like this:

sudo salt ‘*’ cmd.run “apt-get -y install salt-minion”

Step 4 - Verify everything worked

Everything should be upgraded now and running the latest version of salt-minion. You can verify this by running the test.version command again:

sudo salt ‘*’ test.version

If you see some minions aren’t using the latest version you may need to manually intervene to see what is stopping apt from upgrading things for you.

Hope this help make keeping your SaltStack ecosystem in sync and upgraded to the latest stable versions!

]]>/>
Ultimate Front End Development Setuphttp://www.revsys.com/tidbits/ultimate-front-end-development-setup/A quick howto guide on setting up modern front-end development tools to improve your day to day work flow using Gulp and LivereloadTue, 21 Oct 2014 15:09:39 +0000http://www.revsys.com/tidbits/ultimate-front-end-development-setup/

A quick howto guide on setting up modern front-end development tools to improve your day to day work flow.

You'll learn how to:

  • Setup gulp to orchestrate everything
  • Install gulp plugins to combine, minify and otherwise transform your files
  • Use bower a package manager for "web things". It helps you install, track and manage CSS frameworks, JS libraries and other tools
  • Add LiveReload so your browser window(s) auto-refresh when you make changes to anything
  • Integrate all of this with a Django project with django-compressor and django-bower

I've always struggled working on frontend tasks. I know CSS well enough, but it can be a pain to get it just right. I'm more comfortable with Javascript. However, I tend to end up with spaghetti code all over the place and a dozen randomly included files and things quickly become insane.

So I've spent some time over the last several weeks trying to up my frontend development game. As it turns out, many of the pain points I experienced are now solved really well.

What is gulp?

Gulp is a node module that runs and executes various tasks based on rules you've setup. It has its own plugin ecosystem for doing common tasks. To get started today we're going to be using:

What does bower do for us?

Bower is a package manager for web things. With a couple of easy commands you can install things like jQuery, Zurb Foundation or Bootstrap into your project. It also helps track and install the dependencies. It's CPAN, PyPI, gems or npm for "the web" rather than a single language.

Initial Setup

We need to install a few things before we can continue.

First, let's install gulp and bower system wide as once you start using these tools you're going to want to use them in all your web projects:

npm install --global gulp bower

Now move into your project directory, this is so npm and bower will install everything else to your project and not system wide. After you're there, install the gulp plugins we will be using:

npm install gulp-watch gulp-sass gulp-minify-css gulp-rename gulp-gzip gulp-livereload

Setup Zurb Foundation

Before we get too far into setting up Gulp we need something for it to do. So let's install Zurb Foundation 5 and set things up to customize it with SASS:

bower install foundation

Now let's create a directory named scss and create two files inside of it. One named _main_settings.scss is going to be where we override any of Foundation's default settings. For example purposes we just want to change something here, so let's go ahead and make Foundation's grid rows be 100% wide. To do that put this inside the file:

$row-width: 100%;

And we also need a "main" SASS file to tie everything together so inside scss/main.scss place the following SASS code:

@charset 'UTF-8';
@import "main_settings";

@import "../bower_components/foundation/scss/foundation/components/grid";
@import "../bower_components/foundation/scss/foundation/components/accordion";
@import "../bower_components/foundation/scss/foundation/components/alert-boxes";
@import "../bower_components/foundation/scss/foundation/components/block-grid";
@import "../bower_components/foundation/scss/foundation/components/breadcrumbs";
@import "../bower_components/foundation/scss/foundation/components/button-groups";
@import "../bower_components/foundation/scss/foundation/components/buttons";
@import "../bower_components/foundation/scss/foundation/components/clearing";
@import "../bower_components/foundation/scss/foundation/components/dropdown";
@import "../bower_components/foundation/scss/foundation/components/dropdown-buttons";
@import "../bower_components/foundation/scss/foundation/components/flex-video";
@import "../bower_components/foundation/scss/foundation/components/forms";
@import "../bower_components/foundation/scss/foundation/components/icon-bar";
@import "../bower_components/foundation/scss/foundation/components/inline-lists";
@import "../bower_components/foundation/scss/foundation/components/joyride";
@import "../bower_components/foundation/scss/foundation/components/keystrokes";
@import "../bower_components/foundation/scss/foundation/components/labels";
@import "../bower_components/foundation/scss/foundation/components/magellan";
@import "../bower_components/foundation/scss/foundation/components/orbit";
@import "../bower_components/foundation/scss/foundation/components/pagination";
@import "../bower_components/foundation/scss/foundation/components/panels";
@import "../bower_components/foundation/scss/foundation/components/pricing-tables";
@import "../bower_components/foundation/scss/foundation/components/progress-bars";
@import "../bower_components/foundation/scss/foundation/components/range-slider";
@import "../bower_components/foundation/scss/foundation/components/reveal";
@import "../bower_components/foundation/scss/foundation/components/side-nav";
@import "../bower_components/foundation/scss/foundation/components/split-buttons";
@import "../bower_components/foundation/scss/foundation/components/sub-nav";
@import "../bower_components/foundation/scss/foundation/components/switches";
@import "../bower_components/foundation/scss/foundation/components/tables";
@import "../bower_components/foundation/scss/foundation/components/tabs";
@import "../bower_components/foundation/scss/foundation/components/thumbs";
@import "../bower_components/foundation/scss/foundation/components/tooltips";
@import "../bower_components/foundation/scss/foundation/components/top-bar";
@import "../bower_components/foundation/scss/foundation/components/type";
@import "../bower_components/foundation/scss/foundation/components/offcanvas";
@import "../bower_components/foundation/scss/foundation/components/visibility";

This just instructs SASS to use our settings files and load all of the various components of Foundation. In a real project you won't likely be using all of these features so you can comment them out and drastically reduce the size of the CSS you send to the browser.

Setup our gulpfile.js

You control Gulp by writing out a gulpfile.js. Put this in the root of your project, I'll explain what is going on below:

var gulp = require('gulp');
var sass = require('gulp-sass');
var watch = require('gulp-watch');
var minifycss = require('gulp-minify-css');
var rename = require('gulp-rename');
var gzip = require('gulp-gzip');
var livereload = require('gulp-livereload');

var gzip_options = {
    threshold: '1kb',
    gzipOptions: {
        level: 9
    }
};

/* Compile Our Sass */
gulp.task('sass', function() {
    return gulp.src('scss/*.scss')
        .pipe(sass())
        .pipe(gulp.dest('static/stylesheets'))
        .pipe(rename({suffix: '.min'}))
        .pipe(minifycss())
        .pipe(gulp.dest('static/stylesheets'))
        .pipe(gzip(gzip_options))
        .pipe(gulp.dest('static/stylesheets'))
        .pipe(livereload());
});

/* Watch Files For Changes */
gulp.task('watch', function() {
    livereload.listen();
    gulp.watch('scss/*.scss', ['sass']);

    /* Trigger a live reload on any Django template changes */
    gulp.watch('**/templates/*').on('change', livereload.changed);

});

gulp.task('default', ['sass', 'watch']);

The first few lines simple include all of the various plugins we will be using. We then set some common configuration information for the gzip plugin. Here we're saying we don't want to bother compressing anything smaller than 1kb and to use gzip's maximum compression. Gulp is incredibly fast, most everything happens in milliseconds so there isn't any reason to skimp here.

We then define two tasks, sass and watch . The sass command is used by watch command, so let's look at that first.

The watch looks for any file that matches scss/*.scss it will run that file through the sass command. We also we setup a livereload server which can be used with the LiveReload desktop app or in my case the Chrome extension from their browser extensions page.

Finally we have another rule that looks for any files that look like Django templates and trigger a live reload. Why is that awesome? Well you can open up multiple browsers, say one desktop and one mobile sized, and as you make edits to your Django templates and/or CSS your browser windows will automatically refresh. Saving you from having to lift your hands off the keyboard, move your mouse all the way over to the windows and refreshing each manually. This feature alone saves me time and likely a bit of carpal tunnel.

Our sass command is a bit more complicated. Gulp processes files as pipes, so it doesn't have to write any intermediary files to disk which is part of how it achieves such great speed. If you read the source of that task from top down here is what is going on:

  • Look for any files matching 'scss/*.scss'
  • Compile them with libsass
  • Write the results to this point to 'static/stylesheets'. In our case, with our original being main.scss it will write 'static/stylesheets/main.css'
  • Now rename the stream to append '.min' to the end of the filename and before the extension
  • Minify the CSS at this point
  • Write the results again into 'static/stylesheets' this time creating 'static/stylesheets/main.min.css'
  • Compress the stream using the gzip options we setup up top
  • Yet again, write the results to 'static/stylesheets' this time creating 'static/stylesheets/main.min.css.gz'
  • Trigger a live reload

Our final task is the 'default' task. With Gulp you can call tasks individually on the command line, but anything you define in your 'default' task will be run continuously in the foreground after just running gulp without any arguments.

Django Integration

Up to this point everything we've configured is backend agnostic. You could use everything we've setup with a totally static site, Rails, or whatever your preferred backend stack is. Here at RevSys we primarily use Django, so being able to integrate this with Django is pretty important otherwise it wouldn't be useful to us.

Luckily Django is totally agnostic about your frontend setup. What we have configured already would work just fine with a Django site. We're building our CSS and JS as combined files and all one would need to do is include them using a standard Django {% static %} template tag. However, I'm a big performance nerd and one of my favorite features of django-compressor is it's default configuration that creates hashed file names based on underlying CSS and JS it is compressing.

When you're using django-compressor you control what is combined and compressed using template tags like this:

{% load compress %}
<head>
    {% compress css %}
    <link rel="stylesheet" href="{% static "stylesheets/main.css" %}" />
    <link rel="stylesheet" href="{% static "stylesheets/some-other.css" %}" />
    {% endcompress %}
</head>

With this configuration compressor will combine, minify, etc., based on the rules you have setup in your Django settings, into a single file like:

<link rel="stylesheet" href="/static/CACHE/css/ed3523606236.css" type="text/css" />

Note that because gulp is combining and minifing our CSS for us, all we really want django-compressor to do for us is create this hashed file name. So to setup django-compressor all we need to do is 'pip install django-compressor', add 'compressor' to our INSTALLED_APPS and add 'compressor.finders.CompressorFinder' to the STATICFILES_FINDERS setting. By default compressor doesn't compress when DEBUG=True, so I also suggest adding COMPRESS_ENABLED=True to your local dev settings to help ensure your local environment mimics production as much as possible.

We are however also using bower, so to make Django's static files system be able to find the bower components we need to 'pip install django-bower', add 'djangobower' to INSTALLED_APPS and add 'djangobower.finders.BowerFinder' to STATICFILES_FINDERS. So ultimately we end up with this in settings:

INSTALLED_APPS = [
    ... your other django apps ...
    'django.contrib.staticfiles',
    'djangobower',
    'compressor',
]

COMPRESS_ENABLED = True

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'djangobower.finders.BowerFinder',
    'compressor.finders.CompressorFinder',
)

We can then easily setup far future Expires headers for anything living in /static/CACHE/ to a date way into the future without any worries we might serve up old CSS to our users. A typical nginx configuration, setting the expire header for 90 days into the future, looks like this:

location /static/CACHE {
    alias /home/site/static/CACHE;
    expires 90d;
}

Why not just use django-compressor for all of this?

One of the draw backs of django-compressor is when you have several transformations going on it can get a bit slow. All of the SASS compilation, concatination, minification and whatever else you have configured it can take a half second to a second or two for pages to be rendered after a CSS or JS change when working locally with Django's runserver.

This doesn't sound like much or anything to be concerned about, but having worked with this new setup for a few days the perceived daily performance difference is pretty impressive. Add in the immediately useful LiveReload features that are easy to achieve with gulp and I can attest this setup helps you stay in a useful flow state more easily.

Switching over to this setup might take a half day of time at most, but I would wager you will gain that back in spades in less than 2 months of using it. Happy hacking!

]]>/>
Python Dev Tip: DRY your shell with PYTHONSTARTUPhttp://www.revsys.com/tidbits/python-tip-shell-pythonstartup/Simple way to use virtaualenvwrapper&#x27;s postactivate and deactivate hooks to execute arbitrary Python code for any of your Python shells while working on that project.Sat, 12 Jul 2014 20:08:55 +0000http://www.revsys.com/tidbits/python-tip-shell-pythonstartup/

Do you find yourself re-doing the same Python shell stuff?

I know I do. I'm constantly doing it and replaying some of my IPython history isn't cutting it for me anymore. IPython has profiles but they're not terribly easy to get working with Django and your typical manage.py shell command.

Site wide is easy, but what about per project?

You can load things and execute code for ALL your Python shells system wide pretty easily, but what about when you need different things loaded depending on the project?

Turns out it's pretty easy to do with an environment variable and virtualenvwrapper . If you aren't already using virtualenvwrapper you should really check it out. I use it even outside of Python projects just to have an easy mechanism for different "shell environments" to have certain things executed when I start and finish working on a project.

The situation I was in...

While the issue that lead me to figure this out was a Django project, nothing about this technique is Django specific. I have a pretty common workflow when doing green field Django development where I setup Fabric tasks to rebuild my local dev database and repopulate it with certain database objects I need to make the site functional enough for testing. For example, I typically setup a User object with username 'frank' (I know big surprise there) that is a superuser. So if I launch a shell 9 out of 10 times I probably need my frank User object for something.

This is complicated however by the fact that on some projects I work on the User model is customized to not have a username so in some projects I need to run:

frank = User.objects.get(username='frank')

And in others I need:

frank = User.objects.get(email='frank@revsys.com')

What I realized was Python honors and executes the code in any file you specify in the PYTHONSTARTUP environment variable in your shell. So if you adjust your virtualenv's postactivate and postdeactivate hooks to set the path to code you want run on a per project basis. For example:

In ~/.virtualenvs/projectX/bin/postactivate

  #!/bin/bash
  export PYTHONSTARTUP=/Users/frank/work/src/projectX/startup.py
  export DJANGO_SETTINGS_MODULE="projectX.settings.dev"
  cd /Users/frank/work/src/projectX

In ~/.virtualenvs/projectX/bin/postdeactivate

  #!/bin/bash
  unset $DJANGO_SETTINGS_MODULE
  unset $PYTHONSTARTUP

So what can you do with this?

Well you can execute any arbitrary Python code so you could could do a lot, even go so far as to hit your time tracking software's API to log the time, but realize anything you run will slow down the startup time of each Python shell you launch so keep it to what you really need. On this particular project I kept it to a few small things:

  # Import datetime and some of my commonly used Models 
  import datetime
  from django.contrib.auth import get_user_model
  from app1.models import ThatOneModel
  from app2.models import TheOtherCommonModel

  try:
      User = get_user_model()
      frank = User.objects.get(username='frank')
  except User.DoesNotExist:
      # Frank not being in the DB shouldn't raise an error
      pass

Hope this helps you stay in flow more and delay your carpal tunnel a few days at least!

]]>/>
Django Debugging Bookmarklet Trickhttp://www.revsys.com/tidbits/django-debugging-bookmarklet-trick/A little bookmarklet that redirects you from the current page to the same path on http://localhost:8000/. Helpful when debugging Django.Wed, 19 Feb 2014 14:28:41 +0000http://www.revsys.com/tidbits/django-debugging-bookmarklet-trick/

So this is one of those little things where you think, why the hell didn't I think of this years ago?

I'm sure many Django developers end up doing this on a fairly regular basis. You're working on a site, go to the production/staging version, navigate around and then want to view the same URL path but on your local development server.

If you're like me you cut-n-paste the current URL into a new tab and manually remove the host and replace it with http://localhost:8000 . That way you have a browser tab of the real site and a tab with the local version for comparision. I've done this at least a billion times. It's a silly little annoyance because it's easy to mess up the URL leaving off the slash after 8000 for example.

This morning while working on a site I realized, WTF I can do this with a bookmarklet! They're super easy as it turns out.

Version that replaces the current tab:

javascript:(function() { window.location.replace("http://localhost:8000" + window.location.pathname); }())

Version that creates a new tab:

javascript:(function() {  window.open("http://localhost:8000" + window.location.pathname, '_blank'); }())

I still can't quite believe I didn't think to do this long ago! I wonder how many hours of time I would have saved over the years? I've included two links below that you can create your own bookmarks from, you'll obviously need to edit them if you don't use the Django defaults of localhost:8000 .

Simply drag which ever version you prefer into your bookmarks and you're ready to go.

Make localhost:8000

Make localhost:8000 tab

You could easily modify these to create bookmarklets that take you to your staging or QA system and even when working with development frameworks other than Django.

Hope this saves you as much time as it is going to save me!

UPDATE: Jacob sent me a great additional add-on tip to this. Michael Mahemoff has a good tip on how to change the Favicon in your browser between different environments like staging and production. So you can combine these two techniques for maximal win!

]]>/>
Setting up Mailgun with SaltStackhttp://www.revsys.com/tidbits/configure-mailgun-saltstack/How to easily configure email relaying for Ubuntu servers with Mailgun, SaltStack, and PostfixSat, 15 Feb 2014 17:37:13 +0000http://www.revsys.com/tidbits/configure-mailgun-saltstack/

Here at RevSys we've been moving away from Chef and using SaltStack for all our new server setups. Salt is still pretty new and while I run into bugs every now and again, they're releasing new versions with fixes at an impressive rate.

Today I want to show you how to setup a new server with Postfix and Mailgun in a reusable way. This tutorial assumes that you already have a working Salt setup.

The steps are pretty easy, we need to:

  1. Install postfix and sasl
  2. Configure postfix to use our Mailgun account for it's SMTP relay server

We'll make this salt state easily reusable by having the Mailgun credentials and optional email alias configurable with Salt Pillar data.

Setup Pillar Data

So first we'll setup the Pillar data. For the purposes of the tutorial we'll assume we are setting up the server 'example.revsys.com'.

We'll add:

mailgun_username: example@example.mailgun.org mailgun_password: MySillySecretPassword mailgun_alias_file: example/templates/example_mail_aliases `

To the pillar data. This could be a separate per server pillar file that /srv/pillar/example.sls or a more reusable /srv/pillar/mailgun.sls depending on how you want to do it.

Setup Salt State File

Then we create a mailgun state in /srv/salt/mailgun/init.sls:

postfix:
    pkg:
        - installed
        - names:
            - postfix
            - libsasl2-modules
    service:
        - running
        - require:
            - pkg: postfix
        - watch:
            - file: /etc/postfix/main.cf

/etc/postfix/main.cf:
    file.managed:
        - source: salt://mailgun/templates/main.cf
        - user: root
        - group: root
        - mode: 644
        - require:
            - pkg: postfix

/etc/postfix/sasl_passwd:
    file.managed:
        - source: salt://mailgun/templates/sasl_passwd
        - user: root
        - group: root
        - mode: 600
        - template: jinja
        - context:
            username: {{ pillar['mailgun_username'] }}
            password: {{ pillar['mailgun_password'] }}
        - require:
            - pkg: postfix
    cmd.wait:
        - name: postmap /etc/postfix/sasl_passwd
        - user: root
        - watch:
            - file: /etc/postfix/sasl_passwd
        - watch_in:
            - service: postfix
{% if pillar.get('mailgun_alias_file', False) %}
/etc/aliases:
    file.managed:
        - source: salt://{{ pillar['mailgun_alias_file'] }}
        - user: root
        - group: root
        - mode: 600
        - template: jinja
        - require:
            - pkg: postfix
    cmd.wait:
        - name: newaliases
        - user: root
        - watch:
            - file: /etc/aliases
{% endif %}

How this works

What it does is pretty simple. We install postfix and sasl. Then push out a main.cf template that adds the configuration necessary to have postfix use mailgun as an SMTP relay. Those configuration options are well documented by Rackspace in this post on Configuring Postfix for Mailgun and Sendgrid . You just need to add this to bottom of the default main.cf:

smtp_sasl_auth_enable = yes
relayhost = smtp.mailgun.org
smtp_sasl_security_options = noanonymous
smtp_sasl_password_maps=hash:/etc/postfix/sasl_passwd

Our Salt State then creates the file /etc/postfix/sasl_password using a template that looks like:

smtp.mailgun.org {{ username }}:{{ password }}

Which is built using the Pillar data we created above and can customize on a per host/role basis if necessary. And then optionally we can define an aliases file which will replace /etc/aliases and run the standard newaliases command. We're using cmd.wait here to only rebuild the sasl_passwd map and optional aliases if those files have changed. The watch_in command then restarts postfix if we add or change the sasl_password.

Targeting

To actually deploy this again a server then all you need to do is adjust your /srv/salt/top.sls to include the state like so:

base:
     'example.revsys.com':
        - mailgun

Obviously along with whatever other targeting configuration you have setup already.

Hope this helps you get SMTP relaying with Mailgun up and running a bit faster!

]]>/>
$90 million is that the best we can do?http://www.revsys.com/tidbits/90-million-best-we-can-do/Government contracts are broken and make it less likely smart tech companies will even participate.Tue, 14 Jan 2014 21:12:39 +0000http://www.revsys.com/tidbits/90-million-best-we-can-do/

Yesterday one of our team posted a link in our work chat channel to a post by Philip Greenspun on how Accenture recently landed a $90 million dollar per year contract to take care of healthcare.gov , joking that we should have bid on it.

I don't think anyone can dispute that the original contractors bungled the job severely. [1] However, before I rant on why this contract disgusts me let's get a few things out of the way:

  • We all know that working inside a big bureaucracy slows everything down and basically sucks. Design by committee. A dozen or so project managers. Pre-meetings to get ready for the real meeting, etc.
  • While I've only had a few small dealings with HIPAA I'm sure it was a big pain, for better or worse.[2]
  • The process required coordinating data, feeds, and integrations with several different government and private health insurance systems to one degree or another. Legacy systems, speaking legacy protocols, on legacy hardware. Not an easy task, but as someone who has presided over dozens and dozens of weird systems integrations not exactly Nobel Prize worthy stuff.
  • They had to deal with ancient IE browsers and a bunch of other crap those of us working with more modern tech don't bother with any longer. Grandma isn't likely running the latest Chrome.
  • It was a project with a high public profile and was assured to get traffic that by comparison makes most large startup launches look like my personal blog. And no matter how awesome it might have gone, it was going to get some level of complaints.

Some cost comparisons

All that aside, I'm confident $90 million is easily more than double what this should cost. Let's look at some quick comparisons:

We sent two robots to Mars for $820 million dollars. When they performed better than expected we extended their missions over 5 years for about $100 million. Apparently it costs roughly twice as much to travel to and explore Mars with freaking robots controlled from earth per year as it does to signup a few million people on a website.

While on the topic of Space and not to sound like an Elon Musk fan boy, but a SpaceX Falcon 9 rocket only costs $57 million.

So let's break it down a bit in actual costs for something like this.

  • Let's assume they need a lot more hardware than I think, let's call it $10 million a year in hardware costs[3].
  • Toss in another $10 million a year to cover bureaucracy, lawyers, lobbyists, etc.
  • Add in another $10 million a year in consultants and anything else I'm forgetting for good measure.
  • Average US programmer salary in 2011 was $72k, so let's call it $250k/engineer/year to round up because fuck it why not . So another $40 million of the budget would get you 160 extremely highly paid engineers[4].

And we're still at only $70 million with these inflated estimates.

So what's wrong?

When I read Greenspun's post it reminded me of a conversation I had a few years ago with a colleague. He's tangentially involved in state level healthcare policy and mentioned there were a few contracts out for bid that he thought RevSys might be interested in. The exact details escape me, but one involved a few web forms for doctors and hospitals in the state to check on the enrollment of patients in some Medicare program. It would have involved integrating a few legacy state systems, but otherwise pretty straight forward. He asked how much something like this would cost and I ball parked it around $200-300k assuming the integrations would be painful and that the project would eat up tons of time in meetings. His response floored me, he laughed and said "Oh you'd need to bid at least a few million or they won't take you seriously."

The whole process of bidding and then winning/running a project with the government is a feat unto itself. I've been loosely involved with a few proposals, usually as the "tech expert" and not the party actually putting together the full proposal materials, and I can tell you it eats up an inordinate amount of time and energy just to bid on the project.

The process almost assures that the company best at the process wins rather than the one that might deliver the best or most cost effective technological solution. Luckily I've never been a part of a winning proposal as I imagine the day-to-day to be equally grueling and professionally unfulfilling.

When the government announced they were bringing in "tech's best and brightest" I rolled my eyes, assuming they would fail miserably at getting decent talent. However they brought in Red Hat, which was a pleasant surprise. But as many others have commented on, it was too little and way too late.

Most projects like this aren't sexy. They certainly aren't pushing the envelope tech challenges. So perhaps they can't attract even the Top 10% of tech talent either because the problem is boring or they're too busy trying to strike it rich with a startup. But what about the next 10-20% below them on the tech totem pole?

Is $90 million the best we can do? Should I be happy it wasn't $120 million? Does it really all boil down to boring problems and the burden of the process? We all know it's broken. The question is can it be fixed and how do we go about it?

Until then I'm sure our company and many companies like us will, as Jeff Hammerbacher was quoted as saying continue "thinking about how to make people click ads" [5].

If you have ideas on how to fix this mess you can find me on Twitter @fwiles or via email frank@revsys.com .

Footnotes

[1] Besides the widely reported outages and bugs the site has experienced it STILL gets a low C rating on simple performance tools such as YSlow. Doesn't take a lot of imagination to think how badly the internals were constructed in terms of performance.

[2] What I mean here is that with most laws and even corporate policies the original intention is usually good. The spirit if you will, but you quickly get mired in details and wording and CYA behavior.

[3] I'd ball park it in the $2-3 million dollar range and I consider that an inflated estimate. $10 million would work out to 475 c3.8xlarge On Demand AWS instances running for the entire year. Which would be about 51k ECUs, 27 TBs of RAM, and 148 TBs of mirrored SSD instance storage! Even if we shift some of those monies around to S3, EBS, and traffic charges it's still a metric ton of horsepower. This works out to be roughly 2 Pinterests or about 4 circa 2011 Instagrams in terms of hardware.

[4] From US News . If we used the actual average you could have a tech staff of 555 people FYI.

[5] The full quote is "The best minds of my generation are thinking about how to make people click ads. That sucks." from a Business Week article. [ CYA]: Cover Your Ass [ECU]: Elastic Compute Units

]]>/>
Grove isn't dead, it's only restinghttp://www.revsys.com/tidbits/grove-lives/Grove isn&#x27;t dead, it&#x27;s only resting. We&#x27;ve acquired Grove!Fri, 28 Sep 2012 21:54:28 +0000http://www.revsys.com/tidbits/grove-lives/

The news is out: we acquired Grove today, and we're taking over operations immediately.

We think Grove is an incredibly cool tool: it's an IRC server masquerading as a web-based chat application. It's got a really slick web UI that anyone can use, but behind the scenes is IRC, a proven technology trusted by über-geeks everywhere. We love the way Grove bridges the new and the old.

We're first and foremost Grove users, so we think we've got a good grasp of what Grove's other users want. In the short term our focus is going to be on fixing connectivity and improving site performance, but after that we've got some big ideas we're planning to implement. And of course we'd love to hear your ideas for what Grove should do; let us know at team@grove.io.

We're really glad that we could give Grove a home at Revsys. We think it's a great match for our company, and we can't wait to start working on our big ideas.

If you're already a Grove user, I hope you're as excited as we are to hear that it's got a bright future. And if you're not already using Grove, why not check it out ? We think you'll love it as much as we do.

]]>/>
Three things you should never put in your databasehttp://www.revsys.com/tidbits/three-things-you-should-never-put-your-database/There are several things that are not usually appropriate to put into a relational databaseTue, 01 May 2012 21:24:03 +0000http://www.revsys.com/tidbits/three-things-you-should-never-put-your-database/

As I've said in a few talks, the best way to improve your systems is by first not doing "dumb things". I don't mean you or your development staff is "dumb", it's easy to overlook the implications of these types of decisions and not realize how bad they are for maintainability let alone scaling. As a consultant I see this stuff all of the time and I have yet to ever see it work out well for anyone.

Images, files, and binary data

Your database supports BLOBs so it must be a good idea to shove your files in there right? No it isn't! Hell it isn't even very convenient to use with many DB language bindings.

There are a few of problems with storing files in your database:

  • read/write to a DB is always slower than a filesystem
  • your DB backups grow to be huge and more time consuming
  • access to the files now requires going through your app and DB layers

The last two are the real killers. Storing your thumbnail images in your database? Great now you can't use nginx or another lightweight web server to serve them up.

Do yourself a favor and store a simple relative path to your files on disk in the database or use something like S3 or any CDN instead.

Ephemeral data

Usage statistics, metrics, GPS locations, session data anything that is only useful to you for a short period of time or frequently changes. If you find yourself DELETEing an hour, day, or weeks worth of some table with a cron job, you're using the wrong tool for the job.

Use redis , statsd / graphite , Riak anything else that is better suited to that type of work load. The same advice goes for aggregations of ephemeral data that doesn't live for very long.

Sure it's possible to use a backhoe to plant some tomatoes in the garden, but it's far faster to grab the shovel in the garage than schedule time with a backhoe and have it arrive at your place and dig. Use the right tool(s) for the job at hand.

Logs

This one seems ok on the surface and the "I might need to use a complex query on them at some point in the future" argument seems to win people over. Storing your logs in a database isn't a HORRIBLE idea, but storing them in the same database as your other production data is.

Maybe you're conservative with your logging and only emit one log line per web request normally. That is still generating a log INSERT for every action on your site that is competing for resources that your users could be using. Turn up your logging to a verbose or debug level and watch your production database catch on fire!

Instead use something like Splunk , Loggly or plain old rotating flat files for your logs. The few times you need to inspect them in odd ways, even to the point of having to write a bit of code to find your answers, is easily outweighed by the constant resources it puts on your system.

But wait, you're a unique snowflake and your problem is SO different that it's ok for you to do one of these three. No you aren't and no it really isn't . Trust me.

]]>/>
PEP712 - Proposal to make unittest2 more accuratehttp://www.revsys.com/tidbits/pep712-proposal-make-unittest2-more-accurate/Thu, 20 Oct 2011 19:35:56 +0000http://www.revsys.com/tidbits/pep712-proposal-make-unittest2-more-accurate/
PEP: 712
Title: Proposal to make unittest2 more accurate
Version: a58437babcaa
Last-Modified: 2011-10-20T14:40:01.661119 (Thu, 20 Oct 2011)
Authors: Frank Wiles frank@revsys.com , Jacob Kaplan-Moss jacob@jacobian.org , Jeff Triplett jeff@revsys.com
Status: Draft
Type: Humor
Created: 20-Oct-2011
Python-Version 2.7

Introduction

This PEP describes a proposal to make unittest2 output more accurate and fun.

The Proposed Solution

Upon having more than 7 failing tests in a test run replace all 'F' character output with 'U's for the remainder of the test run.

Rationale

This output formatting more accurately describes the mental state of the developer.

Example

.....FFFFFFFUUUUUUUUUUUUUUUUU
CK
----------------------------------------------------------------------
Ran 25 tests in 3.14159s
FAILED (failures=19)

`

Optional Add-On

Replace next passing test after 19 failures with 'CK'.

Reference Implementations

See Reddit.com

References

@f7u12
F7U12

]]>/>
longjmp() 2011 - Impromptu Partyhttp://www.revsys.com/tidbits/longjmp-2011-impromptu-party/Fri, 22 Apr 2011 18:30:35 +0000http://www.revsys.com/tidbits/longjmp-2011-impromptu-party/

Lawrence has been having a bunch of downtown block parties lately, which are always great fun. The last couple of days the Kansas Relays have been going on and the long jump event was setup directly outside our office windows . Since we had such a great view we quickly through together a little party and invited all of our geeky friends to come watch, we had a pretty good turn out (free beer works apparently). Here are some photos I took from our office. You can find more professional quality photos of the event here .

Thanks to everyone for coming out!

]]>/>
We're hiring!http://www.revsys.com/tidbits/were-hiring/Thu, 07 Apr 2011 21:52:42 +0000http://www.revsys.com/tidbits/were-hiring/

UPDATE: I completely forgot to update this post until now, but we have filled this position.

We're looking for a full time developer. If you're reading this you probably know what we do, so you'll obviously need to know Django pretty well. Knowledge of PostgreSQL and devops skills are a big plus. We would prefer if you lived in Lawrence, Kansas, obviously, but a few visits here may be fine. Only because we get lonely.

Lawrence is an awesome town, so don't let the surroundings (the rest of Kansas) scare you off. It's the hometown of Django and basketball. People often compare it to a smaller version of Austin or Portland, OR. For example, check out the local music happening this weekend on the great Lawrence.com .

Interested? Email frank@revsys.com with your resume and code samples. Links to public repositories are ideal, also feel free to talk yourself up with respect to Open Source contributions. Docs, code, answering questions on a mailing list, your StackOverflow account, etc. are all good things to bring to our attention. We probably shouldn't need to tell you this, but if you've contributed code to Django you might want to lead with that!

]]>/>
Django 1.3 is out - time to upgrade!http://www.revsys.com/tidbits/django-13-upgrade/Wed, 23 Mar 2011 19:41:25 +0000http://www.revsys.com/tidbits/django-13-upgrade/

Nearly a year in the making, Django 1.3 is now shipping. It includes a ton of bugfixes along with a bunch of major new features:

  • Class-based views.
  • Better support of Python’s logging tools.
  • A new tool to help with handling static files.
  • Greatly improved testing utilities via the unittest2 library.
  • Configurable on-delete behavior.
  • And more!

To help people get a jump on upgrading, I'll be holding a webinar next week . We'll talk about the new features, go over the steps to follow for a safe and easy upgrade, and cover the "gotchas" to avoid as you upgrade.

You should join us — March 31 from 1-3 (central). It'll be a blast.

]]>/>
The big secret project I've been working on...http://www.revsys.com/tidbits/big-secret-project-ive-been-working/Mon, 07 Feb 2011 17:07:11 +0000http://www.revsys.com/tidbits/big-secret-project-ive-been-working/

With the official press release out the door, I can finally start talking about the project that has been consuming most of my time for the last several months!

StoryMarket is an online system for making finding, buying, selling and sharing content a la carte easy. It's primarily focused towards journalists and the news industry, but we expect there to be a large contingent of bloggers and other new media content creators from all sectors. StoryMarket was developed as a partnership between Revolution Systems and The World Company , publisher of the Lawrence Journal-World the same great company that brought us our beloved Django .

The news industry is in trouble, with declining revenues and increasing costs, the old content syndication models are proving to be more than smaller organizations can bear. You can think of it as a mashup of eBay or Etsy and iTunes with a strong social component similar to LinkedIn. Sellers are given extremely flexible tools for managing their individual prices and rights. From being really open and cheap, to being really restrictive and expensive and everything in between.

For example, I could give this particular blog post a price of $20 to all StoryMarket users, $10 for users in the 'Django' group, $5 to my Mom, and make it free to Steve Holden (because you have to keep the Steve Holden happy). As for rights, I could choose to only allow it to be used in print and not online, or vice versa. I could also exclude certain competitors from even being able to view the content inside of StoryMarket, let alone be able to purchase the rights to republish it.

But it's not limited to textual content. StoryMarket also supports images/photos, audio, video, and data sets. The uses of each should be obvious with the exception of maybe data sets, organizations could for example share polling data in say CSV format between each other. Or demographic data they've collected, anything that might be useful to another party.

What I think will be most interesting to watch about this project is the "long tail" of the content. If the Lawrence Journal-World writes a story about a new organic beer being made at the local FreeState Brewery , many other publishers wouldn't think there would be a secondary market for that story, being so locally focused. But they would be wrong, beer related blogs might want to re-run the story, treehugger.com might want it, and so might zymurgy related magazines.

Or imagine a college football game and the thousands of press photos taken at one. Only a couple make it into print or online, but I would wager most of them are in focus. I'm sure these other shots, even if not the best or most interesting shots of the game, might find buyers in the featured player's home town paper.

Many of you might be saying, "But doesn't the industry already do this today?". They do, but not with the frequency they should be due to all the friction in the process. You have to hunt down someone to talk with at the original organization, wander your way through voicemail hell, and hope they even respond. Then comes the price negotiation, faxing a contract around, etc, etc. Could be days before you're given the rights you wanted last week. With StoryMarket we make this a simple e-commerce transaction so if you find a story you want at 3:14 AM on a Sunday morning, you can have it live on YOUR site a few moments later.

So why is this project of interest to the Django community? Three main reasons, jump out:

  1. Larger funded projects like this help employ Django developers. While I did most of the original development, we've been really lucky to snag two huge Django stars Malcolm Tredinnick and Danny Greenfeld to help us finish up the last bits before our public beta launch. And we managed to entice the awesome Greg Newman to do the design.
  2. Projects like this also bring small hidden benefits to the community as a whole. Because of this project there are a couple of new re-useable Django Apps ( django-tos , django-app-metrics , and django-taggit-suggest ) out there for everyone to use and build upon. We've also contributed back several bug fixes to various Open Source projects we use in StoryMarket.
  3. While Django is used for all sorts of projects these days, it's very heavily used in the news industry from which it came. By helping keep journalism strong and vibrant we're also indirectly helping Django itself.

I think it's important for all commercial projects like this to highlight their contributions back to our community. Not only for the warm fuzzy feelings, but to help other developers out there convince their superiors that it is ok and/or even very useful to Open Source the generic reusable aspects of their work. This needs to become more common place and the best way to do that is to talk about it publicly.

So what app/libraries/patches has your company released lately? Go shout their praises so we all know about it!

]]>/>
Giving Back to the Communityhttp://www.revsys.com/tidbits/giving-back-community/REVSYS sponsors Read The Docs and PyCon 2011Mon, 31 Jan 2011 22:02:00 +0000http://www.revsys.com/tidbits/giving-back-community/

We're big fans of open source, so we like to give back to the community whenever we can. Having a strong community is actually personally and professionally selfish: a vibrant community not only helps ensure we keep getting to eat on a regular basis, but it also means there is more useful code available for us to use.

Neither Jacob nor myself would be in the position we're in if it wasn't for the open source community. So even though we're a small company, in 2011 we're trying to really step up the ways we give back to the community.

Office Hours

We'll be holding another one of our IRC "office hours" in #revsys on Freenode this coming Friday, February 4th, from 3-5pm Central. We held our first a few months ago and it was not only successful, but fun so we're doing it again. We hope to have enough time to do this at least a few times a year.

Sponsoring PyCon

Where would we be without Python? Sure we could have spent this money on a new hot tub for the office, or perhaps adding another runway to the RevSys HQ airstrip, but we figured helping PyCon be awesome might be a better use of our money. So we signed up as a sponsor, joining some other great companies (and few of our clients).

Sponsoring readthedocs.org

It's no secret that we're huge documentation nerds . We've been really enjoying the Sphinx documentation toolkit, so we were super excited when we saw Read the Docs launch after the 2010 DjangoDash . If you haven't checked it out, you really should: It makes it trivial to setup a spot on the web for your docs. It even gives you free searching and doc versions based on tags in your repository.

We think this is a great project that Eric , Charles , and Bobby put together, so when they approached us to help cover their server costs we said "hell yes." We're looking forward to the new features they're working on, and we're honored to be able to help out.

]]>/>
2010: A Year in Reviewhttp://www.revsys.com/tidbits/2010-year-review/A review of what REVSYS did in 2010Sat, 01 Jan 2011 00:48:00 +0000http://www.revsys.com/tidbits/2010-year-review/

2010 was a great year for REVSYS and for Jacob and I, both personally and professionally. Here are some of the highlights:

  • Jacob gave a ton of training classes around the US in cities such as NYC, Boston, Los Angles, and Atlanta. Between his training schedule and speaking engagements he managed to log close to 60,000 miles of travel hitting 31 cities in 4 countries this year. I'd list links to all of Jacob's talks this year, but unfortuantely our webserver doesn't have enough storage for that large of a list! (Ok I'm lying, I'm just lazy and running late for a New Year's Eve party.)
  • We gave a half day tutorial at OSCON on deploying Django .
  • I gave a talk at DjangoCon and we helped sponsor the conference this year. It was also the first time I got to meet Steve Holden
  • We were able to work with some of the best developers other there this year on projects, people such as Malcolm , Danny , Daniel , and Travis . Danny even camped out with us at Chez REVSYS for a few months while he taste tested Lawrence.
  • We refreshed our site with the amazing help of Greg Newman and Ed Frazier .
  • Jacob bought a farm. Note: "a farm" not bought the proverbial "farm".
  • I fell in love with some new bits of tech, most notably Redis (thanks to Simon ), virtualenv and virtualenvwrapper. Being a consultant and juggling many projects at once the simple 'workon' aspect of virtualenvwrapper alone is enough of a reason to switch from buildout. Tip, it's even great to use on entirely non-Python related projects to create any sort of easily switchable shell environment.

So what's in store for next year? Well we've got a big announcement coming out soon about a project I've been working on for most of 2010. We're helping sponsor PyCon this year, so Jacob and I will be there. I'll also likely be at both the US and EU DjangoCons this year and I'm sure Jacob will be as well (travel/schedule/farming permitting).

Thanks all around to our clients and the wonderful Open Source communities we're lucky to be a part of. 2010 was a great year because of you!

]]>/>
Office hours transcript postedhttp://www.revsys.com/tidbits/office-hours-transcript/Transcript of the REVSYS Office Hours from November 2010Sun, 07 Nov 2010 23:05:00 +0000http://www.revsys.com/tidbits/office-hours-transcript/

We held our first open "office hours" this past Friday, November 5th. We've taken the IRC transcript, organized and cleaned it up a bit, and posted it online for those who couldn't make it.

We had about fifty people show in IRC where they asked questions on a wide variety of topics including deployment, NoSQL, cloud computing, and contributing to Django. It was a blast! A few of my favorite questions:

We had a great time with this, and will certainly be doing it again in the future. If you'd like to hear about future office hours, follow us on Twitter .

Also, if you'd like a chance to ask questions like this in person, you might want to attend one of my Django classes in LA and Boston next month.

]]>/>
Two new Django classes: deployment and the ecosystemhttp://www.revsys.com/tidbits/django-classes-dec/Django Training Classes Thu, 04 Nov 2010 23:16:00 +0000http://www.revsys.com/tidbits/django-classes-dec/

Want to buff up your Django chops? Learn some cool new tricks for the new year? You should come to one of my new Django classes next month! I'll be teaching two new one-day classes, both of which evolved from common questions we get here at Revsys. Each class is going to be offered twice, once in LA and once in Boston.

For the impatient here are some links and quick info:

For more details on each class, read on...

The first class attempts to answer this question: "So... I've written this cool Django site, and it works on my laptop... but now what?"

We've increasingly been spending our time helping clients learn how to successfully deploy Django, so I've taken the things we've learned and turned them into the Django Deployment Workshop . You're invited to bring your own Django sites and I'll show you how to put them into production. At the end of the class, you'll walk away with a production environment that could easily serve most small-to-medium Django sites.

To sign up:

The second class focuses on Django's app ecosystem. Our clients tell us that they love all the apps available out there, but that they have a really hard time finding the best ones and figuring out how to use them. So I've put together The Best of the Django Ecosystem , a class that tries to answer both questions. We'll look at some of the coolest tools out there in the ecosystem (including South , Celery , and Haystack ) and learn how to integrate them with an existing site. Again, you're invited to bring your existing Django site; I'll give plenty of time and hands-on help getting these apps wired up..

Dates and details:

Early-bird registration is open now . You'll save over $100 if you register before November 22nd, so get on it. There's also a combo ticket available in both cities if you'd like to attend both classes and group rates if you'd like to bring the whole team.

Oh, and to answer the inevitable FAQ: yes, we'd love to bring these classes to other cities! We're looking at a bunch of locations for 2011, and if you'd like to throw your hometown in the ring we've got a quick little survey to help us figure out where to host future classes.

If you have other questions, please get in touch , or drop into our office hours tomorrow afternoon.

]]>/>
Open "Office Hours"http://www.revsys.com/tidbits/open-office-hours/REVSYS held an open office hours meeting. Tue, 02 Nov 2010 16:55:00 +0000http://www.revsys.com/tidbits/open-office-hours/

We've been wanting to try this for awhile, but have been too busy to actually try it out until now. Jacob Kaplan-Moss and I will be holding open "office hours" this Friday November 5th between 1-3 PM CDT in IRC. Come join us in #revsys on freenode and pick our brains. Or just come to listen and learn.

The concept is pretty simple, Jacob and I (and likely many other knowledgeable Djangonauts) will be in the channel and attempt to answer your questions related to:

  • Django/Python
  • Deployment
  • Scaling/Performance
  • PostgreSQL
  • Anything else we happen to know...

We're going to log the discussion and will provide a transcript online afterwards.

Update: The transcript is now available.

]]>/>
Centralized logging for fun and profit!http://www.revsys.com/tidbits/centralized-logging-fun-and-profit/Fri, 27 Aug 2010 00:12:23 +0000http://www.revsys.com/tidbits/centralized-logging-fun-and-profit/

Setting up a centralized log server using syslog isn't as hard as many may believe. Whether it's logs from Apache, nginx, email services, or even from your own Python applications having a central log server gives you many benefits:

Benefits to a centralized logs

  • Reduces disk space usage and disk I/O on core servers that should be busy doing something else. This is especially true if you want to log all queries to your database. Doing this on the same disk as your actual database creates a write for every read and an extra write for every write.
  • Removes logs from the server in the event of an intrusion or system failure. By having the logs elsewhere you at least have a chance of finding something useful about what happened.
  • All of your logs are in one place, duh! This makes things like grepping through say Apache error logs across multiple webservers easier than bouncing around between boxes. Any log processing and log rotation can also be centralized which may delay your sysadmin from finally snapping and killing everyone.

Syslog Review

In case you aren't terribly familiar with how syslog works, here's a quick primer. Syslog separates out various logs using two items. Facilities and Levels . Here are the standard facilities:

  • 0 kernel messages
  • 1 user-level messages
  • 2 mail system
  • 3 system daemons
  • 4 security/authorization messages
  • 5 messages generated internally by syslogd
  • 6 line printer subsystem
  • 7 network news subsystem
  • 8 UUCP subsystem
  • 9 clock daemon
  • 10 security/authorization messages
  • 11 FTP daemon
  • 12 NTP subsystem
  • 13 log audit
  • 14 log alert
  • 15 clock daemon
  • 16 local use 0 (local0)
  • 17 local use 1 (local1)
  • 18 local use 2 (local2)
  • 19 local use 3 (local3)
  • 20 local use 4 (local4)
  • 21 local use 5 (local5)
  • 22 local use 6 (local6)
  • 23 local use 7 (local7)

For each facility logs are sent using a particular level, the levels are:

  • 0 Emergency: system is unusable
  • 1 Alert: action must be taken immediately
  • 2 Critical: critical conditions
  • 3 Error: error conditions
  • 4 Warning: warning conditions
  • 5 Notice: normal but significant condition
  • 6 Informational: informational messages
  • 7 Debug: debug-level messages

So for any given log message you set these two options to give a hint as to where the logs should be directed. For example, if an email server receives a new message it would likely be sent as mail.info and a kernel panic would be sent using kern.emerg

The receiving syslog server then can be configured to direct log messages of a certain facility and/or log level to various files. For example, a default Ubuntu system has some settings like this:

daemon.*        /var/log/daemon.log
kern.*          /var/log/kern.log
mail.*          /var/log/mail.log

But you can also do more granular separation for example you might want to log mail.err into a separate file from the main mail logs to make it easier to spot new errors with this:

mail.*        /var/log/mail.log
mail.err      /var/log/mail-errors.log

Setting up your central server

Configuring the master log server is pretty easy. On Ubuntu the default syslog server is rsyslog and that's what I'll be using as an example here. You'll need to edit /etc/rsyslog.conf and uncomment the UDP module. You could also use the TCP module, but that one binds to all of your interfaces so you will need to restrict access to it with iptables (or some other mechanism) in order to not allow hackers to fill up your disks remotely. So your configuration should now contain these uncommented lines, where 'x' is an internal protected IP address:

$ModLoad imudp
$UDPServerAddress x.x.x.x
$UDPServerRun 514

And then restart rsyslogd. See that wasn't so hard...

Setting up the remote log sending servers

Setting up your remote servers is even easier. If you want to send ALL of your logs to the central server it's just a matter of adding one line to the top of /etc/rsyslog.d/50-default.conf . That line is:

*.* @x.x.x.x:514

This will send all logs of any facility and any level to the server. Note that the local syslog will, as configured by default, still log locally. So if you don't want that be sure to remove all of the other configuration in this file.

You can also get fancy here and keep some logs on the local server and only send some things remotely. For most of your custom apps and logs you'll want to be using the LOCAL[0-9] facilities. Let's say we're going to want to centrally log our Python logs and Apache error logs. We'll be using LOCAL0 and LOCAL1 for them respectively. That config would look like:

local0.* @x.x.x.x:514
local1.* @x.x.x.x:514

Keep in mind however that most systems have .info_ , _ .debug , etc. configurations setup so you might be duplicating your data. If you poke around this file you'll see lots of configurations ending in .none , this instructs rsyslog to not include those facilities in this particular file. So for example, you'd want to edit your /var/log/syslog to resemble this:

*.*;auth,authpriv,local0,local1.none        /var/log/syslog

Additional help and features

While most applications are easy to setup for use with syslog, here are some pointers for more info on the subject:

  • Apache support sending error logs to syslog via the ErrorLog syslog:local1 configuration option. However, it does not support sending access logs directly. To do that you'll need a small script and pipe your access logs through it.
  • For more information on setting up your own Python code to use syslog, check out the logging.handlers.SysLogHandler handler for the logging module.

We've only really scratched the surface of the features of rsyslog with this setup. You can configure it to do some fairly advanced separation of logs based on the sending host, application name, and other various aspects of the message itself. Refer to the rsyslog documentation for more information on that.

Happy Logging!

]]>/>
Investing in Yourself - A review of Django 1.1 Testing and Debugging by Karen M. Traceyhttp://www.revsys.com/tidbits/investing-yourself-review-django-11-testing-and-debugging-karen-m-tracey/Tue, 06 Jul 2010 18:20:05 +0000http://www.revsys.com/tidbits/investing-yourself-review-django-11-testing-and-debugging-karen-m-tracey/

Packt Publishing recently asked me to read and review Django 1.1 Testing and Debugging and I have to admit I really enjoyed reading this book. Often I find myself debating whether or not to purchase a new development book. I'm usually thinking "If I spend $XX.XX on this book, will I really learn anything worth that much?". Especially considering most answers are a few Google searches away. I can happily attest this book is definitely worth the cost.

The book starts off with the usual introduction to testing, discussing both Doctests and Unittest, which is obviously required for a book on this subject. However, this book differs greatly from many in that it walks you through testing and debugging your application as you would when building a real application. Many tech books strive to be a great tutorial, but often fall short of the mark and end up just being some verbiage around a rehashing of the available documentation. They end up being more reference than tutorial. Django 1.1 Testing and Debugging however does a great job of walking you through real world scenarios. For example, it covers topics (and in the proper order in my opinion) like:

  • Test coverage
  • Everything you would ever want to know about the default Django debug page
  • How to debug urls.py issues
  • How debugging differs between the development server and what to do in production
  • Getting ahold of what database queries were actually generated on a given page
  • Using the wonderful Django Debug Toolbar
  • Using logging to help debug your code
  • Effectively using the Python Debugger
  • How to report Django bugs

All of these tools and techniques should be in all of our development arsenals, but this is the first book I've seen that puts them all together in a way that is accessible to Django developers of nearly any skill level. Beginners should read the book, even if they don't understand everything right away, just to know what options they do have at their disposal. However, more advanced developers will likely find a few golden nuggets of wisdom that will pay productivity dividends for years to come.

Hence the title of this post Investing in Yourself . While Packt was nice enough to give me a free copy of the book to review, had I paid full list price the time savings from the few little nuggets of wisdom I either didn't know about or for whatever reason never clicked with me before would have paid for the book many times over in just the first few weeks.

I highly encourage you to pick this up and with it improve your code going forward. You can check out a sample chapter of the book to whet your appetite a bit.

]]>/>
Early registration for our Advanced Django class ends soonhttp://www.revsys.com/tidbits/early-registration-advanced-django-ends-soon/Tue, 16 Feb 2010 22:32:44 +0000http://www.revsys.com/tidbits/early-registration-advanced-django-ends-soon/

Early registration ends Friday for the March Advanced Django Class I'm teaching, so if you're planning on coming, you should sign up soon!

I'm really excited about this class: we'll get to dive really deep into the good parts of Django. We'll cover all sorts of advanced usage of Django's APIs, spend a bunch of time playing with all the cool stuff out there in the Django ecosystem, and actually spend a whole day setting up and configuring a real-world deployment stack.

Many more details are over on the class description page , along with contact info if you've got any questions.

]]>/>
Django Training in 2010http://www.revsys.com/tidbits/django-training-2010/Django Training in 2010Mon, 18 Jan 2010 23:28:00 +0000http://www.revsys.com/tidbits/django-training-2010/

The new year's shaping up to be a great one for Django: Django 1.2 is on track to ship this March, and there's no doubt in my mind that it'll be the best release ever.

Because of this, we expect to see a lot of new people wanting to learn Django next year, so we're stepping up our training offerings in the new year.

This Friday, I'll be holding a one-day Django workshop in New York City . The class is mostly full, but a few spots are still available . This is the second in a series of workshops we've been holding in conjunction with HoldenWeb , and we plan to offer more (see below)

In March, I'll be teaching a week-long advanced Django class in Kansas City. This class is perfect for folks who know Django but want to really learn all the ins and outs. We'll look at a whole bunch of advanced uses of Django, learn how to use optional add-ons like GeoDjango, Piston, Haystack, and even dig into the internals of Django a bit. We'll also spend a whole day setting up a real-world Django production environment and tuning it for performance.

I'm really looking forward to this class: there's some really fantastic bits buried deep in Django that I don't get much call to talk about.

Finally, we're currently scouting locations and topics for future classes and would like your input ! If you'd be interested in attending a future Django class there's a quick questionnaire you can fill out .

]]>/>
Django 1.0 Template Development Reviewhttp://www.revsys.com/tidbits/django-10-template-development-review/Thu, 17 Sep 2009 17:52:52 +0000http://www.revsys.com/tidbits/django-10-template-development-review/

Django 1.0 Template Development by Scott Newman actually surprised me in it's depth of covering the topic.

I assumed it would be written with the absolute Django Template beginner in mind. While it is definitely an appropriate book for beginners, it also covers more advanced topics such as:

  • Writing custom template tags and filters, Chapter 7
  • Covers pagination quite well in Chapter 8
  • Customizing the Django admin's look and feel in Chapter 9
  • And gives a good designer intro to caching in Chapter 10

This book should definitely be required reading for web designers that are looking to use Django. As someone who has read and re-read the wonderful Django Documentation many times this book does a great job of distilling all of that reference knowledge into book form.

]]>/>
Django Traininghttp://www.revsys.com/tidbits/django-training/Django Training 2009 by JKMThu, 16 Jul 2009 18:14:00 +0000http://www.revsys.com/tidbits/django-training/

I'm pleased to announce that this September, Revolution Systems will be offering a four-day Django training course here in Kansas City, taught by Jacob Kaplan-Moss. Jacob has taught a number of similar courses privately in the past, but this is the first time we're opening this up to the public.

If you've been wanting to really learn Django, this is a great chance. The class is intended for relative beginners to Django, but by the time you leave you'll know enough to create some pretty awesome websites.

The class will be Monday, September 21 - Friday, September 25, 2009 in Kansas City, MO.

For more, or to sign up, check out the full details .

]]>/>
Some quick updateshttp://www.revsys.com/tidbits/some-quick-updates/Mon, 30 Mar 2009 15:15:46 +0000http://www.revsys.com/tidbits/some-quick-updates/

It's been a busy and exciting week for us. Jacob has been at PyCon in Chicago where he is participating in a number of panel discussions and giving quite a few talks as well. Right now I imagine he's neck deep in code in the Django sprint helping to finish up the upcoming 1.1 release . If you're running a production site built with Django you should absolutely check out the talk he is giving with James Bennett on Real World Django .

While my week has been busy hacking away on several client projects and moving my main work machine to a shiny new MacBook Pro (can't recommend these highly enough), I was interviewed by Daniel Dern of Business Trends Quarterly in his post about scaling and performance titled For Scaling, Brains May Beat Brawn . We talk about how just throwing more money and hardware at a problem is not always the best solution. Often there are architectural, design, and/or configuration changes that can bring significant cost savings to your project. Both in terms of the hardware necessary to keep everything flowing, but also in on going system maintenance labor costs. I'm not talking about pre-optimization evils or complicating things for your admin, often these changes are transparent to day to day operations, but certainly not to your bottom line. For example, just using the proper RAID levels and physical disk configurations for your particular PostgreSQL database can be a huge win in performance.

I also added a tidbit of wisdom in an advice post to budding entrepreneurs called 163 Ways How To Become An Entrepreneur .

]]>/>
Welcome Jacob Kaplan-Mosshttp://www.revsys.com/tidbits/welcome-jacob-kaplan-moss/Jacob Kaplan-Moss joins REVSYS as a partner.Thu, 05 Mar 2009 13:02:00 +0000http://www.revsys.com/tidbits/welcome-jacob-kaplan-moss/

I'm very pleased to announce that Jacob Kaplan-Moss has joined Revolution Systems to head up a new line of services around the ever growing Django web development framework. First up are commercial Django Support Plans , but look for more Django related offerings in the near future.

Jacob has been a good friend of mine since before Django was even released. It was a pleasure to work with him at our previous day jobs and I'm very excited for the future ahead. Not only is he obviously an authority on Django, he 's an amazing developer and generally an expert on all things tech. Jacob and Adrian are both great examples of how to lead an Open Source project and grow a real community around it.

By offering Django Support packages we hope to help adoption of Django in the business world, which helps grow the community at large.

]]>/>
ORD Camp a Huge Successhttp://www.revsys.com/tidbits/ord-camp-a-huge-success/Tue, 03 Feb 2009 07:30:39 +0000http://www.revsys.com/tidbits/ord-camp-a-huge-success/

I was luck enough to be invited to attend ORD Camp this last weekend in blisteringly cold Chicago. ORD Camp is an invite only, FooCamp style unconference targeted at geeks living in the Midwest. Having never attended a FooCamp style event I wasn't sure what to expect. I can now say if you ever have the opportunity to attend an event like this it is well worth your time.

As you can see from the attendee list it was a very diverse group of people, not just the usual crowd of notable Open Source geeks. The amount of brain power in that room was simply amazing and I can't remember when I had as much fun. Some sessions were presentations, others were just focused discussions. Everything from how words work, brewing beer, life hacking, to what not to do as a startup.

While I loved the sessions the most fun was getting into random conversations ( some ended up being NSFW after midnight and many beers ) others were more typical. Spent some time talking with people about PostgreSQL's advantages over MySQL, alternative business models, how a certain entrepreneur might improve the performance of their servers, etc.

It is difficult to determine how important this conference will be to my business in the future, but I can easily say that it has increased my drive, ambition, and overall excitement level. Passionate people, doing amazing things will do that to you! I can't wait to attend next year.

]]>/>
ResumeBucket.com Launcheshttp://www.revsys.com/tidbits/resumebucketcom-launches/Mon, 29 Dec 2008 18:19:16 +0000http://www.revsys.com/tidbits/resumebucketcom-launches/

I hope everyone had a great holiday this year. For the past few months I've been working on an online resume site ResumeBucket.com and I need your help taking it for a test drive. Our goal with this site is to create a site where you can upload your current resume in Word form, build a new resume using our online resume creation tool, or even just type in what you want using using our online text editor.

The site gives you a unique URL you can give out to friends and prospective employers so they can instantly access an up to date copy of your resume. There are options for them to also download a copy of your resume in Word or PDF format. For an example of how your resume can look see our CEO's resume page.

Unlike the other resume services out there, employers are able to search the database without paying any huge fees which will drive more qualified employment leads to your INBOX.

Please take a few minutes to kick the tires. You can leave feedback here in the comments or E-mail me at frank@revsys.com . Thanks!

]]>/>
Why isn't PostgreSQL using my index?http://www.revsys.com/tidbits/why-isnt-postgresql-using-my-index/Reasons Postgres might not be using the INDEX you created. Tue, 16 Sep 2008 15:16:00 +0000http://www.revsys.com/tidbits/why-isnt-postgresql-using-my-index/

This is a common question I see from PostgreSQL users. In fact, someone was just in IRC asking it and it prompted this post. The exchange usually goes:

Steve: I have this column foo and I have an index on it, but when I do SELECT * FROM table WHERE foo = X, EXPLAIN doesn't use the index. What am I doing wrong?

90% of the time the answer is unfortunately "Because the query optimizer is smarter than you are". Or maybe it's fortunately depending on how you think about it!

In this user's particular case he was mocking up a database schema and had only one row in the table he was querying against. People who are more familiar with PostgreSQL will probably roll their eyes at the question, but if you put yourself in the user's shoes I can see how people would be confused by this. They are thinking, " I put an index on there on purpose, why the hell isn't it working?" .

PostgreSQL's query optimizer is smart, really smart and unless you have evidence otherwise you should trust what it is doing. In this particular case, the optimizer realizes that if a table has only a few rows that using the index is actually slower than just spinning through the entire table. Just because PostgreSQL isn't using your index today with a small number of rows, does not mean it won't choose to use it later when you have more data or the query changes. Because he was just mocking up a design he didn't have real world data, which is almost always a bad way to performance tune your system unless you are very familiar with how PostgreSQL behaves.

Now there are other reasons why it might not be using the index. If you have lots of data and the query you're running appears that it would benefit from the index, it might be a simple matter of forgetting to run an ANALYZE on the table or not having autovacuum turned on. Until PostgreSQL re-analyzes your table it doesn't really "know" about that index to take it into account when building the query plan.

While performance tuning PostgreSQL is much easier and better documented than in days gone by, it can still be very confusing and time consuming for the inexperienced. If your business needs help tuning their system you might consider my PostgreSQL Tuning Service .

]]>/>
Fret Free -- Introduction to Django and the Django Software Foundationhttp://www.revsys.com/tidbits/fret-free-introduction-to-django-and-the-django-software-foundation/Wed, 20 Aug 2008 17:07:40 +0000http://www.revsys.com/tidbits/fret-free-introduction-to-django-and-the-django-software-foundation/

LinuxPro Magazine just released my latest article, an introduction to Django and some discussion about the newly created Django Software Foundation . Being a life long Perl user, I didn't think I would enjoy Django at all. I have to admit that it is a VERY polished system. It has great PostgreSQL support, in fact the core developers smartly prefer it over MySQL for their own systems.

You can download a PDF copy of the article at, Fret Free -- Django and the Django Software Foundation . The print issue will hit the stands in October. Hope you enjoy it!

]]>/>
Installing Apache2::Request on a 64-bit systemhttp://www.revsys.com/tidbits/installing-apache2request-on-a-64-bit-system/Tue, 17 Jun 2008 07:44:02 +0000http://www.revsys.com/tidbits/installing-apache2request-on-a-64-bit-system/

I usually shy away from using 64-bit systems unless there is a clear need for it. But with more and more hosting companies installing 64-but Linux distrobutions by default I sometimes have to suffer through.

Today I ran into a very small bug in installing Apache2::Request modules via the CPAN shell. This simply installs the default libapreq2 library with the Perl bindings for you. When attempting to start Apache I received this error:

Can't load '/usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/auto/APR/Request/Request.so' for module APR::Request: libapreq2.so.2: cannot open shared object file: No such file or directory at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/DynaLoader.pm line 230. at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/APR/Request/Param.pm line 27 Compilation failed in require at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/APR/Request/Param.pm line 27. BEGIN failed--compilation aborted at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/APR/Request/Param.pm line 27. Compilation failed in require at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Apache2/Request.pm line 2. BEGIN failed--compilation aborted at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Apache2/Request.pm line 2.

Basically it is saying it can't load libapreq2.so.2. Digging around in the system I found it had installed them in /usr/lib instead of the proper /usr/lib64. The actual shared library was compiled for 64-bit, but was just installed in the wrong location. A quick symlink like this fixes the isuse:

ln -s /usr/lib/libapreq2.so.2 /usr/lib64/libapreq2.so.2

Hope this helps you get over this small issue.

]]>/>
Django Software Foundationhttp://www.revsys.com/tidbits/django-software-foundation/Tue, 17 Jun 2008 06:38:38 +0000http://www.revsys.com/tidbits/django-software-foundation/

The Django guys have scored another milestone today with the creation of the Django Software Foundation . Being a huge Open Source advocate I love hearing great news like this. You can read more about the foundation at:

On an unrelated note, an article I wrote for LinuxPro Magazine last November about using Perlbal was recently put online. You can read it at The Juggler -- Let the nimble Perlbal webserver keep your traffic in balance .

]]>/>
Test Driven Development and Getting Things Donehttp://www.revsys.com/tidbits/test-driven-development-and-getting-things-done/Fri, 16 May 2008 15:54:26 +0000http://www.revsys.com/tidbits/test-driven-development-and-getting-things-done/

I'm sure someone, somewhere, has already determined this. It's probably been blogged about before, but I just realized something today. Test driven development dovetails nicely with David Allen's time management book Getting Things Done . By creating your tests first, with or without even marking them as TODO tests, builds you a TODO list of sorts for your project.

Other than just being a different way of doing things, I never really took to test driven development for small or personal projects. Most of my projects are web applications and "unit testing" modules never seemed worthwhile because you weren't easily able to test the database and web interactions. Well I can easily say that Perl Testing: A Developer's Notebook by Ian Langworth and chromatic has shown me the way.

I had always focused too much on ensuring that 'make test' would be able to be run from anywhere and simply did not think about the fact that I could automate most of the work of building a test environment and if I wasn't able, to bail on those tests so they would be skipped. For example, you could automate creating your PostgreSQL database, but what if PostgreSQL isn't installed on the server or what about when your user isn't allowed to create databases? You just bail out of those tests.

I was definitely too concerned with getting 100% of the tests to pass 100% of the time, in all possible scenarios now that I think about it.

]]>/>
Want to be a better manager?http://www.revsys.com/tidbits/want-to-be-a-better-manager/Tue, 11 Mar 2008 06:24:03 +0000http://www.revsys.com/tidbits/want-to-be-a-better-manager/

I've been reading a great book recently titled First, Break All the Rules: What the World's Greatest Managers Do Differently . I highly recommend it for anyone who manages employees, but it makes two great points early on that are especially appropriate for technology managers:

Treat your Employees Differently

You should treat your employees differently. Each has unique strengths, weaknesses, and differ in the way they learn and you should capitalize on that, not try to homogenize everyone. Sally might be an awesome systems architect, so you don't want to weigh her down pounding out mundane features. Bob might be the fastest developer and giving him those same mundane features makes sense. Steve on the other hand might be your more efficient debugger, etc.

They learn in different ways, some by reading, others by doing, and some need a more traditional classroom environment. Sending a do-er to a lecture class is a waste of everyone's time.

Don't Let Human Resources Get in the Way of Your Hiring

I've touched on this topic before, but you can't leave the resume collection and filtering process to your HR department. Even in our more tech savvy age, I still hear horror stories from programmers how they lost out on being interviewed for a job because the HR person didn't know J2EE or J2ME were 'Java', that Ubuntu is a Linux distribution, or that having 6 years of mod_perl experience probably means they know Perl pretty well.

A great technology manager has to hire a person based on several things, one of which is experience with a particular technology or technologies. But that's only part of it. Troubleshooting skills, breadth of experience in many technologies, personality and how this new hire will fit into the overall team structure are just as important. The worst possible case is when the manager is barely or not at all involved in the hiring process.

Now there are exceptions to this. The best example is my friend Josh Stomel who runs NeoHire . Josh is one of those rare recruiters that actually understands technology and more importantly knows when he doesn't know and has a huge network of geeks like me to ask questions. You can just hand NeoHire a list of qualifications and they will bring you back a group of top notch talent.

So it is possible to have HR deeply involved in the hiring process, it just has to be the right people.

]]>/>
Building mongrel on Solarishttp://www.revsys.com/tidbits/building-mongrel-on-solaris/Fri, 29 Feb 2008 17:28:08 +0000http://www.revsys.com/tidbits/building-mongrel-on-solaris/

I keep running into this same situation, so I figured I would write a post for everyone's benefit. When trying to build recent mongrels or some other gem dependencies such as fastthread on a Solaris system the automatically generated Makefile is all sorts of messed up.

This testing was done on two different Joyent Accelerators, your milelage may vary on other Solaris installations.

The steps are pretty easy:
1. Attempt to install the gem via the normal 'gem install mongrel'
2. This will fail, but will download the gem for you.
3. Move into your gems directory. On these sytems that is /opt/csw/lib/ruby/gems/1.8/gems/
4. Move into the particular gem you are building that is causing you trouble. In this case mongrel-1.1.3
5. Move into the ext/ directory and the subdirectory that can't be built. You'll see it in the error messages from step 1.
6. Edit the generated Makefile as such:

Make sure CC is pointing to your C compiler, I had to edit this to reference gcc rather than cc.   
Make sure LD is pointing to a real ld tool.  On this system that meant I changed ld -G to gld -G   
Edit CFLAGS to be CFLAGS= -03 -I/path/to/libs Essentially you just need to remove the -KPIC and all     -x es from the generated file.

7. Run make install
8. Copy the gemspec file to /opt/csw/lib/ruby/gems/1.8/specifications so it will be registered with the gem tools.

Hope this helps.

]]>/>
PostgreSQL version 8.3 Releasedhttp://www.revsys.com/tidbits/postgresql-version-83-released/Mon, 04 Feb 2008 17:39:06 +0000http://www.revsys.com/tidbits/postgresql-version-83-released/

I just got word that version 8.3 of PostgreSQL has been released. Along with the usual amount of improvements there are some new features in 8.3 that should be of interest to PostgreSQL admins and developers such as:

  • Integrated TSearch
  • ENUM and UUID data types
  • Faster sorting technique used for LIMIT operations
  • Faster LIKE and ILIKE operations
  • Lazy XID assignment which will make many read only operations much faster

Check out the full list of features at the PostgreSQL site or download it from the download section of their site.

]]>/>
EveryBlock.com is now launchedhttp://www.revsys.com/tidbits/everyblockcom-is-now-launched/Thu, 24 Jan 2008 17:50:11 +0000http://www.revsys.com/tidbits/everyblockcom-is-now-launched/

My friend and former co-worker Adrian Holovaty and his team just launched their new project EveryBlock.com . EveryBlock takes the term hyperlocal to a whole new level. They aggregate tons of public data sources by geo location so you can for example find all of the recent crime around a particular address, neighborhood, zip code, etc. Or maybe you might be interested in the building code violations of where you live or work?

Right now they have San Francisco, Chicago, and New York up and running, but will be adding more cities as time goes on. Adrian asked me to help performance tune their PostgreSQL database a couple of months ago and so far things seem to be humming along nicely.

Here are some links to other blogs talking about EveryBlock.com:

Congrats and the best of luck to EveryBlock! I'm sure we'll see even more new and interesting things from this team in the future!

]]>/>
Some interesting linkshttp://www.revsys.com/tidbits/some-interesting-links/Fri, 18 Jan 2008 09:31:13 +0000http://www.revsys.com/tidbits/some-interesting-links/

Hope all of my readers have recovered from the holiday season. Here are a couple links I've come across recently, but neglected to write about during the holidays.

Zed Shaw , author of mongrel which is used by many Ruby on Rails applications, posted an interesting rant about the state of Rails development and the personalities of some of the major players.

I've never been a huge fan of Rails because I've never been shown a compelling reason to switch away from the more mature Perl equivalents. Lately I've been working with mongrel and Rails apps more and more as many of my clients have switched to that as their platform of choice. The more I work with it the less fond of it I am, it really is just immature compared to other platforms out there. Toward the end he mentions the Rails creator himself has an application that had to be restarted 400 times per day... with some fixes it only has to be restarted 10 times a day. 400 is just insane, but 10 is still very troubling. If the creator can't make things run better than that it really doesn't speak well for the platform. Normally I wouldn't link to such a huge rant, but when it comes from a major Rails contributor there has to be at least some truth to it.

My friend Stas Bekman has created a new site where you can make it known how you want a better version of something. It's called i-want-a-better.com . I think it's a great way for people to vent their frustrations with products, but I think the really winning idea is that entrepreneurs can peruse the site for new products and untaped markets. Hop on the site and let us all know what's frustrating you!

]]>/>
Making the software written in any language more readablehttp://www.revsys.com/tidbits/making-the-software-written-in-any-language-more-readable/Thu, 17 Jan 2008 08:47:50 +0000http://www.revsys.com/tidbits/making-the-software-written-in-any-language-more-readable/

There are two very simple ways to improve the readability and maintenance of the software you write. They are so simple they are often ignored in favor of more complicated tools and the various programming methodologies people blather on about. This comes from our human nature to think our own problems are more special and complicated than they really are and from not following the KISS principle .

So how do you improve your software?

By using better names and being consistent. It really is that simple, which is probably why it is overlooked. A development manager might score some points with his boss by switching to Agile programming, having some scrums, doubling the amount of developer documentation or maybe even switching to a whole new platform like Ruby on Rails. But who scores any points by saying, "We're going to do better about naming things appropriately?"

Appropriate Naming

Obviously having a variable named temp_user isn't all that descriptive, but it is an improvement over just temp or simply t. Most, if not all, programmers realize this. However, you will often see variables named clt when they should really be named client, as if those three extra characters were single handedly going to cause carpal tunnel. Your variable names should be as descriptive as possible without being absurd about it. A variable named the_current_user_object_we_are_working_with is obviously overkill. But perhaps current_user_object or even current_user is appropriate.

Naming your functions and methods should also be given the same amount of care. Naming a function fix_client doesn't tell us anything useful, we can only speculate something is wrong with the client or the data and this function does something about it. normalize_client_name is a much better name when you read that all the function does is properly set the case of the letters in the client's name.

The names you choose for your libraries are also very important. This is one of the reasons programmers find Perl's CPAN much more useful than other programming languages' library collections. Things are named and categorized (for the most part) sanely. Need something related to E-mail, check the libraries in the Mail category. Can you guess what Net::SSH , IO::File::CompressOnClose , and WWW::Google::News do? Yeah I thought you could.

If I stumble upon your use of a library called Util , it doesn't tell me anything I still have to go look at the library to see how it fits in with this code. If it had been named something like DB::Util or Client::Utils I would at least know the library is probably related to the database or client.

Consistency

Consistency is another area where you can improve your code base without much effort. If all of your classes contain an initialization method, they should all be named the same. Not initialize in some classes and init in others. Things that should be consistent throughout your code base, not only consistent within a single application. If possible, the source in your department should be consistent with other departments and business units.

Consistency through conventions is one of the main reasons people like web frameworks like Ruby on Rails. I'm not advocating Ruby on Rails necessarily, you can accomplish these same things in any language.

Things that should be consistent:

  • variable, function, and method naming conventions ( underscores or CamelCase, but not both )
  • frequency and layout of comments
  • documentation
  • configuration file syntax
  • on disk layout of source code, binaries, configuration files, etc.
  • installation, upgrades and package management
  • language(s) used for development

The point being the more consistent you are in how you build an application the easier it is to get down to the task at hand. Be it new development or bug fixing. Got a configuration file format you like and have already written libraries to parse it? Then use it EVERYWHERE , in every single darn application you build. No one has to learn the new format to start developing, no time is spent discussing which format is better, and another developer in your organization can jump in to fix bugs or add a new feature.

The last development shop I ran we focused on being consistent. 95% of the time we were building web applications that were either used internally by fellow employees or externally by our customers. If you saw the source to one of our applications, then all of our other applications would seem very familiar. One might be a ticket tracking system and the other an accounting package, but the layout, coding style, and use of common libraries let the developers dive right in and not have to worry so much about how this particular app is written differently than the others.

Consistency in your applications also makes refactoring easier. If all of your applications use a particular technique, library, etc. in the same way replacing it with a new tool you have fallen in love with is much easier. If everyone is doing everything even slightly differently, you have to start worrying more and more how the change might impact your code.

I'm definitely a "right tool for the right job" sort of fellow, but mixing and matching tools and techniques for every single application you build is a recipe for disaster. One shop I know (name withheld to protect the guilty) has two developers. One who works entirely in Perl, the other entirely in Ruby. Another application I saw was written mostly in Java, but with a smattering of C# and Python around the edges for kicks. None of these three were chosen from a "right tool for the job" perspective, but simply because those were the favorite languages of the specific developers tasked with those sub-systems. These are obviously worst case examples.

These ideas aren't new, I'm positive I am not the first to use them or even write about them. But I see these two simple rules violated so often by programmers of all experience levels that I felt the need to reiterate them.

]]>/>
Log Buffer #65: a Carnival of the Vanities for DBAshttp://www.revsys.com/tidbits/log-buffer-65-a-carnival-of-the-vanities-for-dbas/Fri, 05 Oct 2007 16:00:00 +0000http://www.revsys.com/tidbits/log-buffer-65-a-carnival-of-the-vanities-for-dbas/

Welcome to the 65th edition of Log Buffer , the weekly survey of database related blogs.

First let's start with some miscellaneous entries that could be of interest to any DBA. Crazy DBA has an interesting post about how attending conferences helped to grow his professional network, which in turn has made him a better DBA. And Thomas Kyte has a great post about why it's the data , not the application itself, that matters. Brian Aker gives us a great link to a Werner Vogels' entry on Dynamo , one of the key technologies used behind the scenes at Amazon.

Oracle users will certainly find these two links of interest. First off, Frederik Visser shows you how to play with Oracle 11g RAC in VMWare . And Alex Gorbachev has a nice write up about Miracle Open World.

SQL Server DBAs might enjoy the following posts. If you're thinking about using or upgrading to Idera SQLsafe v4.5, you'll want to check out Sean McCown's post about some of that product's issues . Steve Jones has some thoughts on monitoring and alerting with your SQL Server , but are valid for any database. Need to know when your SQL Server instance was started? Check out Joe Webb's tip on how to find out. And Mladen Prajdić has some advice on how to notify a client in a long running process with SQL Server.

MySQL users will find this post on accurately measuring how far behind your slave is lagging . Over at the MySQL Performance Blog there is an opportunity to ask questions of Heikki Tuuri, the creator of InnoDB, and Peter has some thoughts on a few serious bugs in the MySQL 5.0 release . Kaj Arnö has an interesting post on how MySQL GmbH and MySQL AB help birds of a feather to flock together, quite literally and about how they have opened up the call for papers for the 2008 MySQL Users Conference.

Kevin Burton talks about how to avoid swapping insanity with InnoDB. Want a free MySQL Magazine? Lewis Cunningham has found one for us all. Jan Kneschke introduces us to the Wormhole storage engine for MySQL. Not really sure how useful it is, but it is definitely interesting.

Hubert Lubaczewski has written a great tool to help you determine the optimal layout of tables , indexes, etc. on your various tablespaces for PostgreSQL. Robert Treat follows up with some additional thoughts to consider .

Joshua Drake has announced the speakers and topics for the PostgreSQL Conference Fall 2007, which is October 20th 2007 at Portland State University. Greg Sabino Mullane has a nice explanation of why you can't used prepared queries when using DBD::Pg and pg_bouncer . And to finish our this week's links, Francisco Figueiredo Jr mentions that PostgreSQL will have a UUID data type in version 8.3.

Enjoy!

]]>/>
Safely making changes to remote systemshttp://www.revsys.com/tidbits/safely-making-changes-to-remote-systems/Thu, 20 Sep 2007 17:17:50 +0000http://www.revsys.com/tidbits/safely-making-changes-to-remote-systems/

I stumbled across this post about a great technique for safely making configuration changes to remote systems . Everyone has done it before, the "Oops I just shutdown sshd" or turned off the wrong network interface. Ask Bjørn Hansen shows you how you can use atd to give yourself a safety net when doing these types of operations.

]]>/>
Under-used CPAN Moduleshttp://www.revsys.com/tidbits/under-used-cpan-modules/Tue, 11 Sep 2007 16:20:40 +0000http://www.revsys.com/tidbits/under-used-cpan-modules/

Perl often suffers from its history. While it is great that there are so many online tutorials and code samples out there for new Perl developers to learn from, they sometimes miss out on all of the useful new code that has been created since the article was written. Here are a few CPAN modules that are, in my opinion, under-used by new developers.

These modules are great for several reasons. The most compelling ones are there is lesss code for you to write/debug than not using these modules, the resuling code is often clearer to your fellow developers, and in many cases the module will out perform your hand hacked code.

List manipulations

There are two list related modules you should familiarize yourself with. They are List::Util and it's cousin List::MoreUtils .

Need to shuffle the values in an array? Don't reinvent the wheel, use List::Util's shuffle function.

use strict;   
use warnings;

use List::Util qw( shuffle );

my @array = qw( 1 2 3 4 5 6 7 8 9 );   
my @shuffled = shuffle( @array );

Need to make a list only contain unique values? You can use the method found in the Perl Cookbook , or you can use the simpler uniq() function from List::MoreUtils like so:

use strict;   
 use warnings;

use List::MoreUtils qw( uniq );

my @array = qw( 1 1 2 2 3 3 4 4 5 5 );   
my @unique_values = uniq( @array );

Not to mention List::MoreUtils' implementation is done in C, so it is much faster than a Pure Perl implementation. I once saw this improve the performance of a large web application by 5% because the developers were uniquing several lists for each page view.

Be sure to also check out the other functions in these two modules such as any(), all(), first(), max(), etc. While you will have to install List::MoreUtils from CPAN, List::Util is part of Perl core.

Merging Hashes

While you can use the simple:

my %merged_hash = ( %hash_one, %hash_two );

Someitmes you need a bit more power, enter Hash::Merge . It gives you several options on how to handle conflicting keys, based on a left/right order, or by storage method. Or you can even control whether or not your data is cloned on not. I find this very useful for merging in command line arguments against a configuration file.

Speaking of configuration options and files....

Repeat after me. I will not write my own configuration parsing code. I will not invent my own configuration file format. I will not parse my own command line options without a damn good reason.

Why waste your time on the most boring part of your code? Use one of the already existing modules and configuration file formats. This saves you time, debugging headaches, and makes the configuration file syntax familiar to your users.

I strongly suggest looking into Getopt::Long for handling command line arguments. It might take you a bit of time to get used to this module, but once you're over the initial learning curve you'll wonder why you ever bothered doing this by hand in the first place.

If you like Apache style configuration files (who doesn't?), start using Config::General and cut your configuration processing code into a use statement and a couple of lines of code. Or if you happen to prefer .INI style configuration files take a look at Config::Tiny .

]]>/>
Technique for improving code over timehttp://www.revsys.com/tidbits/technique-for-improving-code-over-time/Thu, 30 Aug 2007 17:47:04 +0000http://www.revsys.com/tidbits/technique-for-improving-code-over-time/

I was talking with a friend of mine recently about an all too common problem in software development. That problem is finding the time and resources to revisit working, but not perfect code. For whatever your definition of " perfect" may be.

The developers feel this particular bit of code is just plain nasty. Maybe it is written in a confusing manner, poorly documented, or the landscape has changed so much that the entire architecture could use a rethink. The programmers may have thought up a better way, changed their style/best practices, or perhaps have just improved their skills since that code was written. But it works . It accomplishes the business goals.

Because it works, the management/marketing/sales folks are not very interested in making it better. They would rather the programmers focus their efforts on new features, because they don't feel they can sell code quality. " Maybe next month/year/release we can do that, but right now go knock out some of these TODO list items... ", is what you typically hear.

Both sides of this struggle need to realize that they both are correct and both incorrect. The developers are right in saying the code isn't perfect and could be improved. They are also wrong, because sometimes what you have is "good enough".

The biz folks are correct in their thinking that they can't directly sell the results of the programmer's effort. They can't put it on brochure and it certainly won't get them talked about in FastCompany. But they are also wrong, because they are missing the long term impact the code quality/design will have on their future business. It is difficult for them to understand how changing, for example to a messaging oriented architecture, helps their modularity and scalability. How it might drastically reduce future development time, improve testing, etc, etc, etc.

Arguing this out could take forever. Not to mention you will have to argue each architecture change as you find/want them.

A technique I have used with great success in the past is what I have so cleverly named "Friday Afternoons". Friday afternoons are probably the least productive part of an employee's week. They are watching the clock in anticipation of the weekend. It happens to everyone. Experienced sales people will tell you that it is basically impossible for them to get any real sales on Monday mornings and Friday afternoons.

As a compromise, I suggest management give the developers free reign to make architecture, design, and code clean up changes on Friday afternoons. You may need to implement rules around this, such as the changes can't impact any deadlines, they can't create any new work for others, can't impact the product in adverse ways from the customer's perspective, all work must be done in a cleanup branch, etc.

Some of you managers out there are probably thinking, but "No way, that's 10% of our development time". If you feel this way, you're really deluding yourself. No one is 100% productive, and I would estimate no one is even 25% productive on Friday afternoons, yourself included.

So why do this? There are several reasons:

  • Developers can cleanup the code, effectively making their workplace ( aka the code ) more enjoyable. And everyone knows that Great Code comes from Happy Coders .
  • The programmers feel their concerns are being heard. That management cares about about the quality and not just the dollar signs.
  • You eliminate the time drain of the meetings arguing about these issues.
  • The coders are self-motivated in their Friday afternoon sessions. You weren't going to get much out of them anyway, but at least there is a chance that you can reap future benefits from this work.

Maybe one of those architecture changes means that a requested new feature takes 2 days to implement rather than 2 weeks. Or perhaps the code will run 5% faster or scale 2x as well. Even if you just end up with better documentation and a slighly improved employee morale this is win.

If you end up using this technique for awhile, please send me an E-mail or leave a comment here. I would love to see how this idea plays out in different environments.

]]>/>
I'm in the top 10....http://www.revsys.com/tidbits/im-in-the-top-10/Sun, 26 Aug 2007 09:03:20 +0000http://www.revsys.com/tidbits/im-in-the-top-10/

Based on this this article at PerlBuzz.com , my humble little blog is in the Top 10 of all Perl related blogs. While the author admits to having used a very simple/naive ranking system, it is still an honor to be listed.

For anyone who wants to subscribe to only my Perl related posts you'll want to use this RSS link . You can use the RSS feed link on the right to receive all of my posts.

I guess I better start writing more about Perl!

]]>/>
Followup to "A Guide to Hiring Programmers"http://www.revsys.com/tidbits/followup-to-a-guide-to-hiring-programmers/A follow up blog post to Frank's populate guide to how to hire programmers. Wed, 08 Aug 2007 07:11:00 +0000http://www.revsys.com/tidbits/followup-to-a-guide-to-hiring-programmers/

Please excuse my laziness, but I simply don't have time to respond to each and every person who has E-mailed or left comments on digg, reddit, or the original post itself . I would like to respond to a few of the larger themes I've seen in the questions/responses:

This applies to more than just programming

I definitely agree that this can be applied to nearly any type of job, not just programming. A great designer is worth much more than an average one. And I honestly wasn't trying to single out sales and customer service people. I do agree that a great sales person or customer service rep is worth more than the average, and should be paid accordingly. And yes every employee is important to the company.

Using customer service as an example, I've worked with the worst where they would literally scream at the customer on the phone to the best. The problem with customer service is that the metrics are against them in that even the best customer service person can only take a few more calls/tickets than the worst. Just because of the nature of the interactions. It is also very difficult to measure if this person has pleased/retained more customers than another. With programming it is often easier to see how an individuals contributions impact the whole project.

Sales is a different monster, but hopefully this has cleared up what I was trying to express.

Only experts?

Do I think it's reasonable to hire only experts? Yes, in many situations a company can and should staff themselves with the vast majority being experts. Is it possible for larger companies with larger products? Probably not. If the problem simply demands 50 developers, it would be difficult to staff that entirely with experts. However, I do believe they would see a boost if they were able to have at least 10-15 of those developers be experts. Instead most companies have 1-3 experts that lead the team of the masses.

If you can't find experts, you should attempt to hire staff that could become experts over time as they gain experience.

How do you become an expert?

Everyone is correct in saying that experts started out as novices, I was certainly a novice. In many ways I still am. Being personally interested in martial arts I remember a story of someone, after years of training, finally receiving their black belt in Aikido and being told, " Now you are ready to learn." I believe this is true of programming, technology, and most professions. The learning doesn't and shouldn't stop.

So how can you become an expert? I think the best advice I can give is to read up as much as possible on your field. You don't become an expert simply repeating what you did yesterday for many years until, poof, you're an expert. You need to be learning new idioms, patterns, and tips from your peers.

Too many developers sit in their cubes and pound out code and never look up. You need to be reading up on your profession as much as possible, exploring new languages/tools to determine if you could be doing something easier or better.

An example of what I see far too often happened again recently at OSCON. A professional Python developer did not know Django was the predominant web framework for that language. I'm not a Python user, but even I know this. Maybe it's because I'm friends with the core Django team, but even if that had not been the case I would at least be aware of it and in general what it was from my day to day tech reading.

The other advice I would give is to read and become involved in an Open Source project. This improves your code quality and allows you to see how other, presumably senior, developers work. Even if you aren't able to contribute to the project directly, get on the mailing lists and examine how those developers work.

How do you find and hire experts?

I think the biggest mistake managers make is leaving this up to HR. I've always made sure I received every resume that came in for a position I was hiring for. HR will often reject a candidate because their resume states they have "Years of J2EE experience", but since it's a Java programming position it goes in the trash. Perhaps it is time we start hiring "HR Engineers" like we have "Sales Engineers."

The first place I look when hiring programmers is the Open Source community. If they are involved in an Open Source project you can easily see how they work with others on the mailing lists, see examples of their code, etc.

They also tend to be of higher quality because Open Source is a meritocracy. Not to mention the simple act of being involved in a project, for no monetary gain, shows a strong love of their craft.

I think multiple choice tests are a very poor indicator of programming prowess. Too often they have a couple of esoteric or even trick questions that really compare the test writers ability to confuse with the test takers' ability to decipher. It is much more important for your new hire to know how to find the answer than it is for him to actually have it tucked away in a brain cell. Ability to effectively use Google to search for the answer is much more important than many realize.

If you happen to be one of the people who are looking for an expert Perl programmer I suggest you get in touch with my new friend Uri Guttman , The Perl Hunter, at uri@perloncall.com . He specializes in finding execellent Perl programmers for companies. Being an accomplished developer himself he easily separates the wheat from the chaff and can find someone who will be a good overall fit for your organization.

Many problems are marketing and management's fault...

This is also very true. Bad management will bring down any team or project, no matter how many experts they have on staff. This isn't even restricted to technology management.

Marketing often over promises what can be delivered and demands it in an unreasonable time frame. Unfortunately most of the time we blame the developers, because long after the sale all that we see is the code and not the brochure.

My advice to marketing and management is that you bring a problem to your developers and then base your plans on when they believe they can deliver the solution. Far too often management has already determined time lines and set things in motion before the development team has even been told about the project. This is backwards. You don't schedule your building contractors before you have the proper permits or before even speaking with the architect about the project.

Even Microsoft gets this right. They realized it was much better to delay Vista until it is ready than to ship it too early just because they originally said a certain date.

Obviously you can't always just wait around for something to be perfect. There are always restrictions and requirements that are outside of your control. No one could move January 1st, 2000 out a few more weeks just because their Y2K cleanup wasn't done. But often I see companies attempt to move mountains to hit some arbitrary date when one of the largest consequences of delaying would be everyone had to update their Outlook calendars.

My language bias

I received a bunch of comments on my use of "Perl vs Java" in the example, that simply was what we were talking about at dinner that night. I probably should have used "agile language X vs cumbersome language Y" to keep the flames down to a minimum.

You can write efficient, readable, and maintainable Perl. I've even had some notable Python programmers say that about code I was in charge of and honestly the code in question wasn't what I would consider the best of the best. I think Python is a great language, but for me personally I haven't been shown any compelling reason to switch.

You can write crappy unreadable code in any language. You can make most any language/framework/toolset scale and perform to your needs. For every "large app/website/etc" that uses language X I'm sure I can find you a comparable app/website using language Y. Any performance differences between language X and Y can usually be solved with $100 worth of extra CPU. What really matters is programmer efficiency. That is where you save money and reap benefits. I simply don't see how having to write, read, troubleshoot, and maintain 10x the number of lines of code is an efficient use of the programmer's time.

However, I do agree that you should use the right tool for the right job. Java/C++/C# are definitely the right tools in many situations. I just feel that because everyone has seen a horribly written Perl CGI ( or written one themselves ) they think this is somehow ingrained in the language and because of this Perl simply isn't an option for anything "real."

Perl is a language where the developer must use some self-control rather than having it imposed on him by his tool. Which is why Perl (or many of the more agile languages such as PHP/Python/etc) written by novice programmers is so awful. The knowledge and self-control comes with experience.

The largest problem with any language is the use of poor variable, function, class, and method names. Using adequately long and descriptive names is probably the single best way to improve code quality and no language out there enforces this. Some enforce a certain style, others force certain methodologies, but this is really only picking at less important aspects of the problem.

Company bias

By comparing Apple vs Microsoft I wasn't really singling out their development staffs. I'm sure their management, design, and marketing departments are as much to blame for any successes or failures these companies have.

What I was trying to get across was the "It simply works." I would say the second most common comment I hear from Mac users, after how pretty/well designed they are, is that it "just works." I don't hear that very often from Microsoft users.

]]>/>
A Guide to Hiring Programmers: The High Cost of Low Qualityhttp://www.revsys.com/tidbits/a-guide-to-hiring-programmers-the-high-cost-of-low-quality/Sun, 05 Aug 2007 11:11:57 +0000http://www.revsys.com/tidbits/a-guide-to-hiring-programmers-the-high-cost-of-low-quality/

I was invited to a wonderful dinner party ( I swear it wasn't too spicy Sarah! ) with some St. Louis Perl peoples this week while I'm here on business. At one point we were talking about hiring programmers, specifically Perl programmers.

We agreed on the following:

  • Finding good programmers is hard in any language. And that a good programmer can be as effective as 5-10 average programmers.
  • Average pay rates between equivalent programmers are out of sync and are based more on the language used than the skill of the programmer.
  • You don't need to hire an expert in language X, you can and should look for expert programmers that are willing to learn language X. An expert can easily cross over from being a novice in any language in a matter of a few weeks.
  • You should seriously consider allowing your expert developers to telecommute full-time. Restricting your search to programmers who live in your area or are willing to move limits the talent you can acquire. Arguments regarding "face time", productivity, etc. can easily be nullified when you look at how some of the largest and most successful Open Source projects such as Linux, Apache, and Firefox are developed by individuals rarely living in the same time zone or even country.
  • We love Perl and think it's a great language that you graduate to after you have been forced to use less agile languages such as Java, C/C++/C#, etc. Not necessarily a first language you get your feet wet with and then move onto a cough "real" language.

Many people in the Perl community have been writing on this topic lately and wanted to share my opinions on the subject, as it is one I have put many hours of thought into. Doing my best to keep this language agnostic as I believe these tips can be applied to any programming language. I will however, use Perl in some examples as it is my preferred language.

Why is it so hard to find good programmers?

The simplest reason is when a company finds a good developer they do more to make sure that person is happy which leads to longer tenures. Better salary, more flexible working conditions, good tools, interesting projects, and better perks can often keep a programmer working for you longer.

Another obvious reason is that experts in any field are small in number, so your possible talent pool is limited. This leads managers and HR departments to settle for average or even below average developers. I believe this is the single biggest mistake a technology oriented company can make, regarding developers, just short of not using a good version control system .

We're not talking about customer service representatives or sales people here. Just having a body to fill the seat is not, I repeat not, always a win for the company. Sub-standard programmers drag down the efficiency of your other developers with beginner questions, poor comments/documentation, and bad code that someone else will later be forced to spend time fixing.

Companies need to stop thinking about their developers as cogs in the machine. They are more akin to artists, authors, designers, architects, scientists, or CEOs. Would your HR department rush to find the first person who would be willing to take on the role of Chief Scientist, Art Director, or CEO in your company? Of course not, they would spend the time to do a thorough talent search for just the right candidate, court them, and then compensate them appropriately. They realize that having the wrong person in that seat is much worse than having the seat empty. It is absolutely the same with programming.

Anyone who has been a developer or managed developers can tell you that an expert can accomplish as much as 10 average developers. However, companies typically pay only a 10-20% premium for an expert over the average programmer. Whether or not their title is Lead, Architect, Development Manager, Guru or whatever nomenclature the company uses. I am not saying that if your average developer is paid $50k/year that you should pony up $500k/year for an expert. The employer/employee relationship never works like that, but what employers don't seem to realize is that in the end paying more saves them more.

Let's look at an example. One common argument from HR departments is that they "can't find any Perl programmers, but they can't swing a cat without hitting a Java developer". While this is fairly accurate, they are approaching the problem from the wrong direction. If you fill your shop with 15 average Java developers, paying an average of $60k per developer you have an approximate labor cost of $900k/year for your development staff. Not considering any non-salary benefits.

Suppose you instead took the time to find 5 experts, or at least above average, Perl developers at $120k each per year. Here is a partial list of the pros and cons of such a scenario:

Cons:

  • You must spend extra time finding, evaluating, and courting these more sought after developers.
  • Your company and what the developer may be asked to build may simply not be attractive to this class of developer. Very few people want to work for a spammer or a small web design firm that caters solely to freelance accountants for example. Smart people find boring things even more boring than the masses.
  • When one of them leaves the company, there is the feeling that your company's business objectives are more at risk due to having only 4/5ths of your normal resources. Or that a larger chunk of your corporate knowledge just walked out the door. This is more of a perceived problem than an actual one as good developers are better at writing readable/maintainable code, commenting their work, and writing effective documentation.

Pros:

  • Each developer will be more content with their job, due in part to the higher than average salary, but also because his or her co-workers are of a much higher quality which improves anyone's job satisfaction.
  • Development would require less overall communication as there are less people to communicate with. This obviously improves efficiency as anyone who has been on a 20+ person conference call can attest to. Or read the Mythical Man Month if you want a more in-depth analysis of this phenomenon.
  • Experts travel in the same social circles. Having one expert on staff makes it much easier to find other experts in the same field, no matter what field that may be.
  • You would save 2/3rds on infrastructure costs. Things like cubicles, computers, cell phones, free lunches, training costs, travel, office space, air conditioning, electricity, etc, etc. The list is essentially endless.
  • Your HR department would have 1/3rd the number of developers that it would need to take care of. Less paper work, less questions, less everything, and less turn over because of the lower number of employees.
  • Oh and you'd save $300k/year on your labor costs. Not to mention non-salary benefits such as stock options, retirement matches, health insurance premiums, perks, etc. You could spend as much as $100k/year on your talent searches and still be $200k/year ahead. Hell, you could dedicate an entire HR person just to this task.

What is an expert programmer?

Experience is key, but not necessarily in ways you might imagine. Time in the saddle, with a particular language is not as important as diversity of experience. Someone who has worked in several disparate industries, a generalist, is often a much better developer than one who has spent years in the same industry. There are exceptions to this, but in general I have found this to be the case. Bonus points if your developer was a systems administrator in a former life.

Some of the best developers I know were originally trained as journalists, mathmaticians, linguists, and other professions not normally associated with software development.

Experts use better tools and care deeply about their craft. They aren't assembling bits on an assembly line, they are crafting a unique product to solve a unique problem. Experts are lazy, they work smarter rather than harder. Experts prefer the easiest solution that gets the job done. Experts aren't interested in creating complex solutions simply to have the complexity, that misguided egoism is the territory of more junior developers. They often get it right the first try and almost always on the second one.

Simply put, experts write readable code. They comment and document it appropriately based on the complexity and criticality of that particular piece of code.

All of this pays huge dividends when the next developer has to pick up where they left off. Especially if the next person isn't an expert.

More reasons you want an expert programmer

Is your business technology oriented? Perhaps the software you create is even your main product. If nothing else I'm sure we can agree that if the software your developers create is to some degree critical to your business.

I've worked in many different environments, with people of every skill level, and it's very easy to tell whether or not a company has expert developers. Do you often find that the software is down? That it has as many bugs or even just idiosyncrasies that make no sense to the user as it does features? Do the users find it difficult to use? Is the problem at hand relatively simple compared to the training or documentation necessary to begin using the software?

If you answered yes to any of those questions you more than likely have average or below average developers.

When you work in an environment with experts things simply work. They are easier to use and require less initial training. The software is easier to modify. Requested changes happen more frequently and easily. Things just flow. It is the difference between Apple and Microsoft. It's the difference between the iPod and a 400 disc CD changer with 50+ buttons.

As with many things in life, sometimes you get what you pay for. I'd love to hear your comments and opinions on the subject.

UPDATE: I've written a response to some of the questions and comments I've received on this article in a follow up post A follow up to "A Guide for Hiring Programmers"

]]>/>
Which PostgreSQL backend am I using?http://www.revsys.com/tidbits/which-postgresql-backend-am-i-using/Mon, 30 Jul 2007 09:25:38 +0000http://www.revsys.com/tidbits/which-postgresql-backend-am-i-using/

Someone asked me how to determine which PostgreSQL backend a particular client was connected to. Everyone's first thought is to do a ps aux | grep postgres which will show you the IP and user, but if you have different processes connecting from the same IP with the same usernames, how do you know which is which?

One way to tell would be to see which queries are being executed by which backend and match that up to your client side. But you can quickly get confused, especially if the various connections are all executing the same SQL statements, a web application for example.

The simplest way was suggested by Jacob Kaplan-Moss , which is to use the pg_backend_pid() function like:

SELECT pg_backend_pid();

I love it when the solution is something really simple!

]]>/>
Real PostgreSQL Benchmarkhttp://www.revsys.com/tidbits/real-postgresql-benchmark/Mon, 09 Jul 2007 15:50:35 +0000http://www.revsys.com/tidbits/real-postgresql-benchmark/

Josh Berkus and Sun have put together the first real PostgreSQL performance benchmark . I run into people often who are still working under dubious performance comparisons done years ago against against competition like MySQL and Oracle. Hopefully this in depth comparison will put some of these arguments to rest.

If you're just interested in the conclusions, PostgreSQL is as fast or faster than MySQL and nearly as fast as Oracle. I know the performance improvements over the last few years have been nothing short of phenomenal, glad to see there is now a report to back up my gut feelings.

]]>/>
SCALE Talk Slides and Audiohttp://www.revsys.com/tidbits/scale-talk-slides-and-audio/Slides and Audio from Frank's 2007 SCALE 5X talk. Mon, 09 Jul 2007 09:10:00 +0000http://www.revsys.com/tidbits/scale-talk-slides-and-audio/

I just noticed that the Southern California Linux Expo (SCALE) has my slides and a MP3 of the audio of the talk up on their site. The audio isn't actually too bad, especially considering I forgot I was being taped and moved away from the microphone during the Q&A section.

You can also find OpenOffice and PDF versions of the slides on the Revolution Systems site. Enjoy!

]]>/>
Gantry Articlehttp://www.revsys.com/tidbits/gantry-article/Fri, 22 Jun 2007 14:01:00 +0000http://www.revsys.com/tidbits/gantry-article/

Phil Crow, one of the Gantry core developers and the creator of Bigtop has a great article on how easy it is to build and modify Gantry applications using Bigtop .

I highly recommend checking it out, it shows off some of Gantry and Bigtop's best features.

]]>/>
Sprint Customer Service Experiencehttp://www.revsys.com/tidbits/sprint-customer-service-experience/Thu, 21 Jun 2007 09:26:19 +0000http://www.revsys.com/tidbits/sprint-customer-service-experience/

I've been a Sprint customer for years. While I don't think any of the current cell phone providers have what I would call good customer service or a good customer experience, but Sprint has finally gotten around to really pissing me off.

It all started a few weeks ago. My cell rings and a guy announces that he's with Sprint and that my contract is about to expire and if it would be a good time to discuss my plan, phone, etc. It was approximately 12:15. Right in the middle of lunch. I told him it wasn't a good time and that I was planning on getting a new Treo phone, with PalmOS, in a few months as mine is getting older and I'd review my plan at that point. Sprint Guy thanked me. So far not too bad.

About a week later I received essentially the same call and gave the same response hoping this Sprint Girl would mark me off as Contacted, Handled or whatever they needed to do so I wouldn't be bothered again.

I guess Sprint isn't interested in the fact that I'm disinterested in them, as they have responded by increasing the frequency of their calls. It started with a couple of calls the next week, but this week has been ridiculous. They have called me with THE TWO EXACT SAME OFFERS 5 times since last Saturday. Each time telling them I wasn't interested at this time and secret hoping this would be the last call I had to endure.

When I was called again earlier today I asked to be put on their internal "do not call list" after I explained the issue. Sprint Girl apologized and then said she would put me on the list, but that I should know it can take several weeks before that request would filter through the proper channels.

So why does this piss me off? It isn't that they called in the first place or that the person on the other ends sounds so beaten down and bored reading their little scripts. It pisses me off because it proves that they do not care about me as a customer . Why do I know they don't care? Let me count the ways...

  1. Calling during lunch. Sorry if it hurts your calls per hour numbers, but you shouldn't bug your customers first thing in the morning, during their lunch, during dinner, or late at night. Please remember that you are calling AN EXISTING customer and the last thing you want to do is actively annoy them.
  2. The first call did not stop the subsequent calls. This is just plain laziness on Sprint's part. It's extra sad when you consider that they are wasting their own money re-calling people with the same offer they weren't interested in last week ( or yesterday ).
  3. They have no process to ensure they aren't calling their customers too much. I don't mean from the same list/telemarketing plan ( which should have been covered if they did #2 ), but company wide. How hard it to put together a near real-time database of who has been called and stop any and all subsequent calls for a time period? Well I'm a software developer and I can tell you it isn't hard at all.
  4. The fact that it takes several weeks for my request to not be called to be processed. What are they doing chiseling it into a granite wall at the call center? Do they need to get signed approval in triplicate from some paper pushing manager? You're a technology company, start to act like it.

Sprint always seems to be having money troubles and laying off staff. What they probably don't realize is that it's marketing/customer service mistakes like this that are going to kill them in the end.

Cell phone providers are all basically the same. Coverage seems about the same from what I've read, same with dropped call numbers, and all of the usual issues. You can't really do much with stats like "We have 1.7% more coverage area" as no one cares. Every competing product works basically like yours. So what does a cell phone provider need to do, they need to make EVERYTHING about the customer experience as easy and painless as possible.

And you don't do that by having the only real contact I've had with your business in years be ridiculously annoying. Hopefully some Sprint folks will run across this post and change their ways before customers like myself who care more about the ease of use and customer experience than they do about brands, phones, features, or even price.

UPDATE: Looks like the amazing Seth Godin agrees with me that Sprint needs to start caring more about their customers before someone else does

]]>/>
Plat_Forms Results Announcedhttp://www.revsys.com/tidbits/plat_forms-results-announced/Thu, 21 Jun 2007 06:37:20 +0000http://www.revsys.com/tidbits/plat_forms-results-announced/

The results from the Plat_forms contest have finally been announced. You can find a brief overview and the full results here .

While the Revolution Systems team did not come out the winner, overall I'm happy with how the contest turned out. They seemed to have focused in on some areas that I didn't expect, but they definitely show everyone in a fair light.

As I believe I mentioned in a previous blog post, our two biggest mistakes were strategic and not really code related. We should have opted to completely ignore the SOAP aspects of the application. We knew that SOAP support in Perl is pretty poor going in and Phil even took some time to code up some libraries to make it a bit easier for us.

The other mistake we made was to do "big bang" integration at the end. While many developers loath big bang integration, in our day-to-day environment it rarely if ever causes us any trouble. But leaving the integration until the end of a 30 hour work session when we were exhausted was a mistake. I know of at least a couple of requirements we missed simply because we forgot to link them into the site properly. Code was all there, we were just tired!

]]>/>
Email, Templates, and Perlhttp://www.revsys.com/tidbits/email-templates-and-perl/Templating emails when using PerlThu, 31 May 2007 06:54:00 +0000http://www.revsys.com/tidbits/email-templates-and-perl/

I have been meaning to talk about one of my new favorite Perl modules, MIME::Lite::TT::HTML , for quite a while now. As I mentioned in a previous post , there are a bazillion different ways to send an Email message from Perl. This one is just my new favorite.

Here is a short list as to why:

  • Can be used for complex multi-part messages and handles attachments easily
  • Built upon the equally great MIME::Lite module
  • Allows you to easily template your messages using the familiar Template Toolkit package

The templating part is, in my opinion, the important part. How many times have you had to go edit some source code just to change the text or subject of a message? Isn't that just terribly annoying. We use configuration files, MVC with HTML templates, etc, etc. to not hard code things into our apps, but for some reason many people ( myself included for years ) have neglected Email.

Not any longer, I've switched to using this module as my standard way of sending Email these days. If you are interested in learning more about MIME::Lite::TT::HTML, check out my short howto Sending Email with Perl Best Practice on the subject.

]]>/>
PostgreSQL error messages confusing to new usershttp://www.revsys.com/tidbits/postgresql-error-messages-confusing-to-new-users/PostgreSQL errors can be confusingWed, 23 May 2007 08:37:00 +0000http://www.revsys.com/tidbits/postgresql-error-messages-confusing-to-new-users/

In the spirit of my blog post last week, I've created a new page that shows a couple of the more common error messages that confuse newer PostgreSQL users. It is my intention to expand this over time as I see people having trouble.

If you have any error messages you feel should be included or you find any technical inaccuracies please post a comment and I'll include it on the page.

]]>/>
Dumb Server Policieshttp://www.revsys.com/tidbits/dumb-server-policies/Sat, 19 May 2007 17:37:50 +0000http://www.revsys.com/tidbits/dumb-server-policies/

I was chatting with someone recently about what may truly be the dumbest server policy I've ever heard of. He indicated that the company required that:

" All company servers were required to be rebooted each day at noon and midnight".

At first you might be thinking this is an old vestige of a Windows shop policy from days long gone, but no this included their BSD and Linux servers AND * all of their desktop PCs. He also mentioned a couple of choice quotes from the policy:

" The policy has been considered to save the company thousands of dollars in fewer crashes. And synergizes with our risk management initiatives".

This just screams of a policy created by someone who doesn't understand the real underlying problem. While their heart is in the right place, I seriously doubt this saves the company money. In fact, I'm quite sure it costs them much more in early hardware failures and lost productivity when systems are offline.

The policy is almost as bad as someone instituting a mandate that requires everyone to change their password twice a day at noon and midnight in an effort to "strengthen our security". ( For the record, all that does is weaken your security as EVERYONE just has to write it down ).

The moral of the story is to handle the problem at hand with the proper solution. Restarting might be good for Windows based systems, but it is completely unnecessary in a Linux/Unix/*BSD system.

]]>/>
Common PostgreSQL problemhttp://www.revsys.com/tidbits/common-postgresql-problem/Wed, 16 May 2007 07:18:16 +0000http://www.revsys.com/tidbits/common-postgresql-problem/

I see this problem pop up in the #postgresql IRC channel so often I felt it was necessary to blog about it. This problem trips up so many new users it might even be worth changing the default error message to indicate what is going on. The error message happens when the user tries to run psql for the first time:

psql: FATAL: database "root" does not exist

Where " root" is the current Unix username of the operator. By default PostgreSQL attempts to log you into a database that is the same as your username. However, it does not setup this database for you because it would be silly to setup 500 databases for all of the Unix users on your system, if only two of them are going to be using PostgreSQL.

When setting up PostgreSQL for the first time you need to do the following:

  1. su ( or otherwise ) become your root user
  2. su ( or otherwise ) become your PostgreSQL user, typically 'postgres'
  3. Create your first database

The ultimate goal here is to become your PostgreSQL user, typically this involves becoming root and then switching to user postgres. Upon setup this is the only user that is allowed to create users and databases.

Your "first" database can be created in one of two ways:

  1. Run the command 'psql template1' followed by a 'CREATE DATABASE' SQL call
  2. Run the command 'createdb '

While you're still the postgres user it is probably best to also create a user with 'createuser ' or a 'CREATE USER' SQL call. See this section of the PostgreSQL documentation for more information on creating users and roles . You'll also want to read up on managing databases .

NOTE: The programs createdb and createuser may not be, by default, in your PATH so it may be necessary to use locate or type in the full path to your PostgreSQL bin/ directory.

Hope this helps!

]]>/>
Gantry book released to the worldhttp://www.revsys.com/tidbits/gantry-book-released-to-the-world/Wed, 11 Apr 2007 16:12:12 +0000http://www.revsys.com/tidbits/gantry-book-released-to-the-world/

Phil Crow has released a new book on using Gantry and Bigtop. It's titled Building Web Applications with Gantry and Bigtop I encourage you to check it out!

]]>/>
Installing Mail::Cclient on RHELhttp://www.revsys.com/tidbits/installing-mailcclient-on-rhel/Thu, 22 Mar 2007 15:23:32 +0000http://www.revsys.com/tidbits/installing-mailcclient-on-rhel/

Ran into a small build problem when trying to install the CPAN module Mail::Cclient on a Red Hat Enterprise Linux system. Figured I would go ahead and document the process in full for others ( and for myself later in life ).

FIrst off you will most likely need to install two RPMs

  • libc-client
  • libc-client-devel

Then you will need to download a distribution of Mail::Cclient, don't even bother trying to install this from the CPAN shell as it won't work.

Unpack your distribution with the normal tar -xvzf Mail-Cclient-x.xx.tar.gz and cd into the directory. You will then, unfortunately, have to edit the Makefile.PL by hand. Specifically you will have to change INC argument to WriteMakefile() to be:

**" INC"    =>   "-I$CCLIENT_DIR -I/usr/include/imap",**

This instructs the build process to look for the shared library in $CCLIENT_DIR and the headers in /usr/include/imap.

Then it is just a matter of calling:

perl Makefile.PL CCLIENT_DIR=/usr/lib
make
make install

Hope this helps someone else who gets bit by this annoyance.

UPDATE:
Turns out that I was grabbing an older version of Mail::Cclient ( version 1.1 specifically ) if you use Mail::Cclient 1.12 then the install process requires a few other RPMs:

  • openssl-devel
  • pam-devl e

And then you install it with:

perl Makefile.PL --cclient_dir=/usr/lib --with-pam --with-cclient-includes=/usr/include/imap/ --with-shared_cclient

]]>/>
DateTime: Common Calcuationshttp://www.revsys.com/tidbits/datetime-common-calcuations/Mon, 19 Mar 2007 08:33:13 +0000http://www.revsys.com/tidbits/datetime-common-calcuations/

Over the course of the last year or so I've fallen in love with DateTime set of modules. However, I think their docs are a bit lacking for some simple calculations. Often I need to know how many hours, minutes, or seconds ( or some combo of them ) exist between two DateTime objects.

]]>/>
SMTP Connections: How to handle large loadshttp://www.revsys.com/tidbits/smtp-connections-how-to-handle-large-loads/Mon, 12 Mar 2007 17:35:08 +0000http://www.revsys.com/tidbits/smtp-connections-how-to-handle-large-loads/

I recently came across several blog posts about the declining state of E-mail due to spam. Specifically these:

I've been running E-mail servers for myself and others for over 10 years now and I have to agree that with the current version of SMTP we all use, there isn't much that can be done about spam that isn't already being done. If you've got some RBLs, SPF, anti-virus, and a decent spam filter setup, there isn't much more you can do. Sure you can get +/- another percentage point, but you won't really find a solution that is 100% effective 100% of the time. It just isn't possible with the current standards.

However, these articles also discuss another issue that is overlooked by everyone who doesn't run a large E-mail system. By large I'll say over 1,000 users. Around that point you start to run into problems being able to handle the shear number of incoming SMTP connections. Note that you may hit this point long before 1,000 users depending on hardware and your personal traffic levels. But I digress.

I haven't found any Open Source solution to this problem, but I've come pretty close. TrafficControl is a commercial product from MailChannels that is built upon the good Open Source base of Apache and mod_perl . TrafficControl does two things for you:

  • It uses async I/O to proxy incoming SMTP connections. This has allowed me to handle 10x the number of connections on the exact same hardware
  • Allows you to configure throttling rules based on RBLs, Operating System, etc. and choke off the bandwidth to suspicious sending servers. This reduces a great deal of spam as most bots simply move on if they are not getting go flow. After a configurable time period it will remove the choke hold and allow the message to continue on normally. Servers that follow the SMTP RFC have no problem with this as they hang on during the process.

I've been using it for about a year now and couldn't imagine trying to run a large E-mail system without it now. I encourage you to check it out if you see you are having a similar problem.

]]>/>
Early bird registration for www.read-it-later.com beginshttp://www.revsys.com/tidbits/early-bird-registration-for-wwwread-it-latercom-begins/Sat, 03 Mar 2007 13:14:18 +0000http://www.revsys.com/tidbits/early-bird-registration-for-wwwread-it-latercom-begins/

Revolution Systems will soon be offering a new web service called www.read-it-later.com .

What is read-it-later.com?

read-it-later.com is a browser based application to help you better keep up with your online reading material. It was built because I often found myself running across a long blog post, HOWTO, PDF, or otherwise large piece of text I wanted or even needed to read, but didn't have time at that particular moment.

Like many of you I suspect, I had a "TO READ" Firefox bookmark folder where I would plop these links with every intention of going back to read them. Unfortunately, links went in, but they rarely if ever came out.

Enter read-it-later.com. read-it-later.com allows you to bookmark these links into categories of your choosing and then receive a certain amount of links, your Reading List , via E-mail, RSS, or ATOM feeds on a schedule of your choosing.

For example, I currently have my read-it-later.com account setup like this:

  • Send me 2 links via E-mail from my 'Work' and 'Perl' categories every morning, Monday through Friday
  • Send me 3 links via E-mail from my 'Personal' category and 2 links from my 'Business/Marketing' category every Tuesday and Thursday evenings
  • Send me 3 links via RSS from my 'Technology to Check Out' category every morning.

I have a few more categories and Reading Lists setup, but this gives you an idea of the options available to you. With this, I was able to trim down my "TO READ" folder in just a few short weeks.

I already use del.icio.us or ma.gnolia.com, will this work for me?

If you are a del.icio.us or ma.gnolia.com user you can continue to use those services to keep your bookmarks and read-it-later.com will just augment them by giving you the ability to populate your Reading Lists from tags you setup for that purpose in these services.

It works like this, you setup a specific tag in your current bookmark service, say 'rl-work' and then setup a read-it-later category that is associated with that tag and your login from the service you use. The preceeding 'rl-' is just a convention we've been using, your tag can be anything you like.

Once this link between your bookmark service's tag and your read-it-later category is established, read-it-later will check that tag hourly to see if there are any new links you have added. If it finds any, they will be added to a future reading list for you automatically.

Other features available at launch:

  • Set a specific date you must read a link by and it will appear in your read lists prior to that date.
  • Links are tracked by when they are actually clicked through. While we can't guarantee that you'll actually read the material, this does remove any worry about missing the material if you neglect a particular E-mail or feed entry.
  • If you find you have some extra time to do some reading you can browse your un-read links in whole or by category and get some extra reading out of the way. Links read in this manner will not be re-sent to you in another Reading List.
  • You can also have the system generate an extra, one time, Reading List you have previously defined and it will be sent to you immediately.

Will there be an API?

We plan on making available a REST style API to all aspects of the service in the coming months. We want to ensure we have the overall structure of the application baked in a bit more before we make the API public so we don't end up having to release a 2.0 version of the API too quickly and run into compatibility problems.

Pricing?

read-it-later.com will be priced at $3/month or $30/year. All users will receive a 30 day free trial to make sure read-it-later is something they find useful. No credit card or unnecessary personal information will be required to register and try out the service.

As it stands right now, your E-mail address and a password are all that are required to register and we hope to keep it that way. We hate filling out 97 different pieces of information as much as you do.

In addition, early beta testers who sign up now will receive 90 days of free service for being early adopters and to help us find any remaining bugs or user interface confusion as soon as possible. You can sign up at the site right now . The early beta registration site isn't very visually pleasing, we've focused all of our efforts on the actual site, so please bear with us.

Questions or Comments?

Got an question about the service or an idea on how we might make it better for you feel free to leave a comment on this blog post or send us an E-mail at info@read-it-later.com . We look forward to hearing from you!

]]>/>
Next actionshttp://www.revsys.com/tidbits/next-actions/Sun, 25 Feb 2007 10:00:50 +0000http://www.revsys.com/tidbits/next-actions/

We've all been to websites that break long articles up into multiple pages. It tends to be used mostly on news sites these days. Long ago this technique was primarily used to reduce the amount of time it took for a page to load, but now it is an attempt at increasing the sites page view count.

If a user is on say Page 1 of 7, there is a high probability that they will want to read the rest of the article. I'll go so far to say that the two highest probability actions for a user to take is to move on to the next page or leave the site entirely. The second most likely option the web designer has no real control over, but why or why do they make the most likely option so damn hard?

Here is an example of a site that sorta gets it right:

Example1_1

They have small navigation in case I want to jump ahead several pages and the 'Next Page' link is at least fairly prominent.

Here is another example where a site is trying, but still gets it wrong:

Example2_1

I have to admit I've been using this particular site for a long time and did not notice the "CONTINUED:' link at all until I was cropping the example image above. We have all been conditioned to look for the "Next" or page numbers, so my eye is always immediately drawn to the small '2' link. Which, it is sad to say, is smaller than the Digg icon. No to mention the entire 'Digg this' link's size.

The fragment of the title, with the trailing ellipse, serves only to confuse me. I have to think "Oh, is that a fragment of the title of the article I'm currently reading" or it is a link off to some other article on a similar subject. What is the title of the article I'm currently reading?

Not to harp on this site too much, but why are the Print, del.icio.us, and Digg links better visually designed than the most probable next action of reading the next page?

In my opinion, it should look more like this:

Example3

You still have the option of jumping ahead to a certain page, but your most likely action is big, bold, and visible. I think the key here is to not make the user think. The font size of the "Next" link needs to be considerably larger than the article's text, not just the same size and bold.

Speaking of which, if you haven't read Steve Krug's book on the subject I encourage to pick up a copy of Don't Make Me Think: A Common Sense Approach to Web Usability (2nd Edition)

]]>/>
Speaking at SCALE in Los Angeleshttp://www.revsys.com/tidbits/speaking-at-scale-in-los-angeles/Frank Wiles spoke at SCALE 5X in 2007Tue, 06 Feb 2007 14:50:00 +0000http://www.revsys.com/tidbits/speaking-at-scale-in-los-angeles/

I nearly forgot to blog about this, but next Sunday (February 11th, 2007) I'll be giving a short introduction on using mod_perl 2.0 at the 5th Annual Southern California Linux Expo . I hope to see you there!

UPDATE: The slides for this talk can be found here.

]]>/>
Plat_Forms Recaphttp://www.revsys.com/tidbits/plat_forms-recap/Thu, 01 Feb 2007 17:25:42 +0000http://www.revsys.com/tidbits/plat_forms-recap/

So we survived Plat_Forms . Overall I think the contest was fun, interesting, and very well run. But the real test of this is going to be how the results come out.

I think we fared well, compared to the other Perl teams, but we certainly did make some mistakes. We didn't prepare our VMWare server image as well as we should have. We also took the approach of building major pieces of the functionality and then linking them all up towards the end. In retrospect, we should have taken a more (ugh I hate saying this) Agile Programming approach and done more small iterations. Normally this isn't at all an issue, but with the entire development happening in 30 hours it would have been a better approach.

What I really underestimated was how much of a role fatigue played in this contest. I guess I'm not as young as I used to be! We had a few big mistakes that can only by explained by being way too tired to be coding. If the contest had been three 10 hour sessions or even 2 15 hour sessions we would have done at least 50% better than we ultimately did.

]]>/>
Progress at Plat_formshttp://www.revsys.com/tidbits/progress-at-plat_forms/Fri, 26 Jan 2007 18:32:17 +0000http://www.revsys.com/tidbits/progress-at-plat_forms/

Well the contest began roughly 21 hours ago and I think we are making pretty good progress. Just got back from a quick 4 hour nap in the hotel, my team makes are more hard core and only took 3 hour naps.

It's difficult to tell how we are stacking up to the other teams as there is very little interaction between the teams as we are all too busy building our applications. Check out the contest blog for more regular updates from yesterday and today, including some photo's of all of the various team members. Luckily those photos were taken around 11pm last night so we don't look too tired!

]]>/>
Arrival in Germanyhttp://www.revsys.com/tidbits/arrival-in-germany/Wed, 24 Jan 2007 08:39:02 +0000http://www.revsys.com/tidbits/arrival-in-germany/

After an interesting flight where two people got sick over the Atlantic ( luckily there were 2 doctors and 4 nurses traveling on the flight ) and a ground crew who added too much oil to the engines we finally made it into Germany yesterday morning.

One of the cultural differences I forgot about when traveling in Europe is the lack of free wireless Internet access. Turns out our hotel only really has wifi from 6pm to 9am for a reasonable price. We just finished registering at the Open Source Meets Business conference so I'm taking advantage of their free wifi to download the 3 metric tons of E-mail that is waiting for me.

]]>/>
And some people say programmers are boringhttp://www.revsys.com/tidbits/and-some-people-say-programmers-are-boring/Thu, 18 Jan 2007 17:23:13 +0000http://www.revsys.com/tidbits/and-some-people-say-programmers-are-boring/

It is a common stereotype that computer programmers tend to be on the boring side. Not Perl programmers it turns out. We're a wacky bunch. Take for example our benevolent dictator Larry Wall , he's anything but boring.

Not to mention Schwern's Shirt .

And what other programming language has an elder like brian d foy who has violated the Posse Commitatus Act?

This is just a small sampling of Perl's fun and wacky side. Personally, I think it's one of the reasons our community is so strong.

]]>/>
Netgear only a part time peripheralhttp://www.revsys.com/tidbits/netgear-only-a-part-time-peripheral/Wed, 17 Jan 2007 15:27:40 +0000http://www.revsys.com/tidbits/netgear-only-a-part-time-peripheral/

Looks like Ask Bjørn Hansen has been bitten by the bad customer service at Netgear. While I do somewhat understand they feel they are building a product for home or SOHO users, but seriously who turns off their switch or router at home? That's right only when it hangs .

I've never been a big fan of Netgear products, but based on this wonderful word of mouth marketing I'm sure to stay far away from them. They could really use some help from someone like Seth Godin , specifically I think they need to read this post.

]]>/>
Plat_forms Web Framework Contesthttp://www.revsys.com/tidbits/plat_forms-web-framework-contest/Tue, 16 Jan 2007 07:02:26 +0000http://www.revsys.com/tidbits/plat_forms-web-framework-contest/

As some of you may have heard, along with some help from the German Perl Workshop Revolution Systems is sponsoring a team for the upcoming Plat_forms Web Development Platform Comparison .

I will be blogging about the trip and contest here and my team member Phil Crow will be also over at use.perl.org .

Personally, I think the results of this contest are going to be facinating. We'll finally have some in depth analysis of some of the more popular technologies in as close to a real world scenario as you can create artificially.

My only worry is that the results will be dismissed by the community at large due to the lack of Python, Ruby, and .NET being involved in the contest. I'm not sure, but I bet the fact that it is in Germany ( and in the middle of winter no less ) is part of it!

We're obviously a Perl team, but here is a list of the technologies we'll be using during the contest:

]]>/>
Articles claims Open Source databases to have lower TCOhttp://www.revsys.com/tidbits/articles-claims-open-source-databases-to-have-lower-tco/Wed, 22 Nov 2006 14:00:00 +0000http://www.revsys.com/tidbits/articles-claims-open-source-databases-to-have-lower-tco/

Ran across this article shows some Forrester Research data that Open Source databases such as PostgreSQL are 60% cheaper than the commercial alternatives.

I think most geeks are already aware of this. What I found interesting is the quote:

"Eighty per cent of the applications typically use only 30 per cent of the features found in commercial databases," Yuhanna told vnunet.com. "The open source databases deliver those features today."

In my experience working with clients it is more like 95% of applications use only 10% of the features found in commercial databases. I can't even count the number of times a company absolutely needed Oracle for a 100MB, 3 table, simple CRUD database.

One of my favorite themes that comes up is when a company again needs Oracle/DB2/whatever because it has all of the mission critical features they need such as clustering, fail over, etc. Then when it comes to implementation time the tune changes to "Oh we don't really need a cluster. And now that I think about it, we can handle downtime easily, so fail over isn't required either."

I think it comes from the fear that they might need those features and so they play a safe bet. Much like when you buy something like a car and get the luggage rack on the roof, "just in case", but then realize years later that you've never used it.

Moral of the story? As with any type of project, even non-technical ones, you should worry the most about what you know you need today and let tomorrow take care of itself. The tomorrow you are worried about may never come.

]]>/>
More distros should do this...http://www.revsys.com/tidbits/more-distros-should-do-this/Tue, 14 Nov 2006 14:00:00 +0000http://www.revsys.com/tidbits/more-distros-should-do-this/

I was reading this article on the Ubuntu Developer Summit and found this quote:

However, Shuttleworth says that "Feisty will actually warn you that you're running proprietary drivers and point you to alternative hardware ... so you're better educated for your next hardware decision. I don't expect this to be very popular with Nvidia or ATI, or with manufacturers of proprietary-only Wi-Fi cards, but of course the easiest solution for them is to open source their drivers."

While I'm not certain of how well this will work, shaming hardware companies in general is a great new tactic in helping the situation.

So far most peoples experience with closed-source binary drivers is typically one of these:

  • "Oh that's a shame Linux doesn't support my card." A common misperception about the whole issue.

  • Attempting ( and often failing ) to getting the binary drivers to work.

  • Holding true to your Open Source ideals and not using the binary driver.

Maybe as the user base of Linux on the desktop increases and more users become aware of this situation the hardware companies will come around.

]]>/>
Introduction to Gantry, Bigtop, and Tentmaker screencastshttp://www.revsys.com/tidbits/introduction-to-gantry-bigtop-and-tentmaker-screencasts/Thu, 05 Oct 2006 12:00:00 +0000http://www.revsys.com/tidbits/introduction-to-gantry-bigtop-and-tentmaker-screencasts/

Thought I would mention that there are now two screencasts that show off some of the main features of the Gantry framework that I play a minor role in.

You can find the screencasts here under movies

]]>/>
Automatically updating a timestamp column in PostgreSQLhttp://www.revsys.com/tidbits/automatically-updating-a-timestamp-column-in-postgresql/Fri, 04 Aug 2006 12:00:00 +0000http://www.revsys.com/tidbits/automatically-updating-a-timestamp-column-in-postgresql/

One of the great things about modern databases is you can let your database automate some of what used to only happen in application logic. The example I love to show people is automatically updating a "last modified time" timestamp column in a table.

This is easily accomplished if you always use the same name for those types of columns. I like to use 'created' for the creation timestamp and 'modified' for the last modified time. First we create a simple function:

CREATE OR REPLACE FUNCTION update_modified_column() 
RETURNS TRIGGER AS $$
BEGIN
    NEW.modified = now();
    RETURN NEW; 
END;
$$ language 'plpgsql';

This function simply sets any column named 'modified' to the current timestamp for each row passed to it by the trigger. If you use the same column name consitently you only have to do this step once. Now, you just have to create your trigger like so:

CREATE TRIGGER update_customer_modtime BEFORE UPDATE ON customer FOR EACH ROW EXECUTE PROCEDURE  update_modified_column();

`

This technique is very useful when you don't want to have to rely on your application developers to always remember to update the time stamps. You can just let PostgreSQL handle it for you.

You should note that you will have to create a separate trigger for each table, which isn't a big deal. Also, the BEFORE UPDATE is very important. If you attempt to use AFTER UPDATE you put yourself into an infinite loop!

]]>/>
Doing a LEFT OUTER join with DBIx::Classhttp://www.revsys.com/tidbits/doing-a-left-outer-join-with-dbixclass/Tue, 25 Jul 2006 12:00:00 +0000http://www.revsys.com/tidbits/doing-a-left-outer-join-with-dbixclass/

I have recently been using DBIx::Class instead of the more popular Class::DBI. It has many advantages over Class::DBI that I won't go into here, but if you haven't used it yet you should definitely check it out.

One thing I found the other day is how to setup a special LEFT OUTER join query. If you have a situation where you need to do a LEFT OUTER join on your data, but only say in one particular script. Or maybe a one off report that you won't be keeping around. You could go ahead and put in this relationship in your main model class, but for a one off that is a bit of overkill.

What I hadn't thought about, was you can define those relationships from outside the MyModelClass.pm file itself. Take for example a simple Artist -> CD relationship, where you want all artists even if they don't have any CDs:

use ExampleSchema;

ExampleSchema::Artist->has_many('left_outer_albums' =>  
                       'ExampleSchema::Cd', 'artist_id',  
                       { join_type => 'LEFT_OUTER' } );

my $schema = ExampleSchema->connect('dbi:Pg:dbname=outer', '', '');

my $rs = $schema->resultset('Artist')->search(

    undef,

);
while( my $artist = $rs->next ) {
    print "Name: " . $artist->name . "\n";
    print "Albums: \n";
    foreach my $album ( $artist->left_outer_albums ) {
        print "\t" . $album->title . "\n";
    }
}

The nice thing about this is that this special left_outer_artists is defined and used in the one off and doesn't have to polute your main ExampleSchema::Artist relationships that might confuse someone. It may not be the best practice, but it is something to consider.

]]>/>
Learn something new about Perl every dayhttp://www.revsys.com/tidbits/learn-something-new-about-perl-every-day/Wed, 26 Apr 2006 12:00:00 +0000http://www.revsys.com/tidbits/learn-something-new-about-perl-every-day/

Just when you think you know everything about Perl, something silly rises up and shows you have ignorant you really are.

How many times have you written code similar to this?

      my $filename = "/path/to/file.txt";  
      my @dir_parts = split('/', $filename);

      my $file = pop( @dir_parts );  
      my $path = join('/', @dir_parts );

or

      my ($name) = $filename =~ s/\/(.*?)$/o;

While I knew about the existance of File::Basename, the last time I looked at it I don't believe it was part of Perl Core. I should have suspected, but now it is a standard Perl module that makes this trivial:

      use File::Basename;

  # Retrieve just the filename

  my $filename_only = basename($filename);

  # Get just the path in this filename

  my $path_only     = dirname($filename);

You can get even fancier with the fileparse() function provided in
this module.

      my ($base, $path, $suffix) = fileparse( $filename );

Would yield the filename only in $base, the path in $path, and
nothing in $suffix. This is because we did not provide a regular
expression to match on.

If we instead used:

      my ($base, $path, $suffix) = fileparse( $filename, qr{\.txt} );

And we ran it against $filename = '/home/frank/test.txt' and
$filename2 = '/home/frank/test.doc' it would give us:

      Base: test
      Path: /home/frank
      Type: .txt


  and






      Base: test.doc
      Path: /home/frank
      Type:

If the filename give to fileparse() does not match, it is not
stripped from the basename.

It just goes to show that no matter how long you've been using Perl, or how much you think you know, there is always something out there you could be learning.

]]>/>
Been far too long...http://www.revsys.com/tidbits/been-far-too-long/Thu, 20 Apr 2006 15:01:34 +0000http://www.revsys.com/tidbits/been-far-too-long/

It's been far too long since I've had time to post on this blog. Time passes at a frightening pace doesn't it? Here two tidbits of info you might be interested in:

  1. Released a new version of Apache::DB , Apache::DProf , and Apache::SmallProf . The new release fixes a bug where you couldn't use Apache::DProf under taint mode and allows you to specify anywhere on the file system for your dprof info to be dumped. Previously it had to be relative to ServerRoot.
  2. Gantry , yet another web framework, that I have been helping out with was finally released publically. I helped mostly with Gantry::Conf a configuration abstraction interface. It is essentially DBI for configuration files. Right now it doesn't support everything I would like it to, but the hooks are all there and I plan on fleshing the rest out in the near future.
]]>/>
Seriously funnyhttp://www.revsys.com/tidbits/seriously-funny/Fri, 10 Mar 2006 14:00:00 +0000http://www.revsys.com/tidbits/seriously-funny/

I only have two things to say about this quote from James Gosling regarding how Java is under no serious threat from PHP, Ruby, or C#.

  1. HA HA HA HA HA HA HA HA HA HA
  2. Seriously James, do you live in a secret Java cave somewhere?
]]>/>
New product marketing trendhttp://www.revsys.com/tidbits/new-product-marketing-trend/Thu, 09 Mar 2006 14:00:00 +0000http://www.revsys.com/tidbits/new-product-marketing-trend/

I've noticed a trend with software businesses that I think is wonderful. It may not be very new, but I sure haven't seen it before. The trend is to donate money to a charity or Open Source project for each product purchased.

For example, there is a company out there with a product that retails for around $500 for I believe a 25 user license. They setup a promotion only linked to by blogs that allowed users to download a 5 user copy for $5. And they donate the $5 to the Red Cross. I unfortunately can't seem to find the link in my book marks.

Now there is another company AM Software Design who is offering to donate a dollar to the PostgreSQL project provided they sell 1,000 copies of their product. The product is a GUI PostgreSQL administration application.

My question is why we don't see more of this in the FOSS world? Why aren't there more companies who will donate a small share of their profits to help out an Open Source project that is directly related to their product? Why doesn't every vendor who sells a Perl application not donating one, five, ten, or more dollars to the Perl Foundation ?

If I was debating between two vendors products or whether or not to purchase a particular product at all, a donation to a good cause would help persuade me. It obviously would not make up for a shoddy product or one that doesn't fit my needs, but it would make a big difference when choosing between two like products.

I encourage more vendors to adopt this approach in their marketing promotions. I wager that you will be pleasantly surprised at the results.

]]>/>
The need for a 12 step program for software developers...http://www.revsys.com/tidbits/the-need-for-a-12-step-program-for-software-developers/Fri, 24 Feb 2006 14:00:00 +0000http://www.revsys.com/tidbits/the-need-for-a-12-step-program-for-software-developers/

I think the wise kilt wearing Ben Hammersley has it right that developers spend too much time worrying about details that simply don't matter to non-geeks.

Is my site better because it validates ? Do you seriously come back here because the blog software I wrote is built upon a MVC framework and uses a nice, coherent, and flexible database schema? Would you like me more if I had used some REST interfaces? If you do... you really need to get a life.

If you don't like my content, topics, writing style, or me personally would it help if my permanent URLs were /blog/archive/some_story_frank_wrote instead of /blog/archive/37 ???? Would you become my reader if I rewrote all of this in Ruby On Rails instead of using my beloved mod_perl? I really doubt it.

I think this syndrome, I'm going to call it CSD for Compulsive Standards Disorder , is a cousin of the common ED syndrome. No not erectile dysfunction, I'm talking about Efficiency Disorder. The disorder that causes developers to prematurely optimize their software.

I'm as bad about this as the next developer, but sometimes stuff just needs to ship. Holding it back until it is perfect is essentially putting it on hold forever. Just face it, it is NEVER going to be perfect. There is always going to be a slightly better way, a newer standard, a new best practice, etc. What you should be asking yourself is this "Is this perfect enough for the user?". If the answer is even close to yes, ship it.

The first step is admitting you have a problem.

]]>/>
Random musing...http://www.revsys.com/tidbits/random-musing/Thu, 16 Feb 2006 14:00:00 +0000http://www.revsys.com/tidbits/random-musing/

I remember back when I first got into computers reading John C. Dvorak 's columns in PC Magazine. But seriously, has this guy lost his mind or what? In a recent opinion piece he posits that Apple has been secretly moving towards adopting Windows in favor of OS X. Maybe I keep missing it, but I'm still waiting for this guy to be right about something.

He goes on to bring Linux on the desktop into the mix saying " Linux on the desktop never caught on because too many devices don't run on that OS.". First off, I think it's safe to say that Linux on the Desktop is far from "done", so it's.... I dunno... several years too early to say it "never caught" on. Hasn't caught on yet, sure.

No one disagrees that, especially in the early days, getting random hardware to run on Linux was difficult at best. But those days are gone. It used to be "I need a X that runs on Linux" and now it's closer to "Are there any Xs that don't run on Linux?". And just in case Mr. Dvorak ends up reading this, "John, all three of my different DVD burners work just fine on my Linux desktops, just plugged them in and started burning. Thanks for your concern!".

]]>/>
Application Configuration Best Practiceshttp://www.revsys.com/tidbits/application-configuration-best-practices/Advice we had on some configuration best practices related to ModPerl back in the day. Sun, 12 Feb 2006 14:00:00 +0000http://www.revsys.com/tidbits/application-configuration-best-practices/

I signed myself up to write application configuration management into the soon to be released $SUPER_SECRET_OSS_PROJECT. The goal is to have a single default configuration file located on the system (with conf.d/* style includes ) that would define each instance of a particular application.

An instance is simply a name and an easy way for the programmers and admins to talk about each install of a particular application. By simply knowing the instance's name the web application, whether it is a standard Perl CGI or uses FastCGI, mod_perl 1.x, or mod_perl 2.x, can essentially automatically configure itself based on the configuration method preferred by the admin. That's right the admin .

Often there exists a complete disconnect between the programmers and the system administrators responsible for actually making the app run. Programmers want a configuration setup and style that is easy to parse , while sysadmins want a system that is flexible and easy to use.

The configuration infrastructure that I'm building into $SUPER_SECRET_OSS_PROJECT should give everyone the best of several worlds. Here is the idea in a nutshell:

  • You have a central config file let's call it /etc/myconf.conf that will contain a entry for each instance of each application.
  • The instance information will contain either the entire configuration or enough configuration information to bootstrap the config engine enough to gather the rest of the config.
  • The application and any programs or associated cron jobs only need to be given ( via environment variables, command line options, or eek hard coding ) the name of the instance they are a part of to configure themselves.

This is probably best explain with a short example, here is a mock up of the central config file /etc/myconf.conf:

<instance foo>
ConfigViaFlatFile Config::Tiny /etc/apps/foo/foo.cfg 
</instance>


 <instance bar>

ConfigViaParamBuilder Apps::Foo::Params

</instance>

The first instance foo wants to configure itself using the Config::Tiny module and the config information is in /etc/apps/foo/food.cfg

The second instance bar is saying that it wants to configure itself using ModPerl::ParamBuilder
and to use the Apps::Foo::Params module to do so.

So far this is all pretty boring. The real exciting piece is that these two instances could very well be the same code base. One where the admin wants to configure it via custom Apache directives in the httpd.conf and the other using a simple flat text file for configuration. It doesn't matter to the application. You might be asking yourself how that could be. Time for another example:

use SSOP::Conf;   # For Super Secret OSS Project

my $conf = SSOP::Conf->retrieve( $instance );

my $template = $conf->template;

my $dbuser = $conf->dbuser;




 # Or by using a hash reference

my $conf_ref = SSOP::Conf->retrieve_hashref( $instance );

my $template = $$conf_ref{template};

my $dbuser = $$conf_ref{dbuser};

That is all the application needs to know about, the "instance" name,
to configure itself via several methods. Right now the plan is support Config::General, Config::Tiny, PerlSetVars in mod_perl 1 and 2, and ModPerl::ParamBuilder in mod_perl 2.x.

The infrastructure is written so that it doesn't matter to the application how the configuration is gathered and parsed, just so long as it has it. This frees the admin to use whatever configuration method makes the most sense for not only each particular application, but each particular instance of that application.

I think the most exciting aspect is that, provided you use a separately included httpd.conf for each of your instances, your behind the scenes programs, scripts, and cron jobs can share their configuration information. If you like PerlSetVars or ModPerl::ParamBuilder and want all of your app's config to live in your httpd.conf, you don't have to have a separate configuration scheme for your cron jobs.

You're probably saying to yourself, "Why the hell is he telling us this when we can't see the code yet." Well the whole purpose of this entry was to ask all of my readers which configuration file syntaxes they prefer to use, Apache style, .INI style, etc. so we know what to support out of the box. There are plans for a generic SQL backend and LDAP support. The system is built so you can expand on it yourself if there is something we don't support.

Comments, suggestions, rants, death threats, etc. are most welcome. Send them to blog@revsys.com . We want this system to be as useful to as many developers and admins as possible.

]]>/>
Apache debugging and performance tuning articlehttp://www.revsys.com/tidbits/apache-debugging-and-performance-tuning-article/Thu, 09 Feb 2006 14:00:00 +0000http://www.revsys.com/tidbits/apache-debugging-and-performance-tuning-article/ ]]>/>Commerical DB vendors "opening up"http://www.revsys.com/tidbits/commerical-db-vendors-opening-up/Sun, 05 Feb 2006 14:00:00 +0000http://www.revsys.com/tidbits/commerical-db-vendors-opening-up/

NewsForge recently published an article discussing the new trend of opening up commercial database products. PostgreSQL's own Josh Berkus is quoted in the article.

I agree with Robert Treat's blog What the free non-free databases signal that in the end the commercial database vendors are going to loose out in the same manner web server vendors have lost out to Apache, lighttpd, tux, etc.

But I think there is another aspect of this that helps the Open Source database community more than anything. Providing these free installs tranishes all of the mystique of their products.

I remember when I was starting out as a programmer and had my first opportunity to work with a big name commercial database product. I felt I had finally arrived. I was one of the Big Dogs ™ now. Able to leap tall data structures in a single bound!

Little did I know that I would be quickly disappointed and spend the rest of my tenure at that company lobbying to replace the commercial package with PostgreSQL. Not only to reduce costs and improve performance, but to simply make my life/job easier.

If I had not been previously exposed to a product like PostgreSQL, I would have probably thought that insert commercial vendor here was the best thing available, based on advertising and hype. I probably would not even have investigated any further.

Having these free/non-free installations of the propular commercial databases will allow the next generation of geeks to evaluate both the commercial and truly Open on equal footing. And in this fair fight, we are sure to win!

]]>/>
E-Trade VP Talks about Open Sourcehttp://www.revsys.com/tidbits/e-trade-vp-talks-about-open-source/Mon, 30 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/e-trade-vp-talks-about-open-source/

eWeek.com has an article where they interview the VP of Architecture, Lee Thompson, of E-Trade . It's a great overview of how much time, money, and headaches you can solve by moving to Open Source software. The most interesting comment Mr. Thompson made however is on page four , where he discusses the rate of change in Open Source:

OK, so you know the phenomenon the phenomena is, the amount of change that you are sustaining on a Gentoo system is orders of magnitude larger than the amount of change that a typical proprietary operating system from anybody Solaris, HP-UX, mainframes, whatever [would go through].

Whatever operating system, the rate of patches coming out of the vendor is much lower than what you enjoy on, you know, my Gentoo laptop or your Gentoo machine.

And then I started looking, kind of watching this, obviously, from a technology management perspective. If you can sustain change faster than somebody else, you're going to survive, and the person who can't sustain the change is not going to evolve, and they're going to die off. This is almost more important a realization than the direct cost savings, which is still phenomenal.

I'm not a big fan of the Gentoo distribution personally ( not to knock Gentoo, I just had enough of compiling my whole system back when I ran FreeBSD ), but I think Mr. Thompson has stumbled onto yet another reason why Open Source is succeeding so well.

Many theories on business and management, not to mention pundits and journalists, talk about a business' ability to react to change. You will often hear "Company X failed because they were unable to react to change Y in the market" or "When Company Z entered the market it changed everything!" . With technology being so pervasive in companies these days, I believe a company's ability to change is directly related to how fast they can change their technology.

But changing their own technology is only part of the equation. The technology they either purchase commercially or Open Source software they use needs to be flexible in handling the rate of change the business needs require.

I had never thought of it before today, but the ability of Open Source software to handle the rapid changes being thrown at it is one of it's greatest strengths. And considering most Open Source developers are working to either fix a problem they are having or to make life better for someone else, I think this quote from Ghandi is appropriate:

"You must be the change you wish to see in the world" Ghandi

]]>/>
Apache::DB, Apache::DProf, and Apache::SmallProf updateshttp://www.revsys.com/tidbits/apachedb-apachedprof-and-apachesmallprof-updates/Tue, 24 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/apachedb-apachedprof-and-apachesmallprof-updates/

Recently it came to my attention that there were some fairly serious bugs in some of the CPAN modules I maintain . Stas Bekman was trying to use Apache::DProf to profile the performance of one of his projects and could not get it to work.

The problem essentially boiled down to the generally accepted best practice at how to determine if your code is running in mod_perl 1.x or 2.x doesn't work in all cases in all environments. This is because of when the $mod_perl or $mod_perl2 modules are loaded in relation to the Apache lifecycle.

Instead of using:

use constant MP2 = > eval { require mod_perl; $mod_perl::VERSION > 1.99 };

We're going to use:

use constant MP2 => ( exists $ENV{MOD_PERL_API_VERSION} and $ENV{MOD_PERL_API_VERSION} >= 2 );

Because this environment variable is guaranteed to always be available and accurate.

I've uploaded a new distribution of these modules to CPAN just now and it should make it to your favorite mirror in the next 24/48 hours. If you currently use these modules or the above technique to make your mod_perl code work in both an 1.x and 2.x environment you are encouraged to make these changes as well.

]]>/>
greet_pause -- A new anti-spam feature in Sendmail 8.13.xhttp://www.revsys.com/tidbits/greet_pause-a-new-anti-spam-feature-in-sendmail-813x/Thu, 19 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/greet_pause-a-new-anti-spam-feature-in-sendmail-813x/

With the release of Sendmail 8.13.x there is a new anti-spam feature that you should seriously consider taking advantage of. The new feature is called greet_pause .

During an SMTP session, after the sender connects to port 25 on the recipient's MX, it should wait to receive the Sendmail banner prior to sending any further data. In the interest of spewing as much spam as possible, many spammers disregard this and simply force send all of the SMTP commands at once and move on. The receiving MTA typically takes this spew and delivers it.

greet_pause tells Sendmail to wait for a specified amount of time before sending the banner and if the sender spews commands early the message is rejected. This can drastically cut down on the amount of spam you receive on a Sendmail system. You implement this by adding the following to your sendmail.mc:

FEATURE(`greet_pause',5000)

This tells sendmail to wait 5 seconds before displaying the banner. This should be plenty of time for a spammer to begin spewing, but short enough to not cause any timeout problems for legitimate E-mail. You can set it to whatever you think is best, I would suggest something in the 2-10 second range.

Some legit MTAs might still have problems with this, so you can whitelist them by adding the following to your acess file and rebuilding the database:

GreetPause:localhost 0
GreetPause:24.124.0.1 0

This would not delay the banner for localhost and the IP 24.124.0.1. I recommend at least putting in localhost as there is no need to delay yourself.

]]>/>
Article showing PostgreSQL materialized viewshttp://www.revsys.com/tidbits/article-showing-postgresql-materialized-views/Tue, 17 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/article-showing-postgresql-materialized-views/

I recently came across this article Materialized Views In PostgreSQL which shows different ways of using this technique.

For example it shows techniques much like the one I used in my previous blog entry on the subject along with going into depth on how techniques for tables whose data changes frequently.

]]>/>
ModPerl::ParamBuilder releasedhttp://www.revsys.com/tidbits/modperlparambuilder-released/Noting the release of ModPerl::ParamBuilderThu, 12 Jan 2006 09:00:00 +0000http://www.revsys.com/tidbits/modperlparambuilder-released/

I just released a new module to CPAN ModPerl::ParamBuilder which makes it much easier to build custom Apache directives for your mod_perl 2.0 applications.

You might be asking why you would want to do such a thing. The main reasons are:

  • Custom directives are more efficient than using PerlSetEnv or PerlSetVar. They are evaluated only on server startup and not for each request like PerlSetVars

  • It gives your application a more polished and professional look and makes your configuration more intuitive for end users

  • It's just plain cool

Assuming you're building an application called MyApp that will need to be passed various parameters such as database user, password, database name, database server address, etc. Here is how ModPerl::ParamBuilder fits in.

First you create a separate module that will hold your custom directives. We'll call that MyApp::Params and would look like:

package MyApp::Params;
use ModPerl::ParamBuilder;
use base qw( ModPerl::ParamBuilder );

my $builder = ModPerl::ParamBuilder->new( PACKAGE );

$builder->param('DBUser');

$builder->param('DBPass');

$builder->param('DBName');

$builder->param('DBServer');

$builder->on_off('AutoCommit');

$builder->load;

1;

Putting these directives to use in your Apache's httpd.conf is easy,
you just need to load your MyApp::Params module.

PerlLoadModule MyApp::Params

SetHandler perl-script

DBUser apache

DBPass secret

DBServer 127.0.0.1

DBName myapp

AutoCommit On

PerlResponseHandler MyApp::Main

NOTE: You must use PerlLoadModule and not
the more common PerlModule Apache directive for your parameter module. This is because Apache needs to load this module very early in the server startup so that it can read it's own configuration files.

To retrieve and use these directives from your application you add
the following to MyApp::Main :

use MyApp::Params;

my $params = MyApp::Params->new;

my $config = $params->get_config;

my $dbuser = $$config{'DBUser'};

etc

Hopefully everyone will find this module as useful as I have. Personally, I think being able to build custom Apache directives easily is of the neatest features of mod_perl 2.0.

UPDATE: I've also put up a short short tutorial on how to use ModPerl::ParamBuilder .

Feel free to contact me if you have any questions or have a problem using it.

]]>/>
Why vulnerability counts are stupid...http://www.revsys.com/tidbits/why-vulnerability-counts-are-stupid/Fri, 06 Jan 2006 18:39:10 +0000http://www.revsys.com/tidbits/why-vulnerability-counts-are-stupid/

Every year US-CERT releases a list of vulnerabilities for the previous year . And every year the press and media misrepresent these security vulnerabilities as being a big deal and that Microsoft is clearly winning in this effort against all other operating systems. This just simply isn't true and if one of these lame reporters would just call anyone they know who understands Open Source Software or computers in general they would get set straight. There are some great articles already discussing this such as:

I see this happen all of the time with various media outlets. They report on some technology story and get the entire concept wrong. Don't get me wrong, I do not fault the reporter for not knowing everything about technology. That isn't their job. But what large media outlet on the planet doesn't have an IT department or other department filled with resident computer geeks? Why don't these reporters take advantage of the resources they have in house? On a story like this it would only take 10 minutes to explain why their conclusions are just plain wrong and save themselves the embarassement of having a million geeks E-mail them corrections. Not to mention that some of the corrections are going to be, well... unfriendly in nature :)

If any reporter out there has a quick technology question they need fact checked, please send it to a geek you know. If you don't know any geeks feel free to send it to me. I can't guarantee a response time, but the lack of your own personal embarassement should be well... priceless!

]]>/>
Perl DB2 Articlehttp://www.revsys.com/tidbits/perl-db2-article/Fri, 06 Jan 2006 18:38:58 +0000http://www.revsys.com/tidbits/perl-db2-article/

Marina Greenstein has written a short introduction to using DB2 with Perl . While it is geared specifically toward DB2 the concepts apply to pretty much all DBI work. I still prefer my favorite database for all of my work, but if you're looking for a DBI tutorial this is definitely worth checking out.

]]>/>
Why you should date geekshttp://www.revsys.com/tidbits/why-you-should-date-geeks/Fri, 06 Jan 2006 16:31:53 +0000http://www.revsys.com/tidbits/why-you-should-date-geeks/

Maryam Ghaemmaghami Scoble, the wife of Robert Scoble, has put up a Top 10 list of reasons to date a geek, titled In Defense of Geeks or Ten Reasons Why You Should Date a Geek . While I think this is hillarious, now I'm worried that I will have to start lying about my geekness otherwise I run the risk of being trambled to death by hordes of technologically challenged women!

Ok... maybe that wouldn't be so bad...

]]>/>
Tuning your PostgreSQL Databasehttp://www.revsys.com/tidbits/tuning-your-postgresql-database/PostgreSQL Tuning Service Fri, 06 Jan 2006 16:31:00 +0000http://www.revsys.com/tidbits/tuning-your-postgresql-database/

Several months ago I wrote an article on tuning your PostgreSQL database for performance that has gained a lot of attention. While I think the article covers most of the basic to intermediate level options you can use to better tune your database server, it is by no means all you're ever going to need to know. If you use PostgreSQL often I strongly suggest you at least scan the posts on the postgresql-performance mailing list.

What surprised me most is how many companies and individual developers that are in need of a consultant to help them get the most out of their PostgreSQL setup. Because of this we've launched a new PostgreSQL Performance Tuning Service designed to help organizations receive better performance out of their systems and reduce the need to upgrade their server hardware. We often find that a few well placed configuration, query, or stored procedure changes can dramatically impact the speed of your application or website.

The problem with online tuning guides and the standard documentation is that every company's database is designed and/or used just differently enough from everyone else that a customized tuning is the best option. Contact REVSYS to find out more and schedule a performance analysis.

]]>/>
Sending E-mail from Perlhttp://www.revsys.com/tidbits/sending-e-mail-from-perl/One way to send templated emails with PerlThu, 05 Jan 2006 17:22:00 +0000http://www.revsys.com/tidbits/sending-e-mail-from-perl/

UPDATE: I've recently found an even better way of sending Email messages than using any of the information listed here. Check out my new post on the subject for the details

I'm always amazed at how many people have trouble doing something as simple as sending E-mail from Perl or mod_perl. I think this is because new Perl programmers are either unaware of CPAN or afraid to use it. Trust me CPAN is your friend. :) If you install the Net::SMTP module, which is part of the libnet CPAN package, on your system it is trivial to send plain text E-mail messages.

Here is a brief example:

Create an instance of the module

my $smtp = Net::SMTP->new( 'smtp.example.com') or die "Cannot connect to host: $!";

$smtp->to( 'recipient@example.com' );

$smtp->data();
$smtp->datasend("To: recipient\@example.com\n");
$smtp->datasend("From: sender\@example.com\n");
$smtp->datasend("Subject: Test Subject\n");
$smtp->datasend("\n");
$smtp->datasend("This is where the body of your message goes!\n");
$smtp->quit();

This module assumes you know a little bit about the SMTP protocol, but there are tons of modules on CPAN that are even easier to use. This one just happens to be the one I use the most. Some others are:

While I haven't personally used all of these modules, they all seem to have clean interfaces for sending E-mail from your Perl programs. MIME::Lite especially is useful when you want to send attachments along with your message. Hopefully this information helps you in your future projects!

]]>/>
Using SSH ControlMaster to speed up your connect timeshttp://www.revsys.com/tidbits/using-ssh-controlmaster-to-speed-up-your-connect-times/Using the ControlMaster option with OpenSSH for faster connections. Thu, 05 Jan 2006 14:48:00 +0000http://www.revsys.com/tidbits/using-ssh-controlmaster-to-speed-up-your-connect-times/

Awhile back I wrote a short article on how to speed up subsequent connections to the same host via SSH originally I had an issue using the ControlMaster technique with subversion when using svn+ssh . Rob Holland from Inverse Path sent me a messge showing how to fix the issue. The article has been updated with the fix. Thanks Rob for figuring this out for all of us.

]]>/>
DNS & BIND Securityhttp://www.revsys.com/tidbits/dns-bind-security/DNS is baffling for many people. Especially the security aspects. Thu, 05 Jan 2006 07:28:00 +0000http://www.revsys.com/tidbits/dns-bind-security/

Understanding DNS is hard. REALLY HARD . It takes most people several years of running a DNS server on the public Internet before they really become comfortable with all of the various nuances. I've been doing it for many years and I even make silly mistakes sometimes.

A recent study of DNS server configurations found that 20% had security holes and nearly 70% were improperly configured in some aspect or another. I shouldn't be surprised by this, but deep down part of me is surprised these numbers are quite so high. I really recommend, if you can, have someone else run your DNS for you. If you aren't comfortable with it there is no reason to put yourself through the trouble of getting everything setup properly. However, if you do need or want to run your own DNS server I highly recommend the DNS & BIND book from O'Reilly. It covers everything you will need to know and then some.

If you do run your own DNS, I suggest you have someone who is very familiar with DNS to verify your DNS configuration is accurate and secure to help ensure nothing bad happens to you.

]]>/>
When to use a materialized view in PostgreSQLhttp://www.revsys.com/tidbits/when-to-use-a-materialized-view-in-postgresql/Situations where it makes sense to use a materialized view in PostgreSQL. Tue, 03 Jan 2006 13:49:00 +0000http://www.revsys.com/tidbits/when-to-use-a-materialized-view-in-postgresql/

A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL , like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view. This is great for better representation of data for users, but does not do anything to help performance.

Materialized views are different in that they are an actual physical table that is built from the data in other tables. To use another example from my NewsCloud application in order to achieve the performance I needed, I used a materialized view for representing the tag cloud.

In this particular application the data used to build the tag cloud changes very infrequently, but to generate the actual tag cloud the ORDER BY needed to rank the results was terribly slow. They query in question is:

SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

This query was taking an average of 2 seconds to complete which would mean, when you figured in all of the other time aspects such as mod_perl, Apache, transporting the HTML back to the browser, etc. this could easily mean the user would see a 3-4 second page load time. However, by creating a new table with:

CREATE TABLE test AS SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

And then I dropped my old view table ( named count_mview ) and renamed the test table to the old name. A quick vacuum analyze afterwards and everything is happy. With this simple change I can then directly query the count_mview data and it is returned in the order I need, but this query takes just slightly less than 1 millisecond!

If the data in your underlying tables changes more frequently you will be better served by using triggers on those tables that fire when INSERTs, UPDATEs, and/or DELETEs are performed on them and update the materialized view table according. For a good introduction to this check out the PostgreSQL manual section on triggers and PL/pgSQL Trigger Procedures .

Hopefully you can use this technique in the future to speed up some of your slower performing queries.

UPDATE: If you're interested in this topic you might also want to check out my DjangoCon talk on this subject Making your PostgreSQL database sing .

]]>/>
Perl in 2005http://www.revsys.com/tidbits/perl-in-2005/Tue, 03 Jan 2006 10:59:06 +0000http://www.revsys.com/tidbits/perl-in-2005/

No one can argue that 2005 has been an exciting year for the Perl community. Chromatic has written up a nice year in review of what happened in the Perl community over the last year. He also touches on a few things that are going to happen in 2006.

With the up coming release of Perl 6 coming closer and closer, it is only a matter of time before we can begin to reap all of the benefits this new version has to offer. Check out these sites for more detailed information on Parrot and/or Perl 6 if you aren't familiar with them.

]]>/>
Using Apache's mod_deflate for speedhttp://www.revsys.com/tidbits/using-apaches-mod_deflate-for-speed/How to add mod_deflate to compress your Apache content. Mon, 02 Jan 2006 14:29:00 +0000http://www.revsys.com/tidbits/using-apaches-mod_deflate-for-speed/

If you have a site that sends large textual content ( XHTML, XML, CSS, plain text, etc. ) to a browser via Apache you should investigate using mod_deflate . This Apache module compresses your content at the server, before sending to the browser. This can dramatically speed up the time it takes for your pages to load. Especially for users on slower Internet connections. Many Linux distributions already have mod_deflate installed, but if yours does not you will need to install it by hand.

For example, an application I wrote called NewsCloud has some pages that are over 350K of XHTML. This can take awhile even on the fastest of home Internet connections. But by having mod_deflate compress the content before sending it to the browser 350K of XHTML becomes a much more managable 40K. Obviously your results will vary based upon your content. Because NewsCloud is a mod_perl application this is what I added to my config:

         <Location /newscloud>   
   SetOutputFilter DEFLATE  
   ExpiresByType text/html "access plus 2 hours"  
  </Location>

The ExpiresByType instructs the browser to keep cached any content from this particular Location for 2 hours after it was last accessed. This also is useful to ensure a user's experience is good, but obviously varies depending on the nature of the application you are building. It wouldn't be too smart in a shopping cart for instance... " Hey?!?!?! Where the $@#$# did my stuff go?"

Tp learn more about mod_deflate you should check out this article, " Apache 2 mod_deflate Benchmark "

]]>/>
Why you should make your PostgreSQL tables without OIDshttp://www.revsys.com/tidbits/why-you-should-make-your-postgresql-tables-without-oids/Mon, 02 Jan 2006 14:16:45 +0000http://www.revsys.com/tidbits/why-you-should-make-your-postgresql-tables-without-oids/

NOTE: This advice only applies to really old versions of PostgreSQL. User created tables stopped having OIDs added to them in the early 8.x versions.

Something most people don't know about the PostgreSQL RDBMS is that when you create tables each row is given a unique object identifier ( aka OID ) whether or not you define any primary keys for that table. This is useful if you need to be able to delete an individual row from the database that doesn't have any unique way of referring to it. You can retrieve the OID for any row by just adding it to the items you with to SELECT:

SELECT oid, id FROM table1;

If you try this on one of your existing tables you will see the various OIDs for your database rows. While these OIDs can be useful in certain circumstances, if you aren't aware of them then you probably aren't using them! Instructing PostgreSQL to not create OIDs for you can save some space on your filesystem and speed up your queries. However, don't expect to get a noticble amount of difference in either speed or disk space as this is just on extra integer per row. It really depends on how your schema is setup. To create a table without OIDs you simply need to do the following:

  CREATE TABLE test (
    .....
) WITHOUT OIDS;

If your tables were build with OIDs ( the default ) then you can remove them by issuing the following:

ALTER TABLE test SET WITHOUT OIDS;

However, this will only keep new rows and rows that are updated from having OIDs. It isn't optimal, but certainly a step in the right direction.

As I don't ever use OIDs in my queries I now explicitly define all of my tables without OIDs. Perhaps you should do the same?

]]>/>
Sendmail Virtual User Trickhttp://www.revsys.com/tidbits/sendmail-virtual-user-trick/Mon, 02 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/sendmail-virtual-user-trick/

One of the things people find difficult about Sendmail is virtual users. These are defined in the virtusertable file ( usually in /etc/mail/virtusertable ). This file instructs Sendmail to translate a "virtual" user into a real user or alias. The reason I mention aliases here is because, with Sendmail, you can have a virtual user that translates into a alias for multiple local and/or remote E-mail accounts.

A situation some people run into is that they want all usernames at domain2.com to be delivered to the same username at domain1.com , except for a few users.... If you really wanted all users to map to the other domain it would be as simple as adding the domain into the local-host-names . It is the need to have most users map to the other domain where the virtusertable file comes in handy. For this example, let's assume that we want all users names @domain1.com except for postmaster, webmaster, and support to map to the same username @domain2.com this would be accomplished by adding domain2.com to the local-host-names file to tell Sendmail that we wish to receive mail for that domain and then adding the following to the virtusertable file:

postmaster@domain2.com: mailadmin@example.com
webmaster@domain2.com: webmsater@example.com
support@domain2.com: support@example.com
@domain2.com: %1@domain1.com

The first three lines tells Sendmail to send messages sent to those three usernames at domain2.com to the appropriate remote E-mail addresses. The last line instructs sendmail to send any other usernames sent to domain2.com to the same username at domain1.com. Note that this will also include any fake usernames that a spammer might send to. The E-mail server at domain1.com will still be responsible for determining what is or is not a valid username. After you've added those entries to the virtusertable file all you need to do is rebuild it and it becomes active.

]]>/>
Happy New Year!http://www.revsys.com/tidbits/happy-new-year/Sun, 01 Jan 2006 14:00:00 +0000http://www.revsys.com/tidbits/happy-new-year/

Happy New Year from Revolution Systems!

We felt the best way to start off the New Year was to launch a new service for our customers and the general public. This blog will be used to discuss various technology, business/marketing trends, and on topic news we think you will be interested in. We hope you will check back often to see what we have to say.

Frank Wiles

President and Founder

]]>/>