Showing posts with label webdev. Show all posts
Showing posts with label webdev. Show all posts

Monday, August 2, 2010

HTTP: Chunked Encoding

In chunked encoding, the content is broken up into a number of chunks; each of which is prefixed by its size in bytes. A zero size chunk indicates the end of the response message. If a server is using chunked encoding it must set the Transfer-Encoding header to "chunked".

Chunked-encoding is not the same as Content-Encoding header. The Content-Encoding header is an entity-body header. since transfer-encodings are a property of the message, not of the entity-body. ("Entity-body" refers to the body or payload [e.g. a JPG image] of an HTTP request [e.g. POST or PUT request] or response).

 

Q: When is chunked encoding really useful?

A: Chunked encoding is useful when a large amount of data is being returned to the client and the total size of the response may not be known until the request has been fully processed. An example of this is generating an HTML table of results from a database query. If you wanted to use the Content-Length header you would have to buffer the whole result set before calculating the total content size. However, with chunked encoding you could just write the data one row at a time back to the client. At the end, you could write a zero-sized chunk when the end of the SQL query is reached.

This is the HTTP header that is sent by the server:

Transfer-Encoding: chunked

In the HTTP 1.1 specification, chunked is the only encoding method supported by the "Transfer-Encoding" header.

 

source

apache: what are .htaccess files typically used for?

.htaccess files allow us to make configuration changes on a per-directory basis. .htaccess files work in Apache Web Server on both Linux/Unix and Windows operating system. In Apache, the format of .htaccess is the same as the server's global configuration file.

There are several things that developers, site owners and webmasters can do with .htaccess files, for example:

  • Prevent directory browsing
  • Redirect visitors from one page or directory to another
  • Password protection for sensitive directories
  • URL-Rewriting (e.g. rewriting long, unwieldy URLs to shorter ones)
  • Change the default index page of a directory
  • Prevent hot-linking of images from your website
  • per-directory cache control

 

 sources:

http://www.bloghash.com/2006/11/beginners-guide-to-htaccess-file-with-examples/

http://en.wikipedia.org/wiki/Htaccess

Saturday, July 31, 2010

web services vs REST

The fundamental difference, therefore, between REST and document-style Web services is how the service consumer knows what to expect out of the service. Web services have contracts, defined in WSDL. Since Web services focus on the service, rather than on the resource, the consumer has clear visibility into the behavior of the various operations of the service, whereas in REST's resource-oriented perspective, we have visibility into the resources, but the behavior is implicit, since there is no contract that governs the behavior of each URI-identified resource.

For a very good introductory article on REST, check out the article here.

Friday, February 12, 2010

python: make an HTTP request with any kind of HTTP method (put, delete, head) using urllib2

Sometimes you don't want to sit and migrate all your existing codebase from urllib/urllib2 to httplib/httplib2, just because the former doesn't support HTTP PUT or HTTP DELETE methods out-of-the-box. Well, here's a workaround for just that problem:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)


Thanks to stackoverflow yet again for solving another programming conundrum :)

Wednesday, September 23, 2009

python: writing your own urllib2 redirect_handler

class SmartRedirectHandler(urllib2.HTTPRedirectHandler):

    def redirect_request(self, req, fp, code, msg, headers, newurl=''):
        global intermediateURL
        intermediateURL = newurl

        if code in (301, 302, 303, "refresh") or (code == 307 and not req.has_data()):
            newRequest = urllib2.Request(   
                                            newurl,
                                            headers=req.headers,
                                            origin_req_host=req.get_origin_req_host(),
                                            unverifiable=True
                                        )
            newRequest._origin_req = getattr(req, "_origin_req", req)
            
        return newRequest
        
        else:
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)

source

Wednesday, May 27, 2009

apache: worker vs prefork models

Apache's Prefork Model

This Multi-Processing Module (MPM) implements a non-threaded, pre-forking web server that handles requests in a manner similar to Apache 1.3. It is appropriate for sites that need to avoid threading for compatibility with non-thread-safe libraries. It is also the best MPM for isolating each request, so that a problem with a single request will not affect any other.
[http://httpd.apache.org/docs/2.0/mod/prefork.html]



Apache's Worker Model

Multi-Processing Module (MPM) implements a hybrid multi-threaded multi-process web server.
This Multi-Processing Module (MPM) implements a hybrid multi-process multi-threaded server. By using threads to serve requests, it is able to serve a large number of requests with less system resources than a process-based server. Yet it retains much of the stability of a process-based server by keeping multiple processes available, each with many threads.

The most important directives used to control this MPM are ThreadsPerChild, which controls the number of threads deployed by each child process and MaxClients, which controls the maximum total number of threads that may be launched.
[http://httpd.apache.org/docs/2.0/mod/worker.html]




For example, sites that need a great deal of scalability can choose to use a threaded MPM like worker, while sites requiring stability or compatibility with older software can use a prefork.
[http://httpd.apache.org/docs/2.0/mpm.html]


I compiled 2 different versions of apache 2.2.4 on Solaris 10 (06/06, on a crappy U10, but...) one using the prefork MPM (compile --with-mpm=prefork) and the other using the worker MPM (compile --with-mpm=worker). Prefork is supposed to generally be better for single or dual cpu systems, and worker is supposed to be generally better for multi-CPU systems.

So for this setup, the worker MPM was almost twice as fast as the prefork.
I'm going to run these same tests on a multi-cpu server and see what the results look like.
[http://www.camelrichard.org/apache-prefork-vs-worker]




On most Unixes, the worker MPM results in considerable performance enhancements over the prefork MPM, and it results in much greater scalability, since threads are a lot cheaper (less memory and CPU to create and run) than forked processes.
[http://www.onlamp.com/pub/a/apache/2004/06/17/apacheckbk.html]

Friday, October 3, 2008

web security: input validation attacks

The following are some common input validation attacks;
Path or directory traversal This attack is also known as the “dot dot slash”
because it is perpetrated by inserting the characters “../” several times into a URL
to back up or traverse into directories that weren’t supposed to be accessible from
the Web. The command “../” at the command prompt tells the system to back up
to the previous directory (try it, “cd ../”). If a web server’s default directory was
“c:\inetpub\www”, a URL requesting http://www.website.com/scripts/../../../../../
windows/system32/cmd.exe?/c+dir+c:\ would issue the command to back up
several directories to ensure it has gone all the way to the root of the drive and
then make the request to change to the operating system directory (windows\
system32) and run the cmd.exe listing the contents of the c: drive.
Unicode encoding Unicode is an industry standard mechanism developed
to represent the entire range of over 100,000 textual characters in the world as
a standard coding format. Web servers support Unicode to support different
character sets (like Chinese), and, at a time, many supported it by default. So,
even if we told our systems to not allow the “../” directory traversal request
mentioned earlier, an attacker using Unicode could effectively make the same
directory traversal request without using “/”, but with any of the Unicode
representation of that character (three exist: %c1%1c, %c0%9v, and %c0%af).
That request may slip through unnoticed and be processed.
URL encoding If you’ve ever noticed that a “space” appears as “%20” in a URL
in a web browser (Why is it only me who notices that?), the “%20” represents
the space because spaces aren’t allowed characters in a URL. Much like the
attacks using Unicode characters, attackers found that they could bypass filtering
techniques and make requests by representing characters differently.


source: http://www.amazon.com/gp/blog/post/PLNK225DWOGWJIEBG

Tuesday, September 30, 2008

HTTP: digest auth example

This "HTTP Digest Authentication" example from wikipedia was just too good. I had to post this here - a real collector's item! ;)


Step 3 has the crucial part of the whole process - the inclusion of the server's nonce into the MD5 hash computation, which refutes replay attacks.



1 . Client request (no authentication):


GET /dir/index.html HTTP/1.0
Host: localhost



2. Server response:



HTTP/1.0 401 Unauthorised
Server: HTTPd/0.9
Date: Sun, 10 Apr 2005 20:26:47 GMT
WWW-Authenticate: Digest realm="testrealm@host.com",
qop="auth,auth-int",
nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
opaque="5ccc069c403ebaf9f0171e9517f40e41"
Content-Type: text/html
Content-Length: 311


"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">





401 Unauthorised.





3. Client request (user name "Mufasa", password "Circle Of Life"):

GET /dir/index.html HTTP/1.0
Host: localhost
Authorization: Digest username="Mufasa",
realm="testrealm@host.com",
nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
uri="/dir/index.html",
qop=auth,
nc=00000001,
cnonce="0a4f113b",
response="6629fae49393a05397450978507c4ef1",
opaque="5ccc069c403ebaf9f0171e9517f40e41"


4. Server response:

HTTP/1.0 200 OK
Server: HTTPd/0.9
Date: Sun, 10 Apr 2005 20:27:03 GMT
Content-Type: text/html
Content-Length: 7984



(source: http://en.wikipedia.org/wiki/Digest_access_authentication)

web security: nonce

Nonce (wrt HTTP digest authentication)
    
A nonce is a parameter that varies with time. A nonce can be a time stamp, a visit counter on a Web page, or a special marker intended to limit or prevent the unauthorized replay or reproduction of a file.

Because a nonce changes with time, it is easy to tell whether or not an attempt at replay or reproduction of a file is legitimate; the current time can be compared with the nonce. If it does not exceed it or if no nonce exists, then the attempt is authorized. Otherwise, the attempt is not authorized.